Babelfish: SQL Server-to-Postgres Translation Layer

Babelfish: SQL Server-to-Postgres Translation Layer(aws.amazon.com)

288 points by ke4qqq 5 years ago | 113 comments

Don't hate me for it, but I'd like this for MySQL to postgres too. At least as a stepping stone.

Use case: some of my SQL syntax depends on MySQL but I realize I made a poor life choice and would rather have transactional DDL and a myriad of better features on postgres.

lukaseder 5 years ago | |

Use https://www.jooq.org/translate/

victor106 5 years ago | | |

This is so awesome. Wish I found it earlier. Thanks for posting

pletnes 5 years ago | |

For SQL server, AWS can save their customers money by cutting license cost (to MS). MySQL is already free so I don’t see how they can benefit from such a project.

bbatha 5 years ago | | |

They can go one better — they can fork the parsers out of both projects and run it against a common query planner/storage engine (aurora).

jcriddle4 5 years ago | | |

If you have several thousand stored procs and triggers then having to not rewrite all of that is actually likely over a million in savings.

etaioinshrdlu 5 years ago | | |

I agree, but there might be a creative way to do it that increases customer happiness and still is a decent business.

ksec 5 years ago | | |

>don’t see how they can benefit from such a project

I thought there were bad blood between Oracle and Amazon?

linsomniac 5 years ago | |

You might be able to get away with staying with MySQL or a variant. Years ago I did an "interview problem" to generate a report of some accounting data of a sample data set in MySQL. The details are foggy now, but I ended up doing it all using advanced SQL features that I thought only existed in PostgreSQL (you know, or Oracle), but whatever engine I ended up using had it as well. It was probably the latest Percona or Maria?

Conlectus 5 years ago | | |

MySQL has worse guarantees in several cases, though. For example, using `SERIALIZABLE` isolation means your queries can deadlock when run in parallel.

Likewise, selecting a column not present in a GROUP BY leads to random values being returned.

gavinray 5 years ago | |

pgloader does this (and much much more) in a single command:

https://github.com/dimitri/pgloader

  pgloader \
    mysql://user:password@mysql:3306/database \
    postgresql://postgres:root@localhost:5432/postgres

This is a gem of a tool. Surprised nobody has mentioned it. Supports SQLite and MS-SQL too.

https://pgloader.readthedocs.io/en/latest/ref/mysql.html#

acmecorps 5 years ago | | |

Absolutely, absolute love pgloader! So painless and quite straightforward. Saved me lots of time migrating my production db. Highly, highly recommend if anyone wants to migrate from MySQL to PG!

robocat 5 years ago | |

However if the translation layer is designed for consistency, like Babelfish, it would have to replicate many of the MySQL “faults” too... e.g. it would need to have the same non-transactional DDL so it acted “correctly”!

And it would need to replicate the specialised data types like utf8mb3.

To use PostgreSQL features (such as JSON datatype) would presumably need a separate data connection and transaction, because if it isn’t implemented by MySQL, then the syntax wouldn’t be supported by Babelfish.

You would get some wins, but you don’t get a blend of the best of both.

etaioinshrdlu 5 years ago | | |

It would be nice if it could aim for doing the sane thing over exact MySQL compatibility. But I agree it would be a minefield...

lutoma 5 years ago | |

I would love that. There is so much mediocre but popular software out there that uses hard-coded MySQL queries instead of some abstraction layer.

Having to use MySQL/MariaDB after usually dealing with PostgreSQL always feels like such a downgrade.

brian_herman 5 years ago | |

https://aws.amazon.com/about-aws/whats-new/2016/04/aws-schem...

touisteur 5 years ago | |

Can't you do it with a foreign data wrapper or the equivalent in mysql? Maybe keep your mysql-specific queries as views in mysql and call them from pg?

Just spitballing, sorry, I love FDWs.

ziftface 5 years ago | |

Something like this would solve a lot of my problems

leesalminen 5 years ago | | |

I’d considered migrating MySQL to Postgres on a 200 table production app more than once. Couldn’t find any good tooling at the time so I just sucked it up and lived with my life choices.

The_rationalist 5 years ago | |

Why not migrate to MariaDB? It's better than MySQL and has easy and great clustering contrary to Postgres

joshuaellinger 5 years ago |

This is great. I suspect that I have exactly the right use case for this.

The two main issues with running SQL Server are (1) you have to license all the cores on a system and (2) the standard license only recognizes up to 64GB RAM. So I actually wound up buying a 3GHz single-socket system for around $10K to save $20K on the SQL license.

With this, I can move a couple of the big DBs to another system that has 32 Cores with 256GB RAM and the entire DB will fit in memory, put in 5GB ethernet, and gain a tremendous amount of performance.

But, more importantly, I can migrate the workload on a case-by-case basis. Human costs always dwarf my software and hardware costs.

sqlserver1 5 years ago | |

(2) is not accurate. SQL Server Standard Edition supports up to 128 GB RAM

https://docs.microsoft.com/en-us/sql/sql-server/editions-and...

ed25519FUUU 5 years ago | | |

For the highest socket license.

c17r 5 years ago | |

I still remember the day that MS switch the SQL Server pricing from per-socket to per-core. Dark day, indeed.

orf 5 years ago |

> A commonly used datatype to store monetary values is the MONEY data type. In SQL Server, the MONEY data type’s behavior is fixed using four digits to the right of the decimal (e.g., $12.8123). However, in PostgreSQL, the MONEY data type is fixed using two digits to the right of the decimal.

> So, when the application tries to store a value of $12.8123, by example, PostgreSQL will round to $12.81. This subtle difference will result in a rounding error and break an application if not correctly addressed. To ensure correctness in Babelfish, we need to ensure such differences, small and large, are handled with absolute fidelity.

How are they going to solve this with just a query translation layer? Isn't information lost on save?

dragonwriter 5 years ago | |

> How are they going to solve this with just a query translation layer?

Well, the translation layer isn't just a query (DQL) translation layer, its an SQL Translation layer including DDL, DML, etc. Since both Postgres MONEY and SQL Server MONEY are 8-byte, fixed-precision decimal types, with the only difference being the position of the implicit decimal, a translation layer can use one as the backing store for something that is logically treated as the other without data loss, though it will have to be aware of the difference when presenting data and also when doing conversions to other datatypes, doing math other than addition/subtraction, etc.

It would be even easier, I think, to just use, what, DECIMAL(19,4) in Postgres for SQL Server MONEY, with some special handling to have the right failure behavior at the edge of the slightly-narrower range of the SQL Server MONEY type.

arusahni 5 years ago | | |

One can also change the precision of psql's monetary type. From the docs:

> The fractional precision is determined by the database's lc_monetary setting.

radiowave 5 years ago | |

My guess would be: by not using Postgres's money type.

Tostino 5 years ago | |

I'd say the best solution here is if a really nice money data type was developed that is configurable as necessary to hopefully satisfy the MS SQL idiosyncrasies while also maybe supporting a bit richer superset of desirable features you'd want in a money type. You could even have things like temporal currency conversion type queries if you had the right feature set.

OJFord 5 years ago | |

Presumably by not storing SQL Server's MONEYs in pg MONEYs, but CASTing to a pg MONEY if pg asks for it.

bdcravens 5 years ago |

As someone who has been fighting with a SQL Server to Postgresql conversion this sounds AMAZING. Too bad it won't be available before my conversion is complete (and if it is, that's an even sadder proposition)

PeterZaitsev 5 years ago |

This would be even greater news if it would not be vaporware "Babelfish for PostgreSQL will be available on Github in 2021." https://babelfish-for-postgresql.github.io/babelfish-for-pos...

xupybd 5 years ago |

I hope this means I can finally connect postgres to excel with the same ease I can connect SQL server.

pletnes 5 years ago | |

What does “connect” mean in this context? Read data from SQL server into an excel spreadsheet?

pc86 5 years ago | | |

Not the GP but yes I've seen a lot of folks go through the Data tab in Excel and connect to a SQL database to display data directly.

linuxhiker 5 years ago |

This is interesting because it will also help Sybase migrations. SQL Server is the "brand name" but there are still a lot of people stuck on Sybase.

JonathonW 5 years ago | |

Will it? MSSQL and Sybase diverged somewhere around 27 years ago; anything that Sybase and Microsoft did differently since then would likely be completely incompatible.

Svip 5 years ago | | |

As someone who converted a 400'000+ LoC Sybase SQL codebase to Microsoft SQL Server about 5-6 years ago, they did diverged since around 2000, but not by a lot. We ended up with a codebase that could support both Sybase and MSSQL, where when differences occurred, we essentially used compiler directives, which wasn't that often.

throwaway201103 5 years ago | | |

I would guess at least 80% of common SQL and T-SQL from Sybase is still completely compatible with SQL Server.

As a side note, I had no idea Sybase still existed at all. Looks like it's now part of SAP's portfolio.

lukaseder 5 years ago | |

The two dialects are very different today

beoberha 5 years ago |

Sounds like they’re open sourcing it to get some help on it. I have to wonder if they’ve found it not worth the time to make it fully production ready.

justizin 5 years ago |

i swear the fuck to god if one more piece of technology is called babelfish i am going to find out who is responsible and toilet paper their house.

technion 5 years ago | |

I currently support the following products named "Integrity":

- Law firm management software - Document management software - A DVR appliance

Previously I supported HPE Integrity hardware. Certain names just seem way overused.

justizin 5 years ago | |

imagine how hilarious douglas adams would find it to try and google babelfish today lol.

temp667 5 years ago |

Why not do this for Oracle? I've not found SQL Server to be too bad from the crazy Oracle stuff (light experience only - maybe bigger players have it worse?).

edoceo 5 years ago |

Heres ms2pg https://edoceo.com/dev/ms2pg

A tool I made and used over a decade ago when migrating a bunch of stuff

kentbrew 5 years ago |

Ancient muscle memory completes the URL thusly:

babelfish.altavista.digital.com

statictype 5 years ago |

I tried to build a T-Sql-to-pgsql compiler to enable us to migrate our code but ran into some fundamental issues.

Sql Server allows you to have arbitrary statements/declarations embedded in your sql queries. It also doesn't require type information to be specified in many places.

How does this translator get around that?

For example, if I have this bit of unoptimized T-Sql:

   declare @m int
   select @m=[MeterID] from EnergyMeters where MeterLocation='/a/b/c';

   select sum([Value]) from EnergyData where [MeterID]=@m;

How would this get translated to pgsql? (Yes, you can combine this specific statement into a single query - this is a trivial example to highlight the point)

dragonwriter 5 years ago | |

If you are just binding variables with early (before the last one that returns data) selects like that, just turning them into subqueries or factoring them out to CTEs works, which should be reasonably straightforward mechanically.

statictype 5 years ago |

This sounds like it could be a massive competitive advantage for AWS over Azure.

It would be difficult for Microsoft to canibalize their Azure Sql sales by building a similar translation layer.

fellowniusmonk 5 years ago |

About every 3 years when I've tried to migrate and use some db migration tool it always seems to throw frustrating string/formatting errors, each time I've smacked my forehead and ended up just grabbing Ruby and ActiveRecord, it just always seems to work without any weird parsing errors.

xet7 5 years ago |

1) Is there MongoDB-to-Postgres Translation layer?

2) Is there converter that can convert schema and transfer all data from:

2.1) MongoDB to SQLite?

2.2) MongoDB to PostgreSQL?

c17r 5 years ago | |

https://github.com/thomas4019/pgmongo

conroy 5 years ago |

Any idea what language it’s written in?

BrentOzar 5 years ago | |

"Babelfish is written in C, which is the same programming language used to develop PostgreSQL. Some parts of Babelfish are developed using procedural language in PL/pgSQL. Many test cases are written in PL/pgSQL and T-SQL."

conroy 5 years ago | | |

Argh, I scanned the article multiple times and missed that section. Thank you