Notes on Migrating from MS SQL Server to PostgreSQL
I'm not a database guru or SQL sherpa, but from time to time I do fill the role of de facto DBA. The following are some notes and observations from a recent, successful, migration from Microsoft SQL Server to PostgreSQL. Is it everything you need to know? Absolutely not. But there are some points and resources that will hopefully be helpful.
A Note on Versions
Many of the articles that I came across comparing SQL Server and Postgres, aside from those that were useless from a practical standpoint, were outdated. That is, comparing PG 9.3 with SQL Server 2014 just isn't going to paint an accurate picture of where they stand today (February 2020). So, for context, this was a migration from a dedicated instance of MS SQL Server 2016, as well as cloud-based Azure SQL Databases to DigitalOcean's managed PostgreSQL hosting, running PostgreSQL v11.
Moving the Database
I thought this would be the hard part, but the actual migration of the data from one database engine to the other turned out to be rather straightforward, thanks to the aptly named tool: MS SQL to PostgreSQL. The data transfers was fast, it retained foreign keys and indexes, and allowed a high degree of control over the operation and handling of tables and columns. A few dry runs to testing servers were necessary to work out the exact settings and process, but we were very happy with the results. The more labor intensive part of changing database engines was updating our application code to be PostgreSQL compatible.
Updating Application Code for PostgreSQL
While not comprehensive, here are some of the differences between SQL Server and Postgres that we needed to account for in our applications. In no particular order:
Lowercase table and column names
PostgreSQL handles the casing of identifiers differently than SQL Server, which is to say, case matters. There are numerous discussions about this online, so I won't repeat them. The approach we took was to lowercase all identifers. At the database level, this was done by the data transfer tool mentioned earlier, but within the application, we had to do it manually. No magic tool here. We worked our way through the code base, lowercasing as we went. Visual Studio Code's shortcut Command-K-L (⌘+K+L) came in very handy.
Switch from TOP to LIMIT
Postgres, like MySQL, applies limits to the result set at the end of the SQL statement using
LIMIT, while SQL Server does this at the outset, using
TOP. So we had to go through updating queries from
SELECT TOP 1 to
SELECT ... LIMIT 1. It wasn't hard; it just takes a while.
Use RETURNING with INSERT to retrieve identity
When an INSERT statement generates an identity value, applications frequently have need to retrieve and use that generated identifier. For example, if you insert a new client, you probably want to know the client's ID. MS SQL Server provides
SCOPE_IDENTITY() for this purpose, and some programming languages will return the generated identifier value automatically following an INSERT query. With PostgreSQL, you need to modify your INSERT statement, adding a line that specifies the column value you need returned. Here's how that looks:
INSERT INTO client(firstname, lastname)
VALUES ('Testy', 'McTester')
-- returns the value generated for this record in the id column
The reason for this has to do with Postgres apparently not having a concept of a table's "identity". You can read more about it here on Stack Overflow.
CONCAT instead of
Postgres does not allow the use of the
+ operator for string concatenation, so queries composed like the example here will fail:
SELECT firstname + ' ' + lastname AS fullname
This syntax will need to be rewritten, using
CONCAT, as shown here:
SELECT CONCAT(firstname, ' ', lastname) AS fullname
While putting this together, I learned that
CONCAT is supported in SQL Server - I just hadn't ever used it. While a bit strange at first, I prefer the
CONCAT syntax now; I find the resulting code to be cleaner - easier to read, write, and modify.
Accounting for the Boolean data type
In SQL Server, the bit data type is typically used as a stand in for boolean values. Postgres provides a true boolean data type, unsurprisingly named boolean. However, because they are fundementally different data types (though meant to convey the same thing), they behave in subtly different ways.
For example, the bit data type can be compared to integers, as well as boolean string values, as seen in these SQL Server examples:
-- SQL Server
-- bit to integer comparison (valid)
WHERE status = 1
-- bit to boolean string comparison (valid)
WHERE status = 'true'
However, SQL Server does not allow columns of the bit data type to be actually used as booleans:
-- SQL Server
-- bit to boolean comparison (not valid)
WHERE status = TRUE
-- bit as boolean implying truth (not valid)
On the other hand, the PostgreSQL boolean data type cannot be compared with integers; it results in the following error:
ERROR: operator does not exist: boolean = integer (which we saw a lot of) - that is, the following is invalid in Postgres:
-- boolean to integer comparison (not valid)
WHERE status = 1
However, as you would expect, the boolean data type in Postgres does function as a true boolean, so the following examples are valid in Postgres:
-- boolean to boolean comparison (valid)
WHERE status = TRUE
-- boolean auto cast to boolean string (valid)
WHERE status = 'TRUE'
WHERE status = 'T'
-- boolean implying true/false (valid)
WHERE NOT status
All of which is a long-winded way to say that there were instances in our codebase where we were comparing boolean columns to 1 or 0, and we needed to update them to use the actual booleans TRUE or FALSE.
Different reserved keywords
I don't have a comprehensive list of the different reserved keywords - just a note that this difference can cause issues. In our case, we had a column named
offset, which was acceptable in SQL Server, but which we needed to quote in order to use in Postgres, as the keyword is reserved. For reference, here's the list of SQL Server reserved keywords, and here are the Postgres SQL keywords.
Formatting dates and times:
This is a minor difference, but even minor differences need to be changed. When formatting dates, as in a SELECT statement, the function and syntax used are different. While SQL Server used
FORMAT, Postgres uses
to_char, as in this example:
-- SQL Server
FORMAT(lastmodified,'M/d/yy h:mm tt') AS readabledatetime
to_char(lastmodified,'FMMM/FMDD/YY HH12:MI AM') AS readabledatetime
-- Example Output: 8/1/19 8:21 PM
Here are the Postgres docs on formatting, which you'll need to review if this type of change impacts you. I was impressed with how configurable the formatting options were - for instance, note that in the example above, the
FM prefix is used to supress leading 0s.
STUFF... FOR XML PATH('') to
If you've never needed to use SQL Server's
FOR XML PATH('') to concatenate the results of a subquery as a list... then count yourself lucky and just skip this section. If you have, then you know that the syntax is not straightforward and can be difficult to parse. Here's an example of what I'm talking about (along with a SQL fiddle if you want to follow along):
STUFF((SELECT ', ' + title
WHERE authorid = a.authorid
FOR xml PATH ('')), 1, 2, ''
) AS works
FROM author a
Thankfully, PostgreSQL uses a more intuitive function,
string_agg to provide this functionality, so here's how you would rewrite the above example:
(SELECT string_agg(title, ', ')
WHERE authorid = a.authorid) AS works
FROM author a
string_agg function actually allows you to dispense with the subquery and use a JOIN/GROUP BY instead, so you could also rewrite the example as:
string_agg(at.title, ', ') AS works
FROM author a
LEFT OUTER JOIN author_title at
ON a.authorid = at.authorid
GROUP BY a.author
While database incompatibilities are always work to resolve, cases like this are nice, in that the changes at least feel like improvements. That said, I also found out that
string_agg is now available in SQL Server, as of SQL Server 2017.
Obviously, I haven't covered all of the differences between Microsoft SQL Server and PostreSQL. The process and issues you encounter will depend highly on your application architecture. For example, we don't use common table expressions (CTEs), which apparently can cause performance issues in all but the latest (v12) versions of Postgres. Bottom line - you can do all the reading you want about incompatibilities, but there's no substitute for testing, testing, testing your application code.
For a further discussion of this topic, I'd recommend this Reddit thread: Coming from SQL Server; it touches on some of the benefits of moving to Postgres as well, which I really don't have the space to explore here. If you've made a similar migration, or just have more familiarity with database engines, please share you insights in the comments!
The biggest issue that we encountered was actually resolved with the latest update to the app (v4.5) - in earlier versions, when transferring tables with a primary key comprised of multiple columns, the order of the columns within the key/index was sometimes changed. I should also note that it's not a free tool. But, at $49 (with a limited, free trial), we found it well worth the time it saved. ↩︎
And yes, I know that a query-builder library or ORM would mitigate much of this, but it's still good to know. ↩︎