PostgreSQL Data Migration Tips(engineering.tilt.com) |
PostgreSQL Data Migration Tips(engineering.tilt.com) |
FOR UPDATE NOWAIT immediately locks the rows being retrieved in the first step (as if they were to be updated)
Technically "FOR UPDATE" only makes an attempt to lock rows, and the "NOWAIT" instructs postgres, in the case that another transaction already has a lock on the row, to raise an error immediately instead of the default behavior of waiting for the lock to become available.The normal way to handle batch updates is to perform the loop outside of postgresql, so that each batch is in its own transaction.
AFAIK top-level plpgsql statements (including DO blocks run in psql) execute in a single transaction
This is true according to the docs. Anonymous code blocks are "transient anonymous functions", and functions are executed within a transaction. it seems like you end up slowly locking the entire table
The selected rows would be locked for update, delete, and select for updates, but not for regular reads. Perhaps his users table is used primarily for reads, which made this command run with negligible consequences?http://www.postgresql.org/docs/9.4/static/sql-do.html
http://www.postgresql.org/docs/9.4/static/plpgsql-structure....
https://gist.github.com/aanari/349c7d97ed50c6f69930#file-bat...
By creating a separate function for the locking and updating of rows, we ensure that the `BEGIN/END` transaction is handled per iteration rather than at the very end, so we only lock rows while they are being processed. Since Postgres does not support nested transaction blocks, calling a defined function from within an anonymous function block seemed to be the easiest and clearest path to achieve this.
It also means that god forbid it did die halfway through, and PG isn't smart enough to pick up where it left off safely, you won't lose any data, and at worst would end up with a duplicate archived row (easy enough to catch with some maintenance scripts and origin ids)
The anonymous block is implicitly called in a transaction, so all the calls to batch_at_will will be executed in the parent transaction; there's no way around this (except for using db_link but that's pretty smelly imo).
The only way to batch update while only locking rows in the batch is to run the loop outside of postgres like OP suggested.
In the end, queries are code, and code is our way to communicate or intent to the next developer, so it's better to do the delete after the insert
Transaction semantics are a fundamental feature of pg, of SQL, and of ACID databases generally. Anything that doesn't preserve them is not an "almost compatible alternative".
> Perhaps more importantly, it sends the message to future devs (including me) that rows should only be deleted after they've already been archived.
Using an explicit transaction, even if it is not necessary, communicates to future devs the actual intent, which isn't "rows deleted after they are archived" but "deletion and archive should occur in a single atomic step, and either both happen or neither happen." The ordered approach is just a way to express the least harm alternative in an environment that doesn't provide atomicity guarantees, but that's inappropriate when working in an environment that actually provides atomic transactions.
One of the biggest problems that programmers with a stronger background outside of databases often have in writing SQL DB code is that they keep using ingrained workarounds for the fact that many programming environments lack convenient support for grouping operations into atomic units in SQL, which supports such grouping.
I agree, but, to nitpick, if transaction semantics change that much, you aren't dealing with an almost compatible alternative. Further, I'll wager vast sums of money that pg will not change in that way, ever.
> In the end, queries are code, and code is our way to communicate or intent to the next developer, so it's better to do the delete after the insert
I wholeheartedly agree, so long as your code correctly captures the requirements. For ( contrived, uncommon ) example if the requirement states that, transactionally, the row must either be in either the live table or the archive table, but not both ( for reporting non-duplication, ) then the form that performs the insert-delete is the correct capture of that requirement.