-
Notifications
You must be signed in to change notification settings - Fork 4k
sql: step txn after restoring from a savepoint for exception handling #156902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
PL/pgSQL routines use a savepoint to revert any modifications that happened within a block that has an exception handler. It was previously possible to trigger an assertion failure because `RollbackToSavepoint` can leave the read sequence number within the range of ignored sequence numbers. If the user-provided exception handling code contained non-volatile reads, we wouldn't step the read sequence number. This commit fixes the bug by stepping the txn after restoring the savepoint during exception handling. Fixes cockroachdb#156412 Release note (bug fix): Fixed a bug that could cause an internal error in some cases for PL/pgSQL routines that perform database reads within an exception block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yuzefovich reviewed 3 of 3 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @michae2)
pkg/kv/txn.go line 1829 at r1 (raw file):
// This method is only valid when called on RootTxns. // // NB: after calling RollbackToSavepoint, the transaction's read sequence number
Hm, we should check with KV team about this. Currently it doesn't look like we do this in other places - e.g. in dispatchReadCommittedStmtToExecutionEngine within the retry loop we create a savepoint and then roll back to it if we hit a retry error, and even though we clear that error from the txn, we don't step the txn. Is that a bug, or is there something different from the routine usage of the txn?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @stevendanna)
pkg/kv/txn.go line 1829 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Hm, we should check with KV team about this. Currently it doesn't look like we do this in other places - e.g. in
dispatchReadCommittedStmtToExecutionEnginewithin the retry loop we create a savepoint and then roll back to it if we hit a retry error, and even though we clear that error from the txn, we don't step the txn. Is that a bug, or is there something different from the routine usage of the txn?
BTW we've seen this error in sentry (#139827 in PREPARE and #144383 in COPY) as well as on serverless last week (also in COPY).
Looks like there is some active related work in #156366. We should ask @stevendanna to weigh in on the approach here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @stevendanna)
pkg/kv/txn.go line 1829 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
BTW we've seen this error in sentry (#139827 in PREPARE and #144383 in COPY) as well as on serverless last week (also in COPY).
Looks like there is some active related work in #156366. We should ask @stevendanna to weigh in on the approach here.
Indeed, we have bugs in those spots too. What happens is that on the main query path we unconditionally step the txn before every stmt, but COPY and PREPARE don't hit that path.
| // NB: after calling RollbackToSavepoint, the transaction's read sequence number | ||
| // must be stepped by calling Step() before any further reads are performed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is viable to change how RollbackToSavepoint works. If we adopted my PR here: #156366 and then changed RollbackToSavepoint to always step the read sequence what problems might we cause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That behavior seems more intuitive for users of txns with stepping enabled. The only case I'm aware of where we want to keep an older sequence number is after returning from a volatile routine, and for that we already explicitly restore the sequence number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this in KV / SQL pod, and Steven will look into cleaning up the API around this, but as a bug fix, this seems good.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @stevendanna)
pkg/kv/txn.go line 1830 at r1 (raw file):
// // NB: after calling RollbackToSavepoint, the transaction's read sequence number // must be stepped by calling Step() before any further reads are performed.
nit: I think we can hit an assertion when doing writes too, so I'd reword the comment to be "before any further requests are performed".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michae2 reviewed 3 of 3 files at r1, all commit messages.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @DrewKimball and @stevendanna)
PL/pgSQL routines use a savepoint to revert any modifications that happened within a block that has an exception handler. It was previously possible to trigger an assertion failure because
RollbackToSavepointcan leave the read sequence number within the range of ignored sequence numbers. If the user-provided exception handling code contained non-volatile reads, we wouldn't step the read sequence number.This commit fixes the bug by stepping the txn after restoring the savepoint during exception handling.
Fixes #156412
Release note (bug fix): Fixed a bug that could cause an internal error in some cases for PL/pgSQL routines that perform database reads within an exception block.