Skip to content

Conversation

@DrewKimball
Copy link
Collaborator

PL/pgSQL routines use a savepoint to revert any modifications that happened within a block that has an exception handler. It was previously possible to trigger an assertion failure because RollbackToSavepoint can leave the read sequence number within the range of ignored sequence numbers. If the user-provided exception handling code contained non-volatile reads, we wouldn't step the read sequence number.

This commit fixes the bug by stepping the txn after restoring the savepoint during exception handling.

Fixes #156412

Release note (bug fix): Fixed a bug that could cause an internal error in some cases for PL/pgSQL routines that perform database reads within an exception block.

PL/pgSQL routines use a savepoint to revert any modifications that happened
within a block that has an exception handler. It was previously possible to
trigger an assertion failure because `RollbackToSavepoint` can leave the
read sequence number within the range of ignored sequence numbers. If the
user-provided exception handling code contained non-volatile reads, we
wouldn't step the read sequence number.

This commit fixes the bug by stepping the txn after restoring the savepoint
during exception handling.

Fixes cockroachdb#156412

Release note (bug fix): Fixed a bug that could cause an internal error
in some cases for PL/pgSQL routines that perform database reads within
an exception block.
@DrewKimball DrewKimball requested review from a team and michae2 and removed request for a team November 4, 2025 23:24
@DrewKimball DrewKimball requested a review from a team as a code owner November 4, 2025 23:24
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuzefovich reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2)


pkg/kv/txn.go line 1829 at r1 (raw file):

// This method is only valid when called on RootTxns.
//
// NB: after calling RollbackToSavepoint, the transaction's read sequence number

Hm, we should check with KV team about this. Currently it doesn't look like we do this in other places - e.g. in dispatchReadCommittedStmtToExecutionEngine within the retry loop we create a savepoint and then roll back to it if we hit a retry error, and even though we clear that error from the txn, we don't step the txn. Is that a bug, or is there something different from the routine usage of the txn?

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @stevendanna)


pkg/kv/txn.go line 1829 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Hm, we should check with KV team about this. Currently it doesn't look like we do this in other places - e.g. in dispatchReadCommittedStmtToExecutionEngine within the retry loop we create a savepoint and then roll back to it if we hit a retry error, and even though we clear that error from the txn, we don't step the txn. Is that a bug, or is there something different from the routine usage of the txn?

BTW we've seen this error in sentry (#139827 in PREPARE and #144383 in COPY) as well as on serverless last week (also in COPY).

Looks like there is some active related work in #156366. We should ask @stevendanna to weigh in on the approach here.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @stevendanna)


pkg/kv/txn.go line 1829 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

BTW we've seen this error in sentry (#139827 in PREPARE and #144383 in COPY) as well as on serverless last week (also in COPY).

Looks like there is some active related work in #156366. We should ask @stevendanna to weigh in on the approach here.

Indeed, we have bugs in those spots too. What happens is that on the main query path we unconditionally step the txn before every stmt, but COPY and PREPARE don't hit that path.

Comment on lines +1829 to +1830
// NB: after calling RollbackToSavepoint, the transaction's read sequence number
// must be stepped by calling Step() before any further reads are performed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is viable to change how RollbackToSavepoint works. If we adopted my PR here: #156366 and then changed RollbackToSavepoint to always step the read sequence what problems might we cause?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That behavior seems more intuitive for users of txns with stepping enabled. The only case I'm aware of where we want to keep an older sequence number is after returning from a volatile routine, and for that we already explicitly restore the sequence number.

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this in KV / SQL pod, and Steven will look into cleaning up the API around this, but as a bug fix, this seems good. :lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @stevendanna)


pkg/kv/txn.go line 1830 at r1 (raw file):

//
// NB: after calling RollbackToSavepoint, the transaction's read sequence number
// must be stepped by calling Step() before any further reads are performed.

nit: I think we can hit an assertion when doing writes too, so I'd reword the comment to be "before any further requests are performed".

Copy link
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice!

@michae2 reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @DrewKimball and @stevendanna)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: assertion failure caused by routine with exception handler

5 participants