Monday, October 14, 2024

InnoDB data_locks: a happy ending/new beginning in MySQL 8.0.40 ! (?)

 

Earlier today lefred tweeted that the new MySQL releases were rolling out today! As per the usual release process, the yum repositories typically have the release rpms available first, followed by github (release tags here) and then the release notes.


Since the release notes/commits are not out yet(at time of writing!) I’ll hold back on commenting on other changes, but one change I’ve really been looking forward to is improvements in the performance_schema.data_locks and performance_schema.data_lock_waits tables which I was notified of in my Bug #112035 report last month. You can see more on these tables in the MySQL docs, but at a high level these views can be used to view and troubleshoot locks being held by InnoDB. You have lock contention or want to see a lock tree? Check these tables!

But not so fast!!!

For years now, when dealing with high concurrency MySQL deployments, touching these tables should be done with extreme caution, especially in an automated fashion! There have been multiple bugs reported [1] where querying these views on systems under heavy load can lead to InnoDB freezing up and stalling, which is the last thing you need when you are trying to debug a server which is already struggling due to lock contention! I’ve lost count of the number monitoring scripts and services I have seen bitten by this over the years.

Just to demonstrate, here is what we can see in mysql 8.0.39, if you attempt to:
  1. Run a sysbench workload (top pane)
  2. Have an open transaction holding many row locks (middle pane). Here I' doing so on an unrelated set of tables(20m rows) to the foreground transactions for demonstration purposes. 
  3. In a third session, select from the performance_schema.data_locks table (bottom pane)
MySQL 8.0.39 : limit clause

As you can see, when the select ENGINE,ENGINE_LOCK_ID,ENGINE_TRANSACTION_ID from data_locks limit 1; is executed, the sysbench workload stalls to zero until the query is completed(14.4 seconds here). But why is this? To simplify, its because when querying this table, InnoDB must gather the trx_sys mutex while scanning through all the active transactions/row locks. If there are a lot of active transactions or locks being held by InnoDB, the mutex can be held for quite some time while the view is materializing, blocking foreground transactions from “doing work“. (see linked bug reports for more detailed info)
So when I seen the following in my bug report Bug #112035 last month I was pretty excited. 

Posted by developer:

Added the following note to the MySQL Server 8.0.40 release notes:

Redesigned the performance schema data_locks and data_lock_waits tables so
that querying them does not require an exclusive global mutex on the
transaction or lock system. It now iterates over buckets of hash tables
that hold the locks to only latch the actively processed shard, when
previously it iterated over the transactions. This also improves the
iteration logic complexity in terms of speed and memory to decrease the
impact of these queries on the rest of the system.

Note that the query result might show an incomplete list of transaction locks
if it committed, started, or otherwise changed the set of owned locks
in-between visiting two buckets. This differs from previous behavior which
always showed a consistent snapshot of locks held by individual
transactions, although two different transactions could have been
presented at different moments. In other words, the new approach gives a
consistent view of a single wait queue to show conflicting locks with a
waiting lock because they are always in the same bucket, while the old
approach could miss some of them because they belonged to other
transactions. The old approach would always show all the other locks held
by a reported transaction but could miss locks of other transactions even
if they were conflicting.
Also, thanks to the MySQL team for the detailed explanations in the proposed release note entry!

So back to where we started, when I seen lefred’s tweet I couldn't wait to give it a spin. After running the same 8.0.39 test from above on 8.0.40, I could no longer see the stall, and my basic query completed in less than a second.
MySQL 8.0.40 : limit clause

Running without the limit clause will increase execution time as we need to materialize the view, but will not cause a stall for the entire duration in my test case, as noticed in 8.0.39:
MySQL 8.0.40 : No limit clause

Conclusion

- While this is only a first look and not an extensive test, so be sure to test and report any issues you may find. Saying that, its a very promising sign and hopefully will put this to bed; allowing users monitor, troubleshoot and track locks/waits more granularly, with less risk, going forward. 

- Note the changes in behavior for these tables starting from 8.0.40 from bug report Bug #112035 and review 8.0/8.4/9.x release notes for final information once released. 

- Time will tell, but if the above holds true, its a good enough reason to go to 8.0.40 on its own!

Also, kudos to the MySQL team for including this fix in the 8.0 releases, and not just LTS and innovation releases. In the past this has been the case as mentioned by JFG here, but nice to see they have taken feedback onboard. Exited to see what else is included in 8.0.40/8.4.3/9.2.0 when their release notes are put out.(not available at time of writing) Long live da Dolphin!! 🐬 :-)

Appendix:

Where you will find MySQL release notes once published:


Some MySQL docs on locking:

Some other resources with lots more deep dive info on InnoDB internals/Locking



[1] Example bug reports for the data_locks tables, not exhaustive.

No comments:

Post a Comment

InnoDB data_locks: a happy ending/new beginning in MySQL 8.0.40 ! (?)

  Earlier today lefred tweeted that the new MySQL releases were rolling out today! As per the usual release process, the yum repositories t...