Sunday, July 7, 2024

Bug hunting in MySQL with git bisect


Troubleshooting a database crash or regression can be a tedious task. Whether you have recently deployed new application changes, upgraded your database or "nothing has changed" ™ it can be tricky to corner why the database started behaving differently to how it has before.


In the case of a database crash or regression post upgrade, my usual first step in troubleshooting will be to take a look at the database workload and backtrace in the `mysql-error.log`, and see where it's crashing. From the stack you should be able to get a good idea of where the engine is crashing. 
Upon checking that, I will review release notes to see if any changes have been made in this area recently or if there are any open bug reports. Sometimes you will get a good idea of what change may have introduced the issue, but others can be difficult to corner and we have to resort to using other tools. The release note review process can be a lot easier if you only went from one minor to the next, but this becomes a lot tricker when moving multiple minors, 8.0.23 to 8.0.38 will have hundreds of changes.
Whether or not you get a good idea of what changed, the next step is to get a reliable repro. This is key for understanding the issue and opening a good quality bug report. In a lot of cases this can be very difficult, such as when "production" like workload and data is required; especially when debugging is required on production servers. 


Provided you can get a reliable way of reproducing the crash or regression you can then start narrowing down which release or change may have led to this change in behavior. Here I will show how you can do this using git bisect


As per the docs:

This command uses a binary search algorithm to find which commit in your project’s history introduced a bug. You use it by first telling it a "bad" commit that is known to contain the bug, and a "good" commit that is known to be before the bug was introduced. Then git bisect picks a commit between those two endpoints and asks you whether the selected commit is "good" or "bad". It continues narrowing down the range until it finds the exact commit that introduced the change.


 

Git bisect process

With git bisect, you can specify two points in time(branches, tags or commits), execute a reproducible test case and evaluate if the regression/change of behavior is still present. Once the bisect process is complete you should be presented with a commit where the issue started occurring. 

The most well-known process to:

  1. Start the bisect process outlining the known good and bad commits/branches.
  2. Build MySQL
  3. Run your test
    1. If test fails: Mark as bad
    2. If test Succeeds: Mark as good
  4. Repeat

You continue this process until you are presented with the “first” offending commit. This can be quite a tedious process of manually stepping through commits, so here we will demonstrate using git bisect run to automatically handle this for you. With git bisect run you create a script which builds MySQL and executes your test case. Based on the exit status, bisect will automatically mark it as good or bad, and eventually spit out the first bad commit. Just start it, go for a coffee and come back to your bad commit! 

Note: See the git bisect docs for more information on how exit status codes are handled. In general, exit code 0 will be “good”, non-zero will be “bad“. 


Once you have the commit, you can start focusing your debugging on a single area of the code, and if you have upgraded multiple minor releases find which release the regression was introduced.
Even if you do not feel comfortable, or it’s not possible to revert or fix the issue, knowing what commit introduced a change in behavior can help in a few ways :

  • It improves the quality of bug reports. All bug reports should include reproducible test cases, this makes things a lot easier for the bug verification team to identify cause and introduce a fix. Providing the commit which introduced the regression is even better; make life easier for the database developers!
  • Knowing what area of the code triggers the behavior can also help DBAs and Application developers better understand what caused the regression, and if possible, introduce an application side change which can help in avoiding the problem until a permanent fix is in place. For example, if you see the regression/crash is caused by a commit relating to INSTANT DDL in InnoDB, you can make application-level changes to avoid this to restore availability until a permanent fix is available.

In this post I will demonstrate using git bisect run to identify a commit which led to a regression in 8.0.37, that has since been fixed in 8.0.38. You can see more information on the bug here PS-9222 , originally reported in private bug MySQL bug # 114837. I have chosen this as we have a reproducible test case already using the MySQL test framework (MTR). 

Note:
The MySQL test framework (MTR) is being used here but it is not required. You can do whatever test you want in your shell script; you just need to ensure the correct exit status is returned so 
git bisect run knows whether it should mark is as good or bad. For example, here is a simple script which catches the exit code and returns something git bisect run
 is comfortable with:


#!/bin/bash

# Build MySQL
build_mysql.sh 

# Start MySQL binary with a custom data directory
~/mysql-server/bld/runtime_output_directory/mysqld \
--defaults-group-suffix=.1 --defaults-file=~/my.cnf \
--log-output=file --explain-format=TRADITIONAL_STRICT \
--loose-debug-sync-timeout=600 --initialize=0 --core-file \
--init-file=~/init.sql

exit_code=$?
if [[ ${exit_code} -gt 127 ]]; then
    exit 1
fi

Example


The first step is to get the mysql-server source code
git clone https://github.com/mysql/mysql-server.git
Once you have the source code, checkout your branch. In MySQL, each release has its own tag. So, we will checkout mysql-8.0.37

cd mysql-server
git checkout mysql-8.0.37

Now we will get the test cases from PS-9222.
  1. mysql-test/suite/innodb/include/instant_ddl_recovery.inc
  2. mysql-test/suite/innodb/r/instant_alter_stored_column_order.result
  3. mysql-test/suite/innodb/r/instant_ddl_recovery_debug.result
  4. mysql-test/suite/innodb/t/instant_alter_stored_column_order.test

Build MySQL and verify that the above test cases fail on mysql-8.0.37. I won’t go through the MySQL build process here, but here are the results of the above.

mkdir mysql-server/bld
cd mysql-server/bld
cmake .. -DWITH_DEBUG=1 -DWITH_SSL=system -DWITH_ZLIB=bundled -DMYSQL_MAINTAINER_MODE=0 -DENABLED_LOCAL_INFILE=1  -DDOWNLOAD_BOOST=1 -DWITH_BOOST=$PWD/../boost \
  -DCMAKE_CXX_FLAGS="$CXXF" -DCMAKE_C_FLAGS="$CF" \
  -DWITH_ROUTER=OFF -DWITH_MYSQLX=OFF -DWITH_UNIT_TESTS=OFF
make -j 8

Note: Here I’m building a debug build. It’s not a requirement for git bisect but a lot of MTR tests will require it. If you are not using MTR, you can build non-debug and use it in the same way


Once built, try running the test using mtr in your bld directory, this is linked to mysql-test-run-pl. (see here for usage information):

 /home/ec2-user/mysql-server/bld/mysql-test/mtr innodb.instant_alter_stored_column_order;


Upon running the test, you will see that the test did not pass, the engine crashed(reproduced the issue!) and a non-zero exit status was returned. 


[ec2-user@i-0c6e9baba12c0d932 mysql-server]$ /home/ec2-user/mysql-server/bld/mysql-test/mtr innodb.instant_alter_stored_column_order;
Logging: /home/ec2-user/mysql-server/mysql-test/mysql-test-run.pl  innodb.instant_alter_stored_column_order
MySQL Version 8.0.37
Checking supported features
 - Binaries are debug compiled
Using 'all' suites
Collecting tests
Checking leftover processes
Removing old var directory
Creating var directory '/home/ec2-user/mysql-server/bld/mysql-test/var'
Installing system database
Using parallel: 1

==============================================================================
                  TEST NAME                       RESULT  TIME (ms) COMMENT
------------------------------------------------------------------------------
[ 50%] innodb.instant_alter_stored_column_order  [ fail ]
        Test ended at 2024-07-07 22:50:42

CURRENT_TEST: innodb.instant_alter_stored_column_order
mysqltest: At line 17: Query 'UPDATE t1 SET c4 = 'value' WHERE c1 = 'k1'' failed.
ERROR 2013 (HY000): Lost connection to MySQL server during query

The result from queries just before the failure was:
# PS-9222 - Testing if ALGORITHM=instant crashes server
CREATE TABLE t1 (c1 TINYTEXT COLLATE ascii_bin NOT NULL, c2 DATETIME(3) NOT NULL, c3 TEXT, PRIMARY KEY (c1(30)));
INSERT INTO t1 (c1, c2, c3) VALUE ('k1','2021-12-21','something');
INSERT INTO t1 (c1, c2, c3) VALUE ('k3','2021-12-21','something else');
ALTER TABLE t1 ADD COLUMN c4 VARCHAR(18) NOT NULL, algorithm=instant;
UPDATE t1 SET c4 = 'value' WHERE c1 = 'k1';
safe_process[187000]: Child process: 187001, exit: 1


Server [mysqld.1 - pid: 186954, winpid: 186954, exit: 256] failed during test run
Server log from this test:
----------SERVER LOG START-----------
2024-07-07T22:50:22.121344Z 0 [Note] [MY-013667] [Server] Error-log destination "stderr" is not a file. Can not restore error log messages from previous run.
2024-07-07T22:50:22.116996Z 0 [Warning] [MY-013711] [Server] Manifest file '/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld.my' is not read-only. For better security, please make sure that the file is read-only.
2024-07-07T22:50:22.118817Z 0 [Warning] [MY-010099] [Server] Insecure configuration for --secure-file-priv: Data directory is accessible through --secure-file-priv. Consider choosing a different directory.
2024-07-07T22:50:22.118828Z 0 [Warning] [MY-010101] [Server] Insecure configuration for --secure-file-priv: Location is accessible to all OS users. Consider choosing a different directory.
2024-07-07T22:50:22.118909Z 0 [Note] [MY-013932] [Server] BuildID[sha1]=22bfc288cb831a3fc0bdeee5e573d5e8649563d4
2024-07-07T22:50:22.118923Z 0 [Note] [MY-010949] [Server] Basedir set to /home/ec2-user/mysql-server/bld/.
2024-07-07T22:50:22.118933Z 0 [System] [MY-010116] [Server] /home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld (mysqld 8.0.37-debug) starting as process 186955
2024-07-07T22:50:22.125897Z 0 [Warning] [MY-010075] [Server] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 4aaa3aec-3cb3-11ef-a9d1-0e76622b41ab.
2024-07-07T22:50:22.130885Z 0 [Note] [MY-012366] [InnoDB] Using Linux native AIO
2024-07-07T22:50:22.131239Z 0 [Note] [MY-010747] [Server] Plugin 'FEDERATED' is disabled.
2024-07-07T22:50:22.131324Z 0 [Note] [MY-010747] [Server] Plugin 'ndbcluster' is disabled.
2024-07-07T22:50:22.131357Z 0 [Note] [MY-010747] [Server] Plugin 'ndbinfo' is disabled.
2024-07-07T22:50:22.131368Z 0 [Note] [MY-010747] [Server] Plugin 'ndb_transid_mysql_connection_map' is disabled.
2024-07-07T22:50:22.133583Z 1 [System] [MY-013576] [InnoDB] InnoDB initialization has started.
2024-07-07T22:50:22.133620Z 1 [Note] [MY-013546] [InnoDB] Atomic write enabled
2024-07-07T22:50:22.133675Z 1 [Note] [MY-012932] [InnoDB] PUNCH HOLE support available
2024-07-07T22:50:22.133690Z 1 [Note] [MY-012937] [InnoDB] !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
2024-07-07T22:50:22.133695Z 1 [Note] [MY-012944] [InnoDB] Uses event mutexes
2024-07-07T22:50:22.133701Z 1 [Note] [MY-012945] [InnoDB] GCC builtin __atomic_thread_fence() is used for memory barrier
2024-07-07T22:50:22.133708Z 1 [Note] [MY-012948] [InnoDB] Compressed tables use zlib 1.2.13
2024-07-07T22:50:22.137355Z 1 [Note] [MY-012951] [InnoDB] Using hardware accelerated crc32 and polynomial multiplication.
2024-07-07T22:50:22.137865Z 1 [Note] [MY-012203] [InnoDB] Directories to scan './'
2024-07-07T22:50:22.137908Z 1 [Note] [MY-012204] [InnoDB] Scanning './'
2024-07-07T22:50:22.138749Z 1 [Note] [MY-012208] [InnoDB] Completed space ID check of 8 files.
2024-07-07T22:50:22.139293Z 1 [Note] [MY-012955] [InnoDB] Initializing buffer pool, total size = 24.000000M, instances = 1, chunk size =24.000000M
2024-07-07T22:50:22.143342Z 1 [Note] [MY-012957] [InnoDB] Completed initialization of buffer pool
2024-07-07T22:50:22.148312Z 0 [Note] [MY-011952] [InnoDB] If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2024-07-07T22:50:22.148587Z 1 [Note] [MY-013532] [InnoDB] Using './#ib_16384_0.dblwr' for doublewrite
2024-07-07T22:50:22.148880Z 1 [Note] [MY-013532] [InnoDB] Using './#ib_16384_1.dblwr' for doublewrite
2024-07-07T22:50:22.159089Z 1 [Note] [MY-013566] [InnoDB] Double write buffer files: 2
2024-07-07T22:50:22.159109Z 1 [Note] [MY-013565] [InnoDB] Double write buffer pages per instance: 4
2024-07-07T22:50:22.159140Z 1 [Note] [MY-013532] [InnoDB] Using './#ib_16384_0.dblwr' for doublewrite
2024-07-07T22:50:22.159155Z 1 [Note] [MY-013532] [InnoDB] Using './#ib_16384_1.dblwr' for doublewrite
2024-07-07T22:50:22.229760Z 1 [Note] [MY-013883] [InnoDB] The latest found checkpoint is at lsn = 20116776 in redo log file ./#innodb_redo/#ib_redo61.
2024-07-07T22:50:22.229839Z 1 [Note] [MY-013086] [InnoDB] Starting to parse redo log at lsn = 20116527, whereas checkpoint_lsn = 20116776 and start_lsn = 20116480
2024-07-07T22:50:22.234657Z 1 [Note] [MY-013083] [InnoDB] Log background threads are being started...
2024-07-07T22:50:22.235450Z 1 [Note] [MY-012532] [InnoDB] Applying a batch of 0 redo log records ...
2024-07-07T22:50:22.235464Z 1 [Note] [MY-012535] [InnoDB] Apply batch completed!
2024-07-07T22:50:22.235902Z 1 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_001'.
2024-07-07T22:50:22.236084Z 1 [Note] [MY-013252] [InnoDB] Using undo tablespace './undo_002'.
2024-07-07T22:50:22.236510Z 1 [Note] [MY-012910] [InnoDB] Opened 2 existing undo tablespaces.
2024-07-07T22:50:22.236573Z 1 [Note] [MY-011980] [InnoDB] GTID recovery trx_no: 1086
2024-07-07T22:50:22.294196Z 1 [Note] [MY-013776] [InnoDB] Parallel initialization of rseg complete
2024-07-07T22:50:22.294247Z 1 [Note] [MY-013777] [InnoDB] Time taken to initialize rseg using 4 thread: 57685 ms.
2024-07-07T22:50:22.294501Z 1 [Note] [MY-012923] [InnoDB] Creating shared tablespace for temporary tables
2024-07-07T22:50:22.294574Z 1 [Note] [MY-012265] [InnoDB] Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2024-07-07T22:50:22.326292Z 1 [Note] [MY-012266] [InnoDB] File './ibtmp1' size is now 12 MB.
2024-07-07T22:50:22.326574Z 1 [Note] [MY-013627] [InnoDB] Scanning temp tablespace dir:'./#innodb_temp/'
2024-07-07T22:50:22.368475Z 1 [Note] [MY-013018] [InnoDB] Created 128 and tracked 128 new rollback segment(s) in the temporary tablespace. 128 are now active.
2024-07-07T22:50:22.369219Z 1 [Note] [MY-012976] [InnoDB] 8.0.37 started; log sequence number 20116786
2024-07-07T22:50:22.369525Z 1 [System] [MY-013577] [InnoDB] InnoDB initialization has ended.
2024-07-07T22:50:22.384916Z 1 [Note] [MY-011089] [Server] Data dictionary restarting version '80023'.
2024-07-07T22:50:22.845682Z 1 [Note] [MY-012357] [InnoDB] Reading DD tablespace files
2024-07-07T22:50:22.849801Z 1 [Note] [MY-012356] [InnoDB] Scanned 10 tablespaces. Validated 10.
2024-07-07T22:50:22.924166Z 1 [Note] [MY-010006] [Server] Using data dictionary with version '80023'.
2024-07-07T22:50:23.700027Z 0 [Note] [MY-010902] [Server] Thread priority attribute setting in Resource Group SQL shall be ignored due to unsupported platform or insufficient privilege.
2024-07-07T22:50:23.715619Z 0 [Note] [MY-013911] [Server] Crash recovery finished in binlog engine. No attempts to commit, rollback or prepare any transactions.
2024-07-07T22:50:23.715648Z 0 [Note] [MY-013911] [Server] Crash recovery finished in InnoDB engine. No attempts to commit, rollback or prepare any transactions.
2024-07-07T22:50:23.721492Z 0 [Note] [MY-012487] [InnoDB] DDL log recovery : begin
2024-07-07T22:50:23.721664Z 0 [Note] [MY-012488] [InnoDB] DDL log recovery : end
2024-07-07T22:50:23.722107Z 0 [Note] [MY-011946] [InnoDB] Loading buffer pool(s) from /home/ec2-user/mysql-server/bld/mysql-test/var/mysqld.1/data/ib_buffer_pool
2024-07-07T22:50:23.728378Z 0 [Note] [MY-011946] [InnoDB] Buffer pool(s) load completed at 240708  1:50:23
2024-07-07T22:50:23.762999Z 0 [Note] [MY-010303] [Server] Skipping generation of SSL certificates as options related to SSL are specified.
2024-07-07T22:50:23.765041Z 0 [Warning] [MY-010068] [Server] CA certificate /home/ec2-user/mysql-server/mysql-test/std_data/cacert.pem is self signed.
2024-07-07T22:50:23.765066Z 0 [System] [MY-013602] [Server] Channel mysql_main configured to support TLS. Encrypted connections are now supported for this channel.
2024-07-07T22:50:23.765077Z 0 [Note] [MY-010310] [Server] Skipping generation of RSA key pair as --sha256_password_auto_generate_rsa_keys is set to OFF.
2024-07-07T22:50:23.765082Z 0 [Note] [MY-010310] [Server] Skipping generation of RSA key pair as --caching_sha2_password_auto_generate_rsa_keys is set to OFF.
2024-07-07T22:50:23.766780Z 0 [Note] [MY-010252] [Server] Server hostname (bind-address): '*'; port: 13000
2024-07-07T22:50:23.766827Z 0 [Note] [MY-010253] [Server] IPv6 is available.
2024-07-07T22:50:23.766839Z 0 [Note] [MY-010264] [Server]   - '::' resolves to '::';
2024-07-07T22:50:23.766859Z 0 [Note] [MY-010251] [Server] Server socket created on IP: '::'.
2024-07-07T22:50:23.780117Z 0 [Warning] [MY-000067] [Server] unknown variable 'loose-mysqlx-port=13020'.
2024-07-07T22:50:23.780140Z 0 [Warning] [MY-000067] [Server] unknown variable 'loose-mysqlx-socket=/home/ec2-user/mysql-server/bld/mysql-test/var/tmp/mysqlx.1.sock'.
2024-07-07T22:50:23.780146Z 0 [Warning] [MY-000067] [Server] unknown variable 'loose-mysqlx-ssl=DISABLED'.
2024-07-07T22:50:23.780150Z 0 [Warning] [MY-000067] [Server] unknown variable 'loose-mysqlx-ssl-ca='.
2024-07-07T22:50:23.780153Z 0 [Warning] [MY-000067] [Server] unknown variable 'loose-mysqlx-ssl-cert='.
2024-07-07T22:50:23.780157Z 0 [Warning] [MY-000067] [Server] unknown variable 'loose-mysqlx-ssl-key='.
2024-07-07T22:50:23.856915Z 0 [Note] [MY-011025] [Repl] Failed to start replica threads for channel ''.
2024-07-07T22:50:23.865353Z 5 [Note] [MY-010051] [Server] Event Scheduler: scheduler thread started with id 5
2024-07-07T22:50:23.865943Z 0 [System] [MY-010931] [Server] /home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld: ready for connections. Version: '8.0.37-debug' socket: '/home/ec2-user/mysql-server/bld/mysql-test/var/tmp/mysqld.1.sock'  port: 13000  Source distribution.
2024-07-07T22:50:24.332276Z 9 [ERROR] [MY-013183] [InnoDB] Assertion failure: mtr0log.cc:647:total > (ulint)(log_ptr - log_start) thread 139761664595520
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
2024-07-07T22:50:24Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=22bfc288cb831a3fc0bdeee5e573d5e8649563d4
Thread pointer: 0x7f1c2c011bd0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f1ccc5f09f0 thread_stack 0x100000
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x43) [0x49e97e7]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(print_fatal_signal(int)+0x3a2) [0x35c5d19]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(my_server_abort()+0x6b) [0x35c5fcb]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(my_abort()+0xd) [0x49e01d5]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x1d1) [0x4de03e9]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4c0165a]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4c0176e]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4c03767]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(mlog_open_and_write_index(mtr_t*, unsigned char const*, dict_index_t const*, mlog_id_t, unsigned long, unsigned char*&)+0x5c5) [0x4c01d48]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4c38a52]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(page_cur_delete_rec(page_cur_t*, dict_index_t const*, unsigned long const*, mtr_t*)+0x363) [0x4c39019]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(btr_cur_pessimistic_update(unsigned long, btr_cur_t*, unsigned long**, mem_block_info_t**, mem_block_info_t*, big_rec_t**, upd_t*, unsigned long, que_thr_t*, unsigned long, unsigned long, mtr_t*, btr_pcur_t*)+0xe7e) [0x4e64749]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4d24445]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4d24d1b]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4d251af]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(row_upd_step(que_thr_t*)+0x248) [0x4d2566a]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4ca8b43]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(row_update_for_mysql(unsigned char const*, row_prebuilt_t*)+0xbb) [0x4ca8f2f]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(ha_innobase::update_row(unsigned char const*, unsigned char*)+0x423) [0x4a58f2d]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(handler::ha_update_row(unsigned char const*, unsigned char*)+0x1ea) [0x3770398]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(Sql_cmd_update::update_single_table(THD*)+0x214b) [0x34d3f63]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(Sql_cmd_update::execute_inner(THD*)+0x10c) [0x34d6d74]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(Sql_cmd_dml::execute(THD*)+0x6ed) [0x341d0b1]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(mysql_execute_command(THD*, bool)+0x2156) [0x33971cb]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(dispatch_sql_command(THD*, Parser_state*)+0x798) [0x339c811]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x16ec) [0x3392541]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(do_command(THD*)+0x5bd) [0x339048e]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x35b0edf]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x5685133]
/lib64/libc.so.6(+0x9f7f2) [0x7f1cd589f7f2]
/lib64/libc.so.6(+0x3f450) [0x7f1cd583f450]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f1c2c5cc1c0): UPDATE t1 SET c4 = 'value' WHERE c1 = 'k1'
Connection ID (thread ID): 9
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
safe_process[186954]: Child process: 186955, killed by signal: 6
----------SERVER LOG END-------------


 - the logfile can be found in '/home/ec2-user/mysql-server/bld/mysql-test/var/log/innodb.instant_alter_stored_column_order/instant_alter_stored_column_order.log'

[100%] shutdown_report                           [ pass ]
------------------------------------------------------------------------------
The servers were restarted 0 times
The servers were reinitialized 0 times
Spent 0.000 of 33 seconds executing testcases

Completed: Failed 1/2 tests, 50.00% were successful.

Failing test(s): innodb.instant_alter_stored_column_order

The log files in var/log may give you some hint of what went wrong.

If you want to report this error, please read first the documentation
at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html

mysql-test-run: *** ERROR: there were failing test cases
[ec2-user@i-0c6e9baba12c0d932 mysql-server]$ echo $?
1
[ec2-user@i-0c6e9baba12c0d932 mysql-server]$


Now that we have verified the above, we can stitch it all together into a single shell script, and start git bisect:


$ cat ~/bisect_commands.sh
#!/bin/bash
#

# Build MySQL
cd /home/ec2-user/mysql-server/bld
cmake .. -DWITH_DEBUG=1 -DWITH_SSL=system -DWITH_ZLIB=bundled -DMYSQL_MAINTAINER_MODE=0 -DENABLED_LOCAL_INFILE=1  -DDOWNLOAD_BOOST=1 -DWITH_BOOST=$PWD/../boost \
  -DCMAKE_CXX_FLAGS="$CXXF" -DCMAKE_C_FLAGS="$CF" \
  -DWITH_ROUTER=OFF -DWITH_MYSQLX=OFF -DWITH_UNIT_TESTS=OFF
make -j 8

# Run test case, after each build.
# If test passes(exit code 0), bisect will mark as good, otherwise bad. 
/home/ec2-user/mysql-server/bld/mysql-test/mtr innodb.instant_alter_stored_column_order;


Now start bisect specifying the good and bad releases. Here I’m using mysql-8.0.36 where I know we dont crash, and mysql-8.0.37 where I know we do crash!


[ec2-user@i-0c6e9baba12c0d932 bld]$ git bisect start mysql-8.0.37 mysql-8.0.36
Bisecting: 156 revisions left to test after this (roughly 7 steps)
[310704bc6b3a6dad52726bf8820f5a1e3194c844] Bug#35993709 Several MTR tests fail when MySQL Server built with VS2022


Then I execute git bisect run ~/bisect_commands.sh and go grab coffee!


[ec2-user@i-0c6e9baba12c0d932 bld]$ chmod +x ~/bisect_commands.sh
[ec2-user@i-0c6e9baba12c0d932 bld]$ git bisect run ~/bisect_commands.sh >> ~/bisect_log.txt 2>&1


It can take a while to complete, but you can tail the ~/bisect_log.txt to check progress. While running you will see outputs of the MTR test run, mysql logs and the bisect commands. For the bisect commands you will see something like the following after each build and test run letting you know how the bisect is progressing


[ec2-user@i-0c6e9baba12c0d932 ~]$ grep "Bisect" bisect_log.txt
Bisecting: 77 revisions left to test after this (roughly 6 steps)
Bisecting: 38 revisions left to test after this (roughly 5 steps)
Bisecting: 19 revisions left to test after this (roughly 4 steps)
Bisecting: 9 revisions left to test after this (roughly 3 steps)
Bisecting: 4 revisions left to test after this (roughly 2 steps)
Bisecting: 2 revisions left to test after this (roughly 1 step)
Bisecting: 0 revisions left to test after this (roughly 0 steps)

You can see the full output here, but looking at the log you will see the outputs from the build, and also that in some cases the test passed, while others it failed. This is expected and is part of the bisect process.


Failed:


Logging: /home/ec2-user/mysql-server/mysql-test/mysql-test-run.pl  innodb.instant_alter_stored_column_order
MySQL Version 8.0.37
Checking supported features
 - Binaries are debug compiled
Using 'all' suites
Collecting tests
Checking leftover processes
Removing old var directory
Creating var directory '/home/ec2-user/mysql-server/bld/mysql-test/var'
Installing system database
Using parallel: 1

==============================================================================
                  TEST NAME                       RESULT  TIME (ms) COMMENT
------------------------------------------------------------------------------
[ 50%] innodb.instant_alter_stored_column_order  [ fail ]
        Test ended at 2024-07-07 23:21:12
        Completed: Failed 1/2 tests, 50.00% were successful.
.
.
.
2024-07-07T23:20:55.043456Z 9 [ERROR] [MY-013183] [InnoDB] Assertion failure: mtr0log.cc:646:total > (ulint)(log_ptr - log_start) thread 139912252782144
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/8.0/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
2024-07-07T23:20:55Z UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
BuildID[sha1]=b8458559fd278542e79ef051af2982224cca197d
Thread pointer: 0x7f3f3c5b8d20
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f3fdc206a30 thread_stack 0x100000
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x43) [0x49e2b43]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(print_fatal_signal(int)+0x3a2) [0x35c0a1d]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(my_server_abort()+0x6b) [0x35c0ccf]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(my_abort()+0xd) [0x49d9531]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x1d1) [0x4dd936f]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4bfa71c]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4bfa830]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4bfc829]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(mlog_open_and_write_index(mtr_t*, unsigned char const*, dict_index_t const*, mlog_id_t, unsigned long, unsigned char*&)+0x5c5) [0x4bfae0a]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4c31aae]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(page_cur_delete_rec(page_cur_t*, dict_index_t const*, unsigned long const*, mtr_t*)+0x363) [0x4c32075]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(btr_cur_pessimistic_update(unsigned long, btr_cur_t*, unsigned long**, mem_block_info_t**, mem_block_info_t*, big_rec_t**, upd_t*, unsigned long, que_thr_t*, unsigned long, unsigned long, mtr_t*, btr_pcur_t*)+0xe7e) [0x4e5d707]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4d1d4a1]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4d1dd77]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4d1e20b]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(row_upd_step(que_thr_t*)+0x248) [0x4d1e6c6]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x4ca1b9f]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(row_update_for_mysql(unsigned char const*, row_prebuilt_t*)+0xbb) [0x4ca1f8b]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(ha_innobase::update_row(unsigned char const*, unsigned char*)+0x423) [0x4a5228b]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(handler::ha_update_row(unsigned char const*, unsigned char*)+0x1ea) [0x376afdc]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(Sql_cmd_update::update_single_table(THD*)+0x214b) [0x34cec67]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(Sql_cmd_update::execute_inner(THD*)+0x10c) [0x34d1a78]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(Sql_cmd_dml::execute(THD*)+0x6ed) [0x3417af1]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(mysql_execute_command(THD*, bool)+0x2156) [0x3391b99]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(dispatch_sql_command(THD*, Parser_state*)+0x798) [0x33971df]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(dispatch_command(THD*, COM_DATA const*, enum_server_command)+0x16ec) [0x338cf0f]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld(do_command(THD*)+0x5bd) [0x338ae5c]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x35abbe3]
/home/ec2-user/mysql-server/bld/runtime_output_directory/mysqld() [0x567d049]
/lib64/libc.so.6(+0x9f7f2) [0x7f3fe3a9f7f2]
/lib64/libc.so.6(+0x3f450) [0x7f3fe3a3f450]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f3f3c368790): UPDATE t1 SET c4 = 'value' WHERE c1 = 'k1'
Connection ID (thread ID): 9
Status: NOT_KILLED

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file
safe_process[220791]: Child process: 220792, killed by signal: 6
----------SERVER LOG END-------------
.
.
.
Failing test(s): innodb.instant_alter_stored_column_order

The log files in var/log may give you some hint of what went wrong.

If you want to report this error, please read first the documentation
at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html

mysql-test-run: *** ERROR: there were failing test cases
Bisecting: 4 revisions left to test after this (roughly 2 steps)


Passed:



Logging: /home/ec2-user/mysql-server/mysql-test/mysql-test-run.pl  innodb.instant_alter_stored_column_order
MySQL Version 8.0.37
Checking supported features
 - Binaries are debug compiled
Using 'all' suites
Collecting tests
Checking leftover processes
Removing old var directory
Creating var directory '/home/ec2-user/mysql-server/bld/mysql-test/var'
Installing system database
Using parallel: 1

==============================================================================
                  TEST NAME                       RESULT  TIME (ms) COMMENT
------------------------------------------------------------------------------
[ 50%] innodb.instant_alter_stored_column_order  [ pass ]   2530
[100%] shutdown_report                           [ pass ]
------------------------------------------------------------------------------
The servers were restarted 0 times
The servers were reinitialized 0 times
Spent 2.530 of 20 seconds executing testcases

Completed: All 2 tests were successful.

Bisecting: 2 revisions left to test after this (roughly 1 step)
[908d0de4420946a45bd89fd9bd79702fd5c28773] Bug #33970854  Connect timeout too long towards MGMD nodes

At the very end of the bisect log, you will see a message indicating what the “first bad commit“ is:


e6e13a8f64ca1005f0d2d4da8463fdbef3767917 is the first bad commit
commit e6e13a8f64ca1005f0d2d4da8463fdbef3767917
Author: Yasufumi Kinoshita <yasufumi.kinoshita@oracle.com>
Date:   Tue Aug 22 12:03:52 2023 +0900

    Bug#35183686: ALTER_STORED_COLUMN_ORDER with algorithm=INSTANT makes wrong INSERT REDO log

    REDO log is not logged (non instant add/del) column order change with instant DDL.
    It might cause wrong REDO log replay when recovery, very danger.

    The fix makes the column order change logged also correctly.

    Change-Id: I7e617735bb9d327136b18b5165039e1155f1fe50

 .../suite/innodb/include/instant_ddl_recovery.inc  | 23 +++++++++++
 .../innodb/r/instant_ddl_recovery_debug.result     | 45 ++++++++++++++++++++
 storage/innobase/mtr/mtr0log.cc                    | 48 ++++++++++++++++++----
 3 files changed, 107 insertions(+), 9 deletions(-)
bisect found first bad commit



In this case it identified the first bad commit as “Bug#35183686: ALTER_STORED_COLUMN_ORDER with algorithm=INSTANT makes wrong INSERT REDO log”, which as we guessed, is related to instant DDL changes in InnoDB, that was introduced in MySQL 8.0.37. This makes sense based on our repro! After reviewing release notes on 8.0.38, you will be happy to see this is already fixed so you can just upgrade![1] And remember, you have a test case; so why not test it out on 8.0.38 prior to upgrading!

[1]

InnoDB: MySQL unexpectedly halted on an UPDATE after an ALTER TABLE operation. (Bug #36571091)
References: This issue is a regression of: Bug #35183686.

ref


What now?

Now that you know the offending commit, you have a few options.

  1. If you are not on the latest MySQL version, try checking release notes and upgrading to see if the issue is fixed. In this case, the issue is fixed in 8.0.38. 
  2. If the issue is not fixed, try digging deeper into the cause of the issue. Knowing what introduced the bug is a great head start into understanding why, and a kickstart to the debugging process. Read the commit message as to why this change was made and familiarize yourself with this area of code. Once you have done this consider making a contribution to have it included in a future release. If you don’t feel comfortable or cannot fix the issue, don’t fret! You should still open a bug report including your repro, test case and let the bug verification team know what you think may be causing the issue, or at least what commit you think introduced the problem. I’m sure they would be happy with the leg up!
  3. Even if you do open a bug report, MySQL releases are quarterly and prior to release, there are no announcements if/when your contribution will be accepted or the bug fixed. So it may be a good idea, if not already fixed in another release to evaluate if any application or database parameter changes could alleviate the pain. For the example above, since its related to INSTANT DDL, one option could be to consider using other DDL types such as the INPLACE DDL algorithm, or an online schema change tool such as PTOSCGH-ost or Spirit.


Thanks for reading this far, hope this will help someone else out!

 

No comments:

Post a Comment

Bug hunting in MySQL with git bisect

Troubleshooting a database crash or regression can be a tedious task. Whether you have recently deployed new application changes, upgraded y...