从错误日志中检查,有两个信号量等待时间超长,MySQL自动crash了:
# grep " 600 " 31-226-mysql-error.log
2021-12-27T02:25:15.086231Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
2021-12-27T03:22:41.057752Z 0 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
# grep "ready for connections" 31-226-mysql-error.log
2021-12-27T02:57:54.430405Z 0 [Note] /usr/sbin/mysqld: ready for connections.
2021-12-27T04:01:35.813766Z 0 [Note] /usr/sbin/mysqld: ready for connections.
srv_error_monitor_thread发现存在阻塞超过600s的latch锁时,如果连续多次检测该锁仍没有释放,就会触发自动crash,避免MySQL持续hung住。
检查等待信息量时间长的线程有3个:
# grep "Thread 140487646721792 " 31-226-mysql-error.log
--Thread 140487646721792 has waited at buf0buf.cc line 3975 for 241 seconds the semaphore:
--Thread 140487646721792 has waited at buf0buf.cc line 3975 for 244 seconds the semaphore:
--Thread 140487646721792 has waited at buf0buf.cc line 3975 for 264 seconds the semaphore:
......--Thread 140487646721792 has waited at buf0buf.cc line 3975 for 923 seconds the semaphore:
--Thread 140487646721792 has waited at buf0buf.cc line 3975 for 924 seconds the semaphore:
--Thread 140487646721792 has waited at buf0buf.cc line 3975 for 944 seconds the semaphore:
# grep "Thread 140487338841856" 31-226-mysql-error.log
--Thread 140487338841856 has waited at buf0buf.cc line 3975 for 241 seconds the semaphore:
--Thread 140487338841856 has waited at buf0buf.cc line 3975 for 244 seconds the semaphore:
--Thread 140487338841856 has waited at buf0buf.cc line 3975 for 264 seconds the semaphore:
--Thread 140487338841856 has waited at buf0buf.cc line 3975 for 272 seconds the semaphore:
.......--Thread 140487338841856 has waited at buf0buf.cc line 3975 for 923 seconds the semaphore:
--Thread 140487338841856 has waited at buf0buf.cc line 3975 for 924 seconds the semaphore:
--Thread 140487338841856 has waited at buf0buf.cc line 3975 for 944 seconds the semaphore:
# grep "Thread 140487643309824" 31-226-mysql-error.log
--Thread 140487643309824 has waited at buf0buf.cc line 3975 for 241 seconds the semaphore:
--Thread 140487643309824 has waited at buf0buf.cc line 3975 for 244 seconds the semaphore:
--Thread 140487643309824 has waited at buf0buf.cc line 3975 for 264 seconds the semaphore:
......
--Thread 140487643309824 has waited at buf0buf.cc line 3975 for 923 seconds the semaphore:
--Thread 140487643309824 has waited at buf0buf.cc line 3975 for 924 seconds the semaphore:
--Thread 140487643309824 has waited at buf0buf.cc line 3975 for 944 seconds the semaphore:
这3个线程都出现了58次等待信号量,查询系统的信号量:
# cat /proc/sys/kernel/sem
250 32000 32 128
说明:
第一列,表示每个信号集中的最大信号量数目。
第二列,表示系统范围内的最大信号量总数目。
第三列,表示每个信号发生时的最大系统操作数目。
第四列,表示系统范围内的最大信号集总数目。
增大信号量只能治标,不治本,实际上还是数据库中有效率底下的SQL语句,优化SQL才是正解!