概述

Performance Schema(pfs)是对MySQL的细力度的性能监控诊断工具,覆盖statement/io/memory/lock 等各个性能相关的模块。Pfs采集到的性能数据使用 performance_Schema 引擎存储,全部保存在内存。 本文关注 pfs 的内存管理。首先从代码中分析 pfs 内存管理机制,然后以一个监控项为例介绍 pfs 的流程,最后介绍下 pfs 内存相关的参数。本文代码基于 MySQL 8.0.18版本。

Pfs内存管理

核心数据结构

PFS_buffer_scalable_container PFS_buffer_scalable_container 用于内存管理(申请,扩容,释放),内部结构如下图。 其中,global*container (以下称为 container )为全局单例变量,下面是其示意图以及结构定义代码。Container 存储上分两层: page 和 record。 以 global_thread_container 为例,默认global_thread_container中包含 多个 PFS_thread_array(page), page 内部包含多个 PFS_thread(record)。 PFS_buffer_scalable_container PFS_buffer_scalable_container 代码

  1. template <class T, int PFS_PAGE_SIZE, int PFS_PAGE_COUNT,
  2. class U = PFS_buffer_default_array<T>,
  3. class V = PFS_buffer_default_allocator<T>>
  4. class PFS_buffer_scalable_container {
  5. typedef T value_type; // record 类型
  6. typedef U array_type; // page 类型
  7. typedef V allocator_type; // page 分配器,需实现 alloc_array/free_array
  8. value_type *allocate(pfs_dirty_state *dirty_state); // 分配记录
  9. void deallocate(value_type *pfs) { m_array.deallocate(pfs); } // 释放记录
  10. array_type m_array; // 内存起始位置
  11. size_t m_max; // PFS_PAGE_SIZE* PFS_PAGE_COUNT
  12. allocator_type *m_allocator; // 分配器
  13. }
  14. class PFS_thread_allocator {
  15. public:
  16. int alloc_array(PFS_thread_array *array);
  17. void free_array(PFS_thread_array *array);
  18. };

实例化后的 container 对象复制管理 pfs 各个模块的内存分配,其与系统表对应关系如下:

globalaccount_containerevents%summary_by_account_by_event_name
global_host_containerevents%summary_by_host_by_event_name
global_thread_containerevents%summary_by_thread_by_event_name
global_user_containerevents%_summary_by_user_by_event_name
global_mutex_containermutex_instances
global_rwlock_containerrwlock_instances
global_cond_containercond_instances
global_socket_containersocket_instances
global_mdl_containermetadata_locks

Pfs内存管理模型

1) 系统启动的时候预先分配内存,系统运行期间根据需要重新分配内存

Pfs 的内存分配发生在 page 分配(即alloc_array函数),启动时初始化会分配部分page ,系统运行期间若 page 用满会分配新的 page。 在 page 内部分配 record 时,使用原子操作避免加锁。 下面是 global_thread_container 运行期间分配thread 的伪代码。

  1. PFS_thread *pfs = global_thread_container.allocate(&dirty_state)
  2. {
  3. if (m_full) { m_lost++; return NULL; } // 如果container 满了直接返回
  4. while (monotonic < monotonic_max){
  5. array= m_pages[index]
  6. pfs = array->allocate(dirty_state); // 从现有 page 中分配
  7. pfs->m_page= reinterpret_cast<PFS_opaque_container_page *> (array);
  8. return pfs;
  9. }
  10. array = new array_type(); // 分配新 page
  11. int rc= m_allocator->alloc_array(array); // 内部调用PFS_MALLOC_ARRAY分配内存
  12. }

2) Record 采用定长方式存储,每次申请固定数量长度的内存,并初始化填0

真正的内存分配由m_allocator->alloc_array进行,我们以PFS_thread_allocator::alloc_array为例展开代码,PFS_thread中保存了线程粒度下的 statement/wait/error 等数据。 每个PFS_thread对象申请的内存为固定的,以statement为例,MySQL 支持的 statement 类型为220个,每个PFS_thread内会为220个类型提前分配位置并初始化为0,这也是 pfs 内存消耗的重要原因。

  1. int PFS_thread_allocator::alloc_array(PFS_thread_array *array) {
  2. size_t size = array->m_max; // 单个 page 内保存的记录(即 PFS_thread)数
  3. size_t index;
  4. size_t waits_sizing = size * wait_class_max; // wait_class_max 为等待事件的种类
  5. size_t statements_sizing = size * statement_class_max; // statement_class_max 语句类型个数
  6. size_t transactions_sizing = size * transaction_class_max; // 事务类型个数
  7. size_t errors_sizing = (max_server_errors != 0) ? size * error_class_max : 0; // error 类型个数
  8. ...
  9. array->m_ptr =
  10. PFS_MALLOC_ARRAY(&builtin_memory_thread, size, sizeof(PFS_thread),
  11. PFS_thread, MYF(MY_ZEROFILL));
  12. array->m_instr_class_waits_array = PFS_MALLOC_ARRAY(
  13. &builtin_memory_thread_waits, waits_sizing, sizeof(PFS_single_stat),
  14. PFS_single_stat, MYF(MY_ZEROFILL));
  15. array->m_instr_class_statements_array = PFS_MALLOC_ARRAY(
  16. &builtin_memory_thread_statements, statements_sizing,
  17. sizeof(PFS_statement_stat), PFS_statement_stat, MYF(MY_ZEROFILL));
  18. array->m_instr_class_errors_array = PFS_MALLOC_ARRAY(
  19. &builtin_memory_host_errors, errors_sizing, sizeof(PFS_error_stat),
  20. PFS_error_stat, MYF(MY_ZEROFILL));
  21. ...
  22. }

3) 系统运行期间不释放内存,只在shutdown时 释放内存

下面是thread_container 释放thread 的代码逻辑

  1. global_thread_container.deallocate(pfs);
  2. { // 只是标记回收,并不会实际释放空间
  3. safe_pfs->m_lock.allocated_to_free();
  4. page->m_full = false;
  5. m_full = false;
  6. }

4) 数据在不同粒度的维度汇总

Pfs 数据库下可以看到对同一个监控指标有很多个不同的表,每个表代表一个统计的维度。

  1. mysql> show tables like '%statement%summary%';
  2. +----------------------------------------------------+
  3. | Tables_in_performance_schema (%statement%summary%) |
  4. +----------------------------------------------------+
  5. | events_statements_summary_by_account_by_event_name |
  6. | events_statements_summary_by_digest |
  7. | events_statements_summary_by_digest_supplement |
  8. | events_statements_summary_by_host_by_event_name |
  9. | events_statements_summary_by_program |
  10. | events_statements_summary_by_thread_by_event_name |
  11. | events_statements_summary_by_user_by_event_name |
  12. | events_statements_summary_global_by_event_name |
  13. +----------------------------------------------------+

在内部,不同的统计维度被称为集合(aggregates),对同一条数据在内部只会保存一份,运行期间会进行从细维度到高纬度的汇总。 pfs.cc代码注释中用这种图表的方式进行了说明,下面 以statement 为例介绍下汇总的过程,读者可以自己理解下。

  1. statement_locker(T, S)
  2. |
  3. | [1]
  4. |
  5. 1a |-> pfs_thread(T).event_name(S) =====>> [A], [B], [C], [D], [E]
  6. | |
  7. | | [2]
  8. | |
  9. | 2a |-> pfs_account(U, H).event_name(S) =====>> [B], [C], [D], [E]
  10. | . |
  11. | . | [3-RESET]
  12. | . |
  13. | 2b .....+-> pfs_user(U).event_name(S) =====>> [C]
  14. | . |
  15. | 2c .....+-> pfs_host(H).event_name(S) =====>> [D], [E]
  16. | . . |
  17. | . . | [4-RESET]
  18. | 2d . . |
  19. 1b |----+----+----+-> pfs_statement_class(S) =====>> [E]
  20. |
  21. 1c |-> pfs_thread(T).statement_current(S) =====>> [F]
  22. |
  23. 1d |-> pfs_thread(T).statement_history(S) =====>> [G]
  24. |
  25. 1e |-> statement_history_long(S) =====>> [H]
  26. |
  27. 1f |-> statement_digest(S) =====>> [I]
  28. @endverbatim
  29. Implemented as:
  30. - [1] #pfs_start_statement_v2(), #pfs_end_statement_v2()
  31. (1a, 1b) is an aggregation by EVENT_NAME,
  32. (1c, 1d, 1e) is an aggregation by TIME,
  33. (1f) is an aggregation by DIGEST
  34. all of these are orthogonal,
  35. and implemented in #pfs_end_statement_v2().
  36. - [2] #pfs_delete_thread_v1(), #aggregate_thread_statements()
  37. - [3] @c PFS_account::aggregate_statements()
  38. - [4] @c PFS_host::aggregate_statements()
  39. - [A] EVENTS_STATEMENTS_SUMMARY_BY_THREAD_BY_EVENT_NAME,
  40. @c table_esms_by_thread_by_event_name::make_row()
  41. - [B] EVENTS_STATEMENTS_SUMMARY_BY_ACCOUNT_BY_EVENT_NAME,
  42. @c table_esms_by_account_by_event_name::make_row()
  43. - [C] EVENTS_STATEMENTS_SUMMARY_BY_USER_BY_EVENT_NAME,
  44. @c table_esms_by_user_by_event_name::make_row()
  45. - [D] EVENTS_STATEMENTS_SUMMARY_BY_HOST_BY_EVENT_NAME,
  46. @c table_esms_by_host_by_event_name::make_row()
  47. - [E] EVENTS_STATEMENTS_SUMMARY_GLOBAL_BY_EVENT_NAME,
  48. @c table_esms_global_by_event_name::make_row()
  49. - [F] EVENTS_STATEMENTS_CURRENT,
  50. @c table_events_statements_current::make_row()
  51. - [G] EVENTS_STATEMENTS_HISTORY,
  52. @c table_events_statements_history::make_row()
  53. - [H] EVENTS_STATEMENTS_HISTORY_LONG,
  54. @c table_events_statements_history_long::make_row()
  55. - [I] EVENTS_STATEMENTS_SUMMARY_BY_DIGEST
  56. @c table_esms_by_digest::make_row()

Pfs性能监控过程

这里以statement 的一个监控项为例来介绍 pfs 性能数据采集的整个过程。 监控数据最终记录在 events_statements_summary_by_thread_by_event_name 表中,需提前打开 setup_consumers.thread_instrumentation 开关。

线程创建

调用入口: PSI_THREAD_CALL(new_thread) 线程启动时进行在全局container( global_thread_container )中申请内存空间,并进行一系列的监控数据初始化。 首先尝试在现有的 page 中申请空闲的record, 找不到的话申请新的page。

语句开始前

调用入口: MYSQL_START_STATEMENT 在语句开始的位置调用进行,比如 在dispatch_command 函数中,进行statement 统计的初始化,记录 sql 启动时间。

语句结束后

调用入口: MYSQL_END_STATEMENT

  1. pfs_end_statement_v2(PSI_statement_locker *locker, void *stmt_da)
  2. {
  3. PSI_statement_locker_state *state =
  4. reinterpret_cast<PSI_statement_locker_state *>(locker);
  5. // 填充 pfs
  6. PFS_events_statements *pfs =
  7. reinterpret_cast<PFS_events_statements *>(state->m_statement);
  8. insert_events_statements_history(thread, pfs); // 写入到 EVENTS_STATEMENTS_HISTORY
  9. insert_events_statements_history_long(pfs); // 写入到 EVENTS_STATEMENTS_HISTORY_LONG
  10. // 获取写入的位置
  11. event_name_array = thread->write_instr_class_statements_stats(); // PFS_statement_stat*
  12. stat = &event_name_array[index];
  13. // 开始填充 stat,写入汇总表
  14. stat->m_lock_time += state->m_lock_time;
  15. }

线程结束

调用入口: PSI_THREAD_CALL(delete_current_thread)

  1. void pfs_delete_current_thread_vc(void) {
  2. // 将线程的数据汇总到 account 或者 host 统计中
  3. aggregate_thread(thread, thread->m_account, thread->m_user, thread->m_host);
  4. ...
  5. // 销毁 pfs thread, global_thread_container 收回空间
  6. global_thread_container.deallocate(pfs);
  7. }

Pfs内存参数设置

主要看下影响pfs内存使用的相关参数

performance_schema%max%instance

控制监控实体的个数,内部即限制对应 container 的容量。

  1. +------------------------------------------------------+-------+
  2. | Variable_name | Value |
  3. +------------------------------------------------------+-------+
  4. | performance_schema_max_cond_instances | -1 |
  5. | performance_schema_max_file_instances | -1 |
  6. | performance_schema_max_mutex_instances | -1 |
  7. | performance_schema_max_prepared_statements_instances | -1 |
  8. | performance_schema_max_program_instances | -1 |
  9. | performance_schema_max_rwlock_instances | -1 |
  10. | performance_schema_max_socket_instances | -1 |
  11. | performance_schema_max_table_instances | -1 |
  12. | performance_schema_max_thread_instances | -1 |
  13. +------------------------------------------------------+-------+
  14. performance_schema_max_cond_instances global_cond_container
  15. performance_schema_max_file_instances global_file_container
  16. performance_schema_max_mutex_instances global_mutex_container
  17. performance_schema_max_prepared_statements_instances global_prepared_stmt_container
  18. performance_schema_max_program_instances global_program_container
  19. performance_schema_max_rwlock_instances global_rwlock_container
  20. performance_schema_max_socket_instances global_socket_container
  21. performance_schema_max_table_instances global_table_share_container
  22. performance_schema_max_thread_instances global_thread_container

performance_schema_%_size

影响对应表的记录上限

  1. ysql> show global variables like 'performance_schema_%_size';
  2. +----------------------------------------------------------+-------+
  3. | Variable_name | Value |
  4. +----------------------------------------------------------+-------+
  5. | performance_schema_accounts_size | -1 |
  6. | performance_schema_digests_size | 100 |
  7. | performance_schema_error_size | 20 |
  8. | performance_schema_events_stages_history_long_size | 10000 |
  9. | performance_schema_events_stages_history_size | 10 |
  10. | performance_schema_events_statements_history_long_size | 10000 |
  11. | performance_schema_events_statements_history_size | 10 |
  12. | performance_schema_events_transactions_history_long_size | 10000 |
  13. | performance_schema_events_transactions_history_size | 10 |
  14. | performance_schema_events_waits_history_long_size | 10000 |
  15. | performance_schema_events_waits_history_size | 10 |
  16. | performance_schema_hosts_size | -1 |
  17. | performance_schema_session_connect_attrs_size | 512 |
  18. | performance_schema_setup_actors_size | -1 |
  19. | performance_schema_setup_objects_size | -1 |
  20. | performance_schema_users_size | -1 |
  21. +----------------------------------------------------------+-------+

其他参数:

performance_schema_error_size: 监控的系统错误码个数,如果对错误码没有监控需求,建议调低 performance_schema_digests_size: events_statements_summary_by_digest 表的最大容量