背景

数据库的性能优化是一个非常经典的话题,数据库的优化手段以及优化的角度也各不相同。

例如,可以从OS内核、网络、块设备、编译器、文件系统、SQL、数据库参数、业务逻辑、源码等各个方面去进行优化。

但是如果在优化前了解瓶颈在什么地方,可以向打鼹鼠一样,先打大的,起到事半功倍的效果。

本文将针对Linux平台下的PostgreSQL服务,讲解如何诊断PostgreSQL 数据库服务的瓶颈,了解瓶颈在哪个方面,甚至可以了解到瓶颈在哪段代码?

有了比较详细的性能诊断报告,可以给出比较好的指导优化建议。

我们要用到的工具是 《Linux 性能诊断(profiling) perf使用指南》

使用perf跟踪PostgreSQL的前提

要输出完备的跟踪信息,如符号表,call stack traces, 汇编指令。必须在编译PostgreSQL时设置相应的编译开关,例如GCC需要开启如下

  1. CFLAGS="-g -ggdb -fno-omit-frame-pointer"

建议Linux内核编译时加上

  1. CONFIG_KALLSYMS=y
  2. CONFIG_FRAME_POINTER=y

编译perf时需要支持libunwind, 并加上

  1. gcc -g dwarf
  2. man gcc
  3. -gdwarf-version
  4. Produce debugging information in DWARF format (if that is supported). This is the format used by DBX on IRIX 6. The value of version may be either 2 or 3; the default version is 3.
  5. Note that with DWARF version 2 some ports require, and will always use, some non-conflicting DWARF 3 extensions in the unwind tables.

如果是yum安装的软件,可以安装对应的debuginfo包。

例子

  1. wget https://ftp.postgresql.org/pub/source/v9.6.1/postgresql-9.6.1.tar.bz2
  2. tar -jxvf postgresql-9.6.1.tar.bz2
  3. cd postgresql-9.6.1
  4. export USE_NAMED_POSIX_SEMAPHORES=1
  5. LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O3 -flto -g -ggdb -fno-omit-frame-pointer" ./configure --prefix=/home/digoal/pgsql9.6
  6. LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O3 -flto -g -ggdb -fno-omit-frame-pointer" make world -j 64
  7. LIBS=-lpthread CC="/home/digoal/gcc6.2.0/bin/gcc" CFLAGS="-O3 -flto -g -ggdb -fno-omit-frame-pointer" make install-world

常用跟踪手段

1. benchmark

首先,需要定一个目标,比如insert only的场景,如何找到其瓶颈。

创建测试表

  1. postgres=# create table perf_test_ins(id serial8 primary key, info text, crt_time timestamp);
  2. alCREATE TABLE
  3. postgres=# alter sequence perf_test_ins_id_seq cache 10000;
  4. ALTER SEQUENCE

创建测试脚本,不停插入

  1. vi test.sql
  2. insert into perf_test_ins(info,crt_time) values ('test',now());

疲劳压测开始

  1. pgbench -M prepared -n -r -P 1 -f ./test.sql -h $PGDATA -c 32 -j 32 -T 600

2. 实时跟踪

pgbench压测的同时,实时观测。

  1. su - root
  2. perf top -agv -F 1000
  3. 解释
  4. perf top --help
  5. NAME
  6. perf-top - System profiling tool.
  7. SYNOPSIS
  8. perf top [-e <EVENT> | --event=EVENT] [<options>]
  9. DESCRIPTION
  10. This command generates and displays a performance counter profile in real time.
  11. -F <freq>, --freq=<freq>
  12. Profile at this frequency.
  13. -a, --all-cpus
  14. System-wide collection. (default)
  15. -g, --group
  16. Put the counters into a counter group.
  17. -K, --hide_kernel_symbols
  18. Hide kernel symbols.
  19. -U, --hide_user_symbols
  20. Hide user symbols.
  21. -v, --verbose
  22. Be more verbose (show counter open errors, etc).

输出

  1. PerfTop: 23287 irqs/sec kernel:31.2% exact: 0.0% [1000Hz cycles], (all, 24 CPUs)
  2. --------------------------------------------------------------------------------------------------------------
  3. samples pcnt RIP function DSO
  4. _______ _____ ________________ _________________________________________ _____________________________________
  5. 4179.00 1.6% 0000000000524e10 AllocSetAlloc.lto_priv.1186 /home/digoal/pgsql9.6/bin/postgres
  6. 3966.00 1.5% 0000000000043d10 __GI_vfprintf /lib64/libc-2.12.so
  7. 3822.00 1.5% ffffffff8150bb50 schedule [kernel.kallsyms]
  8. 3659.00 1.4% 0000000000082e10 __memset_sse2 /lib64/libc-2.12.so
  9. 3523.00 1.4% 00000000005782d0 LWLockAcquire.constprop.931 /home/digoal/pgsql9.6/bin/postgres
  10. 3219.00 1.2% 00000000002c9390 LWLockRelease /home/digoal/pgsql9.6/bin/postgres
  11. 3114.00 1.2% 0000000000332e10 PostgresMain /home/digoal/pgsql9.6/bin/postgres
  12. 3042.00 1.2% ffffffff8150e8b0 _spin_lock [kernel.kallsyms]
  13. 2948.00 1.1% 0000000000573fe0 hash_search_with_hash_value.constprop.930 /home/digoal/pgsql9.6/bin/postgres
  14. 2901.00 1.1% 00000000005746d0 hash_search_with_hash_value.constprop.928 /home/digoal/pgsql9.6/bin/postgres
  15. 2848.00 1.1% 00000000002db5f0 GetSnapshotData /home/digoal/pgsql9.6/bin/postgres
  16. 2471.00 1.0% 00000000005338b0 hash_seq_search /home/digoal/pgsql9.6/bin/postgres
  17. 2275.00 0.9% 0000000000579ad0 LWLockAcquire.constprop.932 /home/digoal/pgsql9.6/bin/postgres
  18. 2244.00 0.9% 0000000000078460 _int_malloc /lib64/libc-2.12.so
  19. 2215.00 0.9% 0000000000088aa0 memcpy /lib64/libc-2.12.so
  20. 2163.00 0.8% 00000000002df050 LockAcquireExtended /home/digoal/pgsql9.6/bin/postgres
  21. 2057.00 0.8% 0000000000131a70 __strlen_sse42 /lib64/libc-2.12.so
  22. 2017.00 0.8% 00000000002ba2d0 _bt_compare /home/digoal/pgsql9.6/bin/postgres
  23. 1977.00 0.8% 0000000000006e40 doCustom.lto_priv.13 /home/digoal/pgsql9.6/bin/pgbench
  24. 1951.00 0.8% ffffffff8100a630 __switch_to [kernel.kallsyms]
  25. 1855.00 0.7% 000000000053d5e0 hash_search_with_hash_value /home/digoal/pgsql9.6/bin/postgres
  26. 1767.00 0.7% 00000000001c6670 hash_any /home/digoal/pgsql9.6/bin/postgres
  27. 1726.00 0.7% 00000000002c8310 PinBuffer.isra.3.lto_priv.2340 /home/digoal/pgsql9.6/bin/postgres
  28. 1668.00 0.6% 00000000000ee570 XLogInsertRecord /home/digoal/pgsql9.6/bin/postgres
  29. 1604.00 0.6% 0000000000250500 pgstat_report_activity /home/digoal/pgsql9.6/bin/postgres
  30. 1572.00 0.6% ffffffff81189c00 fget_light [kernel.kallsyms]
  31. 1565.00 0.6% 0000000000426970 SearchCatCache /home/digoal/pgsql9.6/bin/postgres
  32. 1460.00 0.6% ffffffff810e03a0 __audit_syscall_exit [kernel.kallsyms]
  33. 1415.00 0.5% ffffffff81057e80 update_curr [kernel.kallsyms]
  34. 1409.00 0.5% 00000000000777f0 _int_free /lib64/libc-2.12.so
  35. 1386.00 0.5% 00000000000fbc10 CommitTransaction.lto_priv.2946 /home/digoal/pgsql9.6/bin/postgres
  36. 1296.00 0.5% ffffffff8119f6d0 do_select [kernel.kallsyms]
  37. 1285.00 0.5% 00000000002c8690 LWLockWaitListLock.lto_priv.1265 /home/digoal/pgsql9.6/bin/postgres
  38. 1268.00 0.5% ffffffff8150e790 _spin_lock_irqsave [kernel.kallsyms]
  39. 1247.00 0.5% 0000000000577650 hash_search_with_hash_value.constprop.929 /home/digoal/pgsql9.6/bin/postgres
  40. 1220.00 0.5% 0000000000538a10 AllocSetFree.lto_priv.1187 /home/digoal/pgsql9.6/bin/postgres
  41. 1201.00 0.5% 0000000000389660 ExecModifyTable /home/digoal/pgsql9.6/bin/postgres
  42. 1185.00 0.5% 000000000009b6d0 gettimeofday /lib64/libc-2.12.so
  43. 1171.00 0.5% ffffffff81278a60 copy_user_generic_string [kernel.kallsyms]
  44. 1142.00 0.4% 000000000013cb10 __memcmp_sse4_1 /lib64/libc-2.12.so
  45. 1124.00 0.4% 0000000000032620 __GI___sigsetjmp /lib64/libc-2.12.so
  46. 1123.00 0.4% 00000000000ef2b0 XLogInsert /home/digoal/pgsql9.6/bin/postgres
  47. 1112.00 0.4% 00000000002d29e0 LockReleaseAll /home/digoal/pgsql9.6/bin/postgres
  48. 1108.00 0.4% 00000000003c7620 ExecInitExpr /home/digoal/pgsql9.6/bin/postgres
  49. 1056.00 0.4% 000000000000ac80 threadRun.lto_priv.20 /home/digoal/pgsql9.6/bin/pgbench
  50. 1052.00 0.4% ffffffff81062a40 select_task_rq_fair [kernel.kallsyms]
  51. 1033.00 0.4% 0000000000017710 pqParseInput3 /home/digoal/pgsql9.6/lib/libpq.so.5.9

按Enter键弹出可以配置的选项以及当前状态。

  1. Mapped keys:
  2. [d] display refresh delay. (2)
  3. [e] display entries (lines). (47)
  4. [f] profile display filter (count). (5)
  5. [F] annotate display filter (percent). (5%)
  6. [s] annotate symbol. (NULL)
  7. [S] stop annotation.
  8. [K] hide kernel_symbols symbols. (no) 是否显示kernel symbols
  9. [U] hide user symbols. (no) 是否显示user symbols
  10. [z] toggle sample zeroing. (0)
  11. [qQ] quit.

调整压测命令,连接回环地址,(前面连接的是unix socket)

  1. pgbench -M prepared -n -r -P 1 -f ./test.sql -h 127.0.0.1 -c 32 -j 32 -T 600

再次观测perf top结果, 采样消耗排名第一的变成了这条,找到了一个瓶颈,关闭iptables,这条就消失了,TPS也提升了。

  1. 5871.00 1.7% 0000000000002e40 ipt_do_table /lib/modules/2.6.32-358.23.2.ali1195.el6.x86_64/kernel/net/ipv4/netfilter/ip_tables.ko

3. 跟踪事件计数(指定事件)

指定命令进行跟踪,或者指定PID进行跟踪。

  1. NAME
  2. perf-stat - Run a command and gather performance counter statistics
  3. SYNOPSIS
  4. perf stat [-e <EVENT> | --event=EVENT] [-a] <command>
  5. perf stat [-e <EVENT> | --event=EVENT] [-a] <command> [<options>]
  6. DESCRIPTION
  7. This command runs a command and gathers performance counter statistics from it.
  8. -a, --all-cpus
  9. system-wide collection from all CPUs
  10. -c, --scale
  11. scale/normalize counter values
  12. -B, --big-num
  13. print large numbers with thousands separators according to locale
  14. -v, --verbose
  15. be more verbose (show counter open errors, etc)
  16. -p, --pid=<pid>
  17. stat events on existing process id
  18. -t, --tid=<tid>
  19. stat events on existing thread id
  20. -r, --repeat=<n>
  21. repeat command and print average + stddev (max: 100)

跟踪批量插入一批记录的事件调用统计

  1. event来自perf list的输出
  2. perf stat -e block:*,syscalls:*,xfs:* -a -B -v -c -- /home/digoal/pgsql9.6/bin/psql -h 127.0.0.1 -p 5288 -q -U postgres postgres -c "drop table if exists x; create table x as select a FROM generate_series(1,1000000) a;";

输出

  1. 0 block:block_rq_abort [98.79%]
  2. 0 block:block_rq_requeue [98.79%]
  3. 995 block:block_rq_complete [98.79%]
  4. 993 block:block_rq_insert [98.79%]
  5. 997 block:block_rq_issue [98.79%]
  6. 0 block:block_bio_bounce [98.79%]
  7. 0 block:block_bio_complete [98.79%]
  8. 18 block:block_bio_backmerge [98.79%]
  9. 1 block:block_bio_frontmerge [98.79%]
  10. 1,011 block:block_bio_queue [98.79%]
  11. 992 block:block_getrq [98.79%]
  12. 2 block:block_sleeprq [98.79%]
  13. 250 block:block_plug [98.79%]
  14. 2 block:block_unplug_timer [98.79%]
  15. 361 block:block_unplug_io [98.79%]
  16. 0 block:block_split [98.79%]
  17. 1,011 block:block_remap [98.79%]
  18. 0 block:block_rq_remap [98.79%]
  19. 5 syscalls:sys_enter_socket [98.79%]
  20. 5 syscalls:sys_exit_socket [98.79%]
  21. 0 syscalls:sys_enter_socketpair [98.79%]
  22. 0 syscalls:sys_exit_socketpair [98.79%]
  23. ... ...

计数例子

  1. perf stat -a -B -v -c -- /home/digoal/pgsql9.6/bin/psql -h 127.0.0.1 -p 5288 -q -U postgres postgres -c "drop table if exists x; create table x as select a FROM generate_series(1,1000000) a;";
  2. 22433.920027 task-clock # 24.042 CPUs utilized [100.00%]
  3. 4,275 context-switches # 0.000 M/sec [100.00%]
  4. 98 CPU-migrations # 0.000 M/sec [100.00%]
  5. 56,484 page-faults # 0.003 M/sec
  6. 2,687,623,473 cycles # 0.120 GHz [80.73%]
  7. 2,747,044,652 stalled-cycles-frontend # 102.21% frontend cycles idle [92.42%]
  8. 2,358,755,489 stalled-cycles-backend # 87.76% backend cycles idle [81.89%]
  9. 3,373,693,967 instructions # 1.26 insns per cycle 越高越好,说明CPU没闲着。至少要大于1,但是注意同步锁等待这个值也可能很高,CPU空转,这就不好了。
  10. # 0.81 stalled cycles per insn [91.50%] 越低越好,说明CPU没有闲着。
  11. 632,351,270 branches # 28.187 M/sec [91.82%]
  12. 1,236,798 branch-misses # 0.20% of all branches [53.19%]
  13. 0.933128299 seconds time elapsed

perf stat重点看的是instructions,判断CPU是否充分被利用,如果没有,说明系统有其他瓶颈,例如IO等待,或者其他等待。

4. 采样跟踪

这个是用得最多的,先采样,后分析报告。

  1. NAME
  2. perf-record - Run a command and record its profile into perf.data
  3. SYNOPSIS
  4. perf record [-e <EVENT> | --event=EVENT] [-l] [-a] <command>
  5. perf record [-e <EVENT> | --event=EVENT] [-l] [-a] <command> [<options>]
  6. DESCRIPTION
  7. This command runs a command and gathers a performance counter profile from it, into perf.data - without displaying anything.
  8. This file can then be inspected later on, using perf report.
  9. -A, --append
  10. Append to the output file to do incremental profiling.
  11. -o, --output=
  12. Output file name.
  13. -e, --event= event来自perf list的输出
  14. Select the PMU event.
  15. Selection can be a symbolic event name (use perf list to list all events) or a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a hexadecimal event descriptor.
  16. -a, --all-cpus
  17. System-wide collection from all CPUs.
  18. -g, --call-graph
  19. Do call-graph (stack chain/backtrace) recording.
  20. -v, --verbose
  21. Be more verbose (show counter open errors, etc).
  22. -s, --stat 相对的是默认的per cpu mode
  23. Per thread counts.
  24. -G name,..., --cgroup name,... 如果你只想跟踪某一些进程的统计信息,可以将这些进程放到一个cgroup中,跟踪这个cgroup的。
  25. monitor only in the container (cgroup) called "name".
  26. This option is available only in per-cpu mode.
  27. The cgroup filesystem must be mounted.
  28. All threads belonging to container "name" are monitored when they run on the monitored CPUs.
  29. Multiple cgroups can be provided.
  30. Each cgroup is applied to the corresponding event, i.e., first cgroup to first event, second cgroup to second event and so on. It is
  31. possible to provide an empty cgroup (monitor all the time) using, e.g., -G foo,,bar.
  32. Cgroups must have corresponding events, i.e., they always refer to events defined earlier on the command line.

跑pgbench压测,和前面的pgbench一样,不断并发插入。

然后开启统计收集,收集10秒的统计信息,默认输出到perf.data文件。

  1. perf record -avg -- sleep 10
  2. [ perf record: Woken up 142 times to write data ]
  3. [ perf record: Captured and wrote 37.111 MB perf.data (~1621415 samples) ]

5. 生成报告

前面使用perf record收集了统计信息到perf.data,接下来就对perf.data进行分析,输出报告。

  1. NAME
  2. perf-report - Read perf.data (created by perf record) and display the profile
  3. SYNOPSIS
  4. perf report [-i <file> | --input=file]
  5. DESCRIPTION
  6. This command displays the performance counter profile information recorded via perf record.
  7. -i, --input=
  8. Input file name. (default: perf.data)
  9. -v, --verbose
  10. Be more verbose. (show symbol address, etc)
  11. -n, --show-nr-samples
  12. Show the number of samples for each symbol
  13. --showcpuutilization
  14. Show sample percentage for different cpu modes.
  15. -T, --threads
  16. Show per-thread event counters
  17. -U, --hide-unresolved
  18. Only display entries resolved to a symbol.
  19. -s, --sort=
  20. Sort by key(s): pid, comm, dso, symbol, parent.
  21. -g [type,min], --call-graph
  22. Display call chains using type and min percent threshold. type can be either:
  23. · flat: single column, linear exposure of call chains.
  24. · graph: use a graph tree, displaying absolute overhead rates.
  25. · fractal: like graph, but displays relative rates. Each branch of the tree is considered as a new profiled object.
  26. Default: fractal,0.5.
  27. --stdio
  28. Use the stdio interface.
  29. --tui
  30. Use the TUI interface, that is integrated with annotate and allows zooming into DSOs or threads, among other features. Use of --tui requires a tty, if one is not present, as when piping to other
  31. commands, the stdio interface is used.

--tui模式可以查看概览,支持交互式指令。

  1. 输入E全展开,输入C全收敛。 展开后可以使用键盘的上下键进行浏览。
  2. [.] 表示userspace
  3. [k] 表示kernel
  4. 1.39% 0.00% 0.01% 15 postgres /home/digoal/pgsql9.6/bin/postgres 0x524e3a d [.] AllocSetAlloc.lto_priv.1186
  5. 1.29% 0.00% 0.01% 22 postgres /lib64/libc-2.12.so 0x83127 B [.] __memset_sse2
  6. 1.25% 0.00% 0.00% 1 postgres /home/digoal/pgsql9.6/bin/postgres 0x57831d d [.] LWLockAcquire.constprop.931
  7. 1.24% 0.00% 0.05% 107 postgres /home/digoal/pgsql9.6/bin/postgres 0x574751 d [.] hash_search_with_hash_value.constprop.928
  8. 1.07% 0.00% 0.00% 2 postgres /home/digoal/pgsql9.6/bin/postgres 0x2c946f d [.] LWLockRelease
  9. 1.01% 0.00% 0.00% 1 postgres /home/digoal/pgsql9.6/bin/postgres 0x574035 d [.] hash_search_with_hash_value.constprop.930
  10. 0.98% 0.00% 0.03% 74 postgres /home/digoal/pgsql9.6/bin/postgres 0x334832 d [.] PostgresMain
  11. 0.94% 0.00% 0.02% 49 postgres /home/digoal/pgsql9.6/bin/postgres 0x2db87b d [.] GetSnapshotData
  12. 0.94% 0.00% 0.00% 1 postgres [kernel.kallsyms] 0xffffffff8150e8b1 k [k] _spin_lock
  13. 0.83% 0.00% 0.00% 1 postgres [kernel.kallsyms] 0xffffffff8150edb0 k [k] page_fault
  14. 0.81% 0.00% 0.00% 1 postgres /lib64/libc-2.12.so 0x88aa0 B [.] memcpy
  15. 0.77% 0.00% 0.22% 465 postgres /home/digoal/pgsql9.6/bin/postgres 0x533939 d [.] hash_seq_search
  16. 0.75% 0.00% 0.00% 1 postgres /lib64/libc-2.12.so 0x454ac B [.] __GI_vfprintf
  17. 0.75% 0.00% 0.03% 59 postgres /home/digoal/pgsql9.6/bin/postgres 0x579b1d d [.] LWLockAcquire.constprop.932
  18. 0.73% 0.00% 0.02% 42 postgres /home/digoal/pgsql9.6/bin/postgres 0x2df3ab d [.] LockAcquireExtended
  19. 0.67% 0.00% 0.00% 2 postgres /home/digoal/pgsql9.6/bin/postgres 0x1c66f9 d [.] hash_any
  20. 0.65% 0.00% 0.02% 49 postgres /home/digoal/pgsql9.6/bin/postgres 0x2ba33a d [.] _bt_compare
  21. 0.63% 0.00% 0.00% 3 postgres /home/digoal/pgsql9.6/bin/postgres 0xee632 d [.] XLogInsertRecord
  22. 0.62% 0.00% 0.00% 1 postgres /home/digoal/pgsql9.6/bin/postgres 0x53d5e1 d [.] hash_search_with_hash_value
  23. 0.59% 0.00% 0.59% 1318 pgbench /lib64/libc-2.12.so 0x47956 B [.] __GI_vfprintf
  24. 0.57% 0.00% 0.00% 1 postgres /lib64/libc-2.12.so 0x131aa7 B [.] __strlen_sse42
  25. 0.55% 0.00% 0.55% 1241 pgbench /home/digoal/pgsql9.6/bin/pgbench 0x85b3 d [.] doCustom.lto_priv.13
  26. 0.55% 0.00% 0.03% 55 postgres /home/digoal/pgsql9.6/bin/postgres 0x2c8327 d [.] PinBuffer.isra.3.lto_priv.2340
  27. 0.54% 0.00% 0.02% 35 postgres /home/digoal/pgsql9.6/bin/postgres 0x25059f d [.] pgstat_report_activity
  28. 0.53% 0.00% 0.02% 43 postgres /lib64/libc-2.12.so 0x78462 B [.] _int_malloc
  29. 0.52% 0.00% 0.01% 33 postgres /home/digoal/pgsql9.6/bin/postgres 0x426a36 d [.] SearchCatCache
  30. 0.47% 0.00% 0.00% 3 postgres [kernel.kallsyms] 0xffffffff81272a1c k [k] radix_tree_lookup_slot
  31. 0.45% 0.00% 0.01% 27 postgres /home/digoal/pgsql9.6/bin/postgres 0xfbe51 d [.] CommitTransaction.lto_priv.2946
  32. 0.44% 0.00% 0.00% 1 postgres /home/digoal/pgsql9.6/bin/postgres 0x5776f5 d [.] hash_search_with_hash_value.constprop.929
  33. 0.43% 0.00% 0.01% 30 postgres /home/digoal/pgsql9.6/bin/postgres 0x38a044 d [.] ExecModifyTable
  34. 0.41% 0.00% 0.00% 1 postgres [kernel.kallsyms] 0xffffffff8150bef4 k [k] schedule
  35. 0.40% 0.00% 0.01% 22 postgres /home/digoal/pgsql9.6/bin/postgres 0x2c86c2 d [.] LWLockWaitListLock.lto_priv.1265
  36. 0.40% 0.00% 0.01% 29 postgres /home/digoal/pgsql9.6/bin/postgres 0x2d2b2d d [.] LockReleaseAll
  37. 0.40% 0.00% 0.01% 12 postgres /home/digoal/pgsql9.6/bin/postgres 0x538a11 d [.] AllocSetFree.lto_priv.1187
  38. 0.39% 0.00% 0.00% 1 postgres [kernel.kallsyms] 0xffffffff8111e567 k [k] find_get_page
  39. 0.39% 0.01% 0.00% 24 postgres [kernel.kallsyms] 0xffffffff8148d1f8 k [k] tcp_sendmsg
  40. 0.39% 0.00% 0.01% 34 postgres /home/digoal/pgsql9.6/bin/postgres 0xef545 d [.] XLogInsert
  41. 0.39% 0.00% 0.01% 25 postgres /lib64/libc-2.12.so 0x32623 B [.] __GI___sigsetjmp
  42. 0.38% 0.00% 0.38% 860 pgbench /home/digoal/pgsql9.6/bin/pgbench 0xb3f3 d [.] threadRun.lto_priv.20
  43. 0.38% 0.00% 0.01% 33 postgres /lib64/libc-2.12.so 0x77cf2 B [.] _int_free
  44. 0.38% 0.00% 0.00% 1 postgres /lib64/libc-2.12.so 0x13cb10 B [.] __memcmp_sse4_1
  45. 0.38% 0.00% 0.02% 36 postgres /home/digoal/pgsql9.6/bin/postgres 0x3c762f d [.] ExecInitExpr
  46. 0.36% 0.36% 0.00% 816 pgbench [kernel.kallsyms] 0xffffffff8119f950 k [k] do_select
  47. 0.36% 0.00% 0.00% 4 postgres /home/digoal/pgsql9.6/bin/postgres 0x21e005 d [.] ReadBuffer_common.lto_priv.1215
  48. 0.36% 0.00% 0.01% 28 postgres /home/digoal/pgsql9.6/bin/postgres 0x1e5e50 d [.] _bt_doinsert
  49. 0.35% 0.00% 0.01% 16 postgres [vdso] 0x7fffb61ff8b0 ! [.] 0x7fffb61ff8b0
  50. 0.35% 0.00% 0.00% 1 postgres /lib64/libc-2.12.so 0x9b6db B [.] gettimeofday
  51. 0.35% 0.00% 0.35% 775 pgbench /home/digoal/pgsql9.6/lib/libpq.so.5.9 0x18057 d [.] pqParseInput3
  52. 0.32% 0.00% 0.01% 18 postgres /home/digoal/pgsql9.6/bin/postgres 0x2f9979 d [.] expression_tree_walker
  53. 0.31% 0.00% 0.01% 18 postgres /home/digoal/pgsql9.6/bin/postgres 0x306a5c d [.] socket_putmessage.lto_priv.1817
  54. 0.31% 0.31% 0.00% 702 pgbench [kernel.kallsyms] 0xffffffff8148c9be k [k] tcp_sendmsg
  55. 0.31% 0.00% 0.01% 17 postgres /home/digoal/pgsql9.6/bin/postgres 0x38891e d [.] ExecInitModifyTable
  56. 0.31% 0.00% 0.00% 1 postgres [kernel.kallsyms] 0xffffffff81057ff7 k [k] update_curr
  57. 0.31% 0.01% 0.00% 27 postgres [kernel.kallsyms] 0xffffffff81494ea6 k [k] tcp_ack

展开后的例子

  1. - 1.39% 0.00% 0.01% 15 postgres /home/digoal/pgsql9.6/bin/postgres 0x524e3a d [.] AllocSetAlloc.lto_priv.1186
  2. - AllocSetAlloc.lto_priv.1186
  3. - 33.72% MemoryContextAlloc
  4. - 27.57% MemoryContextCreate
  5. - AllocSetContextCreate.constprop.955
  6. - 43.77% StartTransaction.lto_priv.2177
  7. start_xact_command.part.3
  8. PostgresMain
  9. ServerLoop
  10. PostmasterMain
  11. main
  12. __libc_start_main
  13. - 16.08% CreatePortal.constprop.203
  14. PostgresMain
  15. ServerLoop
  16. PostmasterMain
  17. main
  18. __libc_start_main
  19. - 14.06% MakePerTupleExprContext
  20. ExecInsertIndexTuples.constprop.430
  21. ExecModifyTable
  22. ExecProcNode
  23. standard_ExecutorRun
  24. pgss_ExecutorRun
  25. ProcessQuery
  26. PortalRunMulti
  27. PortalRun
  28. PostgresMain
  29. ServerLoop
  30. PostmasterMain
  31. main
  32. __libc_start_main
  33. - 13.35% CreateExecutorState
  34. standard_ExecutorStart
  35. pgss_ExecutorStart
  36. ProcessQuery
  37. PortalRunMulti
  38. PortalRun
  39. PostgresMain
  40. ServerLoop
  41. PostmasterMain
  42. main
  43. __libc_start_main
  44. - 12.74% ExecInitResult
  45. ExecInitNode
  46. ExecInitModifyTable
  47. ExecInitNode
  48. standard_ExecutorStart
  49. pgss_ExecutorStart
  50. ProcessQuery
  51. PortalRunMulti
  52. PortalRun
  53. PostgresMain
  54. ServerLoop
  55. ...................略................

如果要查看详情,建议使用—stdio模式,看得非常清晰

  1. #perf report -v -n --showcpuutilization -g --stdio
  2. # Events: 224K cycles
  3. #
  4. # Overhead Samples sys us Command Shared Object
  5. (全局开销占比) (采样) (系统开销占比, 系统开销耗时us) (命令) (共享对象)
  6. # ........ .......... .............. ............................................... ........................................................
  7. #
  8. 1.39% 0.00% 0.01% 15 postgres /home/digoal/pgsql9.6/bin/postgres 0x524e3a d [.] AllocSetAlloc.lto_priv.1186
  9. |
  10. --- AllocSetAlloc.lto_priv.1186
  11. |
  12. |--33.72%-- MemoryContextAlloc
  13. | |
  14. | |--27.57%-- MemoryContextCreate
  15. | | AllocSetContextCreate.constprop.955
  16. | | |
  17. | | |--43.77%-- StartTransaction.lto_priv.2177
  18. | | | start_xact_command.part.3
  19. | | | PostgresMain
  20. | | | ServerLoop
  21. | | | PostmasterMain
  22. | | | main
  23. | | | __libc_start_main
  24. | | |
  25. | | |--16.08%-- CreatePortal.constprop.203
  26. | | | PostgresMain
  27. | | | ServerLoop
  28. | | | PostmasterMain
  29. | | | main
  30. | | | __libc_start_main
  31. | | |
  32. | | |--14.06%-- MakePerTupleExprContext
  33. | | | ExecInsertIndexTuples.constprop.430
  34. | | | ExecModifyTable
  35. | | | ExecProcNode
  36. | | | standard_ExecutorRun
  37. | | | pgss_ExecutorRun
  38. | | | ProcessQuery
  39. | | | PortalRunMulti
  40. | | | PortalRun
  41. | | | PostgresMain
  42. | | | ServerLoop
  43. | | | PostmasterMain
  44. | | | main
  45. | | | __libc_start_main
  46. | | |
  47. | | |--13.35%-- CreateExecutorState
  48. | | | standard_ExecutorStart
  49. | | | pgss_ExecutorStart
  50. | | | ProcessQuery
  51. | | | PortalRunMulti
  52. | | | PortalRun
  53. | | | PostgresMain
  54. | | | ServerLoop
  55. | | | PostmasterMain
  56. | | | main
  57. | | | __libc_start_main
  58. ...............略.............................

perf report输出为本文,可以较为详细的进行代码瓶颈的分析。

6. 生成火焰图

如果你觉得perf report的文本输出不够直观,可以生成火焰图,就更加直观了。

生成火焰图需要将perf record的原始数据弄出来,然后绘图。

perf.data的原始数据perf script即可输出。

生成火焰图

  1. # git clone https://github.com/brendangregg/FlameGraph # or download it from github
  2. # mv perf.data FlameGraph/
  3. # cd FlameGraph
  4. # perf script | ./stackcollapse-perf.pl > out.perf-folded
  5. # cat out.perf-folded | ./flamegraph.pl > perf-kernel.svg

perf-kernel.svg

svg可以使用浏览器打开,转换的图片如下

pic

指定内核内置事件跟踪

perf top|stat|record 都支持通过-e指定事件进行跟踪。

支持的事件如下

  1. perf list
  2. cpu-cycles OR cycles [Hardware event]
  3. stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]
  4. stalled-cycles-backend OR idle-cycles-backend [Hardware event]
  5. instructions [Hardware event]
  6. cache-references [Hardware event]
  7. cache-misses [Hardware event]
  8. branch-instructions OR branches [Hardware event]
  9. branch-misses [Hardware event]
  10. bus-cycles [Hardware event]
  11. ......................略........................

事件支持写通配符,多个事件使用逗号隔开。

  1. perf top -ag -e xfs:xfs_attr_list_sf,skb:kfree_skb,block:*,irq:softirq*
  2. No CONFIG_PERF_EVENTS=y kernel support configured? 编译内核是需要配置CONFIG_PERF_EVENTS=y
  1. perf stat -e xfs:xfs_attr_list_sf,skb:kfree_skb,block:*,irq:softirq* -- sleep 10
  2. Performance counter stats for 'sleep 10':
  3. 0 xfs:xfs_attr_list_sf
  4. 0 skb:kfree_skb
  5. 0 block:block_rq_abort
  6. 0 block:block_rq_requeue
  7. 0 block:block_rq_complete
  8. 0 block:block_rq_insert
  9. 0 block:block_rq_issue
  10. 0 block:block_bio_bounce
  11. 0 block:block_bio_complete
  12. 0 block:block_bio_backmerge
  13. 0 block:block_bio_frontmerge
  14. 0 block:block_bio_queue
  15. 0 block:block_getrq
  16. 0 block:block_sleeprq
  17. 0 block:block_plug
  18. 0 block:block_unplug_timer
  19. 0 block:block_unplug_io
  20. 0 block:block_split
  21. 0 block:block_remap
  22. 0 block:block_rq_remap
  23. 0 irq:softirq_entry
  24. 0 irq:softirq_exit
  25. 0 irq:softirq_raise
  26. 10.001021697 seconds time elapsed
  1. perf record -avg -e xfs:xfs_attr_list_sf,skb:kfree_skb,block:*,irq:softirq* -- sleep 10
  2. perf report -vg --stdio
  3. # Events: 17 skb:kfree_skb
  4. #
  5. # Overhead Command Shared Object Symbol
  6. # ........ .............. ........................................... ................
  7. #
  8. 35.29% init [kernel.kallsyms] 0xffffffff8103c94b k [k] native_safe_halt
  9. |
  10. --- native_safe_halt
  11. 11.76% syslog-ng /lib64/libpthread-2.12.so 0xea10 B [.] __recvmsg
  12. |
  13. --- __recvmsg
  14. ...........略.......................

kernel source 动态跟踪

如果perf list列出的事件不能满足跟踪需求,你可以自行添加任意代码进行跟踪。

这个功能需要perf和内核的支持,详见《Linux 性能诊断(profiling) perf使用指南》

  1. 1. Listing variables available for tcp_sendmsg():
  2. # perf probe -V tcp_sendmsg
  3. Available variables at tcp_sendmsg
  4. @<tcp_sendmsg+0>
  5. size_t size
  6. struct kiocb* iocb
  7. struct msghdr* msg
  8. struct sock* sk
  9. 2. Creating a probe for tcp_sendmsg() with the "size" variable:
  10. # perf probe --add 'tcp_sendmsg size'
  11. Added new event:
  12. probe:tcp_sendmsg (on tcp_sendmsg with size)
  13. 3. You can now use it in all perf tools, such as:
  14. perf record -e probe:tcp_sendmsg -aR sleep 1

PostgreSQL source 动态跟踪

这个功能需要perf和软件编译时的支持,详见《Linux 性能诊断(profiling) perf使用指南》

  1. sudo perf record -g dwarf -e block:block_rq_*,syscalls:sys_enter_write,syscalls:sys_enter_fsync -p 4495
  1. sudo perf probe -x `which postgres` XLogFileInit
  2. sudo perf probe -x `which postgres` XLogFileOpen
  3. sudo perf probe -x `which postgres` XLogFlush
  1. sudo perf probe -l
  2. probe_postgres:XLogFileInit (on 0x000000000009a360)
  3. probe_postgres:XLogFileOpen (on 0x000000000009a860)
  4. probe_postgres:XLogFlush (on 0x00000000000a0670)
  1. sudo perf record -g dwarf -u postgres -e probe_postgres:XLogFileInit,probe_postgres:XLogFileOpen,probe_postgres:XLogFlush
  1. sudo perf stat -e probe_postgres:XLogFileInit,probe_postgres:XLogFileOpen,probe_postgres:XLogFlush -a -- /usr/pgsql-9.2/bin/pgbench -U postgres craig -c 2 -t 10000
  2. starting vacuum...end.
  3. transaction type: TPC-B (sort of)
  4. scaling factor: 100
  5. query mode: simple
  6. number of clients: 2
  7. number of threads: 1
  8. number of transactions per client: 10000
  9. number of transactions actually processed: 20000/20000
  10. tps = 715.854663 (including connections establishing)
  11. tps = 716.092133 (excluding connections establishing)
  12. Performance counter stats for '/usr/pgsql-9.2/bin/pgbench -U postgres craig -c 2 -t 10000':
  13. 64 probe_postgres:XLogFileInit [100.00%]
  14. 0 probe_postgres:XLogFileOpen [100.00%]
  15. 55,440 probe_postgres:XLogFlush
  16. 27.987364469 seconds time elapsed

参考

1. https://wiki.postgresql.org/wiki/Profiling\_with\_perf

2. http://blog.2ndquadrant.com/tracing-postgresql-perf/

3. 《Linux 性能诊断(profiling) perf使用指南》

4. http://blog.2ndquadrant.com/testing-postgresql-vagrant-computer/

5. http://blog.2ndquadrant.com/devops-testing-postgresql-with-vagrant-on-your-computer-part-two/

6. https://www.quora.com/How-do-I-compile-a-Linux-perf-tool-with-all-features-For-Linux-4-0-on-Ubuntu