BE OOM Analysis

SinceVersion 1.2.0

Ideally, in Memory Limit Exceeded Analysis, we regularly detect the remaining available memory of the operating system and respond in time when the memory is insufficient , such as triggering the memory GC to release the cache or cancel the memory overrun query, but because refreshing process memory statistics and memory GC both have a certain lag, and it is difficult for us to completely catch all large memory applications, there are still OOM risk.

Solution

Refer to BE Configuration Items to reduce mem_limit and increase max_sys_mem_available_low_water_mark_bytes in be.conf.

Memory analysis

If you want to further understand the memory usage location of the BE process before OOM and reduce the memory usage of the process, you can refer to the following steps to analyze.

  1. dmesg -T confirms the time of OOM and the process memory at the time of OOM.

  2. Check whether there is a Memory Tracker Summary log at the end of be/log/be.INFO. If it indicates that BE has detected memory overrun, go to step 3, otherwise go to step 8.

  1. Memory Tracker Summary:
  2. Type=consistency, Used=0(0 B), Peak=0(0 B)
  3. Type=batch_load, Used=0(0 B), Peak=0(0 B)
  4. Type=clone, Used=0(0 B), Peak=0(0 B)
  5. Type=schema_change, Used=0(0 B), Peak=0(0 B)
  6. Type=compaction, Used=0(0 B), Peak=0(0 B)
  7. Type=load, Used=0(0 B), Peak=0(0 B)
  8. Type=query, Used=206.67 MB(216708729 B), Peak=565.26 MB(592723181 B)
  9. Type=global, Used=930.42 MB(975614571 B), Peak=1017.42 MB(1066840223 B)
  10. Type=tc/jemalloc_cache, Used=51.97 MB(54494616 B), Peak=-1.00 B(-1 B)
  11. Type=process, Used=1.16 GB(1246817916 B), Peak=-1.00 B(-1 B)
  12. MemTrackerLimiter Label=Orphan, Type=global, Limit=-1.00 B(-1 B), Used=474.20 MB(497233597 B), Peak=649.18 MB(680718208 B)
  13. MemTracker Label=BufferAllocator, Parent Label=Orphan, Used=0(0 B), Peak=0(0 B)
  14. MemTracker Label=LoadChannelMgr, Parent Label=Orphan, Used=0(0 B), Peak=0(0 B)
  15. MemTracker Label=StorageEngine, Parent Label=Orphan, Used=320.56 MB(336132488 B), Peak=322.56 MB(338229824 B)
  16. MemTracker Label=SegCompaction, Parent Label=Orphan, Used=0(0 B), Peak=0(0 B)
  17. MemTracker Label=SegmentMeta, Parent Label=Orphan, Used=948.64 KB(971404 B), Peak=943.64 KB(966285 B)
  18. MemTracker Label=TabletManager, Parent Label=Orphan, Used=0(0 B), Peak=0(0 B)
  19. MemTrackerLimiter Label=DataPageCache, Type=global, Limit=-1.00 B(-1 B), Used=455.22 MB(477329882 B), Peak=454.18 MB(476244180 B)
  20. MemTrackerLimiter Label=IndexPageCache, Type=global, Limit=-1.00 B(-1 B), Used=1.00 MB(1051092 B), Peak=0(0 B)
  21. MemTrackerLimiter Label=SegmentCache, Type=global, Limit=-1.00 B(-1 B), Used=0(0 B), Peak=0(0 B)
  22. MemTrackerLimiter Label=DiskIO, Type=global, Limit=2.47 GB(2655423201 B), Used=0(0 B), Peak=0(0 B)
  23. MemTrackerLimiter Label=ChunkAllocator, Type=global, Limit=-1.00 B(-1 B), Used=0(0 B), Peak=0(0 B)
  24. MemTrackerLimiter Label=LastestSuccessChannelCache, Type=global, Limit=-1.00 B(-1 B), Used=0(0 B), Peak=0(0 B)
  25. MemTrackerLimiter Label=DeleteBitmap AggCache, Type=global, Limit=-1.00 B(-1 B), Used=0(0 B), Peak=0(0 B)
  1. When the end of be/log/be.INFO before OOM contains the system memory exceeded log, refer to Memory Limit Exceeded Analysis. The log analysis method in md) looks at the memory usage of each category of the process. If the current type=query memory usage is high, if the query before OOM is known, continue to step 4, otherwise continue to step 5; if the current type=load memory usage is more, continue to step 6, if the current type= Global memory is used too much and continue to step 7.

  2. type=query query memory usage is high, and the query before OOM is known, such as test cluster or scheduled task, restart the BE node, refer to Memory Tracker View real-time memory tracker statistics, retry the query after set global enable_profile=true, observe the memory usage location of specific operators, confirm whether the query memory usage is reasonable, and further consider optimizing SQL memory usage, such as adjusting the join order .

  3. type=query query memory usage is high, and the query before OOM is unknown, such as in an online cluster, then search Deregister query/load memory tracker from the back to the front in be/log/be.INFO, queryId and Register query/load memory tracker, query/load id, if the same query id prints the above two lines of logs at the same time, it means that the query or import is successful. If there is only Register but no Deregister, the query or import is still before OOM In this way, all running queries and imports before OOM can be obtained, and the memory usage of suspicious large-memory queries can be analyzed according to the method in step 4.

  4. type=load imports a lot of memory.

  5. When the type=global memory is used for a long time, continue to check the type=global detailed statistics in the second half of the Memory Tracker Summary log. When DataPageCache, IndexPageCache, SegmentCache, ChunkAllocator, LastestSuccessChannelCache, etc. use a lot of memory, refer to BE Configuration Item to consider modifying the size of the cache; when Orphan memory usage is too large, Continue the analysis as follows.

  • If the sum of the tracker statistics of Parent Label=Orphan only accounts for a small part of the Orphan memory, it means that there is currently a large amount of memory that has no accurate statistics, such as the memory of the brpc process. At this time, you can consider using the heap profile Memory Tracker to further analyze memory locations.
  • If the tracker statistics of Parent Label=Orphan account for most of Orphan’s memory, when Label=TabletManager uses a lot of memory, further check the number of tablets in the cluster. If there are too many tablets, delete them and they will not be used table or data; when Label=StorageEngine uses too much memory, further check the number of segment files in the cluster, and consider manually triggering compaction if the number of segment files is too large;
  1. If be/log/be.INFO does not print the Memory Tracker Summary log before OOM, it means that BE did not detect the memory limit in time, observe Grafana memory monitoring to confirm the memory growth trend of BE before OOM, if OOM is reproducible, consider adding memory_debug=true in be.conf, after restarting the cluster, the cluster memory statistics will be printed every second, observe the last Memory Tracker Summary log before OOM, and continue to step 3 for analysis;