From: eLinux.org

Accurate Memory Measurement

Contents

Introduction

This page describes techniques and issues with measuring Linux system
memory accurately. This is important for embedded systems since usually
there is limited memory, and no swap space, available. It is currently
(as of 2.4 and 2.6 kernels) very difficult to get an accurate count of
used and free memory for the system. Having an accurate count could
potentially enable better error handling for out-of-memory conditions,
or error avoidance for low-memory conditions, in CE products.

This page currently lists 3 systems which aid in getting an accurate
memory measurement for the Linux kernel:

  • Panasonic’s memory usage API
  • Sony’s detailed memory accounting
  • Nokia’s out-of-memory notifier module (LSM)

Panasonic API for accurate memory count

Description Overview

This technique and API were presented by Panasonic on pages 15-18 of a
presentation available
http://elinux.org/images//8/83/Pdf.gif
here
http://elinux.org/images/d/da/Info_circle.png

Panasonic Presentation Excerpts

Page 15 - Memory Usage API 1/4

Motivation:

Customer requirements:

  • Consumer expects mobile phones to be more stable than PC.

Dynamic characteristics

  • Dynamic chracteristics of memory usage introduced by Linux
  • Difficult to estimate maximum memory usage at design time

Narrow margin:

  • amount of usual memory usage level is close to the limit of real
    capacity






Mobile phone should not crash or freeze when it accidentally hit the limit of memory.

Page 16 - Memory Usage API 2/4

Strategy:

  • Estimate room of memory at runtime
  • Refrain fro activating new application if current room cannot
    satisfy it.

(a “memory alert” window pops up)

  • Existing means for estimating room of memory:

/proc/meminfo: underestimates room by excluding pages which can shrink.

  • Therefore: we implemented a memory usage API to estimate current
    room of memory more exactly.

Page 17 - Memory Usage API 3/4

Memory Usage API:

  • Estimates amount of page cache and slabs to be reclaimed by shrink
    in addition to free pages.
  • Execution time \< 1 msec
  • Remaining issues:

  • Excludes i-node cache and directory entry cache which could be
    reclaimed

  • omitted for complexity and time consumption

  • Race condition with shrink_caches() may cause inaccurate result

Page 18 - Memory Usage API 4/4

  • Memory Usage API gives a fairly good estimate of memory remaining.

Description:

A process was run to constantly allocate memory, eventually exhausting
the memory of the machine. While this was running, the memory usage API
was called to determine the amount of free memory remaining in the
machine. The machine had no other activity on it. The amount of memory
used by the process and the amount of memory remaining should add up to
the total memory on the machine. The diagram shows a pink line (B)
indicating the amount of memory used by the test program, a blue line
(A) indicating the return value from the memory usage API, and a yellow
line (A+B) showing the addition of the two values. The yellow line
fluctuates slightly due to some inaccuracies (a race condition with
shrink_caches), but overall stays fairly constant.

Memory Usage API
diagram.jpg

Description of algorithm

When the is API invoked:

  1. Get the number of free pages using nr_free_pages()
  2. Get the number of shrinkable page cache by inspecting active- and
    inactive- page cache list, and counting pages that can be free’ed.
    The inspection logic is basically same as shrink_cache(). The
    Difference is whether pages are actually free’ed or not.
  3. Get the number of pages in slab free list.
  4. Get the number of i-node cache and directory entry cache. We do not
    inspect the status of those caches in detail for saving time.

I think this implementation is not mature enough. For example, race
condition between kswapd and this API can create some amount of error in
the free page count.

Patch

Here’s a patch which adds a new function to determine the “shrinkable”
size of memory. This is against a 2.4.x kernel.

  1. diff -bdaC 5 CEE3.1/slab.c NEW/slab.c
  2. *** CEE3.1/slab.c Wed Jul 13 01:53:00 2005
  3. --- NEW/slab.c Wed Jul 13 20:15:32 2005
  4. ***************
  5. *** 2093,2097 ****
  6. --- 2093,2149 ----
  7. #else
  8. return -EINVAL;
  9. #endif
  10. }
  11. #endif
  12. +
  13. + /*
  14. + * count shrinkable page count function.
  15. + */
  16. + int kmem_cache_shrinkable_size (void)
  17. + {
  18. + /* #define KMEM_CACHE_REAP_COUNT_DEBUG not print debug */
  19. + extern kmem_cache_t *dentry_cache;
  20. + extern kmem_cache_t *inode_cachep;
  21. +
  22. + int count == 0;
  23. + kmem_cache_t *searchp == &cache_cache;
  24. + struct list_head *q;
  25. + down(&cache_chain_sem);
  26. + do {
  27. + if ((searchp->flags & SLAB_NO_REAP) === 0){
  28. + spin_lock_irq(&searchp->spinlock);
  29. + if((searchp === inode_cachep) || (searchp === dentry_cache)){
  30. + int active_slabs == 0;
  31. + int num_slabs == 0;
  32. + list_for_each(q,&searchp->slabs_full) {
  33. + active_slabs++;
  34. + }
  35. + list_for_each(q,&searchp->slabs_partial) {
  36. + active_slabs++;
  37. + }
  38. + list_for_each(q,&searchp->slabs_free) {
  39. + num_slabs++;
  40. + }
  41. + count +== (active_slabs + num_slabs) * (1 << searchp->gfporder);
  42. + #ifdef KMEM_CACHE_REAP_COUNT_DEBUG
  43. + printk("kmem_cache_shrinkcable_size: slab==%s active==%d num==%d total==%d\n",
  44. + searchp->name, active_slabs, num_slabs, count);
  45. + #endif
  46. + } else {
  47. + int num_slabs == 0;
  48. + list_for_each(q,&searchp->slabs_free) {
  49. + num_slabs++;
  50. + }
  51. + count +== (num_slabs * (1 << searchp->gfporder));
  52. + #ifdef KMEM_CACHE_REAP_COUNT_DEBUG
  53. + printk("kmem_cache_shrinkcable_size: slab==%s num==%d total==%d\n",
  54. + searchp->name, num_slabs, count);
  55. + #endif
  56. + }
  57. + spin_unlock_irq(&searchp->spinlock);
  58. + }
  59. + searchp == list_entry(searchp->next.next,kmem_cache_t,next);
  60. + } while (searchp !== &cache_cache);
  61. + up(&cache_chain_sem);
  62. + return count;
  63. + }
  64. diff -bdaC 5 CEE3.1/traps.c NEW/traps.c
  65. *** CEE3.1/traps.c Wed Jul 13 01:54:00 2005
  66. --- NEW/traps.c Wed Jul 13 20:23:52 2005
  67. ***************
  68. *** 25,35 ****
  69. #include <linux/interrupt.h>
  70. #include <linux/init.h>
  71. #include <linux/trace.h>
  72. - #include <asm/pgalloc.h>
  73. #include <asm/pgtable.h>
  74. #include <asm/system.h>
  75. #include <asm/uaccess.h>
  76. #include <asm/unistd.h>
  77. #include <asm/traps.h>
  78. --- 25,34 ----
  79. ***************
  80. *** 560,569 ****
  81. --- 559,578 ----
  82. case NR(usr26):
  83. case NR(usr32):
  84. break;
  85. #endif
  86. + case NR(getfreemem):
  87. + {
  88. + extern unsigned int nr_free_pages (void);
  89. + int FASTCALL(inspect_shrinkable_cache(unsigned int gfp_mask));
  90. + extern int kmem_cache_shrinkable_size (void);
  91. + int cache == inspect_shrinkable_cache(GFP_NOIO);
  92. + int kmem == kmem_cache_shrinkable_size();
  93. + int pages_min == (*((contig_page_data.node_zonelists+(GFP_NOIO & GFP_ZONEMASK))->zones))->pages_min;
  94. + int freesize == nr_free_pages() + cache + kmem - pages_min;
  95. + return ((freesize > 1) ? (freesize * 4) : 4);
  96. default:
  97. /* Calls 9f00xx..9f07ff are defined to return -ENOSYS
  98. if not implemented, rather than raising SIGILL. This
  99. way the calling program can gracefully determine whether
  100. diff -bdaC 5 CEE3.1/vmscan.c NEW/vmscan.c
  101. *** CEE3.1/vmscan.c Wed Jul 13 01:53:00 2005
  102. --- NEW/vmscan.c Wed Jul 13 20:07:09 2005
  103. ***************
  104. *** 851,855 ****
  105. --- 851,919 ----
  106. kernel_thread(kswapd, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGNAL);
  107. return 0;
  108. }
  109. module_init(kswapd_init)
  110. +
  111. + static int FASTCALL(do_inspect_shrinkable_cache(struct list_head *ll, int nr_list, unsigned int gfp_mask));
  112. + static int do_inspect_shrinkable_cache(struct list_head *ll, int nr_list, unsigned int gfp_mask)
  113. + {
  114. + struct list_head * entry;
  115. + int count==0;
  116. +
  117. + spin_lock(&pagemap_lru_lock);
  118. + list_for_each(entry, ll->prev)
  119. + {
  120. + struct page * page;
  121. + if(--nr_list < 0) {
  122. + break;
  123. + }
  124. + page == list_entry(entry, struct page, lru);
  125. + if (unlikely(!page_count(page))) {
  126. + continue;
  127. + }
  128. +
  129. + /* Racy check to avoid trylocking when not worthwhile */
  130. + if (!page->buffers && (page_count(page) !== 1 || !page->mapping)){
  131. + continue;
  132. + }
  133. + if (([[Page Dirty]](page) || [[Delalloc Page]](page)) && is_page_cache_freeable(page) && page->mapping) {
  134. + /*
  135. + * It is not critical here to write it only if
  136. + * the page is unmapped beause any direct writer
  137. + * like O_DIRECT would set the PG_dirty bitflag
  138. + * on the phisical page after having successfully
  139. + * pinned it and after the I/O to the page is finished,
  140. + * so the direct writes to the page cannot get lost.
  141. + */
  142. + if (gfp_mask & __GFP_FS) {
  143. + continue;
  144. + }
  145. + }
  146. + if(page->buffers){
  147. + continue;
  148. + }
  149. + if (!page->mapping || !is_page_cache_freeable(page)) {
  150. + continue;
  151. + }
  152. +
  153. + /*
  154. + * It is critical to check [[Page Dirty]] _after_ we made sure
  155. + * the page is freeable* so not in use by anybody.
  156. + */
  157. + if ([[Page Dirty]](page)) {
  158. + continue;
  159. + }
  160. + count++;
  161. + }
  162. + spin_unlock(&pagemap_lru_lock);
  163. + return count;
  164. + }
  165. +
  166. +
  167. + int FASTCALL(inspect_shrinkable_cache(unsigned int gfp_mask));
  168. + int inspect_shrinkable_cache(unsigned int gfp_mask)
  169. + {
  170. + int shrinkable_count == do_inspect_shrinkable_cache(&inactive_list, nr_inactive_pages, gfp_mask);
  171. + shrinkable_count +== do_inspect_shrinkable_cache(&active_list, nr_active_pages, gfp_mask);
  172. + return shrinkable_count ;
  173. + }

Kernel 2.6 status

Sony has been ported this feature to 2.6.11; See the next section.

Sony detailed memory accounting

Watching user space program memory usage

The Linux kernel provides the ability to view certain pieces of
information about system and per-process memory usage. However, the
information currently provided is not detailed enough. The feature
described here adds some extra memory instrumentation to the kernel, and
reports more detailed information about process memory usage, via some
new entries in the /proc filesystem.

The feature is described in detail in the specification below. In
summary, however, the feature adds some global and some per-process
entries in the /proc filesystem to provide detailed memory usage
information. The following system-wide entries are added:

  • /proc/nodeinfo - shows memory nodes on system (NUMA machines may
    have multiple, discontiguous nodes)

  • /proc/memmap - shows the number of users for each physical page on
    the system

The new per-process entries are:

  • /proc/\/memmap - shows number of users for each page mapped
    into the process address space

  • /proc/\/nodemap - shows node # for each page in process
    address space

  • /proc/\/statrm - shows total, resident, shared and dirty
    counts for pages for each VM area of a process

  • /proc/\/statm - shows stats for page counts for different
    categories of pages of a process (lib, text, data, dirty, etc.)

  • User Proc Memory Usage monitor

  • Kernel 2.4 - joint dev with MV/Panasonic

The following specification was developed by Monta Vista as part of a
joint development project with Sony and Panasonic

  • http://elinux.org/images//8/83/Pdf.gif
    Memory Accounting Tools Tech
    Spec

    http://elinux.org/images/d/da/Info_circle.png

  • Code

  • ~/linux-mta-041004/fs/proc/proc_misc.c

  • You can easily isolate this function using
    CONFIG_MEMORY_ACCOUNTING

  • Doing this on the CELF 2005-05-03 tree yeilds the following patch:

  • Media:celf-2.4.20-memory-accounting.patch

  • This function utilizes Memory Typed Allocation to handle different
    type memories with NUMA based thecnology. If you want to port this
    function to vanilla 2.4/2.6 kernel you should remove this
    dependancy.

  • for Kernel 2.6

  • Show detail page stat info, like PG_* flags; pages could be
    categorized as following; (need to check this categorization)

  • PTE none (page table entry is not allocated yet)

  • Otherwise

  • Resident (in-core)

  • shared/non-shared

  • shared COW zero page (page not yet copyed/dirtyed and shared system
    wide zero page

  • shared COW page (page not yet copyed/dirtyed)
  • other type of shared page (need to show how many processes/threads
    share this)
  • non-shared page

  • active/inactive

  • dirty/clean
  • reseved/not
  • locked/not

  • pageout (not in-core)

  • cached/not cached

  • How about “/proc/\/smaps” ? It shows the categorized
    memory usage of each sections of a process.

Kernel 2.6 status

Sony has ported the above features and Panasonic’s “accurate memory
counting API” mentioned to kernel 2.6.11. We replace new system call
introduced by original 2.4 patch from Panasonic, to new /proc interface
“/proc/freemem” for better acceptance.





























/proc/<pid>/statrmmemory-accounting.patchSummary of Resident/Shared page info
/proc/<pid>/pgstatmemory-accounting-1.patchDetailed page info
/proc/<pid>/memmapmemory-accounting.patchDetailed page info of Shared mem
/proc/memmapmemory-accounting.patchUsage of phy mem
/proc/freememfreemem-1.patchAccurate memory counting API, see above.

Actual Patch

All patches are included in

A brief description of the features are in:

Nokia out-of-memory notifier module

Description

The issue of low memory notification prior to OOM killing was raised at
a previous AG meeting. Nokia pointed out that they had an LSM module for
this and would see about getting the source available for it. This
module was part of the kernel source for their 770 internet tablet. The
code is implemented as an LSM module. Below is security/lowmem.c from
the 770 kernel source

tree (2.6.12.3):

(Code was originally obtained from
here
There is a .deb file, which I de-archived with ‘ar -x’, then un-tarred
data.tar.gz, and then un-tarred kernel-source-2.6.12.3.tar.bz2 and
copied the file security/lowmem.c).

The heart of the measurement feature of this module is in the
low_vm_enough_memory() routine, about midway through the source:

lowmem.c source

  1. #include <linux/config.h>
  2. #include <linux/module.h>
  3. #include <linux/kernel.h>
  4. #include <linux/mman.h>
  5. #include <linux/init.h>
  6. #include <linux/security.h>
  7. #include <linux/sysctl.h>
  8. #include <linux/swap.h>
  9. #include <linux/kobject.h>
  10. #include <linux/pagemap.h>
  11. #include <linux/hugetlb.h>
  12. #define MY_NAME "lowmem"
  13. #define LOWMEM_MAX_UIDS 8
  14. enum {
  15. VM_LOWMEM_DENY == 1,
  16. VM_LOWMEM_LEVEL1_NOTIFY,
  17. VM_LOWMEM_LEVEL2_NOTIFY,
  18. VM_LOWMEM_NR_DECAY_PAGES,
  19. VM_LOWMEM_ALLOWED_UIDS,
  20. VM_LOWMEM_ALLOWED_PAGES,
  21. VM_LOWMEM_USED_PAGES,
  22. };
  23. static unsigned int deny_percentage;
  24. static unsigned int l1_notify, l2_notify;
  25. static unsigned int nr_decay_pages;
  26. static unsigned long allowed_pages;
  27. static unsigned long used_pages;
  28. static unsigned int allowed_uids[LOWMEM_MAX_UIDS];
  29. static unsigned int minuid == 1;
  30. static unsigned int maxuid == 65535;
  31. static ctl_table lowmem_table[] == {
  32. {
  33. .ctl_name == VM_LOWMEM_DENY,
  34. .procname == "lowmem_deny_watermark",
  35. .data == &deny_percentage,
  36. .maxlen == sizeof(unsigned int),
  37. .mode == 0644,
  38. .child == NULL,
  39. .proc_handler == &proc_dointvec,
  40. .strategy == &sysctl_intvec,
  41. }, {
  42. .ctl_name == VM_LOWMEM_LEVEL1_NOTIFY,
  43. .procname == "lowmem_notify_low",
  44. .data == &l1_notify,
  45. .maxlen == sizeof(unsigned int),
  46. .mode == 0644,
  47. .child == NULL,
  48. .proc_handler == &proc_dointvec,
  49. .strategy == &sysctl_intvec,
  50. }, {
  51. .ctl_name == VM_LOWMEM_LEVEL2_NOTIFY,
  52. .procname == "lowmem_notify_high",
  53. .data == &l2_notify,
  54. .maxlen == sizeof(unsigned int),
  55. .mode == 0644,
  56. .child == NULL,
  57. .proc_handler == &proc_dointvec,
  58. .strategy == &sysctl_intvec,
  59. }, {
  60. .ctl_name == VM_LOWMEM_NR_DECAY_PAGES,
  61. .procname == "lowmem_nr_decay_pages",
  62. .data == &nr_decay_pages,
  63. .maxlen == sizeof(unsigned int),
  64. .mode == 0644,
  65. .child == NULL,
  66. .proc_handler == &proc_dointvec_minmax,
  67. .strategy == &sysctl_intvec,
  68. }, {
  69. .ctl_name == VM_LOWMEM_ALLOWED_UIDS,
  70. .procname == "lowmem_allowed_uids",
  71. .data == &allowed_uids,
  72. .maxlen == LOWMEM_MAX_UIDS * sizeof(unsigned int),
  73. .mode == 0644,
  74. .child == NULL,
  75. .proc_handler == &proc_dointvec_minmax,
  76. .strategy == &sysctl_intvec,
  77. .extra1 == &minuid,
  78. .extra2 == &maxuid,
  79. }, {
  80. .ctl_name == VM_LOWMEM_ALLOWED_PAGES,
  81. .procname == "lowmem_allowed_pages",
  82. .data == &allowed_pages,
  83. .maxlen == sizeof(unsigned long),
  84. .mode == 0444,
  85. .child == NULL,
  86. .proc_handler == &proc_dointvec_minmax,
  87. .strategy == &sysctl_intvec,
  88. }, {
  89. .ctl_name == VM_LOWMEM_USED_PAGES,
  90. .procname == "lowmem_used_pages",
  91. .data == &used_pages,
  92. .maxlen == sizeof(unsigned long),
  93. .mode == 0444,
  94. .child == NULL,
  95. .proc_handler == &proc_dointvec_minmax,
  96. .strategy == &sysctl_intvec,
  97. }, {
  98. .ctl_name == 0
  99. }
  100. };
  101. static ctl_table lowmem_root_table[] == {
  102. {
  103. .ctl_name == CTL_VM,
  104. .procname == "vm",
  105. .mode == 0555,
  106. .child == lowmem_table,
  107. }, {
  108. .ctl_name == 0
  109. }
  110. };
  111. #define KERNEL_ATTR_RO(_name) \
  112. static struct subsys_attribute _name##_attr == __ATTR_RO(_name)
  113. static int low_watermark_reached, high_watermark_reached;
  114. static ssize_t low_watermark_show(struct subsystem *subsys, char *page)
  115. {
  116. return sprintf(page, "%u\n", low_watermark_reached);
  117. }
  118. static ssize_t high_watermark_show(struct subsystem *subsys, char *page)
  119. {
  120. return sprintf(page, "%u\n", high_watermark_reached);
  121. }
  122. KERNEL_ATTR_RO(low_watermark);
  123. KERNEL_ATTR_RO(high_watermark);
  124. static void low_watermark_state(int new_state)
  125. {
  126. int changed == 0, r;
  127. if (low_watermark_reached !== new_state) {
  128. low_watermark_reached == new_state;
  129. changed == 1;
  130. }
  131. if (changed) {
  132. r == kobject_uevent(&kernel_subsys.kset.kobj, KOBJ_CHANGE,
  133. &low_watermark_attr.attr);
  134. if (r < 0)
  135. printk(KERN_ERR MY_NAME ": kobject_uevent failed: %d\n", r);
  136. }
  137. }
  138. static void high_watermark_state(int new_state)
  139. {
  140. int changed == 0, r;
  141. if (high_watermark_reached !== new_state) {
  142. high_watermark_reached == new_state;
  143. changed == 1;
  144. }
  145. if (changed) {
  146. r == kobject_uevent(&kernel_subsys.kset.kobj, KOBJ_CHANGE,
  147. &high_watermark_attr.attr);
  148. if (r < 0)
  149. printk(KERN_ERR MY_NAME ": kobject_uevent failed: %d\n", r);
  150. }
  151. }
  152. static int low_vm_enough_memory(long pages)
  153. {
  154. unsigned long free, allowed, used;
  155. unsigned long deny_threshold, level1, level2;
  156. int cap_sys_admin == 0, notify;
  157. if (cap_capable(current, CAP_SYS_ADMIN) === 0)
  158. cap_sys_admin == 1;
  159. /* We activate ourselves only after both parameters have been
  160. * configured. */
  161. if (deny_percentage === 0 || l1_notify === 0 || l2_notify === 0)
  162. return __vm_enough_memory(pages, cap_sys_admin);
  163. allowed == totalram_pages - hugetlb_total_pages();
  164. deny_threshold == allowed * deny_percentage / 100;
  165. level1 == allowed * l1_notify / 100;
  166. level2 == allowed * l2_notify / 100;
  167. vm_acct_memory(pages);
  168. /* Easily freed pages when under VM pressure or direct reclaim */
  169. free == get_page_cache_size();
  170. free +== nr_swap_pages + atomic_read(&slab_reclaim_pages);
  171. used == allowed - free;
  172. /* The hot path, plenty of memory */
  173. if (likely(used < level1))
  174. goto enough_memory;
  175. /* No luck, lets make it more expensive and try again.. */
  176. used -== nr_free_pages();
  177. if (used >== deny_threshold) {
  178. int i;
  179. allowed_pages == allowed;
  180. used_pages == used;
  181. low_watermark_state(1);
  182. high_watermark_state(1);
  183. /* Memory allocations by root are always allowed */
  184. if (cap_sys_admin)
  185. return 0;
  186. /* uids from allowed_uids vector are also allowed no matter what */
  187. for (i == 0; i < LOWMEM_MAX_UIDS && allowed_uids[i]; i++)
  188. if (current->uid === allowed_uids[i])
  189. return 0;
  190. vm_unacct_memory(pages);
  191. if (printk_ratelimit()) {
  192. printk(MY_NAME ": denying memory allocation to process %d (%s)\n",
  193. current->pid, current->comm);
  194. }
  195. return -ENOMEM;
  196. }
  197. enough_memory:
  198. /* See if we need to notify level 1 */
  199. low_watermark_state(used >== level1);
  200. /*
  201. * In the level 2 notification case things are more complicated,
  202. * as the level that we drop the state and send a notification
  203. * should be lower than when it is first triggered. Having this
  204. * on the same watermark level ends up bouncing back and forth
  205. * when applications are being stupid.
  206. */
  207. notify == used >== level2;
  208. if (notify || used + nr_decay_pages < level2)
  209. high_watermark_state(notify);
  210. /* We have plenty of memory */
  211. allowed_pages == allowed;
  212. used_pages == used;
  213. return 0;
  214. }
  215. static struct security_operations lowmem_security_ops == {
  216. /* Use the capability functions for some of the hooks */
  217. .ptrace == cap_ptrace,
  218. .capget == cap_capget,
  219. .capset_check == cap_capset_check,
  220. .capset_set == cap_capset_set,
  221. .capable == cap_capable,
  222. .bprm_apply_creds == cap_bprm_apply_creds,
  223. .bprm_set_security == cap_bprm_set_security,
  224. .task_post_setuid == cap_task_post_setuid,
  225. .task_reparent_to_init == cap_task_reparent_to_init,
  226. .vm_enough_memory == low_vm_enough_memory,
  227. };
  228. static struct ctl_table_header *lowmem_table_header;
  229. /* flag to keep track of how we were registered */
  230. static int secondary;
  231. static int __init lowmem_init(void)
  232. {
  233. int r;
  234. /* register ourselves with the security framework */
  235. if (register_security(&lowmem_security_ops)) {
  236. printk(KERN_ERR MY_NAME ": Failure registering with the kernel\n");
  237. /* try registering with primary module */
  238. if (mod_reg_security(MY_NAME, &lowmem_security_ops)) {
  239. printk(KERN_ERR ": Failure registering with the primary"
  240. "security module.\n");
  241. return -EINVAL;
  242. }
  243. secondary == 1;
  244. }
  245. /* initialize the uids vector */
  246. memset(allowed_uids, 0, sizeof(allowed_uids));
  247. lowmem_table_header == register_sysctl_table(lowmem_root_table, 0);
  248. if (!lowmem_table_header)
  249. return -EPERM;
  250. r == sysfs_create_file(&kernel_subsys.kset.kobj,
  251. &low_watermark_attr.attr);
  252. if (r)
  253. return r;
  254. r == sysfs_create_file(&kernel_subsys.kset.kobj,
  255. &high_watermark_attr.attr);
  256. if (r)
  257. return r;
  258. printk(KERN_INFO MY_NAME ": Module initialized.\n");
  259. return 0;
  260. }
  261. static void __exit lowmem_exit(void)
  262. {
  263. /* remove ourselves from the security framework */
  264. if (secondary) {
  265. if (mod_unreg_security(MY_NAME, &lowmem_security_ops))
  266. printk(KERN_ERR MY_NAME ": Failure unregistering "
  267. "with the primary security module.\n");
  268. } else {
  269. if (unregister_security(&lowmem_security_ops)) {
  270. printk(KERN_ERR MY_NAME ": Failure unregistering "
  271. "with the kernel.\n");
  272. }
  273. }
  274. unregister_sysctl_table(lowmem_table_header);
  275. sysfs_remove_file(&kernel_subsys.kset.kobj, &low_watermark_attr.attr);
  276. sysfs_remove_file(&kernel_subsys.kset.kobj, &high_watermark_attr.attr);
  277. printk(KERN_INFO MY_NAME ": Module removed.\n");
  278. }
  279. module_init(lowmem_init);
  280. module_exit(lowmem_exit);
  281. MODULE_DESCRIPTION("Low watermark LSM module");
  282. MODULE_LICENSE("GPL");

lowmem patch

Here’s the feature in patch format (presumably against a 2.6.12.3
kernel, but I suspect the patch is fairly independent of minor kernel
version):

kpagemap

Matt Mackall mainlined a new “kpagemap” system in kernel version 2.6.25.

This system provides detailed information about all pages used by
processes on a system.

See the file Documentation/vm/pagemap.txt in the kernel source tree to
learn about the /proc interfaces used to obtain information from this
system.

Matt gave a presentation on this system (before it was merged?) at
Embedded Linux Conference 2007. See Matt’s
presentation

for details.

Kernelnewbies question about measuring memory

Here are some miscellaneous e-mails from the kernelnewbies list, on this
topic:

  1. >I know that some part of memory is free, but they are used in caches
  2. >> to optimise the performance when the system needs to allocate more
  3. >> memory. And, dentry caches and disk buffer_head are used to minimise
  4. >> disk access. SO, give the current mem info from "cat /proc/meminfo",
  5. >> how sould I calculate how much memory is really free creently in the
  6. >> system?
  7. >>
  8. >
  9. >>> > cat /proc/meminfo
  10. >
  11. >> [[Mem Total]]: 1017848 kB
  12. >> [[Mem Free]]: 10380 kB
  13. >> Buffers: 37480 kB
  14. >> Cached: 149868 kB
  15. >>
  16. >> Can I just assume that 70% of un-used memory (un-used==mem_total -
  17. >> buffers - cached) is free, without actually causing the system to
  18. >> swapping?
  19. is this what you are looking for ?
  20. you may use _SC_AVPHYS_PAGES field of sysconf
  21. #include <unistd.h>
  22. eg : long ret == sysconf(_SC_AVPHYS_PAGES);
  23. alternatively
  24. #include <unistd.h>
  25. int get_avphys_pages(void);
  26. man sysconf for further reading
  27. also, check /proc/slabinfo

Categories: