From: eLinux.org

System Size Auto-Reduction

This page has notes and an outline for Tim Bird’s Linux Auto-Reduction
research.

Contents

Talk info

Tim gave a talk on this research at LinuxCon Japan 2013 (May 29 in
Tokyo, Japan)

Title

Advanced size optimization of the Linux kernel

Abstract

This presentation will cover recent research by Tim on aggressive size
reduction of the Linux kernel. This will include results from using gcc
link-time optimization (LTO) with the ARM architecture (using Andi
Kleen’s out-of-tree patches), as well as results and discussion of other
optimization techniques (including whole-system optimization for
embedded devices).

This talk is directed at kernel developers interested in reducing the
size of their Linux systems (and possible improving their performance in
the process). The talk will be highly technical.

Final slides from talk

Media:Bird-Kernel-Size-Optimization-LCJ-2013.pdf

Research Areas

LTO

  • What is it?
  • what was required to get it to work?
  • Andi Kleen’s patch set
    • what do they do?
    • how big are they?
    • mainline status?
  • what is the size gain (see ELC poster)
  • what can be done with it?
  • long-term possibilities for LTO

global constraints

  • overall idea: create constraints external to code, and use for
    optimization
  • rationale: can’t maintain in-tree - too many config items
  • make the application of constraints automatic
  • use existing constraints to generate new constraints
  • constraints can flow between user-space and kernel

  • example: uid=0

  • constraint language
  • application by commenting out references (replace with 0 constant)
    • use compiler to find code references (via error messages)
      • eliminates problem with duplicate names (uid in different
        structure)
  • constant propagation (by, e.g. LTO) reduces code

syscall elimination

  • scan file system
  • create report of used and unused system calls
  • mark syscalls unused in kernel
    • arch/arm/kernel/calls.S (and arch/arm/kernel/entry-common.S
  • make sure unused syscalls are not
    __attribute__(externally_visible)
    • technique of asmlinkage_\
  • use LTO to eliminate calls
  • results: 50K-90K

ARM stack reduction

  • 4k stacks
  • stack extensions

cold code compression

  • D. Chanet did cold code compression
  • consists of:
    • profiling the kernel
    • marking code regions as cold or frozen
    • replacing them with stubs
    • compressing them
  • At execution time:
    • if a stub is called, it decompresses the code and calls it
    • stub is fixed up to directly call decompressed code in future
    • code is left decompressed forever

cold code compression

Results:

    • MUST see paper for details (it’s quite complicated)
  • on 2.4.25 kernel
    • cold code compression resulted in 7% reduction for i386 kernel
      and
    • 11.7% reduction for ARM kernel

Talk outline

This talk will be presented at LinuxCon Japan 2013:

Title

  • Advanced size optimization of the Linux kernel
    • by Tim Bird, Sony Mobile Communication

Self-Introduction

  • I am Tim Bird
  • Now working at Sony Mobile
  • Researching system size for many years
  • Long background in extremely small systems
    • pre-professional: first program on TRS-80, in basic, 8K ram
    • NetWare Lite - file and print server in 50K (in 1991)

The problem of Bloat

  • Software bloat occurs because systems are built with more software
    than is really needed for a given task
  • Open Source software meets the needs of thousands of different
    systems
    • Linux scales from tiny sensors to supercomputers (extreme SMP
      and high-end clusters)
    • Linux supports many, many features, only some of which are
      configurable
  • Software must be generalized for many use cases
  • bloat problem is:
    • How to re-specialize the software, eliminating unused features
      and dead code?

Bloat (cont.)

  • Software gets more generalized over time
  • Can’t use strategy of manual tuning (config options)
    • It gets harder and harder to remove things over time
    • About 13,000 config items now (2.6.12 had 4700)
    • You have to be an expert in too many things to reduce the kernel
  • Must rely on automated methods of reduction
  • Should use an additive, rather than subtractive method of building a
    system
    • ultimate vision: indicate what you want/need, and build up
      system to support it

Bloat (cont. 2)

  • In desktop or server, virtual memory makes bloat issue less
    important for user-space programs
    • Only working set of program is loaded - pages are loaded on
      demand
    • For kernel, all pages are always loaded

Automatic reduction (intro)

The problem with automatic reduction is that “the system” doesn’t know
what software is needed and what is not. there needs to be a way to tell
it about things that are not going to be used.

auto-reduce - story of 8 bytes of bloat

Story of the conditional check in kdb:

  • I found a bug in kdb, when a particular option was using in the
    configuration file
  • not everyone uses the configuration file
  • not everyone uses the particular option
  • bug only triggered in those circumstances
  • I wrote a small patch, to guard against use of a variable
    prematurely
  • problem: all users of KDB now have this check, and suffer this
    overhead
    • it wasn’t much, just a single compare
    • but this is how bloat builds up over time
    • It bothered me because I knew most people didn’t need the check
  • “correct” solution would be to parse the config file, and make the
    code compile-time configurable
    • this adds more complexity than it is worth.

generalizing the problem of bloat

System doesn’t know inputs:

  • It’s very easy to configure the kernel to omit the driver for
    missing hardware.
  • It’s very difficult to configure the kernel to omit error handling
    for bugs that

will never occur due to fixed use cases.

An example of fixed input (uid in kernel)

  • throughout the kernel, there are references to uid
    • comparisons, storing, referencing
  • it turns out this is set by setuid(), by the ‘login’ program.
    • login does a lookup and validates user account name in
      /etc/passwd
  • what if /etc/passwd only has ‘root’ and no others?
  • setuid() could only be called with a value of 0
  • can I encode this constraint on the system.

types of constraints

There are numerous other examples of constraints:

  • kernel command line arguments never used
  • syscalls never called by any program
  • parameters that are never used, or parameter values that are never
    passed in
    • e.g. ioctl value that is not possible
    • this only works in a fixed
  • /proc values never referenced
  • /sys values never referenced

also goes back up to user-space

  • return values that are not possible

kernel command line args

  • Documented in Documentation/kernel-parameters.txt
  • defined with __setup() and early_param from include/linux/init.h
    • approximately 480 __setup routines in kernel
    • about 200 __setup_* in System.map on ARM kernel build (98
      __setup_str_*)
    • about 230 early_param routines in kernel
  • points to function
  • almost always sets a variable, which would default to 0

  • on ARM, with only console_setup and early_mem marked as ‘used’,
    there was a 19K difference in size:

(non-LTO kernel)

  1. vmlinux.baseline-setup-used => vmlinux-param-used
  2. baseline other change percent
  3. text: 7680084 7663472 -16612 0%
  4. data: 362868 360516 -2352 0%
  5. bss: 745312 745184 -128 0%
  6. total: 8788264 8769172 -19092 0%
  • on ARM, with only console_setup and early_mem marked as ‘used’,
    there was a 19K difference in size:

(LTO kernel)

  1. vmlinux.lto-param => vmlinux.param-used
  2. baseline other change percent
  3. text: 1653672 1648920 -4752 0%
  4. data: 131636 130244 -1392 -1%
  5. bss: 50688 50528 -160 0%
  6. total: 1835996 1829692 -6304 0%

System.map from kernel with console_setup and early_mem as only
routines marked ‘used’:

  1. $ grep __setup System.map
  2. c00ea4bc T __setup_irq.153323
  3. c00f1adc t __setup_per_zone_wmarks.172539.15755
  4. c019d570 t __setup_str_early_mem.21664.160821
  5. c019d884 t __setup_str_console_setup.61958.160201
  6. c019ef00 t __setup_early_mem.21659.160819
  7. c019ef00 T __setup_start
  8. c019ef0c t __setup_console_setup.61953.160195
  9. c019ef18 T __setup_end

/proc values

  • Includes sysctl values
  • there are approximately 1200, NOT related specifically to a process
  • about 120 per process
    • about 80 related to networking (on my desktop box)
  • 40 others

Tiny Distribution

References

  • Chanet D. … “Automated reduction of the memory footprint of the
    linux kernel”
  • Haifen He. …”Code Compaction of an Operating System Kernel”
  • AnyKernel and Rumpkernel - see thesis by Antti Kantee - pooka (at)
    iki (dot) fi
    • https://github.com/rumpkernel/wiki/wiki
    • provides a system based on NetBSD to isolate sub-systems and
      drivers and allow their use in micro-kernels and user-space
    • haven’t read enough of it to determine if it could be applied to
      Linux, but sounds like just API wrapping
    • I’m not sure how robust it would be in the context of rapid
      mainline churn

Materials