From: eLinux.org

Kernel XIP

This page describes the use of Kernel Execute-In-Place as a bootup time
reduction technique.

Contents

Description

Execute-in-Place (Wikipedia
entry
) is a method of
executing code directly from long-term storage, instead of first loading
it into RAM.

When the kernel is executed in place, the bootloader does not have to:

  1. read the kernel from flash or
  2. decompress the kernel and
  3. write the kernel to RAM.

How to implement or use

TODO: describe how to achieve the technique (config options, command
args, etc.)

see Kernel XIP Instructions For
OMAP

Expected Improvement - about .5 seconds

The expected improvement from using this technique depends on the size
of the kernel, and the time to load it and decompress it from persistent
storage.

In general, time savings of about .5 seconds have been observed.

Resources

Projects

  • Configure Linux For
    XIP

    describes experience with using both Kernel XIP and application XIP.

  • In this
    e-mail,
    David Woodhouse described issues with implementing support for
    KERNEL XIP in flash. The requirements here are a bit different from
    supporting KERNEL XIP in ROM, since the flash may be unreadable
    during certain flash operations. Therefore, portions of the kernel
    must be copied to RAM, and certain kernel operations must be
    disallowed when the flash is unavailable.

Specifications

TODO: list or link to CELF specifications related to this technique

Patches

  • Kernel 2.6.10 now includes XIP support:

ARM PATCH 2154/2: XIP kernel for ARM

Patch from Nicolas Pitre

This patch allows for the kernel to be configured for XIP. A lot of
people are using semi hacked up XIP patches already so it is a good
idea to have a generic and clean implementation supporting all ARM
targets. The patch isn’t too intrusive.

It involves:

  • modifying the kernel entry code to map separate .text and .data
    sections in the initial page table, as well as relocating .data to
    ram when needed

  • modifying the linker script to account for the different VMA and
    LMA for .data, as well as making sure that .init.data gets
    relocated to ram

  • adding the final kernel mapping with a new MT_ROM mem type

  • distinguishing between XIP and non-XIP for bootmem and memory
    resource declaration

  • and adding proper target handling to Makefiles.

While at it, this also cleans up the kernel boot code a bit so the
kernel can now be compiled for any address in ram, removing the need
for a relation between kernel address and start of ram. Also throws in
some more comments.

And finally the _text, _etext, _end and similar variables are now
declared extern void instead of extern char, or even extern int. That
allows for operations on their address directly without any cast, and
trying to reference them by mistake would yield an error which is a
good thing.

Tested both configurations: XIP and non XIP, the later producing a
kernel for execution from ram just as before.

Signed-off-by: Nicolas Pitre Signed-off-by: Russell King

Case Studies

Case 1 - XIP on Arctic III PowerPC board

XIP was used on a PowerPC board, with the following results:

  • Hardware: PowerPC 405LP Arctic III, running at 266 MHZ
  • Kernel Version: MontaVista Linux CEE 3.0 (based on 2.4.20)
  • Configuration: Features built statically into the kernel included:
    Arctic ethernet, audio, and MTD; 405LP LCD and touchscreen; 405
    onchip I2C; and pinned TLBs; Dynamic Power Management; preemptible
    kernel with selected spinlock breaking; serial driver and serial
    console (kernel messages are disabled for boot time measurements);
    TCP/IP (IP addresses are configured after boot) with network
    devices, network packet filtering, packet protocol, and IP
    multicast; virtual terminal; UNIX domain sockets and UNIX98 PTYs;
    Linux Driver Model; and /proc, sysfs, tmpfs, ramfs, cramfs, devpts
    filesystems.
  • Time without change: 1357 milliseconds
  • Time with change: 894 milliseconds
  • Total Reduction in boot time: 463 milliseconds

Table of bootup times:




































Boot StageNon-XIP TimeXIP Time
Copy kernel to RAM85 ms12 ms
Decompress kernel453 ms0 ms
Kernel time to initialize
(time to first user space program)
819 ms882 ms
Total kernel boot time1357 ms894 ms
Reduction:463 ms
  • still have to copy data segment

Thanks to Todd Poynor of MontaVista for providing this information.

Case 2 - XIP on OMAP Innovator

XIP was used on a TI OMAP (Innovator board), with the following results:

  • Hardware: TI OMAP 1510, running at 168 MHZ
  • Kernel Version: 2.4.20 (precursor to CELF tree)
  • Configuration: [need to put config information here]
  • see Kernel XIP Instructions For
    OMAP









































Boot StageNon-XIP Time
Kernel compressed
Non-XIP Time
Kernel not compressed
XIP Time
Copy kernel to RAM56 ms120 ms0 ms
Decompress kernel545 ms0 ms0 ms
Kernel time to initialize
(time to first user space program)
88 ms208 ms110 ms
Total kernel boot time689 ms208 ms110 ms
Reduction:*481 ms579 ms

Thanks to Hiroyuki Machida of Sony for providing this information.

Case 3 - comparing NOR XIP with OneNAND quick-copy to RAM

  • Hardware: TI OMAP 5912, running at 196 MHZ (OSK5912 from Spectrum
    Digital)
  • Kernel Version: 2.6.10-omap1 (binary size is about 2MBytes
    uncompressed)

Dongjun Shin of Samsung Electronics reports:

As I’ve mentioned in AG meeting, we’ve done some boot time measurements
on OMAP 5912 target platform (OSK5912 from Spectrum Digital). We’ve done
this experiment in order to identify the timing gap between NOR XIP and
NAND shadowing. Here is the result (the number represents time in
microseconds).

The column noted as “XIP tuning” means that we changed the NOR I/F
setting of OMAP (EMIFS) so that the synchronous read is used instead of
(default) asynchronous read.

In case of OneNAND, only 1Kbytes of initial part of OneNAND can be used
as XIP region and we used 1Kbytes IPL for loading u-boot. Shadowing
means that kernel copy (to RAM) is used.

The reason why the kernel initialization time are broken into 2 phases
is that we used timer register for measurement and the timer is
initialized during kernel booting. You can just add the values for 2
phases to get the total kernel booting time.





























































Boot stage
NOR
OneNAND
XIP
Shadowing
Normal
Tuning
Compressed
Uncompressed
Boot loader CPU frequency
96MHz
96 MHz
Boot loader (IPL)
0
0
5,999
5,999
Boot loader (u-boot)
388,146
372,538
356,821
356,810
Copy kernel to RAM
0
0
35,029
56,884
Decompress kernel
0
0
1,178,481
0
Kernel time to initialize - 1 phase
18,964
12,826
9,091
9,119
Kernel time to initialize - 2 phase
61,176
51,263
50,118
50,126
Total
468,287
436,626
1,635,540
478,938
times are in microseconds

Questions

TimRiker asks:

  • What is the ram/rom footprint of these?
  • Are we close to using sram only for some implementations?
  • Has anyone looked at romfs and XIP user space?

Implementation Notes (from the field)

  • Discussion
    about XIP when flash might be in use - note mention of ‘__xipram’
    attribute (for partial XIP??)

Notes on configuring Linux for XIP (for PPC)

Using XIP with U-Boot on Arm

Wolfgang Denks, the primary author of the UBoot bootloader, wrote the
following:

  1. >> Yes. But... _Does_ mkimage -x put header on the front of it?
  2. Yes, it does.
  3. >>> > * You program the resulting image at 0x10004000.
  4. >>> >
  5. >>> > What is programmed at 0x10004000 ? The xipImage code or the uboot header?
  6. >
  7. >>
  8. >> The u-boot headers, yes. Thats wrong. But how to use mkimage -x then?
  9. >> Is the header-caused offset known?
  10. Yes. The U-Boot header is 64 bytes.
  11. U-Boot expects (and verifies) that the entry point is equal to the load address plus the size of the U-Boot header.

Lots more details are in the thread (split across months in the
archives):

How to determine offsets for sections

Dick Johnson talks about how to set the physical address for ELF
sections by editing the kernel link files.

  1. On Fri, 21 Oct 2005, Sreeni wrote:
  2. >> Hi,<br>
  3. >><br>
  4. >> I have a montavista XIP kernel running on ARM and my kernel will be in<br>
  5. >> the flash. Since its XIP, I know that the ".text" portion of the<br>
  6. >> kernel will be executed from flash but that ".data" needs to be placed<br>
  7. >> in SDRAM. Now my question is - based on what offset this data will be<br>
  8. >> placed?<br>
  9. >><br>
  10. >> My SDRAM physicall address starts at 3000_0000 and flash starts at<br>
  11. >> 0100_0000. when i allocated a global variable in the kernel module and<br>
  12. >> when i try to check its actually physical address using virt_to_phys,<br>
  13. >> its giving me the address in the range of 0100_0000 ~ 0600_0000 which<br>
  14. >> is my flash (the PAGE_OFFSET doesn't work in case of XIP).<br>
  15. >><br>
  16. >> Can you please help in knowing the physical address of my .data<br>
  17. >> portion in this situation.<br>
  18. >><br>
  19. >> Thanks<br>
  20. >> Shree<br>
  21. </code><br>
  22. I don't know about the ARM in particular, but if you look in ../arch/arm/boot/compressed/vmlinux.lds.in,
  23. you will see that this linker-file simply allocates the start addresses of each section as the next
  24. available address. The same is true of ../arch/arm/boot/bootp.lds. If you expect to have code the data
  25. elements and stack accessed at a specific physical offset, you modify the linker files().
  26. Note that "." means "right here", just like '$' in many assemblers. You can specify a physical offset
  27. simply as:
  28. ENTRY(_start)
  29. SECTIONS
  30. {
  31. . == 0x01000000 <==== like this for code
  32. .text : {
  33. ...
  34. ... }
  35. .rodata : { }
  36. . == 0x30000000 <==== like this data
  37. .data : { }
  38. .bss : { }
  39. }
  40. In the above, we have put .rodata (initialized ASCII stuff) right after the code in the .text section.
  41. You may need to extract this from the binary blob to put into your NVRAM.
  42. Also, any initialzed data needs to be relocated to your writable SDRAM and the .bss stuff needs to be
  43. zeroed. This is non-trivial. You may want to create a ".reloc" section which contains your initialized
  44. data, put it in your flash, and relocate it at startup.
  45. ...
  46. Cheers,
  47. Dick Johnson

Categories: