Cache Initialization

Initialization starts with an instance of Store reading the storage configuration file, by default storage.config. For each valid element in the file an instance of Span is created. These are of basically four types:

  • File

  • Directory

  • Disk

  • Raw device

After creating all the Span instances, they are grouped by device ID to internal linked lists attached to the Store::disk array 1. Spans that refer to the same directory, disk, or raw device are coalesced in to a single span. Spans that refer to the same file with overlapping offsets are also coalesced 2. This is all done in ink_cache_init() called during startup.

Note

The span logic is also used by the HostDB and more than one otherwise inexplicable feature is provided by the span logic for that module.

After configuration initialization, the cache processor is started by calling CacheProcessor::start(). This does a number of things:

For each valid span, an instance of CacheDisk is created. This class is a continuation and so can be used to perform potentially blocking operations on the span. The primary use of these is to be passed to the AIO threads as the callback when an I/O operation completes. These are then dispatched to AIO threads to perform storage unit initialization. After all of those have completed, the resulting storage is distributed across the volumes in cplist_reconfigure(). The CacheVol instances are created at this time.

Stripe Assignment

Every object that is stored in stored in a single, specific stripe. Stripe assignment is determining for an object what stripe that is. The logic described here sets up the stripe assignment table maps from the cache key to a specific stripe.

Cache stripe assignment setup is done once all stripes have initialized (that is, all stripe header read operations have completed). There is an instance of CacheHostRecord for each line in hosting.config and one generic instance (Cache::hosttable) for all cache volumes that are not explicitly assigned. If hosting.config is empty then all cache volumes will be in the generic record.

build_vol_hash_table() in iocore/cache/Cache.cc does the work of setting up the stripe assignment and is called for each CacheHostRecord and the generic host record. The stripes to be assigned are in CacheHostRecord::vols.

../../_images/cache-init-cachehostrecord.png

CacheHostRecord::vols is the union of all the stripes in the CacheVol instances in CacheHostRecord::cp.

An indirect index mapping is created to account for stripes that are not available. The total size of the stripes is computed at the same time. The forvol and getvol arrays are used for debugging, they are not essential to the assignment setup. rtable_entries is filled with stripe size divided by VOL_HASH_ALLOC_SIZE. These values are used to determine the number of assignment slots given to each stripe. For each stripe a seed for a 32 bit pseudo random number generator is created based on stripe properties. Another array of pairs of value and stripe index is filled using these. For each VOL_HASH_ALLOC_SIZE amount of space in a stripe, a pair is generated containing the stripe index and the next random number from that stripe’s generator. This array is then sorted in ascending order.

../../_images/cache-init-rtable-setup.png

In this example the total size of the stripes is 10 and an 8 bit pseudo random number generator is used.

The result is sampled in sections, the size of the sections selected to yield VOL_HASH_TABLE_SIZE sections. For each section the sample value is the midpoint of the section.For the example, the number of sections is set to 17 (because the number of sections should be a prime number). This yields 17 sections each of width 15 with a sample value equal to 7 more than the initial value. The results of applying this to the rtable is

../../_images/cache-init-rtable-result.png

Sampling results.

This process can be viewed as dividing a number line in to sampling sections, each section corresponding to a stripe assignment slot.

../../_images/cache-init-sampling.png

Sample sections.

Each random value for a stripe in the rtable array can be considered a node in this space. For one stripe this might look like

../../_images/cache-init-slots-single.png

Random value nodes for a single stripe.

The full array contains random value nodes for all the stripes.

../../_images/cache-init-selection.png

Random value nodes for all (four) stripes.

The stripe for that section (assignment slot) is the first node at or past the sample value. This can be seen as the arrow color in the previous figure. This provides a reasonably proportioned to size assignment of slots to stripes. It is also a consistent hash in that if a stripe is removed, the recomputation will tend to distribute assignments for the missing stripe across the other stripes in proportion to their sizes while not changing the assignment of any slot not assigned to the missing stripe. In essence for each sample point (assignment slot) a permutation of the stripes is implied by the order of the random value nodes past that sample point. The randomization serves to spread re-assigned slots to various stripes instead of a single one.

../../_images/cache-init-slots-minus-1.png

Remove the blue stripe.

If the stripe is restored the assignments will be the same as before the stripe was removed. The assignment is very sensitive to the properties of each stripe - changing a stripe size or location will effectively be as if it were a new stripe. In the figure the two blue stripe assignments are changed to purple and green respectively. If the blue stripe were added back those assignments and only those would revert to blue. This is because for each stripe the node sequence as generated by the pseudo random number generator depends only the properties of the stripes.

At runtime stripe selection is done by Cache::key_to_vol() which selects the CacheHostRecord instance then picks the stripe assignment slot in the array which determines the stripe for the object.

Footnotes

1

Work is under way on extending this to include objects that are in the memory cache.

2

This linked list is mostly ignored in later processing, causing all but one file or directory storage units on the same device to be ignored. See TS-1869.