14. Berkeley DB Slice Support - Key Design - 《Oracle Berkeley DB Programmer's Reference Guide （Version 18.1.32）》

Key Design

Key Design

If your application does not perform transaction-protected operations on multiple records, then you do not need to give your keys any special consideration just because of slices. In this case, use keys designed exactly as you would if you were not using slices. The internal hashing algorithm will automatically distribute your records evenly across your slices based on the data that the hash finds in your keys.

However, if you do want to perform atomic operations on multiple records, then you do need to give your key design some thought. This is because the slice feature distributes your records across unique sub-environments. A transaction can not cross environments, so you must design your keys so that those records which you want to operation upon in a single transaction are all placed in the same slice. You do this by identifying the portion of your key with is slice-relevant.

For example, suppose your application contains information about corporate personnel. To adequately represent each employee in the corporation, you have the following record types:

Contact information, including the employee’s name, organization name, office number, phone number, and email address.
A photograph of the employee (stored as an external file).
The employee’s public key for digital signatures.

For a non-sliced database, it would be enough to create keys using employee ID and record type. That is, for an employee with ID 6591, your database would contain three keys:

6591;Contact
6591;Image
6591;Signature

However, for a sliced database, there is no guarantee that the three records using these keys would be placed in the same slice. If they are not in the same slice, then put and get operations for all three records cannot be wrapped in a single transaction. Without a single transaction, you cannot operated on these three records atomically.

Therefore, to support a sliced database, you need to identify a portion of the key that will be identical between records that you want to operate upon within a single transaction. In this simple example, you can use the employee’s ID to accomplish this. That is, you must identify the slice-relevant portion of the key so that DB knows what to use when dividing records across slices.

To identify the slice-relevant portion of the key, you define a callback, wich you then set using the DB->set_slice_callback() method.

int
construct_slice_dbt(const DB *db, const DBT *key, DBT *slice)
{
    /*
     * key is the source key.
     *
     * slice is the DBT that we use to identify the slice-relevant 
     * portion the data in key.
     * 
     * This example does not require lifting disjointed portions of 
     * key to create a slice key, so we can get away with simply 
     * setting slice->size to highlight the interesting portion of 
     * key's data.
     *
     */
    slice->data = key->data;
    slice->size = 4;
    return (0);
} 
...
/*
 * Opening the environment is routine so we skip it here for
 * the sake of brevity. Assume we have created and opened an
 * environment handle, dbenv.
 */
DB *dbp = NULL;
DB_TXN *txn = NULL;
u_int32_t open_flags;
int ret;
ret = db_create(&dbp, dbenv, 0);
if (ret != 0) {
    /* db_create failed. handle error here */
}
open_flags = DB_CREATE | DB_THREAD | DB_AUTO_COMMIT;
/* Open the database with slice support. */
open_flags |= DB_SLICED;
/* Set the callback before opening the database */
ret = dbp->set_slice_callback(dbp, construct_slice_dbt);
if (ret != 0) {
    /* callback set failed. handle error here */
}
ret = dbp->open(dbp, 
        NULL,        /* Txn pointer */ 
        "mydb.db",   /* Database file name */
        NULL,        /* Logical database name */
        DB_BTREE,    /* Database is using btree access method */
        open_flags,  /* Open flags */
        0);          /* File mode. Using defaults. */
if (ret != 0) {
    /* db open failed. handle error here */
}

After that, you can read and write to your database exactly as you would if you were using a non-sliced database. Your threaded workload will scale across cores much better than if you use a non-sliced environment.