Configure state storage

Pulsar Functions use Apache BookKeeper as a state storage interface. Pulsar integrates with BookKeeper table service to store state for functions. For example, a WordCount function can store the state of its counters into BookKeeper table service via State APIs.

States are key-value pairs, where a key is a string and its value is arbitrary binary data - counters are stored as 64-bit big-endian binary values. Keys are scoped to an individual function and shared between instances of that function.

Configure state storage - 图1note

State storage is not available for Go functions.

Call state APIs

Pulsar Functions expose APIs for mutating and accessing state. These APIs are available in the Context object when you use Java/Python SDK to develop functions.

The following table outlines the states that can be accessed within Java and Python functions.

State-related APIJavaPython
Increment counterincrCounter
incrCounterAsync
incr_counter
Retrieve countergetCounter
getCounterAsync
get_counter
Update stateputState
putStateAsync
put_state
Retrieve stategetState
getStateAsync
get_state
Delete statedeleteStatedel_counter

Increment counter

You can use incrCounter to increment the counter of a given key by the given amount. If the key does not exist, a new key is created.

  • Java
  • Python
  1. /**
  2. * Increment the built-in distributed counter referred by key
  3. * @param key The name of the key
  4. * @param amount The amount to be incremented
  5. */
  6. void incrCounter(String key, long amount);

To asynchronously increment the counter, you can use incrCounterAsync.

  1. /**
  2. * Increment the built-in distributed counter referred by key
  3. * but dont wait for the completion of the increment operation
  4. *
  5. * @param key The name of the key
  6. * @param amount The amount to be incremented
  7. */
  8. CompletableFuture<Void> incrCounterAsync(String key, long amount);
  1. def incr_counter(self, key, amount):
  2. """incr the counter of a given key in the managed state"""

Retrieve counter

You can use getCounter to retrieve the counter of a given key mutated by incrCounter.

  • Java
  • Python
  1. /**
  2. * Retrieve the counter value for the key.
  3. *
  4. * @param key name of the key
  5. * @return the amount of the counter value for this key
  6. */
  7. long getCounter(String key);

To asynchronously retrieve the counter mutated by incrCounterAsync, you can use getCounterAsync.

  1. /**
  2. * Retrieve the counter value for the key, but don't wait
  3. * for the operation to be completed
  4. *
  5. * @param key name of the key
  6. * @return the amount of the counter value for this key
  7. */
  8. CompletableFuture<Long> getCounterAsync(String key);
  1. def get_counter(self, key):
  2. """get the counter of a given key in the managed state"""

Update state

Besides the counter API, Pulsar also exposes a general key/value API for functions to store and update the state of a given key.

  • Java
  • Python
  1. /**
  2. * Update the state value for the key.
  3. *
  4. * @param key name of the key
  5. * @param value state value of the key
  6. */
  7. void putState(String key, ByteBuffer value);

To asynchronously update the state of a given key, you can use putStateAsync.

  1. /**
  2. * Update the state value for the key, but don't wait for the operation to be completed
  3. *
  4. * @param key name of the key
  5. * @param value state value of the key
  6. */
  7. CompletableFuture<Void> putStateAsync(String key, ByteBuffer value);
  1. def put_state(self, key, value):
  2. """update the value of a given key in the managed state"""

Retrieve state

You can use getState to retrieve the state of a given key.

  • Java
  • Python
  1. /**
  2. * Retrieve the state value for the key.
  3. *
  4. * @param key name of the key
  5. * @return the state value for the key.
  6. */
  7. ByteBuffer getState(String key);

To asynchronously retrieve the state of a given key, you can use getStateAsync.

  1. /**
  2. * Retrieve the state value for the key, but don't wait for the operation to be completed
  3. *
  4. * @param key name of the key
  5. * @return the state value for the key.
  6. */
  7. CompletableFuture<ByteBuffer> getStateAsync(String key);
  1. def get_state(self, key):
  2. """get the value of a given key in the managed state"""

Delete state

Configure state storage - 图2note

Both counters and binary values share the same keyspace, so this API deletes either type.

  • Java
  1. /**
  2. * Delete the state value for the key.
  3. *
  4. * @param key name of the key
  5. */
  6. void deleteState(String key);

Query state via CLI

Besides using the State APIs to store the state of functions in Pulsar’s state storage and retrieve it back from the storage, you can use CLI commands to query the state of functions.

  1. bin/pulsar-admin functions querystate \
  2. --tenant <tenant> \
  3. --namespace <namespace> \
  4. --name <function-name> \
  5. --state-storage-url <bookkeeper-service-url> \
  6. --key <state-key> \
  7. [---watch]

If --watch is specified, the CLI tool keeps running to get the latest value of the provided state-key.

Example

The example of WordCountFunction demonstrates how state is stored within Pulsar Functions.

  • Java
  • Python

The logic of WordCountFunction is simple and straightforward:

  1. The function splits the received String into multiple words using regex \\..

  2. For each word, the function increments counter by 1 via incrCounter(key, amount).

    1. import org.apache.pulsar.functions.api.Context;
    2. import org.apache.pulsar.functions.api.Function;
    3. import java.util.Arrays;
    4. public class WordCountFunction implements Function<String, Void> {
    5. @Override
    6. public Void process(String input, Context context) throws Exception {
    7. Arrays.asList(input.split("\\.")).forEach(word -> context.incrCounter(word, 1));
    8. return null;
    9. }
    10. }

The logic of this WordCount function is simple and straightforward:

  1. The function first splits the received string into multiple words.

  2. For each word, the function increments counter by 1 via incr_counter(key, amount).

    1. from pulsar import Function
    2. class WordCount(Function):
    3. def process(self, item, context):
    4. for word in item.split():
    5. context.incr_counter(word, 1)