The ledger API is a lower-level API for BookKeeper that enables you to interact with ledgers directly.

The Java ledger API client

To get started with the Java client for BookKeeper, install the bookkeeper-server library as a dependency in your Java application.

For a more in-depth tutorial that involves a real use case for BookKeeper, see the Example application guide.


The BookKeeper Java client library is available via Maven Central and can be installed using Maven, Gradle, and other build tools.


If you’re using Maven, add this to your pom.xml build configuration file:

  1. <!-- in your <properties> block -->
  2. <bookkeeper.version>4.12.1</bookkeeper.version>
  3. <!-- in your <dependencies> block -->
  4. <dependency>
  5. <groupId>org.apache.bookkeeper</groupId>
  6. <artifactId>bookkeeper-server</artifactId>
  7. <version>${bookkeeper.version}</version>
  8. </dependency>

BookKeeper uses google protobuf and guava libraries a lot. If your application might include different versions of protobuf or guava introduced by other dependencies, you can choose to use the shaded library, which relocate classes of protobuf and guava into a different namespace to avoid conflicts.

  1. <!-- in your <properties> block -->
  2. <bookkeeper.version>4.12.1</bookkeeper.version>
  3. <!-- in your <dependencies> block -->
  4. <dependency>
  5. <groupId>org.apache.bookkeeper</groupId>
  6. <artifactId>bookkeeper-server-shaded</artifactId>
  7. <version>${bookkeeper.version}</version>
  8. </dependency>


If you’re using Gradle, add this to your build.gradle build configuration file:

  1. dependencies {
  2. compile group:'org.apache.bookkeeper', name:'bookkeeper-server', version:'4.12.1'
  3. }
  4. // Alternatively:
  5. dependencies {
  6. compile 'org.apache.bookkeeper:bookkeeper-server:4.12.1'
  7. }

Similarly as using maven, you can also configure to use the shaded jars.

  1. // use the `bookkeeper-server-shaded` jar
  2. dependencies {
  3. compile 'org.apache.bookkeeper:bookkeeper-server-shaded:'
  4. }

Connection string

When interacting with BookKeeper using the Java client, you need to provide your client with a connection string, for which you have three options:

  • Provide your entire ZooKeeper connection string, for example zk1:2181,zk2:2181,zk3:2181.
  • Provide a host and port for one node in your ZooKeeper cluster, for example zk1:2181. In general, it’s better to provide a full connection string (in case the ZooKeeper node you attempt to connect to is down).
  • If your ZooKeeper cluster can be discovered via DNS, you can provide the DNS name, for example

Creating a new client

In order to create a new BookKeeper client object, you need to pass in a connection string. Here is an example client object using a ZooKeeper connection string:

  1. try{
  2. String connectionString ="";// For a single-node, local ZooKeeper cluster
  3. BookKeeper bkClient =newBookKeeper(connectionString);
  4. }catch(InterruptedException|IOException|KeeperException e){
  5. e.printStackTrace();
  6. }

If you’re running BookKeeper locally, using the localbookie command, use "" for your connection string, as in the example above.

There are, however, other ways that you can create a client object:

  • By passing in a ClientConfiguration object. Here’s an example:

    1. ClientConfiguration config =newClientConfiguration();
    2. config.setZkServers(zkConnectionString);
    3. config.setAddEntryTimeout(2000);
    4. BookKeeper bkClient =newBookKeeper(config);
  • By specifying a ClientConfiguration and a ZooKeeper client object:

    1. ClientConfiguration config =newClientConfiguration();
    2. config.setAddEntryTimeout(5000);
    3. ZooKeeper zkClient =newZooKeeper(/* client args */);
    4. BookKeeper bkClient =newBookKeeper(config, zkClient);
  • Using the forConfig method:

    1. BookKeeper bkClient =BookKeeper.forConfig(conf).build();

Creating ledgers

The easiest way to create a ledger using the Java client is via the createLedger method, which creates a new ledger synchronously and returns a LedgerHandle. You must specify at least a DigestType and a password.

Here’s an example:

  1. byte[] password ="some-password".getBytes();
  2. LedgerHandle handle = bkClient.createLedger(BookKeeper.DigestType.MAC, password);

You can also create ledgers asynchronously

Create ledgers asynchronously

  1. classLedgerCreationCallbackimplementsAsyncCallback.CreateCallback{
  2. publicvoid createComplete(int returnCode,LedgerHandle handle,Object ctx){
  3. System.out.println("Ledger successfully created");
  4. }
  5. }
  6. client.asyncCreateLedger(
  7. 3,
  8. 2,
  9. BookKeeper.DigestType.MAC,
  10. password,
  11. newLedgerCreationCallback(),
  12. "some context"
  13. );

Adding entries to ledgers

  1. long entryId = ledger.addEntry("Some entry data".getBytes());

Add entries asynchronously

Reading entries from ledgers

  1. Enumerator<LedgerEntry> entries = handle.readEntries(1,99);

To read all possible entries from the ledger:

  1. Enumerator<LedgerEntry> entries =
  2. handle.readEntries(0, handle.getLastAddConfirmed());
  3. while(entries.hasNextElement()){
  4. LedgerEntry entry = entries.nextElement();
  5. System.out.println("Successfully read entry "+ entry.getId());
  6. }

Reading entries after the LastAddConfirmed range

readUnconfirmedEntries allowing to read after the LastAddConfirmed range. It lets the client read without checking the local value of LastAddConfirmed, so that it is possible to read entries for which the writer has not received the acknowledge yet. For entries which are within the range 0..LastAddConfirmed, BookKeeper guarantees that the writer has successfully received the acknowledge. For entries outside that range it is possible that the writer never received the acknowledge and so there is the risk that the reader is seeing entries before the writer and this could result in a consistency issue in some cases. With this method you can even read entries before the LastAddConfirmed and entries after it with one call, the expected consistency will be as described above.

  1. Enumerator<LedgerEntry> entries =
  2. handle.readUnconfirmedEntries(0, lastEntryIdExpectedToRead);
  3. while(entries.hasNextElement()){
  4. LedgerEntry entry = entries.nextElement();
  5. System.out.println("Successfully read entry "+ entry.getId());
  6. }

Deleting ledgers

Ledgers can be deleted synchronously which may throw exception:

  1. long ledgerId =1234;
  2. try{
  3. bkClient.deleteLedger(ledgerId);
  4. }catch(Exception e){
  5. e.printStackTrace();
  6. }

Delete entries asynchronously

Ledgers can also be deleted asynchronously:

  1. classDeleteEntryCallbackimplementsAsyncCallback.DeleteCallback{
  2. publicvoid deleteComplete(){
  3. System.out.println("Delete completed");
  4. }
  5. }
  6. bkClient.asyncDeleteLedger(ledgerID,newDeleteEntryCallback(),null);

Simple example

For a more involved BookKeeper client example, see the example application below.

In the code sample below, a BookKeeper client:

  • creates a ledger
  • writes entries to the ledger
  • closes the ledger (meaning no further writes are possible)
  • re-opens the ledger for reading
  • reads all available entries
  1. // Create a client object for the local ensemble. This
  2. // operation throws multiple exceptions, so make sure to
  3. // use a try/catch block when instantiating client objects.
  4. BookKeeper bkc =newBookKeeper("localhost:2181");
  5. // A password for the new ledger
  6. byte[] ledgerPassword =/* some sequence of bytes, perhaps random */;
  7. // Create a new ledger and fetch its identifier
  8. LedgerHandle lh = bkc.createLedger(BookKeeper.DigestType.MAC, ledgerPassword);
  9. long ledgerId = lh.getId();
  10. // Create a buffer for four-byte entries
  11. ByteBuffer entry =ByteBuffer.allocate(4);
  12. int numberOfEntries =100;
  13. // Add entries to the ledger, then close it
  14. for(int i =0; i < numberOfEntries; i++){
  15. entry.putInt(i);
  16. entry.position(0);
  17. lh.addEntry(entry.array());
  18. }
  19. lh.close();
  20. // Open the ledger for reading
  21. lh = bkc.openLedger(ledgerId,BookKeeper.DigestType.MAC, ledgerPassword);
  22. // Read all available entries
  23. Enumeration<LedgerEntry> entries = lh.readEntries(0, numberOfEntries -1);
  24. while(entries.hasMoreElements()){
  25. ByteBuffer result =ByteBuffer.wrap(ls.nextElement().getEntry());
  26. Integer retrEntry = result.getInt();
  27. // Print the integer stored in each entry
  28. System.out.println(String.format("Result: %s", retrEntry));
  29. }
  30. // Close the ledger and the client
  31. lh.close();
  32. bkc.close();

Running this should return this output:

  1. Result:0
  2. Result:1
  3. Result:2
  4. # etc

Example application

This tutorial walks you through building an example application that uses BookKeeper as the replicated log. The application uses the BookKeeper Java client to interact with BookKeeper.

The code for this tutorial can be found in this GitHub repo. The final code for the Dice class can be found here.


Before you start, you will need to have a BookKeeper cluster running locally on your machine. For installation instructions, see Installation.

To start up a cluster consisting of six bookies locally:

  1. $ bin/bookkeeper localbookie 6

You can specify a different number of bookies if you’d like.


The goal of the dice application is to have

  • multiple instances of this application,
  • possibly running on different machines,
  • all of which display the exact same sequence of numbers.

In other words, the log needs to be both durable and consistent, regardless of how many bookies are participating in the BookKeeper ensemble. If one of the bookies crashes or becomes unable to communicate with the other bookies in any way, it should still display the same sequence of numbers as the others. This tutorial will show you how to achieve this.

To begin, download the base application, compile and run it.

  1. $ git clone
  2. $ mvn package
  3. $ mvn exec:java -Dexec.mainClass=org.apache.bookkeeper.Dice

That should yield output that looks something like this:

  1. [INFO]Scanningfor projects...
  2. [INFO]
  3. [INFO]------------------------------------------------------------------------
  4. [INFO]Building tutorial 1.0-SNAPSHOT
  5. [INFO]------------------------------------------------------------------------
  6. [INFO]
  7. [INFO]---exec-maven-plugin:1.3.2:java (default-cli)@ tutorial ---
  8. [WARNING]Warning: killAfter is now deprecated.Do you need it ?Please comment on MEXEC-6.
  9. Value=4
  10. Value=5
  11. Value=3

The base application

The application in this tutorial is a dice application. The Dice class below has a playDice function that generates a random number between 1 and 6 every second, prints the value of the dice roll, and runs indefinitely.

  1. publicclassDice{
  2. Random r =newRandom();
  3. void playDice()throwsInterruptedException{
  4. while(true){
  5. Thread.sleep(1000);
  6. System.out.println("Value = "+(r.nextInt(6)+1));
  7. }
  8. }
  9. }

When you run the main function of this class, a new Dice object will be instantiated and then run indefinitely:

  1. publicclassDice{
  2. // other methods
  3. publicstaticvoid main(String[] args)throwsInterruptedException{
  4. Dice d =newDice();
  5. d.playDice();
  6. }
  7. }

Leaders and followers (and a bit of background)

To achieve this common view in multiple instances of the program, we need each instance to agree on what the next number in the sequence will be. For example, the instances must agree that 4 is the first number and 2 is the second number and 5 is the third number and so on. This is a difficult problem, especially in the case that any instance may go away at any time, and messages between the instances can be lost or reordered.

Luckily, there are already algorithms to solve this. Paxos is an abstract algorithm to implement this kind of agreement, while Zab and Raft are more practical protocols. This video gives a good overview about how these algorithms usually look. They all have a similar core.

It would be possible to run the Paxos to agree on each number in the sequence. However, running Paxos each time can be expensive. What Zab and Raft do is that they use a Paxos-like algorithm to elect a leader. The leader then decides what the sequence of events should be, putting them in a log, which the other instances can then follow to maintain the same state as the leader.

Bookkeeper provides the functionality for the second part of the protocol, allowing a leader to write events to a log and have multiple followers tailing the log. However, bookkeeper does not do leader election. You will need a zookeeper or raft instance for that purpose.

Why not just use ZooKeeper?

There are a number of reasons:

  1. Zookeeper’s log is only exposed through a tree like interface. It can be hard to shoehorn your application into this.
  2. A zookeeper ensemble of multiple machines is limited to one log. You may want one log per resource, which will become expensive very quickly.
  3. Adding extra machines to a zookeeper ensemble does not increase capacity nor throughput.

Bookkeeper can be seen as a means of exposing ZooKeeper’s replicated log to applications in a scalable fashion. ZooKeeper is still used by BookKeeper, however, to maintain consistency guarantees, though clients don’t need to interact with ZooKeeper directly.

Electing a leader

We’ll use zookeeper to elect a leader. A zookeeper instance will have started locally when you started the localbookie application above. To verify it’s running, run the following command.

  1. $ echo stat | nc localhost 2181
  2. Zookeeper version:3.4.6-1569965, built on 02/20/201409:09 GMT
  3. Clients:
  4. /[1](queued=0,recved=40,sent=41)
  5. /[1](queued=0,recved=11,sent=11)
  6. /[0](queued=0,recved=1,sent=0)
  7. /[1](queued=0,recved=38,sent=39)
  8. /[1](queued=0,recved=38,sent=39)
  9. /[1](queued=0,recved=38,sent=39)
  10. Latency min/avg/max:0/0/23
  11. Received:167
  12. Sent:170
  13. Connections:6
  14. Outstanding:0
  15. Zxid:0x11
  16. Mode: standalone
  17. Node count:16

To interact with zookeeper, we’ll use the Curator client rather than the stock zookeeper client. Getting things right with the zookeeper client can be tricky, and curator removes a lot of the pointy corners for you. In fact, curator even provides a leader election recipe, so we need to do very little work to get leader election in our application.

  1. publicclassDiceextendsLeaderSelectorListenerAdapterimplementsCloseable{
  2. finalstaticString ZOOKEEPER_SERVER ="";
  3. finalstaticString ELECTION_PATH ="/dice-elect";
  4. ...
  5. Dice()throwsInterruptedException{
  6. curator =CuratorFrameworkFactory.newClient(ZOOKEEPER_SERVER,
  7. 2000,10000,newExponentialBackoffRetry(1000,3));
  8. curator.start();
  9. curator.blockUntilConnected();
  10. leaderSelector =newLeaderSelector(curator, ELECTION_PATH,this);
  11. leaderSelector.autoRequeue();
  12. leaderSelector.start();
  13. }

In the constructor for Dice, we need to create the curator client. We specify four things when creating the client, the location of the zookeeper service, the session timeout, the connect timeout and the retry policy.

The session timeout is a zookeeper concept. If the zookeeper server doesn’t hear anything from the client for this amount of time, any leases which the client holds will be timed out. This is important in leader election. For leader election, the curator client will take a lease on ELECTION_PATH. The first instance to take the lease will become leader and the rest will become followers. However, their claim on the lease will remain in the cue. If the first instance then goes away, due to a crash etc., its session will timeout. Once the session times out, the lease will be released and the next instance in the queue will become the leader. The call to autoRequeue() will make the client queue itself again if it loses the lease for some other reason, such as if it was still alive, but it a garbage collection cycle caused it to lose its session, and thereby its lease. I’ve set the lease to be quite low so that when we test out leader election, transitions will be quite quick. The optimum length for session timeout depends very much on the use case. The other parameters are the connection timeout, i.e. the amount of time it will spend trying to connect to a zookeeper server before giving up, and the retry policy. The retry policy specifies how the client should respond to transient errors, such as connection loss. Operations that fail with transient errors can be retried, and this argument specifies how often the retries should occur.

Finally, you’ll have noticed that Dice now extends LeaderSelectorListenerAdapter and implements Closeable. Closeable is there to close the resource we have initialized in the constructor, the curator client and the leaderSelector. LeaderSelectorListenerAdapter is a callback that the leaderSelector uses to notify the instance that it is now the leader. It is passed as the third argument to the LeaderSelector constructor.

  1. @Override
  2. publicvoid takeLeadership(CuratorFramework client)
  3. throwsException{
  4. synchronized(this){
  5. leader =true;
  6. try{
  7. while(true){
  8. this.wait();
  9. }
  10. }catch(InterruptedException ie){
  11. Thread.currentThread().interrupt();
  12. leader =false;
  13. }
  14. }
  15. }

takeLeadership() is the callback called by LeaderSelector when the instance is leader. It should only return when the instance wants to give up leadership. In our case, we never do so we wait on the current object until we’re interrupted. To signal to the rest of the program that we are leader we set a volatile boolean called leader to true. This is unset after we are interrupted.

  1. void playDice()throwsInterruptedException{
  2. while(true){
  3. while(leader){
  4. Thread.sleep(1000);
  5. System.out.println("Value = "+(r.nextInt(6)+1)
  6. +", isLeader = "+ leader);
  7. }
  8. }
  9. }

Finally, we modify the playDice function to only generate random numbers when it is the leader.

Run two instances of the program in two different terminals. You’ll see that one becomes leader and prints numbers and the other just sits there.

Now stop the leader using Control-Z. This will pause the process, but it won’t kill it. You will be dropped back to the shell in that terminal. After a couple of seconds, the session timeout, you will see that the other instance has become the leader. Zookeeper will guarantee that only one instance is selected as leader at any time.

Now go back to the shell that the original leader was on and wake up the process using fg. You’ll see something like the following:

  1. ...
  2. ...
  3. Value=4, isLeader =true
  4. Value=4, isLeader =true
  5. ^Z
  6. [1]+Stopped mvn exec:java -Dexec.mainClass=org.apache.bookkeeper.Dice
  7. $ fg
  8. mvn exec:java -Dexec.mainClass=org.apache.bookkeeper.Dice
  9. Value=3, isLeader =true
  10. Value=1, isLeader =false


Since 4.6 BookKeeper provides a new client API which leverages Java8 CompletableFuture facility. WriteHandle, WriteAdvHandle, ReadHandle are introduced for replacing the generic LedgerHandle.

All the new API now is available in org.apache.bookkeeper.client.api. You should only use interfaces defined in this package.

Beware that this API in 4.6 is still experimental API and can be subject to changes in next minor releases.

Create a new client

In order to create a new BookKeeper client object, you need to construct a ClientConfiguration object and set a connection string first, and then use BookKeeperBuilder to build the client.

Here is an example building the bookkeeper client.

  1. // construct a client configuration instance
  2. ClientConfiguration conf =newClientConfiguration();
  3. conf.setZkServers(zkConnectionString);
  4. conf.setZkLedgersRootPath("/path/to/ledgers/root");
  5. // build the bookkeeper client
  6. BookKeeper bk =BookKeeper.newBuilder(conf)
  7. .statsLogger(...)
  8. ...
  9. .build();

Create ledgers

the easiest way to create a ledger using the java client is via the createbuilder. you must specify at least a digesttype and a password.

here’s an example:

  1. BookKeeper bk =...;
  2. byte[] password ="some-password".getBytes();
  3. WriteHandle wh = bk.newCreateLedgerOp()
  4. .withDigestType(DigestType.CRC32)
  5. .withPassword(password)
  6. .withEnsembleSize(3)
  7. .withWriteQuorumSize(3)
  8. .withAckQuorumSize(2)
  9. .execute()// execute the creation op
  10. .get();// wait for the execution to complete

A WriteHandle is returned for applications to write and read entries to and from the ledger.

Write flags

You can specify behaviour of the writer by setting WriteFlags at ledger creation type. These flags are applied only during write operations and are not recorded on metadata.

Available write flags:

DEFERRED_SYNCWrites are acknowledged early, without waiting for guarantees of durabilityData will be only written to the OS page cache, without forcing an fsync.
  1. BookKeeper bk =...;
  2. byte[] password ="some-password".getBytes();
  3. WriteHandle wh = bk.newCreateLedgerOp()
  4. .withDigestType(DigestType.CRC32)
  5. .withPassword(password)
  6. .withEnsembleSize(3)
  7. .withWriteQuorumSize(3)
  8. .withAckQuorumSize(2)
  9. .withWriteFlags(DEFERRED_SYNC)
  10. .execute()// execute the creation op
  11. .get();// wait for the execution to complete

Append entries to ledgers

The WriteHandle can be used for applications to append entries to the ledgers.

  1. WriteHandle wh =...;
  2. CompletableFuture<Long> addFuture = wh.append("Some entry data".getBytes());
  3. // option 1: you can wait for add to complete synchronously
  4. try{
  5. long entryId =FutureUtils.result(addFuture.get());
  6. }catch(BKException bke){
  7. // error handling
  8. }
  9. // option 2: you can process the result and exception asynchronously
  10. addFuture
  11. .thenApply(entryId ->{
  12. // process the result
  13. })
  14. .exceptionally(cause ->{
  15. // handle the exception
  16. })
  17. // option 3: bookkeeper provides a twitter-future-like event listener for processing result and exception asynchronously
  18. addFuture.whenComplete(newFutureEventListener(){
  19. @Override
  20. publicvoid onSuccess(long entryId){
  21. // process the result
  22. }
  23. @Override
  24. publicvoid onFailure(Throwable cause){
  25. // handle the exception
  26. }
  27. });

The append method supports three representations of a bytes array: the native java byte[], java nio ByteBuffer and netty ByteBuf. It is recommended to use ByteBuf as it is more gc friendly.

Open ledgers

You can open ledgers to read entries. Opening ledgers is done by openBuilder. You must specify the ledgerId and the password in order to open the ledgers.

here’s an example:

  1. BookKeeper bk =...;
  2. long ledgerId =...;
  3. byte[] password ="some-password".getBytes();
  4. ReadHandle rh = bk.newOpenLedgerOp()
  5. .withLedgerId(ledgerId)
  6. .withPassword(password)
  7. .execute()// execute the open op
  8. .get();// wait for the execution to complete

A ReadHandle is returned for applications to read entries to and from the ledger.

Recovery vs NoRecovery

By default, the openBuilder opens the ledger in a NoRecovery mode. You can open the ledger in Recovery mode by specifying withRecovery(true) in the open builder.

  1. BookKeeper bk =...;
  2. long ledgerId =...;
  3. byte[] password ="some-password".getBytes();
  4. ReadHandle rh = bk.newOpenLedgerOp()
  5. .withLedgerId(ledgerId)
  6. .withPassword(password)
  7. .withRecovery(true)
  8. .execute()
  9. .get();

What is the difference between “Recovery” and “NoRecovery”?

If you are opening a ledger in “Recovery” mode, it will basically fence and seal the ledger – no more entries are allowed to be appended to it. The writer which is currently appending entries to the ledger will fail with LedgerFencedException.

In constrat, opening a ledger in “NoRecovery” mode, it will not fence and seal the ledger. “NoRecovery” mode is usually used by applications to tailing-read from a ledger.

Read entries from ledgers

The ReadHandle returned from the open builder can be used for applications to read entries from the ledgers.

  1. ReadHandle rh =...;
  2. long startEntryId =...;
  3. long endEntryId =...;
  4. CompletableFuture<LedgerEntries> readFuture =, endEntryId);
  5. // option 1: you can wait for read to complete synchronously
  6. try{
  7. LedgerEntries entries =FutureUtils.result(readFuture.get());
  8. }catch(BKException bke){
  9. // error handling
  10. }
  11. // option 2: you can process the result and exception asynchronously
  12. readFuture
  13. .thenApply(entries ->{
  14. // process the result
  15. })
  16. .exceptionally(cause ->{
  17. // handle the exception
  18. })
  19. // option 3: bookkeeper provides a twitter-future-like event listener for processing result and exception asynchronously
  20. readFuture.whenComplete(newFutureEventListener<>(){
  21. @Override
  22. publicvoid onSuccess(LedgerEntries entries){
  23. // process the result
  24. }
  25. @Override
  26. publicvoid onFailure(Throwable cause){
  27. // handle the exception
  28. }
  29. });

Once you are done with processing the LedgerEntries, you can call #close() on the LedgerEntries instance to release the buffers held by it.

Applications are allowed to read any entries between 0 and LastAddConfirmed. If the applications attempts to read entries beyond LastAddConfirmed, they will receive IncorrectParameterException.

Read unconfirmed entries from ledgers

readUnconfirmed is provided the mechanism for applications to read entries beyond LastAddConfirmed. Applications should be aware of readUnconfirmed doesn’t provide any repeatable read consistency.

  1. CompletableFuture<LedgerEntries> readFuture = rh.readUnconfirmed(startEntryId, endEntryId);

Tailing Reads

There are two methods for applications to achieve tailing reads: Polling and Long-Polling.


You can do this in synchronous way:

  1. ReadHandle rh =...;
  2. long startEntryId =0L;
  3. long nextEntryId = startEntryId;
  4. int numEntriesPerBatch =4;
  5. while(!rh.isClosed()|| nextEntryId <= rh.getLastAddConfirmed()){
  6. long lac = rh.getLastAddConfirmed();
  7. if(nextEntryId > lac){
  8. // no more entries are added
  9. Thread.sleep(1000);
  10. lac = rh.readLastAddConfirmed().get();
  11. continue;
  12. }
  13. long endEntryId =Math.min(lac, nextEntryId + numEntriesPerBatch -1);
  14. LedgerEntries entries =, endEntryId).get();
  15. // process the entries
  16. nextEntryId = endEntryId +1;
  17. }

Long Polling

  1. ReadHandle rh =...;
  2. long startEntryId =0L;
  3. long nextEntryId = startEntryId;
  4. int numEntriesPerBatch =4;
  5. while(!rh.isClosed()|| nextEntryId <= rh.getLastAddConfirmed()){
  6. long lac = rh.getLastAddConfirmed();
  7. if(nextEntryId > lac){
  8. // no more entries are added
  9. try(LastConfirmedAndEntry lacAndEntry = rh.readLastAddConfirmedAndEntry(nextEntryId,1000,false).get()){
  10. if(lacAndEntry.hasEntry()){
  11. // process the entry
  12. ++nextEntryId;
  13. }
  14. }
  15. }else{
  16. long endEntryId =Math.min(lac, nextEntryId + numEntriesPerBatch -1);
  17. LedgerEntries entries =, endEntryId).get();
  18. // process the entries
  19. nextEntryId = endEntryId +1;
  20. }
  21. }

Delete ledgers

Ledgers can be deleted by using DeleteBuilder.

  1. BookKeeper bk =...;
  2. long ledgerId =...;
  3. bk.newDeleteLedgerOp()
  4. .withLedgerId(ledgerId)
  5. .execute()
  6. .get();

Relaxing Durability

In BookKeeper by default each write will be acklowledged to the client if and only if it has been persisted durably (fsync called on the file system) by a quorum of bookies. In this case the LastAddConfirmed pointer is updated on the writer side, this is the guarantee for the writer that data will not be lost and it will be always readable by other clients.

On the client side you can temporary relax this constraint by using the DEFERRED_SYNC Write flag. Using this flag bookies will acknowledge each entry after writing the entry to SO buffers without waiting for an fsync. In this case the LastAddConfirmed pointer is not advanced to the writer side neither is updated on the reader’s side, this is because there is some chance to lose the entry. Such entries will be still readable using readUnconfirmed() API, but they won’t be readable using Long Poll reads or regular read() API.

In order to get guarantees of durability the writer must use explicitly the force() API which will return only after all the bookies in the ensemble acknowledge the call after performing an fsync to the disk which is storing the journal. This way the LastAddConfirmed pointer is advanced on the writer side and it will be eventually available to the readers.

The close() operation on the writer writes on ledger’s metadata the current LastAddConfirmed pointer, it is up to the application to call force() before issuing the close command. In case that you never call explicitly force() the LastAddConfirmed will remain unset (-1) on ledger metadata and regular readers won’t be able to access data.

  1. BookKeeper bk =...;
  2. long ledgerId =...;
  3. WriteHandle wh = bk.newCreateLedgerOp()
  4. .withDigestType(DigestType.CRC32)
  5. .withPassword(password)
  6. .withEnsembleSize(3)
  7. .withWriteQuorumSize(3)
  8. .withAckQuorumSize(2)
  9. .withWriteFlags(DEFERRED_SYNC)
  10. .execute()// execute the creation op
  11. .get();// wait for the execution to complete
  12. wh.force().get();// wait for fsync, make data available to readers and to the replicator
  13. wh.close();// seal the ledger