The etcd v3 API is designed to give users a more efficient and cleaner abstraction compared to etcd v2. There are a number of semantic and protocol changes in this new API. For an overview see Xiang Li’s video.

To prove out the design of the v3 API the team has also built a number of example recipes, there is a video discussing these recipes too.

Design

  1. Flatten binary key-value space

  2. Keep the event history until compaction

    • access to old version of keys
    • user controlled history compaction
  3. Support range query

    • Pagination support with limit argument
    • Support consistency guarantee across multiple range queries
  4. Replace TTL key with Lease

    • more efficient/ low cost keep alive
    • a logical group of TTL keys
  5. Replace CAS/CAD with multi-object Txn

    • MUCH MORE powerful and flexible
  6. Support efficient watching with multiple ranges

  7. RPC API supports the completed set of APIs.

    • more efficient than JSON/HTTP
    • additional txn/lease support
  8. HTTP API supports a subset of APIs.

    • easy for people to try out etcd
    • easy for people to write simple etcd application

Notes

Request Size Limitation

The max request size is around 1MB. Since etcd replicates requests in a streaming fashion, a very large request might block other requests for a long time. The use case for etcd is to store small configuration values, so we prevent user from submitting large requests. This also applies to Txn requests. We might loosen the size in the future a little bit or make it configurable.

Protobuf Defined API

api protobuf

kv protobuf

Examples

Put a key (foo=bar)

  1. // A put is always successful
  2. Put( PutRequest { key = foo, value = bar } )
  3. PutResponse {
  4. cluster_id = 0x1000,
  5. member_id = 0x1,
  6. revision = 1,
  7. raft_term = 0x1,
  8. }

Get a key (assume we have foo=bar)

  1. Get ( RangeRequest { key = foo } )
  2. RangeResponse {
  3. cluster_id = 0x1000,
  4. member_id = 0x1,
  5. revision = 1,
  6. raft_term = 0x1,
  7. kvs = {
  8. {
  9. key = foo,
  10. value = bar,
  11. create_revision = 1,
  12. mod_revision = 1,
  13. version = 1;
  14. },
  15. },
  16. }

Range over a key space (assume we have foo0=bar0… foo100=bar100)

  1. Range ( RangeRequest { key = foo, end_key = foo80, limit = 30 } )
  2. RangeResponse {
  3. cluster_id = 0x1000,
  4. member_id = 0x1,
  5. revision = 100,
  6. raft_term = 0x1,
  7. kvs = {
  8. {
  9. key = foo0,
  10. value = bar0,
  11. create_revision = 1,
  12. mod_revision = 1,
  13. version = 1;
  14. },
  15. ...,
  16. {
  17. key = foo30,
  18. value = bar30,
  19. create_revision = 30,
  20. mod_revision = 30,
  21. version = 1;
  22. },
  23. },
  24. }

Finish a txn (assume we have foo0=bar0, foo1=bar1)

  1. Txn(TxnRequest {
  2. // mod_revision of foo0 is equal to 1, mod_revision of foo1 is greater than 1
  3. compare = {
  4. {compareType = equal, key = foo0, mod_revision = 1},
  5. {compareType = greater, key = foo1, mod_revision = 1}}
  6. },
  7. // if the comparison succeeds, put foo2 = bar2
  8. success = {PutRequest { key = foo2, value = success }},
  9. // if the comparison fails, put foo2=fail
  10. failure = {PutRequest { key = foo2, value = failure }},
  11. )
  12. TxnResponse {
  13. cluster_id = 0x1000,
  14. member_id = 0x1,
  15. revision = 3,
  16. raft_term = 0x1,
  17. succeeded = true,
  18. responses = {
  19. // response of PUT foo2=success
  20. {
  21. cluster_id = 0x1000,
  22. member_id = 0x1,
  23. revision = 3,
  24. raft_term = 0x1,
  25. }
  26. }
  27. }

Watch on a key/range

  1. Watch( WatchRequest{
  2. key = foo,
  3. end_key = fop, // prefix foo
  4. start_revision = 20,
  5. end_revision = 10000,
  6. // server decided notification frequency
  7. progress_notification = true,
  8. }
  9. // this can be a watch request stream
  10. )
  11. // put (foo0=bar0) event at 3
  12. WatchResponse {
  13. cluster_id = 0x1000,
  14. member_id = 0x1,
  15. revision = 3,
  16. raft_term = 0x1,
  17. event_type = put,
  18. kv = {
  19. key = foo0,
  20. value = bar0,
  21. create_revision = 1,
  22. mod_revision = 1,
  23. version = 1;
  24. },
  25. }
  26. // a notification at 2000
  27. WatchResponse {
  28. cluster_id = 0x1000,
  29. member_id = 0x1,
  30. revision = 2000,
  31. raft_term = 0x1,
  32. // nil event as notification
  33. }
  34. // put (foo0=bar3000) event at 3000
  35. WatchResponse {
  36. cluster_id = 0x1000,
  37. member_id = 0x1,
  38. revision = 3000,
  39. raft_term = 0x1,
  40. event_type = put,
  41. kv = {
  42. key = foo0,
  43. value = bar3000,
  44. create_revision = 1,
  45. mod_revision = 3000,
  46. version = 2;
  47. },
  48. }