Apache HBase External APIs

本章包含访问HBase的非Java语言及自定义协议的相关内容。

REST

代表性状态传输(Representational State Transfer )在2000年由Roy Fielding在其博士论文中提出(Roy Fielding是HTTP的主要作者之一。)

REST本身并不在本文讨论的范围,REST允许客户-服务通过URL绑定在一起的API进行交互。本节讨论如何在HBase上配置并运行REST服务,将HBase表,行,cells以及元数据作为URL中特定的资源。

76.1.Starting and Stopping the REST Server

REST服务可以作为集成的Jetty服务容器中的一部分,其服务部署于Jetty中。可以使用下面的命令在前后台来启动REST服务。端口是可选的,默认为8080。

  1. # Foreground
  2. $ bin/hbase rest start -p <port>
  3. # Background, logging to a file in $HBASE_LOGS_DIR
  4. $ bin/hbase-daemon.sh start rest -p <port>

在前台运行时,使用ctrl-c来停止REST服务,在后台运行时,使用下面命令

  1. $ bin/hbase-daemon.sh stop rest

76.2. 配置REST服务和客户端

For information about configuring the REST server and client for SSL, as well as doAs impersonation for the REST server, see Configure the Thrift Gateway to Authenticate on Behalf of the Client and other portions of the Securing Apache HBase chapter.

76.3. 使用REST Endpoint

Table 11. Cluster-Wide Endpoints
Endpoint HTTP Verb Description Example

/version/cluster

GET

集群上运行的HBase版本

  1. curl -vi -X GET \
  2. -H Accept: text/xml \
  3. http://example.com:8000/version/cluster

/status/cluster

GET

集群状态

  1. curl -vi -X GET \
  2. -H Accept: text/xml \
  3. http://example.com:8000/status/cluster

/

GET

列出所有非系统表

  1. curl -vi -X GET \
  2. -H Accept: text/xml \
  3. http://example.com:8000/
Table 12. Namespace Endpoints
Endpoint HTTP Verb Description Example

/namespaces

GET

列出所有命名空间

  1. curl -vi -X GET \
  2. -H Accept: text/xml \
  3. http://example.com:8000/namespaces/

/namespaces/namespace

GET

描述某个特定的命名空间

  1. curl -vi -X GET \
  2. -H Accept: text/xml \
  3. http://example.com:8000/namespaces/special_ns

/namespaces/namespace

POST

创建新的命名空间

  1. curl -vi -X POST \
  2. -H Accept: text/xml \
  3. example.com:8000/namespaces/special_ns

/namespaces/namespace/tables

GET

列出指定命名空间中指定的表格

  1. curl -vi -X GET \
  2. -H Accept: text/xml \
  3. http://example.com:8000/namespaces/special_ns/tables

/namespaces/namespace

PUT

修改某个已存在的表格。当前不被使用

  1. curl -vi -X PUT \
  2. -H Accept: text/xml \
  3. http://example.com:8000/namespaces/special_ns

/namespaces/namespace

DELETE

删除一个命名空间,其必须为空

  1. curl -vi -X DELETE \
  2. -H Accept: text/xml \
  3. example.com:8000/namespaces/special_ns
Table 13. Table Endpoints
Endpoint HTTP Verb Description Example

/table/schema

GET

描述指定表的结构

  1. curl -vi -X GET \
  2. -H Accept: text/xml \
  3. http://example.com:8000/users/schema

/table/schema

POST

创建一张新表或替换存在的表

  1. curl -vi -X POST \
  2. -H Accept: text/xml \
  3. -H Content-Type: text/xml \
  4. -d ‘<?xml version=”1.0 encoding=”UTF-8”?><TableSchema name=”users”><ColumnSchema name=”cf /></TableSchema>’ \
  5. http://example.com:8000/users/schema

/table/schema

PUT

使用提供的表结构来更新现有表

  1. curl -vi -X PUT \
  2. -H Accept: text/xml \
  3. -H Content-Type: text/xml \
  4. -d ‘<?xml version=”1.0 encoding=”UTF-8”?><TableSchema name=”users”><ColumnSchema name=”cf KEEP_DELETED_CELLS=”true /></TableSchema>’ \
  5. http://example.com:8000/users/schema

/table/schema

DELETE

删除表。 必须使用 /table/schema 这种形式, 不能仅指定/table/.

  1. curl -vi -X DELETE \
  2. -H Accept: text/xml \
  3. http://example.com:8000/users/schema

/table/regions

GET

列出表区域

  1. curl -vi -X GET \
  2. -H Accept: text/xml \
  3. http://example.com:8000/users/regions
Table 14. Endpoints for Get Operations
Endpoint HTTP Verb Description Example

/table/row/column:qualifier/timestamp

GET

得到某行的值,该值是经过Base-64编码的

  1. curl -vi -X GET \
  2. -H Accept: text/xml \
  3. http://example.com:8000/users/row1
  4.  
  5. curl -vi -X GET \
  6. -H Accept: text/xml \
  7. http://example.com:8000/users/row1/cf:a/1458586888395

/table/row/column:qualifier

GET

获取某列的值,该值是经过Base-64编码的。

  1. curl -vi -X GET \
  2. -H Accept: text/xml \
  3. http://example.com:8000/users/row1/cf:a
  4.  
  5. curl -vi -X GET \
  6. -H Accept: text/xml \
  7. http://example.com:8000/users/row1/cf:a/

/table/row/column:qualifier/?v=number_of_versions

GET

获取某个cell指定版本数量的值,该值经过Base-64编码的。

  1. curl -vi -X GET \
  2. -H Accept: text/xml \
  3. http://example.com:8000/users/row1/cf:a?v=2
Table 15. Endpoints for Scan Operations
Endpoint HTTP Verb Description Example

/table/scanner/

PUT

获取一个扫描对象,其他所有Scan都要使用。 调整批量参数,为扫描时应返回的行数。看下下个例子给你的scanner增加过滤器。The scanner endpoint URL is returned as the Location in the HTTP response. The other examples in this table assume that the scanner endpoint is http://example.com:8000/users/scanner/145869072824375522207.

  1. curl -vi -X PUT \
  2. -H Accept: text/xml \
  3. -H Content-Type: text/xml \
  4. -d ‘<Scanner batch=”1”/>’ \
  5. http://example.com:8000/users/scanner/

/table/scanner/

PUT

要给Scanner对象提供过滤器 或配置Scanner,可以创建一个文本文件并将你的过滤器加到里边。 例如,要返回keys start with <codeph>u123</codeph> 的唯一行,而batch size为100, 过滤文件如下:

  1. <Scanner batch=”100”>
  2. <filter>
  3. {
  4. “type”: “PrefixFilter”,
  5. “value”: “u123”
  6. }
  7. </filter>
  8. </Scanner>

Pass the file to the -d argument of the curl request.

  1. curl -vi -X PUT \
  2. -H Accept: text/xml \
  3. -H Content-Type:text/xml \
  4. -d @filter.txt \
  5. http://example.com:8000/users/scanner/

/table/scanner/scanner-id

GET

扫描获取下一批. Cell 值是字节编码的. 如果scanner exhausted , HTTP status 204 is returned.

  1. curl -vi -X GET \
  2. -H Accept: text/xml \
  3. http://example.com:8000/users/scanner/145869072824375522207

table/scanner/scanner-id

DELETE

删除scanner,释放使用的资源

  1. curl -vi -X DELETE \
  2. -H Accept: text/xml \
  3. http://example.com:8000/users/scanner/145869072824375522207
Table 16. Endpoints for Put Operations
Endpoint HTTP Verb Description Example

/table/row_key

PUT

在表中增加一行。 行,列族,以及其值必须是Base-64编码的.要编码一段字符串,使用 命令行工具base64 . 要解码字符串base64 -d. 负载在参数 —data , and the /users/fakerow值为placeholder. 通过将它们加入<CellSet>来实现插入多行。也可以将数据保存到文件中并pass it to the -d parameter with syntax like -d @filename.txt.

  1. curl -vi -X PUT \
  2. -H Accept: text/xml \
  3. -H Content-Type: text/xml \
  4. -d ‘<?xml version=”1.0 encoding=”UTF-8 standalone=”yes”?><CellSet><Row key=”cm93NQo=”><Cell column=”Y2Y6ZQo=”>dmFsdWU1Cg==</Cell></Row></CellSet>’ \
  5. http://example.com:8000/users/fakerow
  6.  
  7. curl -vi -X PUT \
  8. -H Accept: text/json \
  9. -H Content-Type: text/json \
  10. -d ‘{“Row”:[{“key”:”cm93NQo=”, Cell”: [{“column”:”Y2Y6ZQo=”, $”:”dmFsdWU1Cg==”}]}]}’’ \
  11. example.com:8000/users/fakerow

76.4. REST XML Schema

  1. <schema xmlns="http://www.w3.org/2001/XMLSchema" xmlns:tns="RESTSchema">
  2. <element name="Version" type="tns:Version"></element>
  3. <complexType name="Version">
  4. <attribute name="REST" type="string"></attribute>
  5. <attribute name="JVM" type="string"></attribute>
  6. <attribute name="OS" type="string"></attribute>
  7. <attribute name="Server" type="string"></attribute>
  8. <attribute name="Jersey" type="string"></attribute>
  9. </complexType>
  10. <element name="TableList" type="tns:TableList"></element>
  11. <complexType name="TableList">
  12. <sequence>
  13. <element name="table" type="tns:Table" maxOccurs="unbounded" minOccurs="1"></element>
  14. </sequence>
  15. </complexType>
  16. <complexType name="Table">
  17. <sequence>
  18. <element name="name" type="string"></element>
  19. </sequence>
  20. </complexType>
  21. <element name="TableInfo" type="tns:TableInfo"></element>
  22. <complexType name="TableInfo">
  23. <sequence>
  24. <element name="region" type="tns:TableRegion" maxOccurs="unbounded" minOccurs="1"></element>
  25. </sequence>
  26. <attribute name="name" type="string"></attribute>
  27. </complexType>
  28. <complexType name="TableRegion">
  29. <attribute name="name" type="string"></attribute>
  30. <attribute name="id" type="int"></attribute>
  31. <attribute name="startKey" type="base64Binary"></attribute>
  32. <attribute name="endKey" type="base64Binary"></attribute>
  33. <attribute name="location" type="string"></attribute>
  34. </complexType>
  35. <element name="TableSchema" type="tns:TableSchema"></element>
  36. <complexType name="TableSchema">
  37. <sequence>
  38. <element name="column" type="tns:ColumnSchema" maxOccurs="unbounded" minOccurs="1"></element>
  39. </sequence>
  40. <attribute name="name" type="string"></attribute>
  41. <anyAttribute></anyAttribute>
  42. </complexType>
  43. <complexType name="ColumnSchema">
  44. <attribute name="name" type="string"></attribute>
  45. <anyAttribute></anyAttribute>
  46. </complexType>
  47. <element name="CellSet" type="tns:CellSet"></element>
  48. <complexType name="CellSet">
  49. <sequence>
  50. <element name="row" type="tns:Row" maxOccurs="unbounded" minOccurs="1"></element>
  51. </sequence>
  52. </complexType>
  53. <element name="Row" type="tns:Row"></element>
  54. <complexType name="Row">
  55. <sequence>
  56. <element name="key" type="base64Binary"></element>
  57. <element name="cell" type="tns:Cell" maxOccurs="unbounded" minOccurs="1"></element>
  58. </sequence>
  59. </complexType>
  60. <element name="Cell" type="tns:Cell"></element>
  61. <complexType name="Cell">
  62. <sequence>
  63. <element name="value" maxOccurs="1" minOccurs="1">
  64. <simpleType><restriction base="base64Binary">
  65. </simpleType>
  66. </element>
  67. </sequence>
  68. <attribute name="column" type="base64Binary" />
  69. <attribute name="timestamp" type="int" />
  70. </complexType>
  71. <element name="Scanner" type="tns:Scanner"></element>
  72. <complexType name="Scanner">
  73. <sequence>
  74. <element name="column" type="base64Binary" minOccurs="0" maxOccurs="unbounded"></element>
  75. </sequence>
  76. <sequence>
  77. <element name="filter" type="string" minOccurs="0" maxOccurs="1"></element>
  78. </sequence>
  79. <attribute name="startRow" type="base64Binary"></attribute>
  80. <attribute name="endRow" type="base64Binary"></attribute>
  81. <attribute name="batch" type="int"></attribute>
  82. <attribute name="startTime" type="int"></attribute>
  83. <attribute name="endTime" type="int"></attribute>
  84. </complexType>
  85. <element name="StorageClusterVersion" type="tns:StorageClusterVersion" />
  86. <complexType name="StorageClusterVersion">
  87. <attribute name="version" type="string"></attribute>
  88. </complexType>
  89. <element name="StorageClusterStatus"
  90. type="tns:StorageClusterStatus">
  91. </element>
  92. <complexType name="StorageClusterStatus">
  93. <sequence>
  94. <element name="liveNode" type="tns:Node"
  95. maxOccurs="unbounded" minOccurs="0">
  96. </element>
  97. <element name="deadNode" type="string" maxOccurs="unbounded"
  98. minOccurs="0">
  99. </element>
  100. </sequence>
  101. <attribute name="regions" type="int"></attribute>
  102. <attribute name="requests" type="int"></attribute>
  103. <attribute name="averageLoad" type="float"></attribute>
  104. </complexType>
  105. <complexType name="Node">
  106. <sequence>
  107. <element name="region" type="tns:Region"
  108. maxOccurs="unbounded" minOccurs="0">
  109. </element>
  110. </sequence>
  111. <attribute name="name" type="string"></attribute>
  112. <attribute name="startCode" type="int"></attribute>
  113. <attribute name="requests" type="int"></attribute>
  114. <attribute name="heapSizeMB" type="int"></attribute>
  115. <attribute name="maxHeapSizeMB" type="int"></attribute>
  116. </complexType>
  117. <complexType name="Region">
  118. <attribute name="name" type="base64Binary"></attribute>
  119. <attribute name="stores" type="int"></attribute>
  120. <attribute name="storefiles" type="int"></attribute>
  121. <attribute name="storefileSizeMB" type="int"></attribute>
  122. <attribute name="memstoreSizeMB" type="int"></attribute>
  123. <attribute name="storefileIndexSizeMB" type="int"></attribute>
  124. </complexType>
  125. </schema>

76.5. REST Protobufs Schema

  1. message Version {
  2. optional string restVersion = 1;
  3. optional string jvmVersion = 2;
  4. optional string osVersion = 3;
  5. optional string serverVersion = 4;
  6. optional string jerseyVersion = 5;
  7. }
  8. message StorageClusterStatus {
  9. message Region {
  10. required bytes name = 1;
  11. optional int32 stores = 2;
  12. optional int32 storefiles = 3;
  13. optional int32 storefileSizeMB = 4;
  14. optional int32 memstoreSizeMB = 5;
  15. optional int32 storefileIndexSizeMB = 6;
  16. }
  17. message Node {
  18. required string name = 1; // name:port
  19. optional int64 startCode = 2;
  20. optional int32 requests = 3;
  21. optional int32 heapSizeMB = 4;
  22. optional int32 maxHeapSizeMB = 5;
  23. repeated Region regions = 6;
  24. }
  25. // node status
  26. repeated Node liveNodes = 1;
  27. repeated string deadNodes = 2;
  28. // summary statistics
  29. optional int32 regions = 3;
  30. optional int32 requests = 4;
  31. optional double averageLoad = 5;
  32. }
  33. message TableList {
  34. repeated string name = 1;
  35. }
  36. message TableInfo {
  37. required string name = 1;
  38. message Region {
  39. required string name = 1;
  40. optional bytes startKey = 2;
  41. optional bytes endKey = 3;
  42. optional int64 id = 4;
  43. optional string location = 5;
  44. }
  45. repeated Region regions = 2;
  46. }
  47. message TableSchema {
  48. optional string name = 1;
  49. message Attribute {
  50. required string name = 1;
  51. required string value = 2;
  52. }
  53. repeated Attribute attrs = 2;
  54. repeated ColumnSchema columns = 3;
  55. // optional helpful encodings of commonly used attributes
  56. optional bool inMemory = 4;
  57. optional bool readOnly = 5;
  58. }
  59. message ColumnSchema {
  60. optional string name = 1;
  61. message Attribute {
  62. required string name = 1;
  63. required string value = 2;
  64. }
  65. repeated Attribute attrs = 2;
  66. // optional helpful encodings of commonly used attributes
  67. optional int32 ttl = 3;
  68. optional int32 maxVersions = 4;
  69. optional string compression = 5;
  70. }
  71. message Cell {
  72. optional bytes row = 1; // unused if Cell is in a CellSet
  73. optional bytes column = 2;
  74. optional int64 timestamp = 3;
  75. optional bytes data = 4;
  76. }
  77. message CellSet {
  78. message Row {
  79. required bytes key = 1;
  80. repeated Cell values = 2;
  81. }
  82. repeated Row rows = 1;
  83. }
  84. message Scanner {
  85. optional bytes startRow = 1;
  86. optional bytes endRow = 2;
  87. repeated bytes columns = 3;
  88. optional int32 batch = 4;
  89. optional int64 startTime = 5;
  90. optional int64 endTime = 6;
  91. }

77. Thrift

Documentation about Thrift has moved to Thrift API and Filter Language.

78. C/C++ Apache HBase Client

FB’s Chip Turner wrote a pure C/C++ client. Check it out.

79. Using Java Data Objects (JDO) with HBase

Example 41. JDO Example

This example uses JDO to create a table and an index, insert a row into a table, get a row, get a column value, perform a query, and do some additional HBase operations.

  1. package com.apache.hadoop.hbase.client.jdo.examples;
  2. import java.io.File;
  3. import java.io.FileInputStream;
  4. import java.io.InputStream;
  5. import java.util.Hashtable;
  6. import org.apache.hadoop.fs.Path;
  7. import org.apache.hadoop.hbase.client.tableindexed.IndexedTable;
  8. import com.apache.hadoop.hbase.client.jdo.AbstractHBaseDBO;
  9. import com.apache.hadoop.hbase.client.jdo.HBaseBigFile;
  10. import com.apache.hadoop.hbase.client.jdo.HBaseDBOImpl;
  11. import com.apache.hadoop.hbase.client.jdo.query.DeleteQuery;
  12. import com.apache.hadoop.hbase.client.jdo.query.HBaseOrder;
  13. import com.apache.hadoop.hbase.client.jdo.query.HBaseParam;
  14. import com.apache.hadoop.hbase.client.jdo.query.InsertQuery;
  15. import com.apache.hadoop.hbase.client.jdo.query.QSearch;
  16. import com.apache.hadoop.hbase.client.jdo.query.SelectQuery;
  17. import com.apache.hadoop.hbase.client.jdo.query.UpdateQuery;
  18. /*
  19. Hbase JDO Example.
  20. dependency library.
  21. - commons-beanutils.jar
  22. - commons-pool-1.5.5.jar
  23. - hbase0.90.0-transactionl.jar
  24. you can expand Delete,Select,Update,Insert Query classes.
  25. /
  26. public class HBaseExample {
  27. public static void main(String[] args) throws Exception {
  28. AbstractHBaseDBO dbo = new HBaseDBOImpl();
  29. //drop if table is already exist.
  30. if(dbo.isTableExist("user")){
  31. dbo.deleteTable("user");
  32. }
  33. //create table*
  34. dbo.createTableIfNotExist("user",HBaseOrder.DESC,"account");
  35. //dbo.createTableIfNotExist("user",HBaseOrder.ASC,"account");
  36. //create index.
  37. String[] cols={"id","name"};
  38. dbo.addIndexExistingTable("user","account",cols);
  39. //insert
  40. InsertQuery insert = dbo.createInsertQuery("user");
  41. UserBean bean = new UserBean();
  42. bean.setFamily("account");
  43. bean.setAge(20);
  44. bean.setEmail("ncanis@gmail.com");
  45. bean.setId("ncanis");
  46. bean.setName("ncanis");
  47. bean.setPassword("1111");
  48. insert.insert(bean);
  49. //select 1 row
  50. SelectQuery select = dbo.createSelectQuery("user");
  51. UserBean resultBean = (UserBean)select.select(bean.getRow(),UserBean.class);
  52. // select column value.
  53. String value = (String)select.selectColumn(bean.getRow(),"account","id",String.class);
  54. // search with option (QSearch has EQUAL, NOT_EQUAL, LIKE)
  55. // select id,password,name,email from account where id=’ncanis’ limit startRow,20
  56. HBaseParam param = new HBaseParam();
  57. param.setPage(bean.getRow(),20);
  58. param.addColumn("id","password","name","email");
  59. param.addSearchOption("id","ncanis",QSearch.EQUAL);
  60. select.search("account", param, UserBean.class);
  61. // search column value is existing.
  62. boolean isExist = select.existColumnValue("account","id","ncanis".getBytes());
  63. // update password.
  64. UpdateQuery update = dbo.createUpdateQuery("user");
  65. Hashtable<String, byte[]> colsTable = new Hashtable<String, byte[]>();
  66. colsTable.put("password","2222".getBytes());
  67. update.update(bean.getRow(),"account",colsTable);
  68. //delete
  69. DeleteQuery delete = dbo.createDeleteQuery("user");
  70. delete.deleteRow(resultBean.getRow());
  71. ////////////////////////////////////
  72. // etc
  73. // HTable pool with apache commons pool
  74. // borrow and release. HBasePoolManager(maxActive, minIdle etc..)
  75. IndexedTable table = dbo.getPool().borrow("user");
  76. dbo.getPool().release(table);
  77. // upload bigFile by hadoop directly.
  78. HBaseBigFile bigFile = new HBaseBigFile();
  79. File file = new File("doc/movie.avi");
  80. FileInputStream fis = new FileInputStream(file);
  81. Path rootPath = new Path("/files/");
  82. String filename = "movie.avi";
  83. bigFile.uploadFile(rootPath,filename,fis,true);
  84. // receive file stream from hadoop.
  85. Path p = new Path(rootPath,filename);
  86. InputStream is = bigFile.path2Stream(p,4096);
  87. }
  88. }