5.21. Thrift Connector

The Thrift connector makes it possible to integrate with external storage systemswithout a custom Presto connector implementation.

In order to use the Thrift connector with an external system, you need to implementthe PrestoThriftService interface, found below. Next, you configure the Thrift connectorto point to a set of machines, called Thrift servers, that implement the interface.As part of the interface implementation, the Thrift servers will provide metadata,splits and data. The connector will randomly choose a server to talk to from the availableinstances for metadata calls, or for data calls unless the splits include a list of addresses.All requests are assumed to be idempotent and can be retried freely among any server.

Configuration

To configure the Thrift connector, create a catalog properties fileetc/catalog/thrift.properties with the following content,replacing the properties as appropriate:

  1. connector.name=presto-thrift
  2. presto.thrift.client.addresses=host:port,host:port

Multiple Thrift Systems

You can have as many catalogs as you need, so if you have additionalThrift systems to connect to, simply add another properties file to etc/catalogwith a different name (making sure it ends in .properties).

Configuration Properties

The following configuration properties are available:

Property NameDescription
presto.thrift.client.addressesLocation of Thrift servers
presto-thrift.max-response-sizeMaximum size of data returned from Thrift server
presto-thrift.metadata-refresh-threadsNumber of refresh threads for metadata cache
presto.thrift.client.max-retriesMaximum number of retries for failed Thrift requests
presto.thrift.client.max-backoff-delayMaximum interval between retry attempts
presto.thrift.client.min-backoff-delayMinimum interval between retry attempts
presto.thrift.client.max-retry-timeMaximum duration across all attempts of a Thrift request
presto.thrift.client.backoff-scale-factorScale factor for exponential back off
presto.thrift.client.connect-timeoutConnect timeout
presto.thrift.client.request-timeoutRequest timeout
presto.thrift.client.socks-proxySOCKS proxy address
presto.thrift.client.max-frame-sizeMaximum size of a raw Thrift response
presto.thrift.client.transportThrift transport type (UNFRAMED, FRAMED, HEADER)
presto.thrift.client.protocolThrift protocol type (BINARY, COMPACT, FB_COMPACT)

presto.thrift.client.addresses

Comma-separated list of thrift servers in the form of host:port. For example:

  1. presto.thrift.client.addresses=192.0.2.3:7777,192.0.2.4:7779

This property is required; there is no default.

presto-thrift.max-response-size

Maximum size of a data response that the connector accepts. This value is sentby the connector to the Thrift server when requesting data, allowing it to sizethe response appropriately.

This property is optional; the default is 16MB.

presto-thrift.metadata-refresh-threads

Number of refresh threads for metadata cache.

This property is optional; the default is 1.

Thrift IDL File

The following IDL describes the PrestoThriftService that must be implemented:

  1. enum PrestoThriftBound {
  2. BELOW = 1;
  3. EXACTLY = 2;
  4. ABOVE = 3;
  5. }
  6.  
  7. exception PrestoThriftServiceException {
  8. 1: string message;
  9. 2: bool retryable;
  10. }
  11.  
  12. struct PrestoThriftNullableSchemaName {
  13. 1: optional string schemaName;
  14. }
  15.  
  16. struct PrestoThriftSchemaTableName {
  17. 1: string schemaName;
  18. 2: string tableName;
  19. }
  20.  
  21. struct PrestoThriftTableMetadata {
  22. 1: PrestoThriftSchemaTableName schemaTableName;
  23. 2: list<PrestoThriftColumnMetadata> columns;
  24. 3: optional string comment;
  25.  
  26. /**
  27. * Returns a list of key sets which can be used for index lookups.
  28. * The list is expected to have only unique key sets.
  29. * {@code set<set<string>>} is not used here because some languages (like php) don't support it.
  30. */
  31. 4: optional list<set<string>> indexableKeys;
  32. }
  33.  
  34. struct PrestoThriftColumnMetadata {
  35. 1: string name;
  36. 2: string type;
  37. 3: optional string comment;
  38. 4: bool hidden;
  39. }
  40.  
  41. struct PrestoThriftNullableColumnSet {
  42. 1: optional set<string> columns;
  43. }
  44.  
  45. struct PrestoThriftTupleDomain {
  46. /**
  47. * Return a map of column names to constraints.
  48. */
  49. 1: optional map<string, PrestoThriftDomain> domains;
  50. }
  51.  
  52. /**
  53. * Set that either includes all values, or excludes all values.
  54. */
  55. struct PrestoThriftAllOrNoneValueSet {
  56. 1: bool all;
  57. }
  58.  
  59. /**
  60. * A set containing values that are uniquely identifiable.
  61. * Assumes an infinite number of possible values. The values may be collectively included (aka whitelist)
  62. * or collectively excluded (aka !whitelist).
  63. * This structure is used with comparable, but not orderable types like "json", "map".
  64. */
  65. struct PrestoThriftEquatableValueSet {
  66. 1: bool whiteList;
  67. 2: list<PrestoThriftBlock> values;
  68. }
  69.  
  70. /**
  71. * Elements of {@code nulls} array determine if a value for a corresponding row is null.
  72. * Elements of {@code ints} array are values for each row. If row is null then value is ignored.
  73. */
  74. struct PrestoThriftInteger {
  75. 1: optional list<bool> nulls;
  76. 2: optional list<i32> ints;
  77. }
  78.  
  79. /**
  80. * Elements of {@code nulls} array determine if a value for a corresponding row is null.
  81. * Elements of {@code longs} array are values for each row. If row is null then value is ignored.
  82. */
  83. struct PrestoThriftBigint {
  84. 1: optional list<bool> nulls;
  85. 2: optional list<i64> longs;
  86. }
  87.  
  88. /**
  89. * Elements of {@code nulls} array determine if a value for a corresponding row is null.
  90. * Elements of {@code doubles} array are values for each row. If row is null then value is ignored.
  91. */
  92. struct PrestoThriftDouble {
  93. 1: optional list<bool> nulls;
  94. 2: optional list<double> doubles;
  95. }
  96.  
  97. /**
  98. * Elements of {@code nulls} array determine if a value for a corresponding row is null.
  99. * Each elements of {@code sizes} array contains the length in bytes for the corresponding element.
  100. * If row is null then the corresponding element in {@code sizes} is ignored.
  101. * {@code bytes} array contains UTF-8 encoded byte values.
  102. * Values for all rows are written to {@code bytes} array one after another.
  103. * The total number of bytes must be equal to the sum of all sizes.
  104. */
  105. struct PrestoThriftVarchar {
  106. 1: optional list<bool> nulls;
  107. 2: optional list<i32> sizes;
  108. 3: optional binary bytes;
  109. }
  110.  
  111. /**
  112. * Elements of {@code nulls} array determine if a value for a corresponding row is null.
  113. * Elements of {@code booleans} array are values for each row. If row is null then value is ignored.
  114. */
  115. struct PrestoThriftBoolean {
  116. 1: optional list<bool> nulls;
  117. 2: optional list<bool> booleans;
  118. }
  119.  
  120. /**
  121. * Elements of {@code nulls} array determine if a value for a corresponding row is null.
  122. * Elements of {@code dates} array are date values for each row represented as the number
  123. * of days passed since 1970-01-01.
  124. * If row is null then value is ignored.
  125. */
  126. struct PrestoThriftDate {
  127. 1: optional list<bool> nulls;
  128. 2: optional list<i32> dates;
  129. }
  130.  
  131. /**
  132. * Elements of {@code nulls} array determine if a value for a corresponding row is null.
  133. * Elements of {@code timestamps} array are values for each row represented as the number
  134. * of milliseconds passed since 1970-01-01T00:00:00 UTC.
  135. * If row is null then value is ignored.
  136. */
  137. struct PrestoThriftTimestamp {
  138. 1: optional list<bool> nulls;
  139. 2: optional list<i64> timestamps;
  140. }
  141.  
  142. /**
  143. * Elements of {@code nulls} array determine if a value for a corresponding row is null.
  144. * Each elements of {@code sizes} array contains the length in bytes for the corresponding element.
  145. * If row is null then the corresponding element in {@code sizes} is ignored.
  146. * {@code bytes} array contains UTF-8 encoded byte values for string representation of json.
  147. * Values for all rows are written to {@code bytes} array one after another.
  148. * The total number of bytes must be equal to the sum of all sizes.
  149. */
  150. struct PrestoThriftJson {
  151. 1: optional list<bool> nulls;
  152. 2: optional list<i32> sizes;
  153. 3: optional binary bytes;
  154. }
  155.  
  156. /**
  157. * Elements of {@code nulls} array determine if a value for a corresponding row is null.
  158. * Each elements of {@code sizes} array contains the length in bytes for the corresponding element.
  159. * If row is null then the corresponding element in {@code sizes} is ignored.
  160. * {@code bytes} array contains encoded byte values for HyperLogLog representation as defined in
  161. * Airlift specification: href="https://github.com/airlift/airlift/blob/master/stats/docs/hll.md
  162. * Values for all rows are written to {@code bytes} array one after another.
  163. * The total number of bytes must be equal to the sum of all sizes.
  164. */
  165. struct PrestoThriftHyperLogLog {
  166. 1: optional list<bool> nulls;
  167. 2: optional list<i32> sizes;
  168. 3: optional binary bytes;
  169. }
  170.  
  171. /**
  172. * Elements of {@code nulls} array determine if a value for a corresponding row is null.
  173. * Each elements of {@code sizes} array contains the number of elements in the corresponding values array.
  174. * If row is null then the corresponding element in {@code sizes} is ignored.
  175. * {@code values} is a bigint block containing array elements one after another for all rows.
  176. * The total number of elements in bigint block must be equal to the sum of all sizes.
  177. */
  178. struct PrestoThriftBigintArray {
  179. 1: optional list<bool> nulls;
  180. 2: optional list<i32> sizes;
  181. 3: optional PrestoThriftBigint values;
  182. }
  183.  
  184. /**
  185. * A set containing zero or more Ranges of the same type over a continuous space of possible values.
  186. * Ranges are coalesced into the most compact representation of non-overlapping Ranges.
  187. * This structure is used with comparable and orderable types like bigint, integer, double, varchar, etc.
  188. */
  189. struct PrestoThriftRangeValueSet {
  190. 1: list<PrestoThriftRange> ranges;
  191. }
  192.  
  193. struct PrestoThriftId {
  194. 1: binary id;
  195. }
  196.  
  197. struct PrestoThriftSplitBatch {
  198. 1: list<PrestoThriftSplit> splits;
  199. 2: optional PrestoThriftId nextToken;
  200. }
  201.  
  202. struct PrestoThriftSplit {
  203. /**
  204. * Encodes all the information needed to identify a batch of rows to return to Presto.
  205. * For a basic scan, includes schema name, table name, and output constraint.
  206. * For an index scan, includes schema name, table name, set of keys to lookup and output constraint.
  207. */
  208. 1: PrestoThriftId splitId;
  209.  
  210. /**
  211. * Identifies the set of hosts on which the rows are available. If empty, then the rows
  212. * are expected to be available on any host. The hosts in this list may be independent
  213. * from the hosts used to serve metadata requests.
  214. */
  215. 2: list<PrestoThriftHostAddress> hosts;
  216. }
  217.  
  218. struct PrestoThriftHostAddress {
  219. 1: string host;
  220. 2: i32 port;
  221. }
  222.  
  223. struct PrestoThriftPageResult {
  224. /**
  225. * Returns data in a columnar format.
  226. * Columns in this list must be in the order they were requested by the engine.
  227. */
  228. 1: list<PrestoThriftBlock> columnBlocks;
  229.  
  230. 2: i32 rowCount;
  231. 3: optional PrestoThriftId nextToken;
  232. }
  233.  
  234. struct PrestoThriftNullableTableMetadata {
  235. 1: optional PrestoThriftTableMetadata tableMetadata;
  236. }
  237.  
  238. struct PrestoThriftValueSet {
  239. 1: optional PrestoThriftAllOrNoneValueSet allOrNoneValueSet;
  240. 2: optional PrestoThriftEquatableValueSet equatableValueSet;
  241. 3: optional PrestoThriftRangeValueSet rangeValueSet;
  242. }
  243.  
  244. struct PrestoThriftBlock {
  245. 1: optional PrestoThriftInteger integerData;
  246. 2: optional PrestoThriftBigint bigintData;
  247. 3: optional PrestoThriftDouble doubleData;
  248. 4: optional PrestoThriftVarchar varcharData;
  249. 5: optional PrestoThriftBoolean booleanData;
  250. 6: optional PrestoThriftDate dateData;
  251. 7: optional PrestoThriftTimestamp timestampData;
  252. 8: optional PrestoThriftJson jsonData;
  253. 9: optional PrestoThriftHyperLogLog hyperLogLogData;
  254. 10: optional PrestoThriftBigintArray bigintArrayData;
  255. }
  256.  
  257. /**
  258. * LOWER UNBOUNDED is specified with an empty value and an ABOVE bound
  259. * UPPER UNBOUNDED is specified with an empty value and a BELOW bound
  260. */
  261. struct PrestoThriftMarker {
  262. 1: optional PrestoThriftBlock value;
  263. 2: PrestoThriftBound bound;
  264. }
  265.  
  266. struct PrestoThriftNullableToken {
  267. 1: optional PrestoThriftId token;
  268. }
  269.  
  270. struct PrestoThriftDomain {
  271. 1: PrestoThriftValueSet valueSet;
  272. 2: bool nullAllowed;
  273. }
  274.  
  275. struct PrestoThriftRange {
  276. 1: PrestoThriftMarker low;
  277. 2: PrestoThriftMarker high;
  278. }
  279.  
  280. /**
  281. * Presto Thrift service definition.
  282. * This thrift service needs to be implemented in order to be used with Thrift Connector.
  283. */
  284. service PrestoThriftService {
  285. /**
  286. * Returns available schema names.
  287. */
  288. list<string> prestoListSchemaNames()
  289. throws (1: PrestoThriftServiceException ex1);
  290.  
  291. /**
  292. * Returns tables for the given schema name.
  293. *
  294. * @param schemaNameOrNull a structure containing schema name or {@literal null}
  295. * @return a list of table names with corresponding schemas. If schema name is null then returns
  296. * a list of tables for all schemas. Returns an empty list if a schema does not exist
  297. */
  298. list<PrestoThriftSchemaTableName> prestoListTables(
  299. 1: PrestoThriftNullableSchemaName schemaNameOrNull)
  300. throws (1: PrestoThriftServiceException ex1);
  301.  
  302. /**
  303. * Returns metadata for a given table.
  304. *
  305. * @param schemaTableName schema and table name
  306. * @return metadata for a given table, or a {@literal null} value inside if it does not exist
  307. */
  308. PrestoThriftNullableTableMetadata prestoGetTableMetadata(
  309. 1: PrestoThriftSchemaTableName schemaTableName)
  310. throws (1: PrestoThriftServiceException ex1);
  311.  
  312. /**
  313. * Returns a batch of splits.
  314. *
  315. * @param schemaTableName schema and table name
  316. * @param desiredColumns a superset of columns to return; empty set means "no columns", {@literal null} set means "all columns"
  317. * @param outputConstraint constraint on the returned data
  318. * @param maxSplitCount maximum number of splits to return
  319. * @param nextToken token from a previous split batch or {@literal null} if it is the first call
  320. * @return a batch of splits
  321. */
  322. PrestoThriftSplitBatch prestoGetSplits(
  323. 1: PrestoThriftSchemaTableName schemaTableName,
  324. 2: PrestoThriftNullableColumnSet desiredColumns,
  325. 3: PrestoThriftTupleDomain outputConstraint,
  326. 4: i32 maxSplitCount,
  327. 5: PrestoThriftNullableToken nextToken)
  328. throws (1: PrestoThriftServiceException ex1);
  329.  
  330. /**
  331. * Returns a batch of index splits for the given batch of keys.
  332. * This method is called if index join strategy is chosen for a query.
  333. *
  334. * @param schemaTableName schema and table name
  335. * @param indexColumnNames specifies columns and their order for keys
  336. * @param outputColumnNames a list of column names to return
  337. * @param keys keys for which records need to be returned; includes only unique and non-null values
  338. * @param outputConstraint constraint on the returned data
  339. * @param maxSplitCount maximum number of splits to return
  340. * @param nextToken token from a previous split batch or {@literal null} if it is the first call
  341. * @return a batch of splits
  342. */
  343. PrestoThriftSplitBatch prestoGetIndexSplits(
  344. 1: PrestoThriftSchemaTableName schemaTableName,
  345. 2: list<string> indexColumnNames,
  346. 3: list<string> outputColumnNames,
  347. 4: PrestoThriftPageResult keys,
  348. 5: PrestoThriftTupleDomain outputConstraint,
  349. 6: i32 maxSplitCount,
  350. 7: PrestoThriftNullableToken nextToken)
  351. throws (1: PrestoThriftServiceException ex1);
  352.  
  353. /**
  354. * Returns a batch of rows for the given split.
  355. *
  356. * @param splitId split id as returned in split batch
  357. * @param columns a list of column names to return
  358. * @param maxBytes maximum size of returned data in bytes
  359. * @param nextToken token from a previous batch or {@literal null} if it is the first call
  360. * @return a batch of table data
  361. */
  362. PrestoThriftPageResult prestoGetRows(
  363. 1: PrestoThriftId splitId,
  364. 2: list<string> columns,
  365. 3: i64 maxBytes,
  366. 4: PrestoThriftNullableToken nextToken)
  367. throws (1: PrestoThriftServiceException ex1);
  368. }