HDFS

Name

hdfs

Description

HDFS table-valued-function(tvf), allows users to read and access file contents on S3-compatible object storage, just like accessing relational table. Currently supports csv/csv_with_names/csv_with_names_and_types/json/parquet/orc file format.

grammer

  1. hdfs(
  2. "uri" = "..",
  3. "fs.defaultFS" = "...",
  4. "hadoop.username" = "...",
  5. "format" = "csv",
  6. "keyn" = "valuen"
  7. ...
  8. );

parameter description

Related parameters for accessing hdfs:

  • uri: (required) hdfs uri.
  • fs.defaultFS: (required)
  • hadoop.username: (required) Can be any string, but cannot be empty.
  • hadoop.security.authentication: (optional)
  • hadoop.username: (optional)
  • hadoop.kerberos.principal: (optional)
  • hadoop.kerberos.keytab: (optional)
  • dfs.client.read.shortcircuit: (optional)
  • dfs.domain.socket.path: (optional)

File format parameters:

  • format: (required) Currently support csv/csv_with_names/csv_with_names_and_types/json/parquet/orc

  • column_separator: (optional) default ,.

  • line_delimiter: (optional) default \n.

    The following 6 parameters are used for loading in json format. For specific usage methods, please refer to: Json Load

  • read_json_by_line: (optional) default "true"

  • strip_outer_array: (optional) default "false"

  • json_root: (optional) default ""

  • json_paths: (optional) default ""

  • num_as_string: (optional) default false

  • fuzzy_parse: (optional) default false

Examples

Read and access csv format files on hdfs storage.

  1. MySQL [(none)]> select * from hdfs(
  2. "uri" = "hdfs://127.0.0.1:842/user/doris/csv_format_test/student.csv",
  3. "fs.defaultFS" = "hdfs://127.0.0.1:8424",
  4. "hadoop.username" = "doris",
  5. "format" = "csv");
  6. +------+---------+------+
  7. | c1 | c2 | c3 |
  8. +------+---------+------+
  9. | 1 | alice | 18 |
  10. | 2 | bob | 20 |
  11. | 3 | jack | 24 |
  12. | 4 | jackson | 19 |
  13. | 5 | liming | 18 |
  14. +------+---------+------+

Can be used with desc function :

  1. MySQL [(none)]> desc function hdfs(
  2. "uri" = "hdfs://127.0.0.1:8424/user/doris/csv_format_test/student_with_names.csv",
  3. "fs.defaultFS" = "hdfs://127.0.0.1:8424",
  4. "hadoop.username" = "doris",
  5. "format" = "csv_with_names");

Keywords

  1. hdfs, table-valued-function, tvf

Best Practice

For more detailed usage of HDFS tvf, please refer to S3 tvf, The only difference between them is the way of accessing the storage system.