Reading & Writing Hive Tables

Using the HiveCatalog and Flink’s connector to Hive, Flink can read and write from Hive data as an alternative to Hive’s batch engine. Be sure to follow the instructions to include the correct dependencies in your application.

Reading From Hive

Assume Hive contains a single table in its default database, named people that contains several rows.

  1. hive> show databases;
  2. OK
  3. default
  4. Time taken: 0.841 seconds, Fetched: 1 row(s)
  5. hive> show tables;
  6. OK
  7. Time taken: 0.087 seconds
  8. hive> CREATE TABLE mytable(name string, value double);
  9. OK
  10. Time taken: 0.127 seconds
  11. hive> SELECT * FROM mytable;
  12. OK
  13. Tom 4.72
  14. John 8.0
  15. Tom 24.2
  16. Bob 3.14
  17. Bob 4.72
  18. Tom 34.9
  19. Mary 4.79
  20. Tiff 2.72
  21. Bill 4.33
  22. Mary 77.7
  23. Time taken: 0.097 seconds, Fetched: 10 row(s)

With the data ready your can connect to Hive connect to an existing Hive installation and begin querying.

  1. Flink SQL> show catalogs;
  2. myhive
  3. default_catalog
  4. # ------ Set the current catalog to be 'myhive' catalog if you haven't set it in the yaml file ------
  5. Flink SQL> use catalog myhive;
  6. # ------ See all registered database in catalog 'mytable' ------
  7. Flink SQL> show databases;
  8. default
  9. # ------ See the previously registered table 'mytable' ------
  10. Flink SQL> show tables;
  11. mytable
  12. # ------ The table schema that Flink sees is the same that we created in Hive, two columns - name as string and value as double ------
  13. Flink SQL> describe mytable;
  14. root
  15. |-- name: name
  16. |-- type: STRING
  17. |-- name: value
  18. |-- type: DOUBLE
  19. Flink SQL> SELECT * FROM mytable;
  20. name value
  21. __________ __________
  22. Tom 4.72
  23. John 8.0
  24. Tom 24.2
  25. Bob 3.14
  26. Bob 4.72
  27. Tom 34.9
  28. Mary 4.79
  29. Tiff 2.72
  30. Bill 4.33
  31. Mary 77.7

Writing To Hive

Similarly, data can be written into hive using an INSERT INTO clause.

  1. Flink SQL> INSERT INTO mytable (name, value) VALUES ('Tom', 4.72);

Limitations

The following is a list of major limitations of the Hive connector. And we’re actively working to close these gaps.

  • INSERT OVERWRITE is not supported.
  • Inserting into partitioned tables is not supported.
  • ACID tables are not supported.
  • Bucketed tables are not supported.
  • Some data types are not supported. See the limitations for details.
  • Only a limited number of table storage formats have been tested, namely text, SequenceFile, ORC, and Parquet.
  • Views are not supported.