Running Commands and SQL Statements in impala-shell

For information on available commands, see impala-shell Command Reference. You can see the full set of available commands by pressing TAB twice, for example:

  1. [impalad-host:21000] >
  2. connect describe explain help history insert quit refresh select set shell show use version
  3. [impalad-host:21000] >

Note: Commands must be terminated by a semi-colon. A command can span multiple lines.

For example:

  1. [localhost:21000] > select *
  2. > from t1
  3. > limit 5;
  4. +---------+-----------+
  5. | s1 | s2 |
  6. +---------+-----------+
  7. | hello | world |
  8. | goodbye | cleveland |
  9. +---------+-----------+

A comment is considered part of the statement it precedes, so when you enter a -- or /* */ comment, you get a continuation prompt until you finish entering a statement ending with a semicolon:

  1. [localhost:21000] > -- This is a test comment
  2. > show tables like 't*';
  3. +--------+
  4. | name |
  5. +--------+
  6. | t1 |
  7. | t2 |
  8. | tab1 |
  9. | tab2 |
  10. | tab3 |
  11. | text_t |
  12. +--------+

Use the up-arrow and down-arrow keys to cycle through and edit previous commands. impala-shell uses the readline library and so supports a standard set of keyboard shortcuts for editing and cursor movement, such as Ctrl-A for beginning of line and Ctrl-E for end of line.

In Impala 2.5 and higher, you can define substitution variables to be used within SQL statements processed by impala-shell. On the command line, you specify the option --var=variable_name=value. Within an interactive session or a script file processed by the -f option, you specify a SET command using the notation SET VAR:variable_name=value. Within a SQL statement, you substitute the value by using the notation ${var:variable_name}.

Note: Because this feature is part of impala-shell rather than the impalad backend, make sure the client system you are connecting from has the most recent impala-shell. You can use this feature with a new impala-shell connecting to an older impalad, but not the reverse.

For example, here are some impala-shell commands that define substitution variables and then use them in SQL statements executed through the -q and -f options. Notice how the -q argument strings are single-quoted to prevent shell expansion of the ${var:value} notation, and any string literals within the queries are enclosed by double quotation marks.

  1. $ impala-shell --var=tname=table1 --var=colname=x --var=coltype=string -q 'create table ${var:tname} (${var:colname} ${var:coltype}) stored as parquet'
  2. Starting Impala Shell without Kerberos authentication
  3. Connected to hostname
  4. Server version: impalad_version
  5. Query: create table table1 (x string) stored as parquet
  6. $ NEW_STRING="hello world"
  7. $ impala-shell --var=tname=table1 --var=insert_val="$NEW_STRING" -q 'insert into ${var:tname} values ("${var:insert_val}")'
  8. Starting Impala Shell without Kerberos authentication
  9. Connected to hostname
  10. Server version: impalad_version
  11. Query: insert into table1 values ("hello world")
  12. Inserted 1 row(s) in 1.40s
  13. $ for VAL in foo bar bletch
  14. do
  15. impala-shell --var=tname=table1 --var=insert_val="$VAL" -q 'insert into ${var:tname} values ("${var:insert_val}")'
  16. done
  17. ...
  18. Query: insert into table1 values ("foo")
  19. Inserted 1 row(s) in 0.22s
  20. Query: insert into table1 values ("bar")
  21. Inserted 1 row(s) in 0.11s
  22. Query: insert into table1 values ("bletch")
  23. Inserted 1 row(s) in 0.21s
  24. $ echo "Search for what substring?" ; read answer
  25. Search for what substring?
  26. b
  27. $ impala-shell --var=tname=table1 -q 'select x from ${var:tname} where x like "%${var:answer}%"'
  28. Starting Impala Shell without Kerberos authentication
  29. Connected to hostname
  30. Server version: impalad_version
  31. Query: select x from table1 where x like "%b%"
  32. +--------+
  33. | x |
  34. +--------+
  35. | bletch |
  36. | bar |
  37. +--------+
  38. Fetched 2 row(s) in 0.83s

Here is a substitution variable passed in by the --var option, and then referenced by statements issued interactively. Then the variable is cleared with the UNSET command, and defined again with the SET command.

  1. $ impala-shell --quiet --var=tname=table1
  2. Starting Impala Shell without Kerberos authentication
  3. ***********************************************************************************
  4. banner_message
  5. ***********************************************************************************
  6. [hostname:21000] > select count(*) from ${var:tname};
  7. +----------+
  8. | count(*) |
  9. +----------+
  10. | 4 |
  11. +----------+
  12. [hostname:21000] > unset var:tname;
  13. Unsetting variable TNAME
  14. [hostname:21000] > select count(*) from ${var:tname};
  15. Error: Unknown variable TNAME
  16. [hostname:21000] > set var:tname=table1;
  17. [hostname:21000] > select count(*) from ${var:tname};
  18. +----------+
  19. | count(*) |
  20. +----------+
  21. | 4 |
  22. +----------+

The following example shows how the SOURCE command can execute a series of statements from a file:

  1. $ cat commands.sql
  2. show databases;
  3. show tables in default;
  4. show functions in _impala_builtins like '*minute*';
  5. $ impala-shell -i localhost
  6. ...
  7. [localhost:21000] > source commands.sql;
  8. Query: show databases
  9. +------------------+----------------------------------------------+
  10. | name | comment |
  11. +------------------+----------------------------------------------+
  12. | _impala_builtins | System database for Impala builtin functions |
  13. | default | Default Hive database |
  14. +------------------+----------------------------------------------+
  15. Fetched 2 row(s) in 0.06s
  16. Query: show tables in default
  17. +-----------+
  18. | name |
  19. +-----------+
  20. | customers |
  21. | sample_07 |
  22. | sample_08 |
  23. | web_logs |
  24. +-----------+
  25. Fetched 4 row(s) in 0.02s
  26. Query: show functions in _impala_builtins like '*minute*'
  27. +-------------+--------------------------------+-------------+---------------+
  28. | return type | signature | binary type | is persistent |
  29. +-------------+--------------------------------+-------------+---------------+
  30. | INT | minute(TIMESTAMP) | BUILTIN | true |
  31. | TIMESTAMP | minutes_add(TIMESTAMP, BIGINT) | BUILTIN | true |
  32. | TIMESTAMP | minutes_add(TIMESTAMP, INT) | BUILTIN | true |
  33. | TIMESTAMP | minutes_sub(TIMESTAMP, BIGINT) | BUILTIN | true |
  34. | TIMESTAMP | minutes_sub(TIMESTAMP, INT) | BUILTIN | true |
  35. +-------------+--------------------------------+-------------+---------------+
  36. Fetched 5 row(s) in 0.03s

The following example shows how a file that is run by the SOURCE command, or through the -q or -f options of impala-shell, can contain additional SOURCE commands. The first file, nested1.sql, runs an impala-shell command and then also runs the commands from nested2.sql. This ability for scripts to call each other is often useful for code that sets up schemas for applications or test environments.

  1. $ cat nested1.sql
  2. show functions in _impala_builtins like '*minute*';
  3. source nested2.sql
  4. $ cat nested2.sql
  5. show functions in _impala_builtins like '*hour*'
  6. $ impala-shell -i localhost -f nested1.sql
  7. Starting Impala Shell without Kerberos authentication
  8. Connected to localhost:21000
  9. ...
  10. Query: show functions in _impala_builtins like '*minute*'
  11. +-------------+--------------------------------+-------------+---------------+
  12. | return type | signature | binary type | is persistent |
  13. +-------------+--------------------------------+-------------+---------------+
  14. | INT | minute(TIMESTAMP) | BUILTIN | true |
  15. | TIMESTAMP | minutes_add(TIMESTAMP, BIGINT) | BUILTIN | true |
  16. | TIMESTAMP | minutes_add(TIMESTAMP, INT) | BUILTIN | true |
  17. | TIMESTAMP | minutes_sub(TIMESTAMP, BIGINT) | BUILTIN | true |
  18. | TIMESTAMP | minutes_sub(TIMESTAMP, INT) | BUILTIN | true |
  19. +-------------+--------------------------------+-------------+---------------+
  20. Fetched 5 row(s) in 0.01s
  21. Query: show functions in _impala_builtins like '*hour*'
  22. +-------------+------------------------------+-------------+---------------+
  23. | return type | signature | binary type | is persistent |
  24. +-------------+------------------------------+-------------+---------------+
  25. | INT | hour(TIMESTAMP) | BUILTIN | true |
  26. | TIMESTAMP | hours_add(TIMESTAMP, BIGINT) | BUILTIN | true |
  27. | TIMESTAMP | hours_add(TIMESTAMP, INT) | BUILTIN | true |
  28. | TIMESTAMP | hours_sub(TIMESTAMP, BIGINT) | BUILTIN | true |
  29. | TIMESTAMP | hours_sub(TIMESTAMP, INT) | BUILTIN | true |
  30. +-------------+------------------------------+-------------+---------------+
  31. Fetched 5 row(s) in 0.01s

Parent topic: Using the Impala Shell (impala-shell Command)

Rerunning impala-shell Commands

In Impala 2.10 and higher, you can use the rerun command, or its abbreviation @, to re-execute commands from the history list. The argument can be a positive integer (reflecting the number shown in history output) or a negative integer (reflecting the N’th last command in the history output. For example:

  1. [localhost:21000] > select * from p1 order by t limit 5;
  2. ...
  3. [localhost:21000] > show table stats p1;
  4. +-----------+--------+--------+------------------------------------------------------------+
  5. | #Rows | #Files | Size | Location |
  6. +-----------+--------+--------+------------------------------------------------------------+
  7. | 134217728 | 50 | 4.66MB | hdfs://test.example.com:8020/user/hive/warehouse/jdr.db/p1 |
  8. +-----------+--------+--------+------------------------------------------------------------+
  9. [localhost:21000] > compute stats p1;
  10. +-----------------------------------------+
  11. | summary |
  12. +-----------------------------------------+
  13. | Updated 1 partition(s) and 3 column(s). |
  14. +-----------------------------------------+
  15. [localhost:21000] > history;
  16. [1]: use jdr;
  17. [2]: history;
  18. [3]: show tables;
  19. [4]: select * from p1 order by t limit 5;
  20. [5]: show table stats p1;
  21. [6]: compute stats p1;
  22. [7]: history;
  23. [localhost:21000] > @-2; <- Rerun the 2nd last command in the history list
  24. Rerunning compute stats p1;
  25. +-----------------------------------------+
  26. | summary |
  27. +-----------------------------------------+
  28. | Updated 1 partition(s) and 3 column(s). |
  29. +-----------------------------------------+
  30. [localhost:21000] > history; <- History list is not updated by rerunning commands
  31. or by repeating the last command, in this case 'history'.
  32. [1]: use jdr;
  33. [2]: history;
  34. [3]: show tables;
  35. [4]: select * from p1 order by t limit 5;
  36. [5]: show table stats p1;
  37. [6]: compute stats p1;
  38. [7]: history;
  39. [localhost:21000] > @4; <- Rerun command #4 in the history list using short form '@'.
  40. Rerunning select * from p1 order by t limit 5;
  41. ...
  42. [localhost:21000] > rerun 4; <- Rerun command #4 using long form 'rerun'.
  43. Rerunning select * from p1 order by t limit 5;
  44. ...