S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)
Speeds up INSERT
operations on tables or partitions residing on the Amazon S3 filesystem. The tradeoff is the possibility of inconsistent data left behind if an error occurs partway through the operation.
By default, Impala write operations to S3 tables and partitions involve a two-stage process. Impala writes intermediate files to S3, then (because S3 does not provide a “rename” operation) those intermediate files are copied to their final location, making the process more expensive as on a filesystem that supports renaming or moving files. This query option makes Impala skip the intermediate files, and instead write the new data directly to the final destination.
Usage notes:
Important:
If a host that is participating in the INSERT
operation fails partway through the query, you might be left with a table or partition that contains some but not all of the expected data files. Therefore, this option is most appropriate for a development or test environment where you have the ability to reconstruct the table if a problem during INSERT
leaves the data in an inconsistent state.
The timing of file deletion during an INSERT OVERWRITE
operation makes it impractical to write new files to S3 and delete the old files in a single operation. Therefore, this query option only affects regular INSERT
statements that add to the existing data in a table, not INSERT OVERWRITE
statements. Use TRUNCATE TABLE
if you need to remove all contents from an S3 table before performing a fast INSERT
with this option enabled.
Performance improvements with this option enabled can be substantial. The speed increase might be more noticeable for non-partitioned tables than for partitioned tables.
Type: Boolean; recognized values are 1 and 0, or true
and false
; any other value interpreted as false
Default: true
(shown as 1 in output of SET
statement)
Added in: Impala 2.6.0
Related information:
Using Impala with Amazon S3 Object Store
Parent topic: Query Options for the SET Statement