Parquet File Input

GitHub 来源:Hop 浏览 137 扫码分享 2022-06-11 08:59:32

Parquet File Input
- Description
- Options

Parquet File Input

Description

This transform can read (primitive) values from an Apache Parquet file. For more information on this see: Apache Parquet.

Options

Notes:

To support reading from any location through Apache VFS each file is loaded into memory (one at a time). Make sure to allocate enough memory to allow this.
Long values can be de-serialized to Dates if they are EPOC: milliseconds since 1970-01-01 00:00:00.000
Parquet Binary fields are considered to be Hop Strings but you can read them as Hop Binary.
All input values are passed to the output
INT96 is converted to the Hop Binary data type.

Option	Description
Transform name	Name of the transform this name has to be unique in a single pipeline.
Filename field	Specify the input field. Use a transform like Get File Names to obtain file names. Any supported file location is fine.
Fields	In this table you can specify all the fields you want to obtain from the parquet files as well as their desired Hop output type.
Get fields button	With this button you can select a parquet file from which we’ll read the schema to populate the Fields grid.

当前内容版权归 Hop 或其关联方所有，如需对内容或内容相关联开源项目进行关注与资助，请访问 Hop .

本文档使用 BookStack 构建

展开/收起文章目录