Description

Data source of a CSV (Comma Separated Values) file.

The file can reside in places including:

  • local file system
  • hdfs
  • http

Parameters

Name Description Type Required? Default Value
filePath File path String
schemaStr Formatted schema String
fieldDelimiter Field delimiter String “,”
quoteChar quote char Character “\””
skipBlankLine skipBlankLine Boolean true
rowDelimiter Row delimiter String “\n”
ignoreFirstLine Whether to ignore first line of csv file. Boolean false

Script Example

Csv Batch Source

  1. filePath = 'http://alink-dataset.cn-hangzhou.oss.aliyun-inc.com/csv/iris.csv'
  2. schema = 'sepal_length double, sepal_width double, petal_length double, petal_width double, category string'
  3. csvSource = CsvSourceBatchOp()\
  4. .setFilePath(filePath)\
  5. .setSchemaStr(schema)\
  6. .setFieldDelimiter(",")
  7. BatchOperator.collectToDataframe(csvSource)

Results

  1. sepal_length sepal_width petal_length petal_width category
  2. 0 6.3 3.3 6.0 2.5 Iris-virginica
  3. 1 5.6 2.8 4.9 2.0 Iris-virginica
  4. 2 5.0 3.3 1.4 0.2 Iris-setosa
  5. 3 5.8 2.7 5.1 1.9 Iris-virginica
  6. 4 7.0 3.2 4.7 1.4 Iris-setosa

Csv Stream Source

  1. filePath = 'http://alink-dataset.cn-hangzhou.oss.aliyun-inc.com/csv/iris.csv'
  2. schema = 'sepal_length double, sepal_width double, petal_length double, petal_width double, category string'
  3. csvSource = CsvSourceStreamOp()\
  4. .setFilePath(filePath)\
  5. .setSchemaStr(schema)\
  6. .setFieldDelimiter(",")
  7. csvSource.print()
  8. StreamOperator.execute()

Results

  1. sepal_length sepal_width petal_length petal_width category
  2. 1 5.5 2.4 3.8 1.1 Iris-versicolor
  3. 2 6.1 2.6 5.6 1.4 Iris-virginica
  4. 3 6.0 2.2 4.0 1.0 Iris-versicolor
  5. 4 5.5 2.4 3.7 1.0 Iris-versicolor
  6. 5 4.6 3.1 1.5 0.2 Iris-setosa