Text - NGram(batch) - 《Alink v1.0.1 Document》

Description
Parameters
Script Example
- Code
- Results

Description

Transfrom a document into a new document composed of all its ngrams. The document is splitted into an array of words by a word delimiter(default space). Through sliding the word array, we get all ngrams and each ngram is connected with a “_” character. All the ngrams are joined together with space in the new document.

Parameters

Name	Description	Type	Required？	Default Value
n	NGram length	Integer		2
selectedCol	Name of the selected column used for processing	String	✓
outputCol	Name of the output column	String		null
reservedCols	Names of the columns to be retained in the output table	String[]		null

Script Example

Code

data = np.array([
    [0, 'That is an English Book!'],
    [1, 'Do you like math?'],
    [2, 'Have a good day!']
])
df = pd.DataFrame({"id": data[:, 0], "text": data[:, 1]})
inOp1 = dataframeToOperator(df, schemaStr='id long, text string', op_type='batch')
op = NGramBatchOp().setSelectedCol("text")
print(BatchOperator.collectToDataframe(op.linkFrom(inOp1)))
inOp2 = dataframeToOperator(df, schemaStr='id long, text string', op_type='stream')
op = NGramStreamOp().setSelectedCol("text")
op.linkFrom(inOp2).print()
StreamOperator.execute()

Results

    id    text
0    2    Have_a a_good good_day!
1    1    Do_you you_like like_math?
2    0    That_is is_an an_English English_Book!