Using filters

Filters are a part of transforms and gives a DSL for you to keep parts of your dataset. Filters can be one-liners for single conditions or include complex boolean logic.

  1. TransformProcess tp = new TransformProcess.Builder(inputDataSchema)
  2. .filter(new ConditionFilter(new CategoricalColumnCondition("MerchantCountryCode", ConditionOp.NotInSet, new HashSet<>(Arrays.asList("USA","CAN")))))
  3. .build();

You can also write your own filters by implementing the Filter interface, though it is much more often that you may want to create a custom condition instead.

Available filters


ConditionFilter

[source]

If condition is satisfied (returns true): remove the example or sequenceIf condition is not satisfied (returns false): keep the example or sequence

removeExample
  1. public boolean removeExample(Object writables)
  • param writables Example
  • return true if example should be removed, false to keep
removeSequence
  1. public boolean removeSequence(Object sequence)
  • param sequence sequence example
  • return true if example should be removed, false to keep
transform
  1. public Schema transform(Schema inputSchema)

Get the output schema for this transformation, given an input schema

  • param inputSchema
outputColumnName
  1. public String outputColumnName()

The output column nameafter the operation has been applied

  • return the output column name
columnName
  1. public String columnName()

The output column namesThis will often be the same as the input

  • return the output column names

Filter

[source]

Filter: a method of removing examples(or sequences) according to some condition


FilterInvalidValues

[source]

FilterInvalidValues: a filter operation that removes any examples (or sequences)if the examples/sequences containsinvalid values in any of a specified set of columns.Invalid values are determined with respect to the schema

transform
  1. public Schema transform(Schema inputSchema)
  • param columnsToFilterIfInvalid Columns to check for invalid values
removeExample
  1. public boolean removeExample(Object writables)
  • param writables Example
  • return true if example should be removed, false to keep
removeSequence
  1. public boolean removeSequence(Object sequence)
  • param sequence sequence example
  • return true if example should be removed, false to keep
outputColumnName
  1. public String outputColumnName()

The output column nameafter the operation has been applied

  • return the output column name
columnName
  1. public String columnName()

The output column namesThis will often be the same as the input

  • return the output column names

InvalidNumColumns

[source]

Remove invalid records of a certain size.

removeExample
  1. public boolean removeExample(Object writables)
  • param writables Example
  • return true if example should be removed, false to keep
removeSequence
  1. public boolean removeSequence(Object sequence)
  • param sequence sequence example
  • return true if example should be removed, false to keep
removeExample
  1. public boolean removeExample(List<Writable> writables)
  • param writables Example
  • return true if example should be removed, false to keep
removeSequence
  1. public boolean removeSequence(List<List<Writable>> sequence)
  • param sequence sequence example
  • return true if example should be removed, false to keep
transform
  1. public Schema transform(Schema inputSchema)

Get the output schema for this transformation, given an input schema

  • param inputSchema
outputColumnName
  1. public String outputColumnName()

The output column nameafter the operation has been applied

  • return the output column name
columnName
  1. public String columnName()

The output column namesThis will often be the same as the input

  • return the output column names