Transforming a Leap Frame

Transformers are a useful abstraction for computing data in a dataframe, whether it is a leap frame in MLeap or a Spark data frame. Let’ssee how to transform a data frame using a simple StringIndexertransformer.

  1. // Create a StringIndexer that knows how to index the two strings
  2. // In our leap frame
  3. val stringIndexer = StringIndexer(
  4. shape = NodeShape().withStandardInput("a_string").withStandardOutput("a_string_index"),
  5. model = StringIndexerModel(Seq("Hello, MLeap!", "Another row")))
  6. // Transform our leap frame using the StringIndexer transformer
  7. val indices = (for(lf <- stringIndexer.transform(leapFrame);
  8. lf2 <- lf.select("a_string_index")) yield {
  9. lf2.dataset.map(_.getDouble(0))
  10. }).get.toSeq
  11. // Make sure our indexer did its job
  12. assert(indices == Seq(0.0, 1.0))

Transforming a Leap Frame with a Pipeline

The above example isn’t very interesting. The real power of data framesand transformers comes when you build entire pipelines out of them,going all the way from raw features to some sort of predictivealgorithm. Let’s create a dummy pipeline that takes our indices from thestring indexer, runs them through a one hot encoder, then executes alinear regression.

  1. // Create our one hot encoder
  2. val oneHotEncoder = OneHotEncoder(shape = NodeShape.vector(1, 2,
  3. inputCol = "a_string_index",
  4. outputCol = "a_string_oh"),
  5. model = OneHotEncoderModel(2, dropLast = false))
  6. // Assemble some features together for use
  7. // By our linear regression
  8. val featureAssembler = VectorAssembler(
  9. shape = NodeShape().withInput("input0", "a_string_oh").
  10. withInput("input1", "a_double").withStandardOutput("features"),
  11. model = VectorAssemblerModel(Seq(TensorShape(2), ScalarShape())))
  12. // Create our linear regression
  13. // It has two coefficients, as the one hot encoder
  14. // Outputs vectors of size 2
  15. val linearRegression = LinearRegression(shape = NodeShape.regression(3),
  16. model = LinearRegressionModel(Vectors.dense(2.0, 3.0, 6.0), 23.5))
  17. // Create a pipeline from all of our transformers
  18. val pipeline = Pipeline(
  19. shape = NodeShape(),
  20. model = PipelineModel(Seq(stringIndexer, oneHotEncoder, featureAssembler, linearRegression)))
  21. // Transform our leap frame using the pipeline
  22. val predictions = (for(lf <- pipeline.transform(leapFrame);
  23. lf2 <- lf.select("prediction")) yield {
  24. lf2.dataset.map(_.getDouble(0))
  25. }).get.toSeq
  26. // Print our predictions
  27. // > 365.70000000000005
  28. // > 166.89999999999998
  29. println(predictions.mkString("\n"))

This is the task that MLeap was meant for, executing machine learningpipelines that were trained in Spark, PySpark, Scikit-learn orTensorflow.