Serializing with MLeap

Serializing and deserializing with MLeap is a simple task. You canchoose to serialize to a directory on the file system or to a zip filethat can easily be shipped around.

Create a Simple MLeap Pipeline

  1. import ml.combust.bundle.BundleFile
  2. import ml.combust.bundle.serializer.SerializationFormat
  3. import ml.combust.mleap.core.feature.{OneHotEncoderModel, StringIndexerModel}
  4. import ml.combust.mleap.core.regression.LinearRegressionModel
  5. import ml.combust.mleap.runtime.transformer.Pipeline
  6. import ml.combust.mleap.runtime.transformer.feature.{OneHotEncoder, StringIndexer, VectorAssembler}
  7. import ml.combust.mleap.runtime.transformer.regression.LinearRegression
  8. import org.apache.spark.ml.linalg.Vectors
  9. import ml.combust.mleap.runtime.MleapSupport._
  10. import resource._
  11. // Create a sample pipeline that we will serialize
  12. // And then deserialize using various formats
  13. val stringIndexer = StringIndexer(
  14. shape = NodeShape.scalar(inputCol = "a_string", outputCol = "a_string_index"),
  15. model = StringIndexerModel(Seq("Hello, MLeap!", "Another row")))
  16. val oneHotEncoder = OneHotEncoder(
  17. shape = NodeShape.vector(1, 2, inputCol = "a_string_index", outputCol = "a_string_oh"),
  18. model = OneHotEncoderModel(2, dropLast = false))
  19. val featureAssembler = VectorAssembler(
  20. shape = NodeShape().withInput("input0", "a_string_oh").
  21. withInput("input1", "a_double").withStandardOutput("features"),
  22. model = VectorAssemblerModel(Seq(TensorShape(2), ScalarShape())))
  23. val linearRegression = LinearRegression(
  24. shape = NodeShape.regression(3),
  25. model = LinearRegressionModel(Vectors.dense(2.0, 3.0, 6.0), 23.5))
  26. val pipeline = Pipeline(
  27. shape = NodeShape(),
  28. model = PipelineModel(Seq(stringIndexer, oneHotEncoder, featureAssembler, linearRegression)))

Serialize to Zip File

In order to serialize to a zip file, make sure the URI begins withjar:file and ends with a .zip.

For examplejar:file:/tmp/mleap-bundle.zip.

JSON Format

  1. for(bundle <- managed(BundleFile("jar:file:/tmp/mleap-examples/simple-json.zip"))) {
  2. pipeline.writeBundle.format(SerializationFormat.Json).save(bundle)
  3. }

Protobuf Format

  1. for(bundle <- managed(BundleFile("jar:file:/tmp/mleap-examples/simple-protobuf.zip"))) {
  2. pipeline.writeBundle.format(SerializationFormat.Protobuf).save(bundle)
  3. }

Serialize to Directory

In order to serialize to a directory, make sure the URI begins withfile.

For example file:/tmp/mleap-bundle-dir

JSON Format

  1. for(bundle <- managed(BundleFile("file:/tmp/mleap-examples/simple-json-dir"))) {
  2. pipeline.writeBundle.format(SerializationFormat.Json).save(bundle)
  3. }

Protobuf Format

  1. for(bundle <- managed(BundleFile("file:/tmp/mleap-examples/simple-protobuf-dir"))) {
  2. pipeline.writeBundle.format(SerializationFormat.Protobuf).save(bundle)
  3. }

Deserializing

Deserializing is just as easy as serializing. You don’t need to know theformat the MLeap Bundle was serialized as beforehand, you just need toknow where the bundle is.

Zip Bundle

  1. // Deserialize a zip bundle
  2. // Use Scala ARM to make sure resources are managed properly
  3. val zipBundle = (for(bundle <- managed(BundleFile("jar:file:/tmp/mleap-examples/simple-json.zip"))) yield {
  4. bundle.loadMleapBundle().get
  5. }).opt.get

Directory Bundle

  1. // Deserialize a directory bundle
  2. // Use Scala ARM to make sure resources are managed properly
  3. val dirBundle = (for(bundle <- managed(BundleFile("file:/tmp/mleap-examples/simple-json-dir"))) yield {
  4. bundle.loadMleapBundle().get
  5. }).opt.get