Getting Started with Spark

MLeap Spark integration provides serialization of Spark-trained MLpipelines to MLeap Bundles. MLeap also providesseveral extensions to Spark, including enhanced one hot encoding, one vsrest models and unary/binary math transformations.

Adding MLeap Spark to Your Project

MLeap Spark and its snapshots are hosted on Maven Central and so should beeasily accessible via a maven build file or SBT. MLeap is currentlycross-compiled for Scala versions 2.10 and 2.11. We try to maintainScala compatibility with Spark.

Using SBT

  1. libraryDependencies += "ml.combust.mleap" %% "mleap-spark" % "0.14.0"

To use MLeap extensions to Spark:

  1. libraryDependencies += "ml.combust.mleap" %% "mleap-spark-extension" % "0.14.0"

Using Maven

  1. <dependency>
  2. <groupId>ml.combust.mleap</groupId>
  3. <artifactId>mleap-spark_2.11</artifactId>
  4. <version>0.14.0</version>
  5. </dependency>

To use MLeap extensions to Spark:

<dependency>
  <groupId>ml.combust.mleap</groupId>
  <artifactId>mleap-spark-extension_2.11</artifactId>
  <version>0.14.0</version>
</dependency>
  1. See build instructions to build MLeap from source.
  2. See core concepts for an overview of ML pipelines.
  3. See Spark documentation to learn how to train ML pipelines in Spark.
  4. See Demo notebooks on how to use MLeap with PySpark to serialize your pipelines to Bundle.ML and score with MLeap.