MLeap Troubleshooting

Must provide a sample dataset for the X transformer

This error occurs because you are trying to serialize a Sparktransformer that normally relies on metadata available in the SparkDataFrame. In order to serialize properly, MLeap needs access to themetadata so we can store all of the necessary values in the MLeapBundle. The solution is to provide a sample DataFrame that has beentransformed by your Spark ML Pipeline.

Fixed Code

  1. // Use your Spark ML Pipeline to transform the Spark DataFrame
  2. val transformedDataset = sparkTransformer.transform(sparkDataset)
  3. // Create a custom SparkBundleContext and provide the transformed DataFrame
  4. implicit val sbc = SparkBundleContext().withDataset(transformedDataset)
  5. // Serialize the pipeline as you would normally
  6. (for(bf <- managed(BundleFile(file))) yield {
  7. sparkTransformer.writeBundle.save(bf).get
  8. }).tried.get