Introduction to Pipeline
- Preparation
- Run Pipeline

Introduction to Pipeline

In Pipcook, we use Pipeline to represent the training process of a model, so in general, what kind of pipeline is needed to train a model? The developer can use a JSON to describe pipeline of modeling from sample collection, model definition, training to model evaluation:

{
  "plugins": {
    "dataCollect": {
      "package": "@pipcook/plugins-csv-data-collect",
      "params": {
        "url": "http://foobar"
      }
    },
    "dataAccess": {
      "package": "@pipcook/plugins-csv-data-access",
      "params": {
        "labelColumn": "output"
      }
    },
    "modelDefine": {
      "package": "@pipcook/plugins-bayesian-model-define"
    },
    "modelTrain": {
      "package": "@pipcook/plugins-bayesian-model-train"
    },
    "modelEvaluate": {
      "package": "@pipcook/plugins-bayesian-model-evaluate"
    }
  }
}

As shown above, a pipeline is composed of different plugins, and we add the field params to each plugin to pass given parameters. Then the pipeline interpreter will perform the corresponding operation(s) by its plugin type and parameters.

See Introduction to Plugin for more details about plugin.

Next, when we have defined such a pipeline, we can run it through Pipcook.

Preparation

Follow the Pipcook Tools Initlization to get the Pipcook ready.

Run Pipeline

Save the above JSON of your pipeline in anywhere, and run:

$ pipcook run /path/to/your/pipeline-config.json

The trained model will generate an output directory under cwd(3):

📂output
   ┣ 📂logs
   ┣ 📂model
   ┣ 📜package.json
   ┣ 📜metadata.json
   ┗ 📜index.js

To get started with your trained model, follow the below steps:

$ npm install

It will install dependencies which contain the plugins and Python packages. Pipcook provides a way to use tuna mirror when it downloads Python and packages:

$ BOA_TUNA=1 npm install

Once the output is initialized, just import it as the following:

import * as predict from './output';
predict('your input data');