Nuclio functions
Nuclio - High performance serverless for data processing and ML
Out of date
This guide contains outdated information pertaining to Kubeflow 1.0. This guide needs to be updated for Kubeflow 1.1.
Nuclio Overview
Nuclio is a high performance serverless platform which runs over docker or kubernetes and automate the development, operation, and scaling of code (written in 8 supported languages). Nuclio is focused on data analytics and ML workloads, it provides extreme performance and parallelism, supports stateful and data intensive workloads, GPU resource optimization, check-pointing, and 14 native triggers/streaming protocols out of the box including HTTP, Cron, batch, Kafka, Kinesis, Google pub/sub, Azure event-hub, MQTT, etc. additional triggers can be added dynamically (e.g. Twitter feed).
Nuclio can run in the cloud as a managed offering, or on any Kubernetes cluster (cloud, on-prem, or edge)
read more about nuclio …
Using Nuclio In Data Science Pipelines
Nuclio functions can be used in the following ML pipeline tasks:
- Data collectors, ETL, stream processing
- Data preparation and analysis
- Hyper parameter model training
- Real-time model serving
- Feature vector assembly (real-time data preparation)
Containerized functions (+ dependent files and spec) can be created directly from a Jupyter Notebook using %nuclio
magic commands or SDK API calls (see nuclio-jupyter), or they can be built/deployed using Kubeflow Pipeline (see: nuclio pipeline components) e.g. if we want to deploy/update Inference functions right after we update an ML model.
Installing Nuclio over Kubernetes
The Nuclio GitHub repo contains detailed documentation on the installation and usage. You can also follow this interactive tutorial by O’Reilly Katacoda.
The simplest way to install is using Helm
, assuming you deployed Helm on your cluster, type the following commands:
helm repo add nuclio https://nuclio.github.io/nuclio/charts
kubectl create ns nuclio
helm install nuclio nuclio/nuclio --set dashboard.nodePort=31000
kubectl -n nuclio get all
Browse to the dashboard URL, you can create, test, and manage functions using a visual editor.
Note: you can change the NodePort number or skip that option for in-cluster use.
Writing and Deploying a Simple Function
The simplest way to write a nuclio function is from within Jupyter. the entire Notebook, portions of it, or code files can be turned into functions in a single magic/SDK command. see the SDK for detailed documentation.
The full notebook with the example below can be found here
before you begin install the latest nuclio-jupyter
package:
pip install --upgrade nuclio-jupyter
We write and test our code inside a notebook like any other data science code. We add some %nuclio
magic commands to describe additional configurations such as which packages to install, CPU/Mem/GPU resources, how the code will get triggered (http, cron, stream), environment variables, additional files we want to bundle (e.g. ML model, libraries), versioning, etc.
First, you need to import nuclio
package (note that you add an ignore
comment, so that this line won’t be compiled later):
# nuclio: ignore
import nuclio
We add function spec, environment, configuration details using magic commands:
%nuclio cmd pip install textblob
%nuclio env TO_LANG=fr
%nuclio config spec.build.baseImage = "python:3.6-jessie"
and we write our code as usual, just make sure we have a handler function which is invoked to initiate our run. The function accepts a context and an event, e.g.: def handler(context, event)
Function code
the following example show accepting text and doing NLP processing (correction, translation, sentiments):
from textblob import TextBlob
import os
def handler(context, event):
context.logger.info('This is an NLP example! ')
# process and correct the text
blob = TextBlob(str(event.body.decode('utf-8')))
corrected = blob.correct()
# debug print the text before and after correction
context.logger.info_with("Corrected text", corrected=str(corrected), orig=str(blob))
# calculate sentiments
context.logger.info_with("Sentiment",
polarity=str(corrected.sentiment.polarity),
subjectivity=str(corrected.sentiment.subjectivity))
# read target language from environment and return translated text
lang = os.getenv('TO_LANG','fr')
return str(corrected.translate(to=lang))
Now we can test the function using a built-in function context and examine its output
# nuclio: ignore
event = nuclio.Event(body=b'good morning')
handler(context, event)
Finally we deploy our function using the magic commands, SDK, or Kubeflow Pipeline. We can simply write and run the following command a cell:
%nuclio deploy -n nlp -p ai -d <nuclio-dashboard-url>
The dashboard URL is http://cluster-ip:node-port
, which you can see with kubectl get service nuclio-dashboard
.
If you want more control, you can use the SDK:
# nuclio: ignore
# deploy the notebook code with extra configuration (env vars, config, etc.)
spec = nuclio.ConfigSpec(config={'spec.maxReplicas': 2}, env={'EXTRA_VAR': 'something'})
addr = nuclio.deploy_file(name='nlp',project='ai',verbose=True, spec=spec,
tag='v1.1', dashboard_url='<dashboard-url>')
# invoke the generated function
resp = requests.get('http://' + addr)
print(resp.text)
We can also deploy our function directly from Git:
addr = nuclio.deploy_file('git://github.com/nuclio/nuclio#master:/hack/examples/python/helloworld',
name='hw', project='myproj', dashboard_url='<dashboard-url>')
resp = requests.get('http://' + addr)
print(resp.text)
Using Nuclio with Kubeflow Pipelines
We can deploy and test functions as part of a Kubeflow pipeline step. after installing nuclio in your cluster (see instructions above), you can run the following pipeline:
import kfp
from kfp import dsl
# load nuclio kubeflow components
nuclio_deploy = kfp.components.load_component(url='https://raw.githubusercontent.com/kubeflow/pipelines/master/components/nuclio/deploy/component.yaml')
nuclio_invoke = kfp.components.load_component(url='https://raw.githubusercontent.com/kubeflow/pipelines/master/components/nuclio/invoker/component.yaml')
@dsl.pipeline(
name='Nuclio deploy and invoke demo',
description='Nuclio demo, build/deploy a function from notebook + test the function rest endpoint'
)
def nuc_pipeline(
txt='good morning',
):
nb_path = 'https://raw.githubusercontent.com/nuclio/nuclio-jupyter/master/docs/nlp-example.ipynb'
dashboard='http://nuclio-dashboard.nuclio.svc:8070'
# build the function image & CRD from a notebook file (in the above URL)
build = nuclio_deploy(url=nb_path, name='myfunc', project='myproj', tag='0.11', dashboard=dashboard)
# test the function with real data (function URL is taken from the build output)
test = nuclio_invoke(build.output, txt)
The code above assumes nuclio was deployed into the nuclio
namespace on the same cluster. When using a remote cluster or a different namespace you just need to change the dashboard
URL.
Refer to nuclio pipeline components (allowing to deploy, delete, or invoke functions).
Note: Nuclio is not limited to Python - this Jupyter notebook example shows how you can create a simple
Bash
function from a Notebook, e.g. we can createGo
functions if we need performance/concurrency for our inference.
Nuclio function examples
Some useful function example Notebooks:
- Analyze Real-Time Data Using Spark Streaming, SQL, and ML
- Twitter Feed NLP
- Real-time Stock data reader
Last modified 08.10.2020: Fix minor documentation typos (#2256) (f391d15b)