Exercise 4: Operationalize ML Scoring with Azure ML and Data Factory

Duration: 20 mins

Synopsis: In this exercise, attendees will extend the Data Factory service to operationalize the scoring of data using the previously created ML model.

This exercise has 5 tasks:

Get out of Jail Free

If, for whatever reason, you cannot complete this lab whether due to time contraints or if you are not able to troubleshoot an issue, we have created a “get out of jail free” exercise. If you wish to use this exercise at any time, please proceed to Appendix B. Please note that using this exercise will let you surpass all of the Azure Data Factory exercises. After completing Appendix B, you can continue to Exercise 5.

Task 1: Create Azure ML Linked Service

  1. Go back to the Azure Data Factory service blade.
  2. Click on the Author and deploy from Actions section.

    Screenshot

  3. Click on …More.

    Screenshot

  4. Click on New Compute.

    Screenshot

  5. Select Azure ML from the list.
  6. In the new window, be sure change the JSON file to match the following:

    • Back in Exercise 1, Task 9 you noted some values related to your ML web service. The value for mlEndPoint below is your web service’s Batch Requests URL (remembering to remove the query string), and apiKey is the Primary Key of your web service.

      1. {
      2. "name": "AzureMLLinkedService",
      3. "properties": {
      4. "type": "AzureML",
      5. "description": "",
      6. "typeProperties": {
      7. "mlEndpoint": "<Specify the batch scoring URL>",
      8. "apiKey": "<Specify the published workspace model's API key>"
      9. }
      10. }
      11. }
  7. Click Deploy.

Task 2: Create Azure ML Input Dataset

  1. Click on …More.

    Screenshot

  2. To create new dataset that will be copied into Azure Blob storage, click on the New dataset from the top.

    Screenshot

  3. Select Azure Blob storage from the list.
  4. In the new window, be sure change the JSON file to match the following or copy the below JSON text and paste into the browser window.

    1. {
    2. "name": "AzureBlobDataInPut",
    3. "properties": {
    4. "type": "AzureBlob",
    5. "external": true,
    6. "linkedServiceName": "OutputLinkedService-AzureBlobStorage",
    7. "typeProperties": {
    8. "fileName": "FlightsAndWeather.csv",
    9. "folderPath": "sparkcontainer",
    10. "format": {
    11. "type": "TextFormat"
    12. }
    13. },
    14. "availability": {
    15. "frequency": "Hour",
    16. "interval": 12
    17. }
    18. }
    19. }
  5. Click Deploy.

Task 3: Create Azure ML Scored Dataset

  1. Click on …More.

    Screenshot

  2. Click on the New dataset.

    Screenshot

  3. Select Azure Blob storage from the list.
  4. In the new window, be sure change the JSON file to match the following or copy the below JSON text and paste into the browser window.

    1. {
    2. "name": "AzureBlobScoredDataOutPut",
    3. "properties": {
    4. "type": "AzureBlob",
    5. "linkedServiceName": "OutputLinkedService-AzureBlobStorage",
    6. "typeProperties": {
    7. "fileName": "Scored_FlightsAndWeather.csv",
    8. "folderPath": "sparkcontainer",
    9. "format": {
    10. "type": "TextFormat"
    11. }
    12. },
    13. "availability": {
    14. "frequency": "Hour",
    15. "interval": 12
    16. }
    17. }
    18. }
  5. Click Deploy.

Task 4: Create Azure ML Predictive Pipeline

  1. Click on …More.

    Screenshot

  2. Click on the New pipeline.

    Screenshot

  3. In the new window, be sure change the JSON file to match the following or copy the below JSON text and paste into the browser window.

    1. {
    2. "name": "PredictivePipeline",
    3. "properties": {
    4. "description": "Use AzureML model",
    5. "activities": [
    6. {
    7. "type": "AzureMLBatchExecution",
    8. "typeProperties": {
    9. "webServiceInput": "AzureBlobDataInPut",
    10. "webServiceOutputs": {
    11. "output1": "AzureBlobScoredDataOutPut"
    12. },
    13. "globalParameters": {}
    14. },
    15. "inputs": [
    16. {
    17. "name": "AzureBlobDataInPut"
    18. }
    19. ],
    20. "outputs": [
    21. {
    22. "name": "AzureBlobScoredDataOutPut"
    23. }
    24. ],
    25. "policy": {
    26. "timeout": "02:00:00",
    27. "concurrency": 1,
    28. "executionPriorityOrder": "NewestFirst",
    29. "retry": 1
    30. },
    31. "name": "MLActivity",
    32. "description": "prediction analysis on batch input",
    33. "linkedServiceName": "AzureMLLinkedService"
    34. }
    35. ],
    36. "start": "2016-09-14T00:00:00Z",
    37. "end": "2016-09-15T00:00:00Z"
    38. }
    39. }
  4. Make sure to change the start to today’s date and end to today + 1 date.
  5. Click Deploy.

Task 5: Monitor Your Pipeline Activities

  1. Close the current blade by clicking on the X from the top right corner of the blade.
  2. Click on the Monitor & Manage from the Actions section.
  3. Maximize the new window and you can see the diagram view of the data flow.

    Screenshot

  4. You should start to see Ready status activity listed on the bottom of the new window.
  5. Close the Monitor & Manage browser tab.

Next Exercise: Exercise 5 - Summarize Data Using HDInsight Spark