Importing trained Spark MLlib models into Watson Machine Learning

If you have a Spark MLlib model that you trained outside of IBM Watson Machine Learning, this topic describes how to import that model into your Watson Machine Learning service.

 

Restrictions

  • Only classification and regression models are supported
  • Custom transformers, user-defined functions, and classes are not supported
  • See also: Supported frameworks

 

Example

The following notebook demonstrates importing a Spark MLlib model:

 

Interface options

There are two options for importing trained Spark MLlib models:

  • Option 1: If you have saved your model in PMML format, see: Importing models saved in PMML format
  • Option 2: If you have saved your model using the save method of the model object, you can use the Watson Machine Learning Python client to import the model as described below

 

Step 0: Build and train a model, then save the model and training data

The following Python code snippet demonstrates:

  • Training a PipelineModel, pipeline_model_org
  • Saving the model in a directory called “tent-prediction-model”
  • Saving the training data in a file called “training-data.parquet”
    from pyspark.ml.classification import LogisticRegression
    from pyspark.ml import Pipeline
    lr = LogisticRegression( featuresCol="<features-column-name>", labelCol="<label-column-name>" )
    pipeline_org = Pipeline( stages=[ lr ] )
    pipeline_model_org = pipeline_org.fit( train_org )
    pipeline_model_org.save( "tent-prediction-model" )
    train_org.write.save( "training-data.parquet" )
    

Where:

  • train_org is a DataFrame object containing your labeled training data
  • <features-column-name> is the name of a feature vector column (for example, created using VectorAssembler external link)
  • <label-column-name> is the name of the label column (the column containing the known result) in your training data

For the full code example, see: the sample notebook external link

 

Step 1: Store the model in your Watson Machine Learning repository

The following example demonstrates loading the saved model and training data into memory, and then storing the PipelineModel in your Watson Machine Learning repository using the Watson Machine Learning Python client store_model method external link.

Example Python code

# Load the model and training data into memory
from pyspark.ml import PipelineModel
pipeline_model = PipelineModel.load( "tent-prediction-model" )
pipeline = Pipeline( stages = pipeline_model.stages )
train = spark.read.load( "training-data.parquet" )

# Store the model
from watson_machine_learning_client import WatsonMachineLearningAPIClient
client = WatsonMachineLearningAPIClient( <your-credentials> )
model_details = client.repository.store_model( pipeline_model, 'My Spark MLlib model', training_data=train, pipeline=pipeline )

Where:

  • <your-credentials> contains credentials for your Watson Machine Learning service (see: Looking up credentials)

All four parameters demonstrated are mandatory:

  1. The PipelineModel object
  2. A string containing a name you make up for the model (or a ModelMetaNames external link object specifying the name)
  3. The labeled training data you used to train the model
  4. The Pipeline of the PipelineModel object

 

Step 2: Deploy the stored model in your Watson Machine Learning service

The following example demonstrates deploying the stored model as a web service, which is the default deployment type:

model_id = model_details["metadata"]["guid"]
model_deployment_details = client.deployments.create( artifact_uid=model_id, name="My Spark MLlib model deployment" )

See: Deployments.create external link