Importing trained Spark MLlib models into Watson Machine Learning
If you have a Spark MLlib model that you trained outside of IBM Watson Machine Learning, this topic describes how to import that model into your Watson Machine Learning service.
Restrictions
- Only classification and regression models are supported
- Custom transformers, user-defined functions, and classes are not supported
- See also: Supported frameworks
Example
The following notebook demonstrates importing a Spark MLlib model:
Interface options
There are two options for importing trained Spark MLlib models:
- Option 1: If you have saved your model in PMML format, see: Importing models saved in PMML format
- Option 2: If you have saved your model using the
save
method of the model object, you can use the Watson Machine Learning Python client to import the model as described below
Step 0: Build and train a model, then save the model and training data
The following Python code snippet demonstrates:
- Training a PipelineModel,
pipeline_model_org
- Saving the model in a directory called “tent-prediction-model”
- Saving the training data in a file called “training-data.parquet”
from pyspark.ml.classification import LogisticRegression from pyspark.ml import Pipeline lr = LogisticRegression( featuresCol="<features-column-name>", labelCol="<label-column-name>" ) pipeline_org = Pipeline( stages=[ lr ] ) pipeline_model_org = pipeline_org.fit( train_org ) pipeline_model_org.save( "tent-prediction-model" ) train_org.write.save( "training-data.parquet" )
Where:
train_org
is a DataFrame object containing your labeled training data- <features-column-name> is the name of a feature vector column (for example, created using VectorAssembler
)
- <label-column-name> is the name of the label column (the column containing the known result) in your training data
For the full code example, see: the sample notebook
Step 1: Store the model in your Watson Machine Learning repository
The following example demonstrates loading the saved model and training data into memory, and then storing the PipelineModel in your Watson Machine Learning repository using the Watson Machine Learning Python client store_model
method .
Example Python code
# Load the model and training data into memory
from pyspark.ml import PipelineModel
pipeline_model = PipelineModel.load( "tent-prediction-model" )
pipeline = Pipeline( stages = pipeline_model.stages )
train = spark.read.load( "training-data.parquet" )
# Store the model
from watson_machine_learning_client import WatsonMachineLearningAPIClient
client = WatsonMachineLearningAPIClient( <your-credentials> )
model_details = client.repository.store_model( pipeline_model, 'My Spark MLlib model', training_data=train, pipeline=pipeline )
Where:
- <your-credentials> contains credentials for your Watson Machine Learning service (see: Looking up credentials)
All four parameters demonstrated are mandatory:
- The PipelineModel object
- A string containing a name you make up for the model (or a
ModelMetaNames
object specifying the name)
- The labeled training data you used to train the model
- The Pipeline of the PipelineModel object
Step 2: Deploy the stored model in your Watson Machine Learning service
The following example demonstrates deploying the stored model as a web service, which is the default deployment type:
model_id = model_details["metadata"]["guid"]
model_deployment_details = client.deployments.create( artifact_uid=model_id, name="My Spark MLlib model deployment" )
See: Deployments.create