ibm-watson-machine-learning
¶This notebook contains steps and code to train a Scikit-Learn model that uses a custom defined transformer and use it with Watson Machine Learning service. Once the model is trained, this notebook contains steps to persist the model and custom defined transformer to Watson Machine Learning Repository, deploy and score it using Watson Machine Learning python client.
In this notebook, we use GNFUV dataset that contains mobile sensor readings data about humidity and temperature from Unmanned Surface Vehicles in a test-bed in Athens, to train a Scikit-Learn model for predicting the temperature.
Some familiarity with Python is helpful. This notebook uses Python & scikit-learn.
The learning goals of this notebook are:
Before you use the sample code in this notebook, you must perform the following setup tasks:
Authenticate the Watson Machine Learning service on IBM Cloud. You need to provide platform api_key
and instance location
.
You can use IBM Cloud CLI to retrieve platform API Key and instance location.
API Key can be generated in the following way:
ibmcloud login
ibmcloud iam api-key-create API_KEY_NAME
In result, get the value of api_key
from the output.
Location of your WML instance can be retrieved in the following way:
ibmcloud login --apikey API_KEY -a https://cloud.ibm.com
ibmcloud resource service-instance WML_INSTANCE_NAME
In result, get the value of location
from the output.
Tip: Your Cloud API key
can be generated by going to the Users section of the Cloud console. From that page, click your name, scroll down to the API Keys section, and click Create an IBM Cloud API key. Give your key a name and click Create, then copy the created key and paste it below. You can also get a service specific url by going to the Endpoint URLs section of the Watson Machine Learning docs. You can check your instance location in your Watson Machine Learning (WML) Service instance details.
You can also get service specific apikey by going to the Service IDs section of the Cloud Console. From that page, click Create, then copy the created key and paste it below.
Action: Enter your api_key
and location
in the following cell.
api_key = 'PASTE YOUR PLATFORM API KEY HERE'
location = 'PASTE YOUR INSTANCE LOCATION HERE'
wml_credentials = {
"apikey": api_key,
"url": 'https://' + location + '.ml.cloud.ibm.com'
}
!pip install -U ibm-watson-machine-learning
from ibm_watson_machine_learning import APIClient
client = APIClient(wml_credentials)
First of all, you need to create a space that will be used for your work. If you do not have space already created, you can use Deployment Spaces Dashboard to create one.
space_id
and paste it belowTip: You can also use SDK to prepare the space for your work. More information can be found here.
Action: Assign space ID below
space_id = 'PASTE YOUR SPACE ID HERE'
You can use list
method to print all existing spaces.
client.spaces.list(limit=10)
To be able to interact with all resources available in Watson Machine Learning, you need to set space which you will be using.
client.set.default_space(space_id)
Library - linalgnorm-0.1
is a python distributable package that contains the implementation of a user defined Scikit-Learn transformer - LNormalizer
.
Any 3rd party libraries that are required for the custom transformer must be defined as the dependency for the corresponding library that contains implementation of the transformer.
In this section, we will create the library and install it in the current notebook environment.
!mkdir -p linalgnorm-0.1/linalg_norm
Define a custom scikit transformer.
%%writefile linalgnorm-0.1/linalg_norm/sklearn_transformers.py
from sklearn.base import BaseEstimator, TransformerMixin
import numpy as np
class LNormalizer(BaseEstimator, TransformerMixin):
def __init__(self, norm_ord=2):
self.norm_ord = norm_ord
self.row_norm_vals = None
def fit(self, X, y=None):
self.row_norm_vals = np.linalg.norm(X, ord=self.norm_ord, axis=0)
def transform(self, X, y=None):
return X / self.row_norm_vals
def fit_transform(self, X, y=None):
self.fit(X, y)
return self.transform(X, y)
def get_norm_vals(self):
return self.row_norm_vals
Wrap created code into Python source distribution package.
%%writefile linalgnorm-0.1/linalg_norm/__init__.py
__version__ = "0.1"
%%writefile linalgnorm-0.1/README.md
A simple library containing a simple custom scikit estimator.
%%writefile linalgnorm-0.1/setup.py
from setuptools import setup
VERSION='0.1'
setup(name='linalgnorm',
version=VERSION,
url='https://github.ibm.com/NGP-TWC/repository/',
author='IBM',
author_email='[email protected]',
license='IBM',
packages=[
'linalg_norm'
],
zip_safe=False
)
%%bash
cd linalgnorm-0.1
python setup.py sdist --formats=zip
cd ..
mv linalgnorm-0.1/dist/linalgnorm-0.1.zip .
rm -rf linalgnorm-0.1
Install the downloaded library using pip
command
!pip install linalgnorm-0.1.zip
Download the data from UCI repository - https://archive.ics.uci.edu/ml/machine-learning-databases/00452/GNFUV%20USV%20Dataset.zip
!rm -rf dataset
!mkdir dataset
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00452/GNFUV%20USV%20Dataset.zip --output-document=dataset/gnfuv_dataset.zip
!unzip dataset/gnfuv_dataset.zip -d dataset
Create pandas datafame based on the downloaded dataset
import json
import pandas as pd
import numpy as np
import os
from datetime import datetime
from json import JSONDecodeError
home_dir = './dataset'
pi_dirs = os.listdir(home_dir)
data_list = []
base_time = None
columns = None
for pi_dir in pi_dirs:
if 'pi' not in pi_dir:
continue
curr_dir = os.path.join(home_dir, pi_dir)
data_file = os.path.join(curr_dir, os.listdir(curr_dir)[0])
with open(data_file, 'r') as f:
line = f.readline().strip().replace("'", '"')
while line != '':
try:
input_json = json.loads(line)
sensor_datetime = datetime.fromtimestamp(input_json['time'])
if base_time is None:
base_time = datetime(sensor_datetime.year, sensor_datetime.month, sensor_datetime.day, 0, 0, 0, 0)
input_json['time'] = (sensor_datetime - base_time).seconds
data_list.append(list(input_json.values()))
if columns is None:
columns = list(input_json.keys())
except JSONDecodeError as je:
pass
line = f.readline().strip().replace("'", '"')
data_df = pd.DataFrame(data_list, columns=columns)
data_df.head()
Create training and test datasets from the downloaded GNFUV-USV dataset.
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
Y = data_df['temperature']
X = data_df.drop('temperature', axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.25, random_state=143)
In this section, you will use the custom transformer as a stage in the Scikit-Learn Pipeline
and train a model.
Here, import the custom transformer that has been defined in linalgnorm-0.1.zip
and create an instance of it that will inturn be used as stage in sklearn.Pipeline
from linalg_norm.sklearn_transformers import LNormalizer
lnorm_transf = LNormalizer()
Import other objects required to train a model
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LinearRegression
Now, you can create a Pipeline
with user defined transformer as one of the stages and train the model
skl_pipeline = Pipeline(steps=[('normalizer', lnorm_transf), ('regression_estimator', LinearRegression())])
skl_pipeline.fit(X_train.loc[:, ['time', 'humidity']].values, y_train)
y_pred = skl_pipeline.predict(X_test.loc[:, ['time', 'humidity']].values)
rmse = np.mean((np.round(y_pred) - y_test.values)**2)**0.5
print('RMSE: {}'.format(rmse))
In this section, using ibm-watson_machine_learning
SDK, you will ...
linalgnorm-0.1.zip
in WML Repository by creating a package extension resourceDefine the meta data required to create package extension resource.
The value for file_path
in client.package_extensions.LibraryMetaNames.store()
contains the library file name that must be uploaded to the WML.
Note: You can also use conda environment configuration file yaml
as package extension input. In such case set the TYPE
to conda_yml
and file_path
to yaml file.
client.package_extensions.ConfigurationMetaNames.TYPE = "conda_yml"
meta_prop_pkg_extn = {
client.package_extensions.ConfigurationMetaNames.NAME: "K_Linag_norm_skl",
client.package_extensions.ConfigurationMetaNames.DESCRIPTION: "Pkg extension for custom lib",
client.package_extensions.ConfigurationMetaNames.TYPE: "pip_zip"
}
pkg_extn_details = client.package_extensions.store(meta_props=meta_prop_pkg_extn, file_path="linalgnorm-0.1.zip")
pkg_extn_uid = client.package_extensions.get_uid(pkg_extn_details)
pkg_extn_url = client.package_extensions.get_href(pkg_extn_details)
Display the details of the package extension resource that was created in the above cell.
details = client.package_extensions.get_details(pkg_extn_uid)
Define the meta data required to create software spec resource and bind the package. This software spec resource will be used to configure the online deployment runtime environment for a model.
client.software_specifications.ConfigurationMetaNames.show()
client.software_specifications.list()
base_sw_spec_uid = client.software_specifications.get_uid_by_name("runtime-23.1-py3.10")
meta_prop_sw_spec = {
client.software_specifications.ConfigurationMetaNames.NAME: "linalgnorm-0.1",
client.software_specifications.ConfigurationMetaNames.DESCRIPTION: "Software specification for linalgnorm-0.1",
client.software_specifications.ConfigurationMetaNames.BASE_SOFTWARE_SPECIFICATION: {"guid": base_sw_spec_uid}
}
sw_spec_details = client.software_specifications.store(meta_props=meta_prop_sw_spec)
sw_spec_uid = client.software_specifications.get_uid(sw_spec_details)
client.software_specifications.add_package_extension(sw_spec_uid, pkg_extn_uid)
Define the metadata to save the trained model to WML Repository along with the information about the software spec resource required for the model.
The client.repository.ModelMetaNames.SOFTWARE_SPEC_UID
metadata property is used to specify the GUID of the software spec resource that needs to be associated with the model.
model_props = {
client.repository.ModelMetaNames.NAME: "Temp prediction model with custom lib",
client.repository.ModelMetaNames.TYPE: 'scikit-learn_1.1',
client.repository.ModelMetaNames.SOFTWARE_SPEC_UID: sw_spec_uid
}
Save the model to the WML Repository and display its saved metadata.
published_model = client.repository.store_model(model=skl_pipeline, meta_props=model_props)
published_model_uid = client.repository.get_model_id(published_model)
model_details = client.repository.get_details(published_model_uid)
print(json.dumps(model_details, indent=2))
In this section, you will deploy the saved model that uses the custom transformer and perform predictions. You will use WML client to perform these tasks.
metadata = {
client.deployments.ConfigurationMetaNames.NAME: "Deployment of custom lib model",
client.deployments.ConfigurationMetaNames.ONLINE: {}
}
created_deployment = client.deployments.create(published_model_uid, meta_props=metadata)
Note: Here we use deployment uid
saved in published_model object. In next section, we show how to retrive deployment url from Watson Machine Learning instance.
deployment_uid = client.deployments.get_uid(created_deployment)
Now you can print an online scoring endpoint.
scoring_endpoint = client.deployments.get_scoring_href(created_deployment)
print(scoring_endpoint)
scoring_payload = {
"input_data": [{
'fields': ["time", "humidity"],
'values': [[79863, 47]]}]
}
Execute the method to perform online predictions and display the prediction results
predictions = client.deployments.score(deployment_uid, scoring_payload)
print(json.dumps(predictions, indent=2))
If you want to clean up all created assets:
please follow up this sample notebook.
You successfully completed this notebook!
You learned how to use a scikit-learn model with custom transformer in Watson Machine Learning service to deploy and score.
Check out our Online Documentation for more samples, tutorials, documentation, how-tos, and blog posts.
Krishnamurthy Arthanarisamy, is a senior technical lead in IBM Watson Machine Learning team. Krishna works on developing cloud services that caters to different stages of machine learning and deep learning modeling life cycle.
Lukasz Cmielowski, PhD, is a Software Architect and Data Scientist at IBM.
Copyright © 2020, 2021, 2022 IBM. This notebook and its source code are released under the terms of the MIT License.