When you complete the Part 2 - WML Federated Learning with XGBoost and Adult Income dataset - Party, you should know how to:
Paste in the ID credentials you got from the end of the Part 1 notebook. If you have not run through Part 1, open the notebook and run through it first.
WML_SERVICES_HOST = 'us-south.ml.cloud.ibm.com' # or "eu-de.ml.cloud.ibm.com", "eu-gb.ml.cloud.ibm.com", "jp-tok.ml.cloud.ibm.com"
PROJECT_ID = 'XXX'
IAM_APIKEY = 'XXX'
RTS_ID = 'XXX'
TRAINING_ID = 'XXX'
As the party, you must provide the dataset that you will use to train the Federated Learning model. In this tutorial, a dataset is provided by default, the Adult Income dataset.
import requests
dataset_resp = requests.get("https://api.dataplatform.cloud.ibm.com/v2/gallery-assets/entries/5fcc01b02d8f0e50af8972dc8963f98e/data",
allow_redirects=True)
f = open('adult.csv', 'wb')
f.write(dataset_resp.content)
f.close()
!ls -lh
In this section, we will install the necessary libraries and other packages to call for Federated Learning with the Python client.
This installs the IBM Watson Machine Learning CLI along with the whole software development package with Federated Learning.
import sys
!{sys.executable} -m pip install --upgrade 'ibm-watsonx-ai[fl-rt23.1-py3.10]'
The following code imports the client for the party, and ensures that it is loaded.
from ibm_watsonx_ai import APIClient
wml_credentials = {
"apikey": IAM_APIKEY,
"url": "https://" + WML_SERVICES_HOST
}
wml_client = APIClient(wml_credentials)
wml_client.set.default_project(PROJECT_ID)
The party should run a data handler to ensure that their datasets are in compatible format and consistent. In this tutorial, an example data handler for the Adult Income dataset is provided.
For more details on data handlers, see Customizing the data handler.
This data handler is written to the local working directory of this notebook
datahandler_resp = requests.get("https://raw.githubusercontent.com/IBMDataScience/sample-notebooks/master/Files/adult_sklearn_data_handler.py",
allow_redirects=True)
f = open('adult_sklearn_data_handler.py', 'wb')
f.write(datahandler_resp.content)
f.close()
!ls -lh
Each party must run their party configuration file to call out to the aggregator. Here is an example of a party configuration.
Because you had already defined the training ID, RTS ID and data handler in the previous sections of this notebook, and the local training and protocol handler are all defined by the SDK, you will only need to define the information for the dataset file under wml_client.remote_training_systems.ConfigurationMetaNames.DATA_HANDLER.
In this tutorial, the data path is already defined as we have loaded the examplar Adult Income dataset from previous sections.
import json
from pathlib import Path
working_dir = !pwd
pwd = working_dir[0]
party_config = {
wml_client.remote_training_systems.ConfigurationMetaNames.DATA_HANDLER: {
"info": {
"txt_file": "./adult.csv"
},
"name": "AdultSklearnDataHandler",
"path": "./adult_sklearn_data_handler.py"
}
}
print(json.dumps(party_config, indent=4))
Here you can finally connect to the aggregator to begin training.
party = wml_client.remote_training_systems.create_party(RTS_ID, party_config)
party.monitor_logs()
party.run(aggregator_id=TRAINING_ID, asynchronous=False)
Congratulations! You have learned to:
Copyright © 2020-2024 IBM. This notebook and its source code are released under the terms of the MIT License.