As a data scientist, you need to train numerous models to identify the right combination of data in conjunction with hyperparameters to optimize the performance of your neural networks. You want to perform more experiments faster. You want to train deeper networks and explore broader hyperparameters spaces. IBM Watson Machine Learning accelerates this iterative cycle by simplifying the process to train models in parallel with an on-demand GPU compute cluster.
Here’s how to get started:
1. Set up your environment
2. There are several ways to train models
Use one of the following methods to train your model:
For details, follow the CLI tutorial using Tensorflow.
3. Configure each training run
IBM Watson Machine Learning allows you to rapidly conduct deep learning iterations by submitting multiple training runs that can be queued for training. A training run consists of the following parts:
- Your neural network defined in one of the supported deep learning frameworks.
- Define your training run including the number of GPUs and location of the IBM Cloud Object Storage that contains your data set
4. Upload training data to the cloud
Before you can start training your neural networks, you first need to move your data into IBM Cloud. To do this, upload your training data to an IBM Cloud Object Storage service instance. When you’re done training, the output from your training runs is written to your IBM Cloud Object Storage so that you can drag files to your desktop.
Note: By default, Watson Machine Learning does not restrict external sites users can access as part of operations such as downloading data source files or installing Python library packages. If you would like to limit access to a list of approved sites, contact IBM Cloud support to request a custom network policy for your organization.
5. Start training
After setting up your training runs, use the Python client or the CLI to submit your training runs to IBM Watson Machine Learning. IBM Watson Machine Learning packages each of your training runs and allocates them to a Kubernetes container with the requested resources and deep learning framework. Training runs are executed in parallel depending on the GPU resources available to your account level.
6. Deploy to a REST endpoint and obtain predictions
Once you’re trained and selected an optimal model, you’re ready to deploy your model as a REST endpoint for access by your application.