Data Science and MLOps use case

Last updated: Nov 21, 2024

To operationalize data analysis and model creation, your enterprise needs integrated systems and processes. Cloud Pak for Data as a Service provides the processes and technologies to enable your enterprise to develop and deploy machine learning models and other data science applications.

Watch this video to see the use case for implementing a Data Science and MLOps solution.

This video provides a visual method to learn the concepts and tasks in this documentation.

Challenges

You can solve the following challenges for your enterprise by implementing a Data Science and MLOps use case:

Accessing high-quality data: Organizations need to provide easy access to high quality, governed data for data science teams who use the data to build models.
Operationalizing model building and deploying: Organizations need to implement repeatable processes to quickly and efficiently build and deploy models to production environments.
Monitoring and retraining models: Organizations need to automate the monitoring and retraining of models based on production feedback.

Example: Golden Bank's challenges

Follow the story of Golden Bank as it implements a Data Science and MLOps process to expand its business by offering low-rate mortgage renewals for online applications. Data scientists at Golden Bank need to create a mortgage approval model that avoids risk and treats all applicants fairly. They must also automate the model retraining to optimize model performance.

Process

To implement Data Science and MLOps for your enterprise, your organization can follow this process:

Prepare and share the data
Build and train models
Deploy models
Monitor deployed models
Automate the AI lifecycle

The watsonx.ai Studio, watsonx.ai Runtime, Watson OpenScale, and IBM Knowledge Catalog services in Cloud Pak for Data as a Service provide the tools and processes that your organization needs to implement a Data Science and MLOps solution.

Image showing the flow of the data science use case

Data scientists can prepare their own data sets and share them in a catalog. The catalog serves as a feature store where your data scientist teams can find high-quality data assets with the features that they need. They can add data assets from a catalog into a project, where they collaborate to prepare, analyze, and model the data.

What you can use	What you can do	Best to use when
Data Refinery	Access and refine data from diverse data source connections. Materialize the resulting data sets as snapshots in time that might combine, join, or filter data for other data scientists to analyze and explore. Make the resulting data sets available in catalogs.	You need to visualize the data when you want to shape or cleanse it. You want to simplify the process of preparing large amounts of raw data for analysis.
Catalogs	Use catalogs in IBM Knowledge Catalog as a feature store to organize your assets to share among the collaborators in your organization. Take advantage of AI-powered semantic search and recommendations to help users find what they need.	Your users need to easily understand, collaborate, enrich, and access the high-quality data. You want to increase visibility of data and collaboration between business users. You need users to view, access, manipulate, and analyze data without understanding its physical format or location, and without having to move or copy it. You want users to enhance assets by rating and reviewing them.

Example: Golden Bank's catalog

The governance team leader creates a catalog, "Mortgage Approval Catalog" and adds the data stewards and data scientists as catalog collaborators. The data stewards publish the data assets that they created into the catalog. The data scientists find the data assets, curated by the data stewards, in the catalog and copy those assets to a project. In their project, the data scientists can refine the data to prepare it for training a model.

2. Build and train models

To get predictive insights based on your data, data scientists, business analysts, and machine learning engineers can build and train models. Data scientists use Cloud Pak for Data as a Service services to build the AI models, ensuring that the right algorithms and optimizations are used to make predictions that help to solve business problems.

What you can use	What you can do	Best to use when
AutoAI	Use AutoAI in watsonx.ai Studio to automatically select algorithms, engineer features, generate pipeline candidates, and train model pipeline candidates. Then, evaluate the ranked pipelines and save the best as models. Deploy the trained models to a space, or export the model training pipeline that you like from AutoAI into a notebook to refine it.	You want an advanced and automated way to build a good set of training pipelines and models quickly. You want to be able to export the generated pipelines to refine them.
Notebooks and scripts	Use notebooks and scripts in watsonx.ai Studio to write your own feature engineering model training and evaluation code in Python or R. Use training data sets that are available in the project, or connections to data sources such as databases, data lakes, or object storage. Code with your favorite open source frameworks and libraries.	You want to use Python or R coding skills to have full control over the code that is used to create, train, and evaluate the models.
SPSS Modeler flows	Use SPSS Modeler flows in watsonx.ai Studio to create your own model training, evaluation, and scoring flows. Use training data sets that are available in the project, or connections to data sources such as databases, data lakes, or object storage.	You want a simple way to explore data and define model training, evaluation, and scoring flows.
RStudio	Analyze data and build and test models by working with R in RStudio.	You want to use a development environment to work in R.
Decision Optimization	Prepare data, import models, solve problems and compare scenarios, visualize data, find solutions, produce reports, and save models to deploy with watsonx.ai Runtime.	You need to evaluate millions of possibilities to find the best solution to a prescriptive analytics problem.
Federated learning	Train a common model that uses distributed data.	You need to train a model without moving, combining, or sharing data that is distributed across multiple locations.

Example: Golden Bank's model building and training

Data scientists at Golden Bank create a model, "Mortgage Approval Model" that avoids unanticipated risk and treats all applicants fairly. They want to track the history and performance of the model from the beginning, so they add a model use case to the "Mortgage Approval Catalog". They run a notebook to build the model and predict which applicants qualify for mortgages. The details of the model training are automatically captured as metadata in the model use case.

3. Deploy models

When operations team members deploy your AI models, the models become available for applications to use for scoring and predictions to help drive actions.

What you can use	What you can do	Best to use when
Spaces user interface	Use the Spaces UI to deploy models and other assets from projects to spaces.	You want to deploy models and view deployment information in a collaborative workspace.

Example: Golden Bank's model deployment

The operations team members at Golden Bank promote the "Mortgage Approval Model" from the project to a deployment space and then creates an online model deployment.

4. Monitor deployed models

After models are deployed, it is important to monitor them to make sure that they are performing well. Data scientists must watch for model performance and data consistency issues.

What you can use	What you can do	Best to use when
Watson OpenScale	Monitor model fairness issues across multiple features. Monitor model performance and data consistency over time. Explain how the model arrived at certain predictions with weighted factors. Maintain and report on model governance and lifecycle across your organization.	You have features that are protected or that might contribute to prediction fairness. You want to trace model performance and data consistencies over time. You want to know why the model gives certain predictions.

Example: Golden Bank's model monitoring

Data scientists at Golden Bank use Watson OpenScale to monitor the deployed "Mortgage Approval Model" to ensure that it is accurate and treating all Golden Bank mortgage applicants fairly. They run a notebook to set up monitors for the model and then tweak the configuration by using the Watson OpenScale user interface. Using metrics from the Watson OpenScale quality monitor and fairness monitor, the data scientists determine how well the model predicts outcomes and if it produces any biased outcomes. They also get insights for how the model comes to decisions so that the decisions can be explained to the mortgage applicants.

5. Automate the AI lifecycle

Your team can automate and simplify the MLOps and AI lifecycle with Orchestration Pipelines.

What you can use	What you can do	Best to use when
Orchestration Pipelines	Use pipelines to create repeatable and scheduled flows that automate notebook, Data Refinery, and machine learning pipelines, from data ingestion to model training, testing, and deployment.	You want to automate some or all of the steps in an MLOps flow.

Example: Golden Bank's automated ML lifecycle

The data scientists at Golden Bank can use pipelines to automate their complete Data Science and MLOps lifecycle and processes to simplify the model retraining process.

Tutorials for Data Science and MLOps

Tutorial	Description	Expertise for tutorial
Orchestrate an AI pipeline with model monitoring	Train a model, promote it to a deployment space, and deploy the model.	Run a notebook.
Orchestrate an AI pipeline with data integration	Create an end-to-end pipeline that prepares data and trains a model.	Use the Orchestration Pipelines drag and drop interface to create a pipeline.

Learn more

Parent topic: Use cases