Data fabric solution overview
When you implement the data fabric solution on Cloud Pak for Data, you can solve the challenges of data access, data quality, data governance, and managing your data and AI lifecycles.
The data fabric solution on Cloud Pak for Data provides these main capabilities for managing and automating your data and AI lifecycles:
- Data access
- Access your data across multiple clouds and on-premises in your existing data architecture.
- Self-service consumption
- Share and use data and other assets from across the enterprise in catalogs.
- Accumulated knowledge
- Understand your data through a common business vocabulary. Trust your data through history, lineage, and quality analysis.
- Collaborative innovation
- Collaborate with others to discover insights. Prepare data, analyze data, and build models with a set of integrated tools for all levels of experience.
- Governance and compliance
- Define rules to enforce data privacy. Track and document the detailed history of AI models to ensure compliance.
- Unified lifecycle
- Automate the building, testing, deploying, and monitoring of data pipelines and AI models.
The following illustration shows how the data fabric supports use cases on the Cloud Pak for Data platform by integrating access to hybrid data sources with capabilities in a single UI experience.
The value of assets
With the data fabric, you can transform data into assets that accumulate meaning and value. Assets are more than just data. When you first create a connection to a data source, you have basic information about how to access the data, the tables, schemas, and data values. You start adding value while you ingest data by virtualizing, transforming, or replicating it in workspaces called projects.
When you curate the data, you add metadata to your data assets. You profile the data to classify it and compile statistics about the values. You enrich assets with business vocabulary that describe the semantic meaning of the data for your organization. You analyze data quality. The metadata that you add during curation is considered active metadata, because it is generated automatically through machine learning processes. When you rerun curation after your data changes, the metadata is updated based on automated data analysis.
As users use the assets in projects, they create the third level of meaning that describes the history of how the asset is used and the relationships between assets. Users can analyze the data in notebooks or dashboards, or train machine learning models.
Users can also add information to assets, such as, ratings and reviews, visualizations of the data, tags, and other relationships.
The following image shows how data assets accumulate value in a data fabric.
Models are also assets. You can track deployments and input data for the model, comparisons between models, compliance with corporate protocols, and other performance metrics.
Use cases
Cloud Pak for Data as a Service provides four use cases as parts of the data fabric solution. You implement the data fabric as represented in each use case by creating one or more service instances that provide features and tools. Some services are included in multiple use cases.
Use cases represent ways to implement part of the data fabric solution so that your teams can start working while you build out other parts. You can start with any use case and add the others as you need them:
- If you have a more mature data governance model, start by establishing your business vocabulary, as described in the Data governance use case.
- If you want quicker time-to-value, start with data virtualization or data science, as described in the Data integration and the Data Science and MLOps use cases.
- If you need to ensure that your models are compliant with your organization's goals and regulations, start tracking your models, as described in the AI governance use case.
Explore each use case to learn about what you can accomplish and the tools you can use.
Data governance
Implement governance based on metadata that provides business knowledge and defines data protection. Provide high-quality data assets in self-service catalogs. Automate enforcement of data governance for regulatory compliance.
Services for this use case: Watson Knowledge Catalog and IBM Match 360 with Watson.
Data integration
Simplify and automate access to all your data, without moving it. Orchestrates data across a distributed landscape to create a network of instantly available information for data consumers.
Services for this use case: Watson Query, DataStage, and Watson Knowledge Catalog.
Data Science and MLOps
Operationalize data analysis and model creation with an automated workflow that prepares data, builds, deploys, monitors, and retrains models.
Services for this use case: Watson Studio, Watson Machine Learning, Watson OpenScale, and Watson Knowledge Catalog.
AI governance
Operationalize AI governance with an automated workflow that enforces fairness, quality, and explainability in your models.
Services for this use case: Watson Studio, Watson Machine Learning, Watson OpenScale, and Watson Knowledge Catalog.