Managing feature groups (beta)
Create a feature group to preserve a set of columns of a data asset along with associated metadata for use with Machine Learning models. Publish feature groups to IBM Knowledge Catalog so that it can be used as a feature store. Feature groups in IBM Knowledge Catalog can be searched and reused by others without them needing access to your project.
Requirements and restrictions
You can view a feature group for assets under the following circumstances.
- Required service
-
Watson Studio (for projects)
- Required permissions
-
To view this page, you can have any role in a project or catalog.
-
To edit or update information on this page, you must have the Editor or Admin role in the project or catalog.
- Workspaces
-
You can view the asset feature group in these workspaces:
- Projects
- Catalogs
- Types of assets
-
These types of assets can have a feature group:
- Tabular: CSV, TSV, Parquet, XLS, XSLX, AVRO, TXT, JSON files
- Connected data types that are structured and supported in both Watson Studio and IBM Knowledge Catalog
- Data size
-
No limit
Feature groups (beta)
Use IBM Knowledge Catalog as a feature store, where you can save and annotate data assets for use in your organization. Create a feature group to preserve a set of columns of a particular data asset along with the metadata used for Machine Learning. For example, if you have a set of features for a credit approval model, you can preserve the features used to train the model, as well as some metadata, including which column is used as the prediction target, and which columns are used for bias detection. Feature groups make it simple to preserve the metadata for the features used to train a machine learning model so other data scientists can use the same features. You can see the feature group tab when you preview a particular asset.
- Creating a feature group
- Editing a feature group
- Removing features or a feature group
- Sharing feature group with catalog
- Using the Python API for feature groups
Creating a feature group in a project
Before you begin
If you create a profile for the data asset before creating a feature group you can select profile metadata to add values to the feature.
Create a feature group
You can select particular columns of data assets to form a feature group.
-
In the project Assets tab, click the name of the relevant asset to open the preview and select the Feature group tab. Here you can create a feature group or view and edit an existing one. An asset can have only one feature group. Click New feature group.
-
Select the columns that you want to be used in the feature group. Select the Name checkbox to include all the columns as features.
You can also create a feature group for data assets in IBM Knowledge Catalog. See Catalog assets for more information.
Editing a feature group
When you have selected the columns of the data asset to be used in the feature group, you can then view each feature and edit it to specify the role it will have in Machine Learning models.
-
Click a feature name and click Edit this feature. A window opens displaying the following tabs:
-
Details - provide the following information about the feature.
Select a Role to be assigned to the feature:
Input
: the feature can be used as input for training a Machine Learning model.Target
: the feature to be used as the prediction target when the data is used to train a Machine Learning model.Identifier
: the primary key, such as customer ID, used to identify the input data.
Enter a Description, Recipe (any method or formula used to create values for the feature) and any Tags.
-
Value descriptions
Value descriptions allow you to clarify the meaning of specific values. For example, consider a column "credit evaluation" with the values -1, 0 and 1. You can use value descriptions to provide meaning for these values. For example, -1 might mean "evaluation rejected". You can enter descriptions for particular values. For numerical values, you can also specify a range. To specify a range of numerical values, enter the following text [n,m] where n is the start and m is the end of the range, surrounded by brackets, and click Add. For example, to describe all age values between 18 and 24 as "millenials", enter [18,24] as the value and millenials as the description. If you have a profile defined, the profile values are displayed in the value descriptions list. From here you can select one value or multiple values.
-
Fairness information
You can define
Monitor
orReference
groups of values for monitoring bias. The values that are more at risk of biased outcomes can be placed in the Monitor group. These values are then compared to values in the Reference group. To specify a range of numerical values, enter the following text [n,m] where n is the start and m is the end of the range, surrounded by brackets. For example, to monitor all age values between 18 and 35, enter [18,35]. Then select Monitor or Reference and click Add. You can also specify Favorable outcomes. See Fairness in AutoAI experiments for more information about fairness.
-
-
When you have edited the feature, click Save. You can now see your changes in the Feature Details window. Close this window to return to the feature group.
Removing features from a group
To remove a feature from a group:
-
Preview the asset in the project and select the Feature group tab.
-
In the Features table that is displayed, select the feature (or features) that you want to remove.
-
In the toolbar that appears, select Remove from group.
The feature, or feature group if you selected all the features, is removed.
Searching for a feature group
You can search for assets or columns across all catalogs and projects. To filter your search results to find assets with a feature group, select Data to see the filter options, and select Feature group. Assets containing a feature group will then be listed in the search results.
Using the Python API to create and use feature groups
You can also use the assetframe-lib Python library in notebooks to create and edit feature groups. This library also allows you use feature metadata like fairness information when creating machine learning models.
Learn more
For examples on how to create and use feature groups in notebooks:
- Creating and using feature store data sample project in the Resource hub
See also:
- Searching for assets in IBM Knowledge Catalog
- Searching for assets across all catalogs and projects
- Viewing assets in catalogs
- Editing assets in catalogs
- Publishing project assets to a catalog
Parent topic: Preparing data