0 / 0
Managing feature groups (beta)

Managing feature groups (beta)

Create a feature group to preserve a set of columns of a data asset along with associated metadata for use with Machine Learning models. Publish feature groups to IBM Knowledge Catalog so that it can be used as a feature store. Feature groups in IBM Knowledge Catalog can be searched and reused by others without them needing access to your project.

Requirements and restrictions

You can view a feature group for assets under the following circumstances.

Required service

Watson Studio (for projects)

Required permissions

To view this page, you can have any role in a project or catalog.

To edit or update information on this page, you must have the Editor or Admin role in the project or catalog.

Workspaces

You can view the asset feature group in these workspaces:

  • Projects
  • Catalogs
Types of assets

These types of assets can have a feature group:

  • Tabular: CSV, TSV, Parquet, XLS, XSLX, AVRO, TXT, JSON files
  • Connected data types that are structured and supported in both Watson Studio and IBM Knowledge Catalog
Data size

No limit

Feature groups (beta)

Use IBM Knowledge Catalog as a feature store, where you can save and annotate data assets for use in your organization. Create a feature group to preserve a set of columns of a particular data asset along with the metadata used for Machine Learning. For example, if you have a set of features for a credit approval model, you can preserve the features used to train the model, as well as some metadata, including which column is used as the prediction target, and which columns are used for bias detection. Feature groups make it simple to preserve the metadata for the features used to train a machine learning model so other data scientists can use the same features. You can see the feature group tab when you preview a particular asset.

Creating a feature group in a project

Before you begin

If you create a profile for the data asset before creating a feature group you can select profile metadata to add values to the feature.

Create a feature group

You can select particular columns of data assets to form a feature group.

  1. In the project Assets tab, click the name of the relevant asset to open the preview and select the Feature group tab. Here you can create a feature group or view and edit an existing one. An asset can have only one feature group. Click New feature group.

    Create a feature group

  2. Select the columns that you want to be used in the feature group. Select the Name checkbox to include all the columns as features.

    Select the feature group columns

You can also create a feature group for data assets in IBM Knowledge Catalog. See Catalog assets for more information.

Editing a feature group

When you have selected the columns of the data asset to be used in the feature group, you can then view each feature and edit it to specify the role it will have in Machine Learning models.

View feature group

  1. Click a feature name and click Edit this feature. A window opens displaying the following tabs:

    • Details - provide the following information about the feature.

      Details

      Select a Role to be assigned to the feature:

      • Input: the feature can be used as input for training a Machine Learning model.
      • Target: the feature to be used as the prediction target when the data is used to train a Machine Learning model.
      • Identifier: the primary key, such as customer ID, used to identify the input data.

      Enter a Description, Recipe (any method or formula used to create values for the feature) and any Tags.

    • Value descriptions

      Value descriptions

      Value descriptions allow you to clarify the meaning of specific values. For example, consider a column "credit evaluation" with the values -1, 0 and 1. You can use value descriptions to provide meaning for these values. For example, -1 might mean "evaluation rejected". You can enter descriptions for particular values. For numerical values, you can also specify a range. To specify a range of numerical values, enter the following text [n,m] where n is the start and m is the end of the range, surrounded by brackets, and click Add. For example, to describe all age values between 18 and 24 as "millenials", enter [18,24] as the value and millenials as the description. If you have a profile defined, the profile values are displayed in the value descriptions list. From here you can select one value or multiple values.

    • Fairness information

      Fairness information

      You can define Monitor or Reference groups of values for monitoring bias. The values that are more at risk of biased outcomes can be placed in the Monitor group. These values are then compared to values in the Reference group. To specify a range of numerical values, enter the following text [n,m] where n is the start and m is the end of the range, surrounded by brackets. For example, to monitor all age values between 18 and 35, enter [18,35]. Then select Monitor or Reference and click Add. You can also specify Favorable outcomes. See Fairness in AutoAI experiments for more information about fairness.

  2. When you have edited the feature, click Save. You can now see your changes in the Feature Details window. Close this window to return to the feature group.

Removing features from a group

To remove a feature from a group:

  1. Preview the asset in the project and select the Feature group tab.

  2. In the Features table that is displayed, select the feature (or features) that you want to remove.

  3. In the toolbar that appears, select Remove from group.

    Removing features

The feature, or feature group if you selected all the features, is removed.

Sharing a feature group with IBM Knowledge Catalog

From a project to a catalog

If you have a IBM Knowledge Catalog created, then from a project, you can select the three dots next to the data asset and select Publish to catalog. Then the catalog also contains the asset and its feature group is displayed with the feature details populated in the catalog asset.

If you have previously published an asset with a feature group from a project to a catalog and you then remove a feature from the project asset, you might want to remove this feature also from the catalog. You can either remove the asset from the catalog or republish from the project and choose the appropriate duplicate action. For example, selecting overwrite will remove the previous feature group from the catalog.

If you edit the description or tag of a feature in a project, you must republish the asset to the catalog and choose update as the duplicate action or edit the feature directly in the catalog asset if you want to propagate your changes.

From a catalog to a project

Similarly, if you have features defined in a catalog, you can view the asset in the catalog, edit the asset, and add the catalog asset to the project. The project then contains the asset and its feature group is displayed with feature details populated in the project asset.

Searching for a feature group

You can search for assets or columns across all catalogs and projects. To filter your search results to find assets with a feature group, select Data to see the filter options, and select Feature group. Assets containing a feature group will then be listed in the search results.

Using the Python API to create and use feature groups

You can also use the assetframe-lib Python library in notebooks to create and edit feature groups. This library also allows you use feature metadata like fairness information when creating machine learning models.

Learn more

For examples on how to create and use feature groups in notebooks:

See also:

Parent topic: Preparing data

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more