The autoai-lib library for Python contains a set of functions that help you to interact with IBM watsonx.ai Runtime AutoAI experiments. Using the autoai-lib library, you can review and edit the data transformations that
take place in the creation of the pipeline. Similarly, you can use the autoai-ts-libs library to interact with pipeline notebooks for time series experiments.
Installing autoai-lib or autoai-ts-libs for Python
The autoai-lib and autoai-ts-libs library for Python contain functions that help you to interact with AutoAI experiments. Using the autoai-lib library, you can review and edit the data transformations
that take place in the creation of classification and regression pipelines. Using the autoai-ts-libs library, you can review the data transformations that take place in the creation of time series (forecast) pipelines.
Installing autoai-lib and autoai-ts-libs for Python
You can configure the library logger autoai_libs for your requirements. For example, you can define and attach handlers or configure filters to customize log details such as warning and error handling.
If you do not update the configuration you will get the default behavior for logging. For example, messages of severity warning and higher (that is warning, error, and critical) will be printed to stderr (standard error) without
any special formatting. For example, warning messages display as WARNING with no special formatting. For more information on how to configure logging, and to view examples, refer to the documentation for autoai-lib.
The autoai-lib functions
Copy link to section
The instantiated project object that is created after you import the autoai-lib library exposes these functions:
type of string compression. 'string' for removing spaces from a string and 'hash' for creating an int hash. Default is 'string'. 'hash' is used for columns with strings and cat_imp_strategy='most_frequent'
dtypes_list
list containing strings that denote the type of each column of the input numpy array X (strings are among 'char_str','int_str','float_str','float_num', 'float_int_num','int_num','Boolean','Unknown'). If None, the column types are discovered.
Default is None.
misslist_list
list contains lists of missing values of each column of the input numpy array X. If None, the missing values of each column are discovered. Default is None.
missing_values_reference_list
reference list of missing values in the input numpy array X
activate_flag
flag that indicates that this transformer is active. If False, transform(X) outputs the input numpy array X unmodified.
Given a numpy array and a reference list of missing values for it, replaces missing values with a special value (typically a special missing value such as np.nan).
Given a numpy array and a reference list of known values for each column, replaces values that are not part of a reference list with a special value (typically np.nan). This method is typically used to remove labels for columns in a test data
set that has not been seen in the corresponding columns of the training data set.
string, optional, default=”mean”. The imputation strategy for missing values. -mean: replace by using the mean along each column. Can be used only with numeric data. - median:replace by using the median
along each column. Can only be used with numeric data. - most_frequent:replace by using most frequent value each column. Used with strings or numeric data. - constant:replace with fill_value. Can be
used with strings or numeric data.
missing_values
number, string, np.nan (default) or None. The placeholder for the missing values. All occurrences of missing_values are imputed.
sklearn_version_family
str indicating the sklearn version for backward compatibiity with versions 019, and 020dev. Currently unused. Default is None.
activate_flag
flag that indicates that this transformer is active. If False, transform(X) outputs the input numpy array X unmodified.
autoai_libs.transformers.exportable.CatEncoder()
Copy link to section
This method is a wrapper for categorical encoder. If encoding parameter is 'ordinal', internally it currently uses sklearn OrdinalEncoder.
If encoding parameter is 'onehot', or 'onehot-dense' internally it uses sklearn OneHotEncoder
str, 'onehot', 'onehot-dense' or 'ordinal'. The type of encoding to use (default is 'ordinal') 'onehot': encode the features by using a one-hot aka one-of-K scheme (or also called 'dummy' encoding). This encoding creates a binary
column for each category and returns a sparse matrix. 'onehot-dense': the same as 'onehot' but returns a dense array instead of a sparse matrix. 'ordinal': encode the features as ordinal integers. The result is a single column
of integers (0 to n_categories - 1) per feature.
categories
'auto' or a list of lists/arrays of values. Categories (unique values) per feature: 'auto' : Determine categories automatically from the training data. list : categories[i] holds the categories that are
expected in the ith column. The passed categories must be sorted and can not mix strings and numeric values. The used categories can be found in the encoder.categories_ attribute.
dtype
number type, default np.float64 Desired dtype of output.
handle_unknown
'error' (default) or 'ignore'. Whether to raise an error or ignore if a unknown categorical feature is present during transform (default is to raise). When this parameter is set to 'ignore' and an unknown category is encountered during
transform, the resulting one-hot encoded columns for this feature are all zeros. In the inverse transform, an unknown category are denoted as None. Ignoring unknown categories is not supported for encoding='ordinal'.
sklearn_version_family
str indicating the sklearn version for backward compatibiity with versions 019, and 020dev. Currently unused. Default is None.
activate_flag
flag that indicates that this transformer are active. If False, transform(X) outputs the input numpy array X unmodified.
Given numpy array X and dtypes_list that denotes the types of its columns, it replaces columns of strings that represent floats (type 'float_str' in dtypes_list) to columns of floats and replaces their missing values with np.nan.
list contains strings that denote the type of each column of the input numpy array X (strings are among 'char_str','int_str','float_str','float_num', 'float_int_num','int_num','Boolean','Unknown').
missing_values_reference_list
reference list of missing values
activate_flag
flag that indicates that this transformer is active. If False, transform(X) outputs the input numpy array X unmodified.
num_imp_strategy: string, optional (default=”mean”). The imputation strategy: - If “mean”, then replace missing values by using the mean along the axis. - If “median”, then replace missing values by using the median along the axis. -
If “most_frequent”, then replace missing by using the most frequent value along the axis.
missing_values
integer or “NaN”, optional (default=”NaN”). The placeholder for the missing values. All occurrences of missing_values are imputed: - For missing values encoded as np.nan, use the string value “NaN”. - activate_flag:
flag that indicates that this transformer is active. If False, transform(X) outputs the input numpy array X unmodified.
Boolean, optional, default True. If False, try to avoid a copy and do in-place scaling instead. This action is not guaranteed to always work. With in-place, for example, if the data is not a NumPy array or scipy.sparse CSR matrix, a
copy might still be returned.
num_scaler_with_mean
Boolean, True by default. If True, center the data before scaling. An exception is raised when attempted on sparse matrices because centering them entails building a dense matrix, which in common use cases is likely to be too large to
fit in memory.
num_scaler_with_std
Boolean, True by default. If True, scale the data to unit variance (or equivalently, unit standard deviation).
use_scaler_flag
Boolean, flag that indicates that this transformer is active. If False, transform(X) outputs the input numpy array X unmodified. Default is True.
a string name that uniquely identifies this transformer from others
datatypes
a list of datatypes either of which are valid input to the transformer function (numeric, float, int, and so on)
feat_constraints
all constraints, which must be satisfied by a column to be considered a valid input to this transform
tgraph
tgraph object must be the starting TGraph( ) object. This parameter is optional and you can pass None, but that can result in some failure to detect some inefficiencies due to lack of caching
apply_all
only use applyAll = True. It means that the transformer enumerates all features (or feature sets) that match the specified criteria and apply the provided function to each.
name: a string name that uniquely identifies this transformer from others
datatypes1
a list of datatypes either of which are valid inputs (first parameter) to the transformer function (numeric, float, int, and so on)
feat_constraints1
all constraints, which must be satisfied by a column to be considered a valid input (first parameter) to this transform
datatypes2
a list of data types either of which are valid inputs (second parameter) to the transformer function (numeric, float, int, and so on)
feat_constraints2
all constraints, which must be satisfied by a column to be considered a valid input (second parameter) to this transform
tgraph
tgraph object must be the invoking TGraph( ) object. Note this parameter is optional and you can pass None, but that results in some missing inefficiencies due to lack of caching
apply_all
only use applyAll = True. It means that the transformer enumerates all features (or feature sets) that match the specified criteria and apply the provided function to each.
a class that implements fit( ) and transform( ) in accordance with the transformation function definition
name
a string name that uniquely identifies this transformer from others
datatypes
list of datatypes either of which are valid input to the transformer function (numeric, float, int, and so on)
feat_constraints
all constraints, which must be satisfied by a column to be considered a valid input to this transform
tgraph
tgraph object must be the invoking TGraph( ) object. Note that this is optional and you might pass None, but that results in some missing inefficiencies due to lack of caching
apply_all
only use applyAll = True. It means that the transformer enumerates all features (or feature sets) that match the specified criteria and apply the provided function to each.
a class that implements fit( ) and transform( ) in accordance with the transformation function definition
name
a string name that uniquely identifies this transformer from others
datatypes1
a list of data types either of which are valid inputs (first parameter) to the transformer function (numeric, float, int, and so on)
feat_constraints1
all constraints, which must be satisfied by a column to be considered a valid input (first parameter) to this transform
datatypes2
a list of data types either of which are valid inputs (second parameter) to the transformer function (numeric, float, int, and so on)
feat_constraints2
all constraints, which must be satisfied by a column to be considered a valid input (second parameter) to this transform
tgraph
tgraph object must be the invoking TGraph( ) object. This parameter is optional and you might pass None, but that results in some missing inefficiencies due to lack of caching
apply_all
only use applyAll = True. It means that the transformer enumerates all features (or feature sets) that match the specified criteria and apply the provided function to each.
a class that implements fit( ) and transform( ) in accordance with the transformation function definition
name
a string name that uniquely identifies this transformer from others
tgraph
tgraph object must be the invoking TGraph( ) object. This parameter is optional and you can pass None, but that results in some missing inefficiencies due to lack of caching
apply_all
only use applyAll = True. It means that the transformer enumerates all features (or feature sets) that match the specified criteria and apply the provided function to each.
a string name that uniquely identifies this transformer from others
arg_count
number of inputs to the function, in this example it is 1, for binary, it is 2, and so on
datatypes_list
a list of arg_count lists that correspond to the acceptable input data types for each parameter. In the previous example, since `arg_count=1``, the result is one list within the outer list, and it contains a single type called 'numeric'.
In another case, it might be a specific case 'int' or even more specific 'int64'.
feat_constraints_list
a list of arg_count lists that correspond to some constraints that can be imposed on selection of the input features
tgraph
tgraph object must be the invoking TGraph( ) object. Note this parameter is optional and you can pass None, but that results in some missing inefficiencies due to lack of caching
apply_all
only use applyAll = True. It means that the transformer enumerates all features (or feature sets) that match the specified criteria and apply the provided function to each.
serial numbers of the columns that must be kept irrespective of their feature importance
additional_col_count_to_keep
how many columns need to be retained
ptype
classification or regression
The autoai-ts-libs functions
Copy link to section
The combination of transformers and estimators are designed and chosen for each pipeline by the AutoAI Time Series system. Changing the transformers or the estimators in the generated pipeline notebook can cause unexpected results or even failure.
We do not recommend you change the notebook for generated pipelines, thus we do not currently offer the specification of the functions for the autoai-ts-libs library.