Glossary

This glossary provides terms and definitions for Watson Studio deployment environments. (Not all terms and definitions apply to Watson Studio Desktop.)

A

active runtime
An instance of an environment that is running to provide compute resources to analytical assets.
advanced data curation
The process of curating data with tools that automate the discovery, analysis, assignment of metadata, and publishing of data assets to a catalog.
algorithm
Formula applied to data to determine optimal ways to solve analytical problems.
analytical asset
An asset that runs code to analyze data. See also asset.
anonymize
To mask, substitute, or redact data in a column, based on the attribute classifier of the column and data policy rules.
asset
An item in a project or catalog that contains metadata about data or data analysis. See also analytical asset, data asset.
attribute classifier
A classification for a column in a relational data set.
AutoAI experiment
An automated training process that considers a series of training definitions and parameters to create a set of ranked pipelines as model candidates.
automated discovery
A method of discovering data assets that provides detailed quality analysis results for the data assets, assigns business terms and data classes, and publishes the data assets to the default catalog.
automation rule
During advanced data curation, a rule that automates the process of applying rule definitions and quality dimensions to data.

B

batch deployment
A method to deploy models that processes input data from a file and writes the output to a file.
business glossary
The set of business terms for an organization.
business lineage
The lifecycle of a unit of data, such as a table or a column, between information assets. Business lineage excludes data transformations that data lineage typically includes. See also data lineage.
business term
A word or phrase that defines a business concept in a standard way for an enterprise. Terms can be used to enrich the metadata of data assets and to define the criteria of data protection rules.

C

catalog
A repository of assets for an organization share. Assets in catalogs can be governed by data protection rules and enriched by other governance artifacts, such as classifications, data classes, and business terms. Catalogs can store structured and unstructured data, references to data in external data sources, and other analytical assets, like machine learning models.
classification
In Watson Knowledge Catalog, a governance artifact that describes the sensitivity level of the data in a data asset.
cleanse
To ensure that all values in a data set are consistent and correctly recorded.
collaborator
A member of a group of people who are working together toward a common goal.
combinatorial problem
A problem that is difficult to solve because it requires multiple decisions to be made involving too many combinations of possible choices. Some examples are finding a grouping, ordering, or the assignment of objects.
compute resources
The hardware and software resources defined by an environment definition to run analytical assets.
confusion matrix
A table that provides a detailed numeric breakdown of annotated document sets. The table is used to compare the annotations that were added by a machine learning model to the annotations in the ground truth. The table reports the number of false positives, false negatives, true positives, and true negatives.
connected data asset
A pointer to data that is accessed through a connection to an external data source.
connection
The information required to connect to a database. The actual information required varies according to the DBMS and connection method.
connection asset
An asset that contains information that enables connecting to a data source.
constraint
A condition that must be satisfied by the solution of a problem.
constraint programming (CP)
A mathematical technique based on logic programming and graph theory used to model and solve scheduling and combinatorial optimization problems.
continuous learning
Automating the tasks of monitoring model performance, retraining with new data, and redeploying to ensure prediction quality.
Core ML deployment
The process of downloading a deployment in Core ML format for use in iOS apps.
CP
See constraint programming.
CP Optimizer
The constraint programming engine in ILOG CPLEX.
curate
To create a data asset and prepare it to be published in a catalog. Curation can include enriching the data asset by assigning governance artifacts such as business terms, classification, and data classes, and analyzing the quality of the data in the data asset.
custom attribute
Custom created properties to use with information assets and governance artifacts.

D

dashboard asset
A set of visualizations of analytical results created in the dashboard editor without writing code.
data asset
An asset that points to data, for example, to an uploaded file. Connections and connected data assets are also considered data assets. See also asset.
data class
A governance artifact that categorizes columns in relational data sets according to the type of the data and how the data is used.
data lineage
The lifecycle of a unit of data, such as a table or a column, that indicates where the data comes from and how the data changes as it moves between persistent and transient data stores of any type. Lineage is often expressed as a graph of that data flow. See also business lineage.
data mining
The process of collecting critical business information from a data source, correlating the information, and uncovering associations, patterns, and trends. See also predictive analytics.
data protection rule
A governance artifact that specifies what data to control and how to control it. A data protection rule contains criteria and an action.
data quality analysis
A part of advanced data curation that identifies the structure, content, and overall quality of data.
Data Refinery flow
A data source, a chain of one or more operations that refine and shape that data source, and a target that the data moves to.
Data Refinery flow asset
An asset that is based on an ordered set of steps to cleanse, shape, and enhance data.
data rule
During advanced data curation, a rule that applies rule logic provided in rule definitions to analyze data.
data science
The analysis and visualization of structured and unstructured data to discover insights and knowledge.
data set
A collection of data, usually in the form of rows (records) and columns (fields) and contained in a file or database table.
data source
A repository, queue, or feed for reading data, such as a Db2 database or IBM MQ.
data table
A collection of data, usually in the form of rows (records) and columns (fields) and contained in a table.
Decision Optimization
A set of tools to build and solve prescriptive models using CPLEX engines and data.
Decision Optimization experiment
An asset that contains a group of scenarios that represent different model formulations or data sets related to the same problem that is being solved.
Decision Optimization model
A prescriptive model that can be solved with optimization to provide the best solution to a Decision Optimization problem.
Decision Optimization model builder
A user interface to edit experiments. In the model builder, data can be imported and prepared, models formulated and linked to scenarios to be solved. Solutions can be compared, data visualized, notebooks generated, and scenarios saved as models for deployment.
Decision Optimization Modeling Assistant
A user interface to create optimization models in natural language.
Decision Optimization notebook
A Jupyter notebook that uses DOcplex, a native Python API for modeling and solving Decision Optimization problems.
Decision Optimization problem
A Decision Optimization model plus a data set. The same Decision Optimization model can be used for several problems. The model represents a situation where a decision needs to be taken, but there are too many alternatives or considerations to be humanly able to examine all possibilities (for example, combinations) to choose the best solution in a reasonable time.
Decision Optimization scenario
A Decision Optimization model formulation and a data set. Multiple scenarios can be created, run and compared in the Decision Optimization model builder.
decision variable
One of a set of variables representing decisions to be made, whose values are determined by the optimization engine while ensuring that all constraints are satisfied and the objective optimized.
deep learning experiment
An asset that is based on a logical grouping of one or more model training definitions that are connected in a neural network.
Deep Learning Experiment Builder
A tool for building deep learning experiment assets.
deployment
A model or application package that is available for use.
deployment space
A workspace where models are deployed and deployments managed.
discovery
The process of automatically creating data assets out of all tables from a connected data source.
DOcplex
A native Python API for modeling and solving Decision Optimization problems.

E

endpoint URL
A network destination address that identifies resources, such as services and objects. For example, an endpoint URL is used to identify the location of a model or function deployment when a user sends payload data to the deployment.
environment
The compute resources for running analytical assets.
environment definition
A definition that specifies hardware and software resources to instantiate environment runtimes.
environment runtime
An instantiation of the environment definition to run analytical assets.
experiment asset
An asset that evaluates a group of model training definitions to create an optimal model asset. See AutoAI experiment and Deep Learning experiment.
experiment builder
A tool for building experiment assets.

F

feasible solution
A solution to a problem that satisfies all the constraints of the problem without necessarily having an optimal objective.
feature selection
Identifying the columns of data that best support an accurate prediction or score.
feature transformation
In AutoAI, a phase of pipeline creation that applies algorithms to transform and optimize the training data to achieve the best outcome for the model type.
field ops node
A node in an SPSS Modeler flow that performs operations on data fields, such as filtering, deriving new fields, and determining the measurement level for given fields.
flow
A collection of nodes that define a set of steps for processing data or training a model.
flow canvas
See flow editor.
flow editor
A tool for creating flows.
folder asset
A pointer to a folder in IBM Cloud Object Storage.

G

Gantt chart
A graphical representation of a project timeline and duration in which schedule data is displayed as horizontal bars along a time scale.
governance artifact
Governance items that enrich or control data assets. Governance artifacts include business terms, classifications, data classes, policies, rules, and reference data sets.
governance rule
A governance artifact that provides a natural-language description of the criteria that are used to determine whether data assets are compliant with business objectives.
governance workflow
A task-based process to control the creating, modifying, and deleting of governance artifacts.
governed catalog
A catalog that has enforcement of data protection rules enabled.
graphical canvas
A tool for creating analytical assets by visually coding. A canvas is an area on which to place objects or nodes that can be connected to create a flow.
graph node
A node in an SPSS Modeler flow that displays data before or after modeling. Some common graphs include plots, histograms, web nodes, and evaluation charts.

H

HPO
See hyperparameter optimization.
hyperparameter
In machine learning, a parameter whose value is set before training as a way to increase model accuracy.
hyperparameter optimization (HPO)
The process for setting hyperparameter values to the settings that provide the most accurate model.

I

image
A software package that contains a set of libraries.
image recognition
A process where deep learning algorithms are used to classify images based on training data, such as with IBM Watson Visual Recognition.
import node
A node that pulls data into an SPSS Modeler flow, and always appears at the start of the flow.
infeasible problem
A Decision Optimization problem for which no feasible solution exists. This often indicates an incorrect constraint in the model formulation.
infeasible solution
A set of values for all the decision variables of a Decision Optimization problem where not all the constraints are satisfied.
information asset
An asset that provides an alternative view of data assets in a catalog.
ingest
To continuously add a high-volume of real-time data to a database.
integer programming (IP)
A form of mathematical programming in which the model of a problem includes the additional stipulation that the domain of all the variables is restricted to integers.
IP
See integer programming.

J

job
A separately executable unit of work.
job dump
A file that is exported from a job. All the elements that are used to execute the job (model, data, solution, log) are contained in the file.
Jupyter notebook
See notebook.
Jupyter notebook editor
The standard notebook editor in a project. Locks the notebook during editing to prevent conflicts.

L

lineage
  1. The history of the events performed on an asset.
  2. The history of the flow of data through assets.
linear programming (LP)
A technique for the optimization of a linear function subject to linear constraints over decision variables. In LP, the model of a problem is expressed through numeric variables combined with linear constraints and governed by a linear objective function and by bounds on the variables.
LP
See linear programming.

M

machine learning framework
The libraries and runtime for training and deploying a model.
mask
  1. To obfuscate, substitute, or redact data in a column, based on the attribute classifier of the column and data protection rules.
  2. To replace data in a column with similarly formatted values that match the original format. A form of anonymization.
master data
Reference data that remains the same for several jobs on the same model but that could still be changed if necessary.
mathematical programming (MP)
A field of mathematics, or operational research, used to model and solve Decision Optimization problems. This encompasses linear, integer, mixed integer and non-linear programming.
member
A user who has been added to an asset in a catalog.
metadata import
A method of importing metadata as information assets that provide the relationships between physical and logical data models. For example, importing metadata about reports and jobs related to data assets augments the lineage of those assets.
metadata interchange server
A computer that runs metadata imports with bridges and connectors.
MIP
See mixed integer programming.
mixed integer programming (MIP)
A branch of linear programming where some decision variables in the model take integer values and others take continuous values.
model
  1. In Decision Optimization, a mathematical formulation of a problem that can be solved with CPLEX optimization engines using different data sets.
  2. In a machine learning context, a set of functions and algorithms that have been trained and tested on a data set to provide predictions or decisions.
modeler flow asset
An asset that is based on a graphical representation of a data model or a neural network design.
model formulation
In Decision Optimization, the mathematical formulation of a model expressed as a list of decision variables, one or more objective functions to be maximized or minimized, and some constraints to be satisfied.
modeling node
A node in SPSS Modeler that represents a statistical algorithm such as neural nets, decision trees, clustering algorithms, and data sequencing.
MP
See mathematical programming.

N

natural language
A modeling syntax that resembles natural human language (in English) to formulate models.
Natural Language Classifier model asset
An asset that is based on a set of custom textual content classifiers that are defined by the user.
neural network
A mathematical model for predicting or classifying cases by using a complex mathematical scheme that simulates an abstract version of brain cells. A neural network is trained by presenting it with a large number of observed cases, one at a time, and allowing it to update itself repeatedly until it learns the task.
node
The graphical representation of a data operation in a stream or flow. Different types of nodes have different shapes to indicate the type of operation that they perform.
notebook
An interactive document that contains executable code, descriptive text for that code, and the results of any code that is run.
notebook asset
An asset that is based on a Jupyter notebook file.
notebook kernel
The part of the Jupyter notebook editor that executes code and returns the computational results.

O

object detection
A capability of the IBM Visual Recognition service that uses deep learning algorithms to analyze images for particular objects and other content.
objective function
In Decision Optimization and operations research, an expression to optimize (that is, either to minimize or to maximize) while satisfying other constraints of the problem.
object storage
A method of storing data, typically used in the cloud, in which data is stored as discrete units, or objects, in a storage pool or repository that does not use a file hierarchy but that stores all objects at the same level.
online deployment
Method of accessing a deployment through an API endpoint, providing a real-time score or solution on new data.
OPL
See Optimization Programming Language.
OPL model
A model formulation expressed in OPL modeling language.
optimal solution
In operations research, a solution to a problem that optimizes the objective function (whether linear or quadratic) and satisfies all the other constraints of the problem.
optimization
The process of finding the most effective way to allocate scarce resources, to find the best elements or combinations from a very large set of alternatives, while minimizing or maximizing a defined objective. The process uses mathematical programming or constraint programming techniques and CPLEX optimization engines to find the best (optimal) solution.
Optimization Programming Language (OPL)
A modeling language for expressing model formulations of optimization problems in a format that can be solved by CPLEX optimization engines such as IBM CPLEX.
output node
A node in an SPSS Modeler flow that produces various choices of output for data, charts, and model results. Output nodes usually appear as the last node in a flow or a branch of a flow.

P

payload
The data passed to a deployment to get back a score, prediction, or solution.
pipeline
A candidate model created by AutoAI.
pipeline leaderboard
A table showing the list of automatically generated candidate models, as pipelines, ranked according to the specified criteria.
placeholder
A field or variable to be replaced with a value.
policy
  1. A set of rules that protect data by controlling access to data assets or anonymizing sensitive data within data assets.
  2. A governance artifact that consists of one or more data protection and governance rules.
predictive analytics
A business process and a set of related technologies that are concerned with the prediction of future possibilities and trends. Predictive analytics applies such diverse disciplines as probability, statistics, machine learning, and artificial intelligence to business problems to find the best action for a given situation. See also data mining.
primary category
In Watson Knowledge Catalog, the category that contains the governance artifact. A category is similar to a folder or directory that organizes a user's governance artifacts.
profile
The generated metadata and statistics about the textual content of data.
project
A workspace to organize resources and collaboratively work on data.
publish
To copy an asset into a catalog.
Python
A programming language used in data science and AI.
Python DOcplex model
A model formulation expressed in Python.

Q

quality rule
During data quality analysis, a rule that checks whether specific conditions are met and identifies records that do not meet the conditions as rule violations.
quick scan
An advanced data curation method that analyzes a sample of each table or file to quickly provide analysis results, including data quality score, and automatically assigned data classes and business terms.

R

R
An extensible scripting language used in data science and AI that offers a wide variety of analytic, statistical, and graphical functions and techniques.
record ops node
A node in an SPSS Modeler flow that performs operations on data records, such as selecting, merging, and appending.
redact
To replace data values in a column with 10 Xs to hide sensitive values, data format, and referential integrity.
reference data set
A governance artifact that defines values for specific types of columns.
refine
To cleanse and shape data.
rule
In Watson Knowledge Catalog, a rule contains the criteria or logic to analyze or protect data.
runtime environment
The predefined or custom hardware and software configuration that is used to run tools, such as notebooks in Watson Studio.

S

scoring
  1. The process of computing how closely the attributes for an incoming identity match the attributes of an existing entity.
  2. In machine learning, measures the confidence of a predicted outcome.
secondary category
An optional category that references the governance artifact.
sensitive data
Data that contains information that should not be visible to all users. For example, personally identifiable information or other information that is restricted by privacy regulations.
shape
To customize data by filtering, sorting, removing columns; joining tables; performing operations that include calculations, data groupings, hierarchies and more.
SPSS Modeler
A tool for creating flows that build and train predictive models.
SQL pushback
In SPSS Modeler, the process of performing many data preparation and mining operations directly in the database through SQL code.
streams flow
A continuous, unidirectional flow of data that real-time analytics can be applied to.
streams flow asset
An asset that is based on a set of ordered steps to analyze data in real time.
substitute
To replace data in a column with values that don't match the original format but retain referential integrity.
supernode
An SPSS Modeler node that shrinks a data stream by encapsulating several nodes into one.

T

text classification
A type of model that automatically identifies and classifies text into specified categories.
trained model
A model that is ready to be deployed.
training
The initial stage of model building, involving a subset of the source data. The model can then be tested against a further, different subset for which the outcome is already known.

U

unbounded problem
A Decision Optimization problem where an infinite number of solutions exists and the objective can take values up to infinity. This often indicates that a constraint is missing from the model formulation.
unstructured data
Any data that is stored in an unstructured format rather than in fixed fields. Data in a word processing document is an example of unstructured data.
unsupervised learning
A model for deep learning that allows raw, unlabeled data to be used to train a system with little to no human effort.

V

visualization
A graph, chart, plot, table, map, or any other visual representation of data.
Visual Recognition model asset
An asset that is based on a set of custom image classifiers that the user defines.

W

Watson Machine Learning model asset
An asset that is based on machine learning algorithms that are optimized for a training data set.