In high-performance computing, a specialized circuit that is used to take some of the computational load from the CPU, increasing the efficiency of the system. For example, in deep learning, GPU-accelerated computing is often employed to offload
part of the compute workload to a GPU while the main application runs off the CPU. See also graphics processing unit.
accountability
Copy link to section
The expectation that organizations or individuals will ensure the proper functioning, throughout their lifecycle, of the AI systems that they design, develop, operate or deploy, in accordance with their roles and applicable regulatory frameworks.
This includes determining who is responsible for an AI mistake which may require legal experts to determine liability on a case-by-case basis.
activation function
Copy link to section
A function defining a neural unit's output given a set of incoming activations from other neurons
active learning
Copy link to section
A model for machine learning in which the system requests more labeled data only when it needs it.
active metadata
Copy link to section
Metadata that is automatically updated based on analysis by machine learning processes. For example, profiling and data quality analysis automatically update metadata for data assets.
active runtime
Copy link to section
An instance of an environment that is running to provide compute resources to assets that run code.
agent
Copy link to section
An algorithm or a program that interacts with an environment to learn optimal actions or decisions, typically using reinforcement learning, to achieve a specific goal.
agentic AI
Copy link to section
A generative AI flow that can decompose a prompt into multiple tasks, assign tasks to appropriate gen AI agents, and synthesize an answer without human intervention.
Specialized silicon hardware designed to efficiently execute AI-related tasks like deep learning, machine learning, and neural networks for faster, energy-efficient computing. It can be a dedicated unit in a core, a separate chiplet on a multi-module
chip or a separate card.
AI ethics
Copy link to section
A multidisciplinary field that studies how to optimize AI's beneficial impact while reducing risks and adverse outcomes. Examples of AI ethics issues are data responsibility and privacy, fairness, explainability, robustness, transparency,
environmental sustainability, inclusion, moral agency, value alignment, accountability, trust, and technology misuse.
AI governance
Copy link to section
An organization's act of governing, through its corporate instructions, staff, processes and systems to direct, evaluate, monitor, and take corrective action throughout the AI lifecycle, to provide assurance that the AI system is operating
as the organization intends, as its stakeholders expect, and as required by relevant regulation.
AI safety
Copy link to section
The field of research aiming to ensure artificial intelligence systems operate in a manner that is beneficial to humanity and don't inadvertently cause harm, addressing issues like reliability, fairness, transparency, and alignment of AI systems
with human values.
AI service
Copy link to section
A deployable unit of code that contains the logic of a generative AI use case and provides an endpoint for inferencing from an application.
A formula applied to data to determine optimal ways to solve analytical problems.
analytics
Copy link to section
The science of studying data in order to find meaningful patterns in the data and draw conclusions based on those patterns.
appropriate trust
Copy link to section
In an AI system, an amount of trust that is calibrated to its accuracy, reliability, and credibility.
artificial intelligence (AI)
Copy link to section
The capability to acquire, process, create and apply knowledge in the form of a model to make predictions, recommendations or decisions.
artificial intelligence system (AI system)
Copy link to section
A system that can make predictions, recommendations or decisions that influence physical or virtual environments, and whose outputs or behaviors are not necessarily pre-determined by its developer or user. AI systems are typically trained
with large quantities of structured or unstructured data, and might be designed to operate with varying levels of autonomy or none, to achieve human-defined objectives.
asset
Copy link to section
An item that contains information about data, other valuable information, or code that works with data. See also data asset.
attention mechanism
Copy link to section
A mechanism in deep learning models that determines which parts of the input a model focuses on when producing output.
AutoAI experiment
Copy link to section
An automated training process that considers a series of training definitions and parameters to create a set of ranked pipelines as model candidates.
B
Copy link to section
batch deployment
Copy link to section
A method to deploy models that processes input data from a file, data connection, or connected data in a storage bucket, then writes the output to a selected destination.
bias
Copy link to section
Systematic error in an AI system that has been designed, intentionally or not, in a way that may generate unfair decisions. Bias can be present both in the AI system and in the data used to train and test it. AI bias can emerge in an AI system
as a result of cultural expectations; technical limitations; or unanticipated deployment contexts. See also fairness.
bias detection
Copy link to section
The process of calculating fairness to metrics to detect when AI models are delivering unfair outcomes based on certain attributes.
bias mitigation
Copy link to section
Reducing biases in AI models by curating training data and applying fairness techniques.
binary classification
Copy link to section
A classification model with two classes. Predictions are a binary choice of one of the two classes.
C
Copy link to section
classification model
Copy link to section
A predictive model that predicts data in distinct categories. Classifications can be binary, with two classes of data, or multi-class when there are more than 2 categories.
cleanse
Copy link to section
To ensure that all values in a data set are consistent and correctly recorded.
An intervention that is applied at a decision-making moment to disrupt heuristic reasoning and cause a person to engage in analytical thinking; examples include a checklist, a diagnostic time-out, or asking a person to rule out an alternative.
computational linguistics
Copy link to section
Interdisciplinary field that explores approaches for computationally modeling natural languages.
compute resource
Copy link to section
The hardware and software resources that are defined by an environment template to run assets in tools.
confusion matrix
Copy link to section
A performance measurement that determines the accuracy between a model's positive and negative predicted outcomes compared to positive and negative actual outcomes.
connected data asset
Copy link to section
A pointer to data that is accessed through a connection to an external data source.
connected folder asset
Copy link to section
A pointer to a folder in IBM Cloud Object Storage.
connection
Copy link to section
The information required to connect to a database. The actual information that is required varies according to the DBMS and connection method.
connection asset
Copy link to section
An asset that contains information that enables connecting to a data source.
constraint
Copy link to section
In databases, a relationship between tables.
In Decision Optimization, a condition that must be satisfied by the solution of a problem.
continuous learning
Copy link to section
Automating the tasks of monitoring model performance, retraining with new data, and redeploying to ensure prediction quality.
convolutional neural network (CNN)
Copy link to section
A class of neural network commonly used in computer vision tasks that uses convolutional layers to process image data.
Core ML deployment
Copy link to section
The process of downloading a deployment in Core ML format for use in iOS apps.
corpus
Copy link to section
A collection of source documents that are used to train a machine learning model.
CPLEX model
Copy link to section
A Decision Optimization model that is formulated to be solved by the CPLEX engine.
CPO model
Copy link to section
A constraint programming model that is formulated to be solved by the Decision Optimization CP Optimizer (CPO) engine.
cross-validation
Copy link to section
A technique for testing how well a model generalizes in the absence of a hold-out test sample. Cross-validation divides the training data into a number of subsets, and then builds the same number of models, with each subset held out in turn.
Each of those models is tested on the holdout sample, and the average accuracy of the models on those holdout samples is used to estimate the accuracy of the model when applied to new data.
curate
Copy link to section
To select, collect, preserve, and maintain content relevant to a specific topic. Curation establishes, maintains, and adds value to data; it transforms data into trusted information and knowledge.
D
Copy link to section
data asset
Copy link to section
An asset that points to data, for example, to an uploaded file. Connections and connected data assets are also considered data assets. See also asset.
data imputation
Copy link to section
The substitution of missing values in a data set with estimated or explicit values.
data lake
Copy link to section
A large-scale data storage repository that stores raw data in any format in a flat architecture. Data lakes hold structured and unstructured data as well as binary data for the purpose of processing and analysis.
data lakehouse
Copy link to section
A unified data storage and processing architecture that combines the flexibility of a data lake with the structured querying and performance optimizations of a data warehouse, enabling scalable and efficient data analysis for AI and analytics
applications.
data mining
Copy link to section
The process of collecting critical business information from a data source, correlating the information, and uncovering associations, patterns, and trends. See also predictive analytics.
Data Refinery flow
Copy link to section
A set of steps that cleanse and shape data to produce a new data asset.
data science
Copy link to section
The analysis and visualization of structured and unstructured data to discover insights and knowledge.
data set
Copy link to section
A collection of data, usually in the form of rows (records) and columns (fields) and contained in a file or database table.
data source
Copy link to section
A repository, queue, or feed for reading data, such as a database.
data table
Copy link to section
A collection of data, usually in the form of rows (records) and columns (fields) and contained in a table.
data warehouse
Copy link to section
A large, centralized repository of data collected from various sources that is used for reporting and data analysis. It primarily stores structured and semi-structured data, enabling businesses to make informed decisions.
A division of data points in a space into distinct groups or classifications.
decoder-only model
Copy link to section
A model that generates output text word by word by inference from the input sequence. Decoder-only models are used for tasks such as generating text and answering questions.
deep learning
Copy link to section
A computational model that uses multiple layers of interconnected nodes, which are organized into hierarchical layers, to transform input data (first layer) through a series of computations to produce an output (final layer). Deep learning
is inspired by the structure and function of the human brain. See also distributed deep learning.
deep neural network
Copy link to section
A neural network with multiple hidden layers, allowing for more complex representations of the data.
deep reasoning
Copy link to section
A class of machine learning in which systems generate insights from data to support cognitive tasks beyond perception and classification, such as common sense, changing situations, planning, and decision making.
deployment
Copy link to section
A model or application package that is available for use.
deployment space
Copy link to section
A workspace where models are deployed and deployments are managed.
deterministic
Copy link to section
Describes a characteristic of computing systems when their outputs are completely determined by their inputs.
discriminative AI
Copy link to section
A class of algorithm that focuses on finding a boundary that separates different classes in the data.
distributed deep learning (DDL)
Copy link to section
An approach to deep learning training that leverages the methods of distributed computing. In a DDL environment, compute workload is distributed between the central processing unit and graphics processing unit. See also deep learning.
DOcplex
Copy link to section
A Python API for modeling and solving Decision Optimization problems.
E
Copy link to section
embedding
Copy link to section
A numerical representation of a unit of information, such as a word or a sentence, as a vector of real-valued numbers. Embeddings are learned, low-dimensional representations of higher-dimensional data. See also encoding,
representation.
emergence
Copy link to section
A property of foundation models in which the model exhibits behaviors that were not explicitly trained.
emergent behavior
Copy link to section
A behavior exhibited by a foundation model that was not explicitly constructed.
encoder-decoder model
Copy link to section
A model for both understanding input text and for generating output text based on the input text. Encoder-decoder models are used for tasks such as summarization or translation.
encoder-only model
Copy link to section
A model that understands input text at the sentence level by transforming input sequences into representational vectors called embeddings. Encoder-only models are used for tasks such as classifying customer feedback and extracting information
from large documents.
encoding
Copy link to section
The representation of a unit of information, such as a character or a word, as a set of numbers. See also embedding, positional encoding.
endpoint URL
Copy link to section
A network destination address that identifies resources, such as services and objects. For example, an endpoint URL is used to identify the location of a model or function deployment when a user sends payload data to the deployment.
environment
Copy link to section
The compute resources for running jobs.
environment runtime
Copy link to section
An instantiation of the environment template to run assets.
environment template
Copy link to section
A definition that specifies hardware and software resources to instantiate environment runtimes.
exogenous feature
Copy link to section
A feature that can influence the predictive model but cannot be influenced in return. For example, temperatures can affect predicted ice cream sales, but ice cream sales cannot influence temperatures.
experiment
Copy link to section
A model training process that considers a series of training definitions and parameters to determine the most accurate model configuration.
explainability
Copy link to section
The ability of human users to trace, audit, and understand predictions that are made in applications that use AI systems.
The ability of an AI system to provide insights that humans can use to understand the causes of the system's predictions.
F
Copy link to section
fairness
Copy link to section
In an AI system, the equitable treatment of individuals or groups of individuals. The choice of a specific notion of equity for an AI system depends on the context in which it is used. See also bias.
feature
Copy link to section
A property or characteristic of an item within a data set, for example, a column in a spreadsheet. In some cases, features are engineered as combinations of other features in the data set.
feature engineering
Copy link to section
The process of selecting, transforming, and creating new features from raw data to improve the performance and predictive power of machine learning models.
feature group
Copy link to section
A set of columns of a particular data asset along with the metadata that is used for machine learning.
feature selection
Copy link to section
Identifying the columns of data that best support an accurate prediction or score in a machine learning model.
feature store
Copy link to section
A centralized repository or system that manages and organizes features, providing a scalable and efficient way to store, retrieve, and share feature data across machine learning pipelines and applications.
feature transformation
Copy link to section
In AutoAI, a phase of pipeline creation that applies algorithms to transform and optimize the training data to achieve the best outcome for the model type.
federated learning
Copy link to section
The training of a common machine learning model that uses multiple data sources that are not moved, joined, or shared. The result is a better-trained model without compromising data security.
few-shot prompting
Copy link to section
A prompting technique in which a small number of examples are provided to the model to demonstrate how to complete the task.
fine tuning
Copy link to section
The process of adapting a pre-trained model to perform a specific task by conducting additional training. Fine tuning may involve (1) updating the model’s existing parameters, known as full fine tuning, or (2) updating a subset of the model’s
existing parameters or adding new parameters to the model and training them while freezing the model’s existing parameters, known as parameter-efficient fine tuning.
flow
Copy link to section
A collection of nodes that define a set of steps for processing data or training a model.
foundation model
Copy link to section
An AI model that can be adapted to a wide range of downstream tasks. Foundation models are typically large-scale generative models that are trained on unlabeled data using self-supervision. As large scale models, foundation models can include
billions of parameters.
G
Copy link to section
Gantt chart
Copy link to section
A graphical representation of a project timeline and duration in which schedule data is displayed as horizontal bars along a time scale.
A tool for creating flow assets by visually coding. A canvas is an area on which to place objects or nodes that can be connected to create a flow.
graphics processing unit (GPU)
Copy link to section
A specialized processor designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display. GPUs are heavily utilized in machine learning due to their parallel processing
capabilities. See also accelerator.
grounding
Copy link to section
Providing a large language model with information to improve the accuracy of results.
H
Copy link to section
hallucination
Copy link to section
A response from a foundation model that includes off-topic, repetitive, incorrect, or fabricated content. Hallucinations involving fabricating details can happen when a model is prompted to generate text, but the model doesn't have enough
related text to draw upon to generate a result that contains the correct details.
HAP detection (HAP detection)
Copy link to section
The ability to detect and filter hate, abuse, and profanity in both prompts submitted by users and in responses generated by an AI model.
HAP detector (HAP detector)
Copy link to section
A sentence classifier that removes potentially harmful content, such as hate speech, abuse, and profanity, from foundation model output and input.
hold-out set
Copy link to section
A set of labeled data that is intentionally withheld from both the training and validation sets, serving as an unbiased assessment of the final model's performance on unseen data.
homogenization
Copy link to section
The trend in machine learning research in which a small number of deep neural net architectures, such as the transformer, are achieving state-of-the-art results across a wide variety of tasks.
Human involvement in reviewing decisions rendered by an AI system, enabling human autonomy and accountability of decision.
hyperparameter
Copy link to section
In machine learning, a parameter whose value is set before training as a way to increase model accuracy.
hyperparameter optimization (HPO)
Copy link to section
The process for setting hyperparameter values to the settings that provide the most accurate model.
I
Copy link to section
image
Copy link to section
A software package that contains a set of libraries.
incremental learning
Copy link to section
The process of training a model using data that is continually updated without forgetting data obtained from the preceding tasks. This technique is used to train a model with batches of data from a large training data source.
inferencing
Copy link to section
The process of running live data through a trained AI model to make a prediction or solve a task.
ingest
Copy link to section
To continuously add a high-volume of real-time data to a database.
To feed data into a system for the purpose of creating a base of knowledge.
insight
Copy link to section
An accurate or deep understanding of something. Insights are derived using cognitive analytics to provide current snapshots and predictions of customer behaviors and attitudes.
intelligent AI
Copy link to section
Artificial intelligence systems that can understand, learn, adapt, and implement knowledge, demonstrating abilities like decision-making, problem-solving, and understanding complex concepts, much like human intelligence.
intent
Copy link to section
A purpose or goal expressed by customer input to a chatbot, such as answering a question or processing a bill payment.
A class or category assigned to a data point in supervised learning.Labels can be derived from data but are often applied by human labelers or annotators.
labeled data
Copy link to section
Raw data that is assigned labels to add context or meaning so that it can be used to train machine learning models. For example, numeric values might be labeled as zip codes or ages to provide context for model inputs and outputs.
large language model (LLM)
Copy link to section
A language model with a large number of parameters, trained on a large quantity of text.
latent space
Copy link to section
An n-dimensional mathematical space in which data instances are embedded. A two- dimensional latent space embeds data as points within in a 2D plane (see also: representational space).. See also representational space.
A branch of artificial intelligence (AI) and computer science that focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving the accuracy of AI models.
machine learning framework
Copy link to section
The libraries and runtime for training and deploying a model.
machine learning model
Copy link to section
An AI model that is trained on a a set of data to develop algorithms that it can use to analyze and learn from new data.
mental model
Copy link to section
An individual’s understanding of how a system works and how their actions affect system outcomes. When these expectations do not match the actual capabilities of a system, it can lead to frustration, abandonment, or misuse.
misalignment
Copy link to section
A discrepancy between the goals or behaviors that an AI system is optimized to achieve and the true, often complex, objectives of its human users or designers
A methodology that takes a machine learning model from development to production.
The practice for collaboration between data scientists and operations professionals to help manage production machine learning (or deep learning) lifecycle. MLOps looks to increase automation and improve the quality of production ML while
also focusing on business and regulatory requirements. It involves model development, training, validation, deployment, monitoring, and management and uses methods like CI/CD.
model
Copy link to section
In a machine learning context, a set of functions and algorithms that have been trained and tested on a data set to provide predictions or decisions.
In Decision Optimization, a mathematical formulation of a problem that can be solved with CPLEX optimization engines using different data sets.
ModelOps
Copy link to section
A methodology for managing the full lifecycle of an AI model, including training, deployment, scoring, evaluation, retraining, and updating.
monitored group
Copy link to section
A class of data that is monitored to determine if the results from a predictive model differ significantly from the results of the reference group. Groups are commonly monitored based on characteristics that include race, gender, or age.
multiclass classification model
Copy link to section
A classification task with more than two classes. For example, where a binary classification model predicts yes or no values, a multi-class model predicts yes, no, maybe, or not applicable.
multimodal model
Copy link to section
A generative AI model that can process multiple types of data, such as, text, images, and audio, and convert between them. For example, a multimodal model can take text input and generate image output.
multivariate time series
Copy link to section
Time series experiment that contains two or more changing variables. For example, a time series model forecasting the electricity usage of three clients.
N
Copy link to section
natural language processing (NLP)
Copy link to section
A field of artificial intelligence and linguistics that studies the problems inherent in the processing and manipulation of natural language, with an aim to increase the ability of computers to understand human languages.
natural language processing library
Copy link to section
A library that provides basic natural language processing functions for syntax analysis and out-of-the-box pre-trained models for a wide variety of text processing tasks.
neural network
Copy link to section
A mathematical model for predicting or classifying cases by using a complex mathematical scheme that simulates an abstract version of brain cells. A neural network is trained by presenting it with a large number of observed cases, one at a
time, and allowing it to update itself repeatedly until it learns the task.
In an SPSS Modeler flow, the graphical representation of a data operation.
notebook
Copy link to section
An interactive document that contains executable code, descriptive text for that code, and the results of any code that is run.
notebook kernel
Copy link to section
The part of the notebook editor that executes code and returns the computational results.
O
Copy link to section
object storage
Copy link to section
A method of storing data, typically used in the cloud, in which data is stored as discrete units, or objects, in a storage pool or repository that does not use a file hierarchy but that stores all objects at the same level.
one-shot learning
Copy link to section
A model for deep learning that is based on the premise that most human learning takes place upon receiving just one or two examples. This model is similar to unsupervised learning.
one-shot prompting
Copy link to section
A prompting technique in which a single example is provided to the model to demonstrate how to complete the task.
online deployment
Copy link to section
Method of accessing a model or Python code deployment through an API endpoint as a web service to generate predictions online, in real time.
ontology
Copy link to section
An explicit formal specification of the representation of the objects, concepts, and other entities that can exist in some area of interest and the relationships among them.
operational asset
Copy link to section
An asset that runs code in a tool or a job.
optimization
Copy link to section
The process of finding the most appropriate solution to a precisely defined problem while respecting the imposed constraints and limitations. For example, determining how to allocate resources or how to find the best elements or combinations
from a large set of alternatives.
Optimization Programming Language
Copy link to section
A modeling language for expressing model formulations of optimization problems in a format that can be solved by CPLEX optimization engines such as IBM CPLEX.
optimized metric
Copy link to section
A metric used to measure the performance of the model. For example, accuracy is the typical metric used to measure the performance of a binary classification model.
orchestration
Copy link to section
The process of creating an end-to-end flow that can train, run, deploy, test, and evaluate a machine learning model, and uses automation to coordinate the system, often using microservices.
overreliance
Copy link to section
A user's acceptance of an incorrect recommendation made by an AI model. See also reliance, underreliance.
P
Copy link to section
parameter
Copy link to section
A configurable part of the model that is internal to a model and whose values are estimated or learned from data. Parameters are aspects of the model that are adjusted during the training process to help the model accurately predict the
output. The model's performance and predictive power largely depend on the values of these parameters.
A real-valued weight between 0.0 and 1.0 indicating the strength of connection between two neurons in a neural network.
party
Copy link to section
In Federated Learning, an entity that contributes data for training a common model. The data is not moved or combined but each party gets the benefit of the federated training.
payload
Copy link to section
The data that is passed to a deployment to get back a score, prediction, or solution.
payload logging
Copy link to section
The capture of payload data and deployment output to monitor ongoing health of AI in business applications.
pipeline
Copy link to section
In Watson Pipelines, an end-to-end flow of assets from creation through deployment.
In AutoAI, a candidate model.
pipeline leaderboard
Copy link to section
In AutoAI, a table that shows the list of automatically generated candidate models, as pipelines, ranked according to the specified criteria.
policy
Copy link to section
A strategy or rule that an agent follows to determine the next action based on the current state.
positional encoding
Copy link to section
An encoding of an ordered sequence of data that includes positional information, such as encoding of words in a sentence that includes each word's position within the sentence. See also encoding.
predictive analytics
Copy link to section
A business process and a set of related technologies that are concerned with the prediction of future possibilities and trends. Predictive analytics applies such diverse disciplines as probability, statistics, machine learning, and artificial
intelligence to business problems to find the best action for a specific situation. See also data mining.
pretrained model
Copy link to section
An AI model that was previously trained on a large data set to accomplish a specific task. Pretrained models are used instead of building a model from scratch.
pretraining
Copy link to section
The process of training a machine learning model on a large dataset before fine-tuning it for a specific task.
privacy
Copy link to section
Assurance that information about an individual is protected from unauthorized access and inappropriate use.
probabilistic
Copy link to section
The characteristic of being subject to randomness; non-deterministic. Probabilistic models do not produce the same outputs given the same inputs. See also generative variability.
project
Copy link to section
A collaborative workspace for working with data and other assets.
prompt
Copy link to section
Data, such as text or an image, that prepares, instructs, or conditions a foundation model's output.
A component of an action that indicates that user input is required for a field before making a transition to an output screen.
prompt engineering
Copy link to section
The process of designing natural language prompts for a language model to perform a specific task.
prompting
Copy link to section
The process of providing input to a foundation model to induce it to produce output.
prompt tuning
Copy link to section
An efficient, low-cost way of adapting a pre-trained model to new tasks without retraining the model or updating its weights. Prompt tuning involves learning a small number of new parameters that are appended to a model’s prompt, while freezing
the model’s existing parameters.
pruning
Copy link to section
The process of simplifying, shrinking, or trimming a decision tree or neural network. This is done by removing less important nodes or layers, reducing complexity to prevent overfitting and improve model generalization while maintaining its
predictive power.
Python
Copy link to section
A programming language that is used in data science and AI.
Python function
Copy link to section
A function that contains Python code to support a model in production.
Q
Copy link to section
quantization
Copy link to section
A method of compressing foundation model weights to speed up inferencing and reduce GPU memory needs.
R
Copy link to section
R
Copy link to section
An extensible scripting language that is used in data science and AI that offers a wide variety of analytic, statistical, and graphical functions and techniques.
A number used to initialize a pseudorandom number generator. Random seeds enable reproducibility for processes that rely on random number generation.
reference group
Copy link to section
A group that is identified as most likely to receive a positive result in a predictive model. The results can be compared to a monitored group to look for potential bias in outcomes.
refine
Copy link to section
To cleanse and shape data.
regression model
Copy link to section
A model that relates a dependent variable to one or more independent variables.
reinforcement learning
Copy link to section
A machine learning technique in which an agent learns to make sequential decisions in an environment to maximize a reward signal. Inspired by trial and error learning, agents interact with the environment, receive feedback, and adjust their
actions to achieve optimal policies.
reinforcement learning on human feedback (RLHF)
Copy link to section
A method of aligning a language learning model's responses to the instructions given in a prompt. RLHF requires human annotators rank multiple outputs from the model. These rankings are then used to train a reward model using reinforcement
learning. The reward model is then used to fine-tune the large language model's output.
reliance
Copy link to section
In AI systems, a user’s acceptance of a recommendation made by, or the output generated by, an AI model. See also overreliance, underreliance.
representation
Copy link to section
An encoding of a unit of information, often as a vector of real-valued numbers. See also embedding.
representational space
Copy link to section
An n-dimensional mathematical space in which data instances are embedded. A two-dimensional latent space embeds data as points within in a 2D plane (see also: latent space). See also latent space.
reranking
Copy link to section
A generative AI process for ranking a set of document passages from most-to-least likely to answer a specified query.
retrieval augmented generation (RAG)
Copy link to section
A technique in which a large language model is augmented with knowledge from external sources to generate text. In the retrieval step, relevant documents from an external source are identified from the user’s query. In the generation step,
portions of those documents are included in the LLM prompt to generate a response grounded in the retrieved documents.
reward
Copy link to section
A signal used to guide an agent, typically a reinforcement learning agent, that provides feedback on the goodness of a decision
The predefined or custom hardware and software configuration that is used to run tools or jobs, such as notebooks.
S
Copy link to section
scoring
Copy link to section
In machine learning, the process of measuring the confidence of a predicted outcome.
The process of computing how closely the attributes for an incoming identity match the attributes of an existing entity.
script
Copy link to section
A file that contains Python or R scripts to support a model in production.
self-attention
Copy link to section
An attention mechanism that uses information from the input data itself to determine what parts of the input to focus on when generating output.
self-supervised learning
Copy link to section
A machine learning training method in which a model learns from unlabeled data by masking tokens in an input sequence and then trying to predict them. An example is "I like ________ sprouts".
sentiment analysis
Copy link to section
Examination of the sentiment or emotion expressed in text, such as determining if a movie review is positive or negative.
shape
Copy link to section
To customize data by filtering, sorting, removing columns; joining tables; performing operations that include calculations, data groupings, hierarchies and more.
small data
Copy link to section
Data that is accessible and comprehensible by humans. See also structured data.
SQL pushback
Copy link to section
In SPSS Modeler, the process of performing many data preparation and mining operations directly in the database through SQL code.
structured data
Copy link to section
Data that resides in fixed fields within a record or file. Relational databases and spreadsheets are examples of structured data. See also unstructured data, small data.
structured information
Copy link to section
Items stored in structured resources, such as search engine indices, databases, or knowledge bases.
supervised learning
Copy link to section
A machine learning training method in which a model is trained on a labeled dataset to make predictions on new data.
T
Copy link to section
temperature
Copy link to section
A parameter in a generative model that specifies the amount of variation in the generation process. Higher temperatures result in greater variability in the model's output.
text classification
Copy link to section
A model that automatically identifies and classifies text into specified categories.
text extraction
Copy link to section
A generative AI method of converting highly structured information into a simpler textual format for use as input to large language models.
time series
Copy link to section
A set of values of a variable at periodic points in time.
time series model
Copy link to section
A model that tracks and predicts data over time.
token
Copy link to section
A discrete unit of meaning or analysis in a text, such as a word or subword.
tokenization
Copy link to section
The process used in natural language processing to split a string of text into smaller units, such as words or subwords.
trained model
Copy link to section
A model that is trained with actual data and is ready to be deployed to predict outcomes when presented with new data.
training
Copy link to section
The initial stage of model building, involving a subset of the source data. The model learns by example from the known data. The model can then be tested against a further, different subset for which the outcome is already known.
training data
Copy link to section
A collection of data that is used to train machine learning models.
training set
Copy link to section
A set of labeled data that is used to train a machine learning model by exposing it to examples and their corresponding labels, enabling the model to learn patterns and make predictions.
transfer learning
Copy link to section
A machine learning strategy in which a trained model is applied to a completely new problem.
transformer
Copy link to section
A neural network architecture that uses positional encodings and the self-attention mechanism to predict the next token in a sequence of tokens.
transparency
Copy link to section
Sharing appropriate information with stakeholders on how an AI system has been designed and developed. Examples of this information are what data is collected, how it will be used and stored, and who has access to it; and test results for
accuracy, robustness and bias.
trust calibration
Copy link to section
The process of evaluating and adjusting one’s trust in an AI system based on factors such as its accuracy, reliability, and credibility.
Turing test
Copy link to section
Proposed by Alan Turing in 1950, a test of a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human.
U
Copy link to section
underreliance
Copy link to section
A user's rejection of a correct recommendations made by an AI model. See also overreliance, reliance.
univariate time series
Copy link to section
Time series experiment that contains only one changing variable. For example, a time series model forecasting the temperature has a single prediction column of the temperature.
unstructured data
Copy link to section
Any data that is stored in an unstructured format rather than in fixed fields. Data in a word processing document is an example of unstructured data. See also structured data.
unstructured information
Copy link to section
Data that is not contained in a fixed location, such as the natural language text document.
unsupervised learning
Copy link to section
A model for deep learning that allows raw, unlabeled data to be used to train a system with little to no human effort.
A machine learning training method in which a model is not provided with labeled data and must find patterns or structure in the data on its own.
V
Copy link to section
validation set
Copy link to section
A separate set of labeled data that is used to evaluate the performance and generalization ability of a machine learning model during the training process, assisting in hyperparameter tuning and model selection.
vector
Copy link to section
A one-dimensional, ordered list of numbers, such as [1, 2, 5] or [0.7, 0.2, -1.0].
An index that retrieves the vectorized embeddings of documents from a vector store.
vector store
Copy link to section
A repository that stores vectorized embeddings of documents.
verbalizer
Copy link to section
In generative AI, a template to format the data during tuning and inferencing.
virtual agent
Copy link to section
A pretrained chat bot that can process natural language to respond and complete simple business transactions, or route more complicated requests to a human with subject matter expertise.
visualization
Copy link to section
A graph, chart, plot, table, map, or any other visual representation of data.
W
Copy link to section
weight
Copy link to section
A coefficient for a node that transforms input data within the network's layer. Weight is a parameter that an AI model learns through training, adjusting its value to reduce errors in the model's predictions.
Z
Copy link to section
zero-shot prompt
Copy link to section
A prompting technique in which the model completes a task without being given a specific example of how.