0 / 0
Data formats for tuning foundation models
Last updated: Oct 08, 2024
Data formats for tuning foundation models

Prepare a set of prompt examples to use to tune the model. The examples must contain the type of input that the model will need to process at run time and the appropriate output for the model to generate in response.

You can add one file as training data.

Training data requirements

Follow these guidelines when you create your training data:

  • Add 100 to 1,000 labeled examples.

    Between 50 to 10,000 examples are allowed.

  • The language of the training data must be English.

  • Keep your input and output examples within the maximum token limits that are used by the experiment. Otherwise, your example text will be truncated.

    For more information, see Controlling the number of tokens used.

    How tokens are counted differs by model, which makes the number of tokens difficult to estimate. For language-based foundation models, you can think of 256 tokens as about 130—170 words and 128 tokens as about 65—85 words. For more information, see Tokens and tokenization.

If you plan to use the tuned foundation model to classify data, follow these extra guidelines:

  • Try to limit the number of class labels to 10 or fewer.
  • Include an equal number of examples of each class type.

You can use the Prompt Lab to craft examples for the training data. For more information, see Prompt Lab.

After you collect a representative set of examples, group the examples into a set to use for training and a separate, smaller set to use for testing purposes.

File format requirements

The training data file must meet these requirements:

  • Use one of the following formats:
    • JavaScript Object Notation (JSON)
    • JSON Lines (JSONL) format
  • The maximum file size that is allowed is 200 MB.
  • Each example must include one input and output pair.
  • If the input or output text includes quotation marks, escape each quotation mark with a backslash(\). For example, He said, \"Yes.\".
  • To represent a carriage return or line break, you can use \n escape sequence to represent the new line. For example, ...end of paragraph.\nStart of new paragraph.

JSON example

The following example shows an excerpt from a training data file with labeled prompts for a classification task in JSON format.

[
  {
    "input":"Message: When I try to log in, I get an error.",
    "output":"Class name: Problem"
  },
  {
  "input":"Message: Where can I find the plan prices?",
  "output":"Class name: Question"
  },
  {
    "input":"Message: What is the difference between trial and paygo?",
    "output":"Class name: Question"
  },
  {
    "input":"Message: The registration page crashed, and now I can't create a new account.",
    "output":"Class name: Problem"
  },
  {
    "input":"Message: What regions are supported?",
    "output":"Class name: Question"
  },
  {
    "input":"Message: I can't remember my password.",
    "output":"Class name: Problem"
  },
  {
    "input":"Message: I'm having trouble registering for a new account.",
    "output":"Classname: Problem"
  },
  {
    "input":"Message: A teammate shared a service instance with me, but I can't access it. What's wrong?",
    "output":"Class name: Problem"
  },
  {
    "input":"Message: What extra privileges does an administrator have?",
    "output":"Class name: Question"
  },
  {
    "input":"Message: Can I create a service instance for data in a language other than English?",
    "output":"Class name: Question"
  }
]

JSONL example

The following example shows an excerpt from a training data file with labeled prompts for a classification task in JSONL format.

{"input":"Message: When I try to log in, I get an error.","output":"Class name: Problem"}
{"input":"Message: Where can I find the plan prices?","output":"Class name: Question"}
{"input":"Message: What is the difference between trial and paygo?","output":"Class name: Question"}
{"input":"Message: The registration page crashed, and now I can't create a new account.","output":"Class name: Problem"}
{"input":"Message: What regions are supported?","output":"Class name: Question"}
{"input":"Message: I can't remember my password.","output":"Class name: Problem"}
{"input":"Message: I'm having trouble registering for a new account.","output":"Classname: Problem"}
{"input":"Message: A teammate shared a service instance with me, but I can't access it. What's wrong?","output":"Class name: Problem"}
{"input":"Message: What extra privileges does an administrator have?","output":"Class name: Question"}
{"input":"Message: Can I create a service instance for data in a language other than English?","output":"Class name: Question"}

Parent topic: Tuning a model

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more