基礎モデルのチューニングのためのデータフォーマット

最終更新: 2025年3月04日

モデルを調整するために使用するプロンプトの例を準備する。例題には、モデルが実行時に処理する必要のある入力の種類と、それに応答してモデルが生成する適切な出力が含まれていなければならない。

1つのファイルをトレーニングデータとして追加することができます。

トレーニング・データ要件

トレーニングデータを作成する際は、以下のガイドラインに従ってください：

100から1,000のラベル付き例を加える。

50から10,000の例が認められている。
トレーニングデータの言語は英語でなければならない。
入力と出力の例は、実験で使用される最大トークン制限内に収めてください。そうしないと、例のテキストは切り捨てられます。

詳細については、トークンの使用数を制御するを参照してください。

トークンの数え方はモデルによって異なるため、トークンの数を推定するのは難しい。言語ベースの基礎モデルの場合、256トークンを約130～170語、128トークンを約65～85語と考えることができる。詳しくは、トークンとトークン化を参照してください。

データを分類するために調整された基礎モデルを使用する予定の場合は、以下の追加ガイドラインに従ってください：

クラス・ラベルの数を10個以下にする。
各クラス・タイプの例を同数含める。

Prompt Lab を使用して、トレーニングデータのサンプルを作成することができます。詳細については、 Prompt Lab。

代表的なサンプルを集めたら、そのサンプルをトレーニング用のセットと、テスト用の小さなセットに分けます。

ファイル形式の要件

トレーニング・データ・ファイルはこれらの要件を満たしていなければならない：

次のいずれかの形式で指定します。
- JavaScript Object Notation (JSON)
- JSON行（JSONL）フォーマット
許可される最大ファイルサイズは200MBです。
各例は、inputとoutputのペアを1つずつ含まなければなりません。
入力または出力テキストに引用符が含まれている場合は、各引用符をバックスラッシュ(\)でエスケープします。例えば、He said, \"Yes.\"のように。
キャリッジリターンや改行を表すには、\nエスケープシーケンスを使って改行を表すことができます。例えば、...end of paragraph.\nStart of new paragraphです。

JSON の例

次の例は、分類タスクのラベル付きプロンプトを含むトレーニングデータファイルからの抜粋をJSON形式で示しています。

[
  {
    "input":"Message: When I try to log in, I get an error.",
    "output":"Class name: Problem"
  },
  {
  "input":"Message: Where can I find the plan prices?",
  "output":"Class name: Question"
  },
  {
    "input":"Message: What is the difference between trial and paygo?",
    "output":"Class name: Question"
  },
  {
    "input":"Message: The registration page crashed, and now I can't create a new account.",
    "output":"Class name: Problem"
  },
  {
    "input":"Message: What regions are supported?",
    "output":"Class name: Question"
  },
  {
    "input":"Message: I can't remember my password.",
    "output":"Class name: Problem"
  },
  {
    "input":"Message: I'm having trouble registering for a new account.",
    "output":"Classname: Problem"
  },
  {
    "input":"Message: A teammate shared a service instance with me, but I can't access it. What's wrong?",
    "output":"Class name: Problem"
  },
  {
    "input":"Message: What extra privileges does an administrator have?",
    "output":"Class name: Question"
  },
  {
    "input":"Message: Can I create a service instance for data in a language other than English?",
    "output":"Class name: Question"
  }
]

JSONLの例

次の例は、JSONL形式の分類タスクのラベル付きプロンプトを含むトレーニングデータファイルからの抜粋です。

{"input":"Message: When I try to log in, I get an error.","output":"Class name: Problem"}
{"input":"Message: Where can I find the plan prices?","output":"Class name: Question"}
{"input":"Message: What is the difference between trial and paygo?","output":"Class name: Question"}
{"input":"Message: The registration page crashed, and now I can't create a new account.","output":"Class name: Problem"}
{"input":"Message: What regions are supported?","output":"Class name: Question"}
{"input":"Message: I can't remember my password.","output":"Class name: Problem"}
{"input":"Message: I'm having trouble registering for a new account.","output":"Classname: Problem"}
{"input":"Message: A teammate shared a service instance with me, but I can't access it. What's wrong?","output":"Class name: Problem"}
{"input":"Message: What extra privileges does an administrator have?","output":"Class name: Question"}
{"input":"Message: Can I create a service instance for data in a language other than English?","output":"Class name: Question"}

親トピック： 基盤モデルチューニング

トピックは役に立ちましたか?

0/1000

トレーニング・データ要件Copy link to section

ファイル形式の要件Copy link to section

JSON の例Copy link to section

JSONLの例Copy link to section

トレーニング・データ要件

ファイル形式の要件

JSON の例

JSONLの例