The Granite 8 Billion Japanese (granite-8b-japanese) model is an instruct variant initialized from the pre-trained Granite Base 8 Billion Japanese model. Pre-training went through 1.0T tokens of English, 0.5T tokens of Japanese, and
0.1T tokens of code. This model is designed to work with Japanese text. IBM Generative AI Large Language Foundation Models are Enterprise-level Multilingual models trained with large volumes of data that has been subjected to intensive pre-processing
and careful analysis.
Person or organization developing the model:
granite-8b-japanese was developed by IBM Research.
Model release date and version:
granite-8b-japanese version 1.0 was released on 2/29/2024.
Model type:
granite-8b-japanese is a decoder-only transformer model.
The following features were used in the design of the model:
Information about training algorithms, parameters, fairness constraints or other applied approaches, and features:
Model was trained using 4x Tensor Parallel + 4x Pipeline Parallel + Megatron distributed optimizer Megatron-LM.
GPUs: 448x A100 80GB
Interconnect: 1600 gigabit Infiniband
License:
Available only through IBM products and offerings. Contact IBM for licensing terms.
Intended Use
Copy link to section
Primary intended uses:
granite-8b-japanese is used for text generation, summarization, question and answer, classification, and extraction in Japanese.
Primary intended users:
The primary users are IBM Enterprise clients looking to bolster their portfolios with Enterprise-level generative AI models.
Out-of-scope use cases:
granite-8b-japanese is not designed, tested, or supported, for code use cases of any kind.
Factors
Copy link to section
Relevant factors: granite-8b-japanese works with Japanese text. All datasets have been cleansed of any type of tagging (e.g., HTML), and all media has been removed as well.
JCommonsenseQA is a Japanese version of CommonsenseQA (Talmor+, 2019), which is a multiple-choice question answering dataset that requires commonsense reasoning
ability. It is built using crowdsourcing with seeds extracted from the knowledge base ConceptNet.
JNLI is a Japanese version of the NLI (Natural Language Inference) dataset. NLI is a task to recognize the inference relation that a premise sentence has to a hypothesis sentence. The inference relations are 含意, 矛盾,
and 中立.
JSQuAD is a Japanese version of SQuAD (Rajpurkar+, 2016), one of the datasets of reading comprehension. Each instance in the dataset consists of a
question regarding a given context (Wikipedia article) and its answer. JSQuAD is based on SQuAD 1.1 (there are no unanswerable questions). We used the Japanese Wikipedia dump as of 20211101.
Japanese Questions on Knowledge of Entity (JAQKET) is a Japanese open-domain question answering dataset where the answers are Wikipedia article titles.
XLSum-ja This is a filtered Japanese subset of XLSum based on ROUGE-2, which PaLM 2 uses. It is composed of filtered data based on 15-gram overlap as PaLM 2 did.
XWinograd - XWinograd is a set of Winograd Schema sentence pairs. For example:
ボブはトムに尋ねた。トムはお金をいくらか貸してくれるかと。
ボブはトムに尋ねた。ボブはお金をいくらか貸してくれるかと。
In this case the first sentence is correct, because it doesn't make sense for Bob to ask Tom how much money Bob himself will loan. The task is for the model to assign the higher log likelihood to the reasonable sentence. Because of the way
the task is defined, it's always zero-shot with no prompt. While XWinograd is a multilingual task, this only uses the Japanese subset, which has 959 pairs.
Multilingual Grade School Math is a set of 250 math word problems in Japanese, and the task is to get the right integer solution to the problem.
Zero-shot results
Copy link to section
Task
Version
Metric
Performance
jcommonsenseqa-1.1-0.3
1.1
acc
0.7078
jnli-1.3-0.3
1.3
balanced_acc
0.5032
marc_ja-1.1-0.3
1.1
balanced_acc
0.6442
jsquad-1.1-0.3
1.1
f1
59.3862
jaqket_v2-0.2-0.3
0.2
f1
60.3066
xlsum_ja-1.0-0.3
1
rouge2
7.2561
xwinograd_ja
1
acc
0.683
mgsm-1.0-0.3
1
acc
0.028
N-shot results
Copy link to section
Task
Version
Metric
Performance
jcommonsenseqa-1.1-0.3
1.1
acc
0.807
jnli-1.3-0.3
1.3
balanced_acc
0.5935
marc_ja-1.1-0.3
1.1
balanced_acc
0.9461
jsquad-1.1-0.3
1.1
f1
80.9671
jaqket_v2-0.2-0.3
0.2
f1
74.9605
xlsum_ja-1.0-0.3
1
rouge2
9.4874
xwinograd_ja
1
acc
0.683
mgsm-1.0-0.3
1
acc
0.116
Data, Limitations, and Recommendations
Copy link to section
Data selection for training:
The granite-8b-japanese underwent pre-training using 1.0T tokens of English, 0.5T tokens of Japanese, and 0.1T tokens of code.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.