Extracting sentiment with a custom transformer model

Last updated: Nov 07, 2024

You can train your own models for sentiment extraction based on the Slate IBM Foundation model. This pretrained model can be find-tuned for your use case by training it on your specific input data.

For a list of available Slate models, see this table:

List of available Slate models and their descriptions
Model	Description
`pretrained-model_slate.153m.distilled_many_transformer_multilingual_uncased`	Generic, multi-purpose model
`pretrained-model_slate.125m.finance_many_transformer_en_cased`	Model pretrained on finance content
`pretrained-model_slate.110m.cybersecurity_many_transformer_en_uncased`	Model pretrained on cybersecurity content
`pretrained-model_slate.125m.biomedical_many_transformer_en_cased`	Model pretrained on biomedical content

Note: Training transformer models is CPU and memory intensive. Depending on the size of your training data, the environment might not be large enough to complete the training. If you run into issues with the notebook kernel during training, create a custom notebook environment with a larger amount of CPU and memory, and use that to run your notebook. Use a GPU-based environment for training and also inference time, if it is available to you. See Creating your own environment template.

Input data format for training
Loading the pretrained model resources
Training the model
Applying the model on new data

Input data format for training

You need to provide a training and development data set to the training function. The development data is usually around 10% of the training data. Each training or development sample is represented as a JSON object. It must have a text and a labels field. The text represents the training example text, and the labels field is an array, which contains exactly one label of positive, neutral, or negative.

The following is an example of an array with sample training data:

  [
      {
      "text": "I am happy",
      "labels": ["positive"]
      },
      {
      "text": "I am sad",
      "labels": ["negative"]
      },
      {
      "text": "The sky is blue",
      "labels": ["neutral"]
      }
  ]

The training and development data sets are created as data streams from arrays of JSON objects. To create the data streams, you might use the utility method prepare_data_from_json:

import watson_nlp
from watson_nlp.toolkit.sentiment_analysis_utils.training import train_util as utils

training_data_file = "train_data.json"
dev_data_file = "dev_data.json"

train_stream = utils.prepare_data_from_json(training_data_file)
dev_stream = utils.prepare_data_from_json(dev_data_file)

Loading the pretrained model resources

The pretrained Slate IBM Foundation model needs to be loaded before it passes to the training algorithm. In addition, you need to load the syntax analysis models for the languages that are used in your input texts.

To load the model:

# Load the pretrained Slate IBM Foundation model
pretrained_model_resource = watson_nlp.load('<pretrained Slate model>')

# Download relevant syntax analysis models
syntax_model_en = watson_nlp.load('syntax_izumo_en_stock')
syntax_model_de = watson_nlp.load('syntax_izumo_de_stock')

# Create a list of all syntax analysis models
syntax_models = [syntax_model_en, syntax_model_de]

Training the model

For all options that are available for configuring sentiment transformer training, enter:

help(watson_nlp.workflows.sentiment.AggregatedSentiment.train_transformer)

The train_transformer method creates a workflow model, which automatically runs syntax analysis and the trained sentiment classification. In a subsequent step, enable language detection so that the workflow model can run on input text without any prerequisite information.

The following is a sample call using the input data and pretrained model from the previous section (Training the model):

from watson_nlp.workflows.sentiment import AggregatedSentiment

sentiment_model = AggregatedSentiment.train_transformer(
          train_data_stream = train_stream,
          dev_data_stream = dev_stream,
          syntax_model=syntax_models,
         	pretrained_model_resource=pretrained_model_resource,
          label_list=['negative', 'neutral', 'positive'],
          learning_rate=2e-5,
          num_train_epochs=10,
          combine_approach="NON_NEUTRAL_MEAN",
          keep_model_artifacts=True
        )
lang_detect_model = watson_nlp.load('lang-detect_izumo_multi_stock')

sentiment_model.enable_lang_detect(lang_detect_model)

Applying the model on new data

After you train the model on a data set, apply the model on new data by using the run() method, as you would use on any of the existing pre-trained blocks.

Sample code:

input_text = 'new input text'
sentiment_predictions = sentiment_model.run(input_text)

Parent topic: Creating your own models