About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Last updated: Nov 27, 2024
Each party in a Federated Learning experiment must get a data handler to process their data. You or a data scientist must create the data handler. A data handler is a Python class that loads and transforms data so that all data for the experiment
is in a consistent format.
About the data handler class
The data handler performs the following functions:
- Accesses the data that is required to train the model. For example, reads data from a CSV file into a Pandas data frame.
- Pre-processes the data so data is in a consistent format across all parties. Some example cases are as follows:
- The Date column might be stored as a time epoch or timestamp.
- The Country column might be encoded or abbreviated.
- The data handler ensures that the data formatting is in agreement.
- Optional: feature engineer as needed.
The following illustration shows how a data handler is used to process data and make it consumable by the experiment:
One party might have multiple tables in a relational database while another party uses a CSV file. After the data is processed with the data handler, they will have a unified format. For example, all data are put into a single table with previous data in separate tables joined together.
Data handler template
A general data handler template is as follows:
# your import statements
from ibmfl.data.data_handler import DataHandler
class MyDataHandler(DataHandler):
"""
Data handler for your dataset.
"""
def __init__(self, data_config=None):
super().__init__()
self.file_name = None
if data_config is not None:
# This can be any string field.
# For example, if your data set is in `csv` format,
# <your_data_file_type> can be "CSV", ".csv", "csv", "csv_file" and more.
if '<your_data_file_type>' in data_config:
self.file_name = data_config['<your_data_file_type>']
# extract other additional parameters from `info` if any.
# load and preprocess the training and testing data
self.load_and_preprocess_data()
"""
# Example:
# (self.x_train, self.y_train), (self.x_test, self.y_test) = self.load_dataset()
"""
def load_and_preprocess_data(self):
"""
Loads and pre-processeses local datasets,
and updates self.x_train, self.y_train, self.x_test, self.y_test.
# Example:
# return (self.x_train, self.y_train), (self.x_test, self.y_test)
"""
pass
def get_data(self):
"""
Gets the prepared training and testing data.
:return: ((x_train, y_train), (x_test, y_test)) # most build-in training modules expect data is returned in this format
:rtype: `tuple`
This function should be as brief as possible. Any pre-processing operations should be performed in a separate function and not inside get_data(), especially computationally expensive ones.
# Example:
# X, y = load_somedata()
# x_train, x_test, y_train, y_test = \
# train_test_split(X, y, test_size=TEST_SIZE, random_state=RANDOM_STATE)
# return (x_train, y_train), (x_test, y_test)
"""
pass
def preprocess(self, X, y):
pass
Parameters
: This can be any string field. For example, if your data set is inyour_data_file_type
format,csv
can be "CSV", ".csv", "csv", "csv_file" and more.your_data_file_type
Return a data generator defined by Keras or Tensorflow
The following is a code example that needs to be included as part of the
function to return a data generator defined by Keras or Tensorflow:get_data
train_gen = ImageDataGenerator(rotation_range=8,
width_sht_range=0.08,
shear_range=0.3,
height_shift_range=0.08,
zoom_range=0.08)
train_datagenerator = train_gen.flow(
x_train, y_train, batch_size=64)
return train_datagenerator
Data handler examples
Parent topic: Creating a Federated Learning experiment
Was the topic helpful?
0/1000