Each party in a Federated Learning experiment must get a data handler to process their data. You or a data scientist must create the data handler. A data handler is a Python class that loads and transforms data so that all data for the experiment
is in a consistent format.
About the data handler class
Copy link to section
The data handler performs the following functions:
Accesses the data that is required to train the model. For example, reads data from a CSV file into a Pandas data frame.
Pre-processes the data so data is in a consistent format across all parties. Some example cases are as follows:
The Date column might be stored as a time epoch or timestamp.
The Country column might be encoded or abbreviated.
The data handler ensures that the data formatting is in agreement.
Optional: feature engineer as needed.
The following illustration shows how a data handler is used to process data and make it consumable by the experiment:
One party might have multiple tables in a relational database while another party uses a CSV file. After the data is processed with the data handler, they will have a unified format. For example, all data are put into a single table with previous
data in separate tables joined together.
Data handler template
Copy link to section
A general data handler template is as follows:
# your import statementsfrom ibmfl.data.data_handler import DataHandler
classMyDataHandler(DataHandler):
"""
Data handler for your dataset.
"""def__init__(self, data_config=None):
super().__init__()
self.file_name = Noneif data_config isnotNone:
# This can be any string field.# For example, if your data set is in `csv` format,# <your_data_file_type> can be "CSV", ".csv", "csv", "csv_file" and more.if'<your_data_file_type>'in data_config:
self.file_name = data_config['<your_data_file_type>']
# extract other additional parameters from `info` if any.# load and preprocess the training and testing data
self.load_and_preprocess_data()
"""
# Example:
# (self.x_train, self.y_train), (self.x_test, self.y_test) = self.load_dataset()
"""defload_and_preprocess_data(self):
"""
Loads and pre-processeses local datasets,
and updates self.x_train, self.y_train, self.x_test, self.y_test.
# Example:
# return (self.x_train, self.y_train), (self.x_test, self.y_test)
"""passdefget_data(self):
"""
Gets the prepared training and testing data.
:return: ((x_train, y_train), (x_test, y_test)) # most build-in training modules expect data is returned in this format
:rtype: `tuple`
This function should be as brief as possible. Any pre-processing operations should be performed in a separate function and not inside get_data(), especially computationally expensive ones.
# Example:
# X, y = load_somedata()
# x_train, x_test, y_train, y_test = \
# train_test_split(X, y, test_size=TEST_SIZE, random_state=RANDOM_STATE)
# return (x_train, y_train), (x_test, y_test)
"""passdefpreprocess(self, X, y):
pass
Copy to clipboardCopied to clipboardShow more
Parameters
your_data_file_type: This can be any string field. For example, if your data set is in csv format, your_data_file_type can be "CSV", ".csv", "csv", "csv_file" and
more.
Return a data generator defined by Keras or Tensorflow
Copy link to section
The following is a code example that needs to be included as part of the get_data function to return a data generator defined by Keras or Tensorflow:
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.