The time series library provides various functions on univariate, multivariate, multi-key time series as well as numeric and categorical types.
The functionality provided by the library can be broadly categorized into:
Time series I/O, for creating and saving time series data
Time series functions, transforms, windowing or segmentation, and reducers
Time series SQL and SQL extensions to Spark to enable executing scalable time series functions
Some of the key functionality is shown in the following sections using examples.
Time series I/O
Copy link to section
The primary input and output (I/O) functionality for a time series is through a pandas DataFrame or a Python list. The following code sample shows constructing a time series from a DataFrame:
>>> import numpy as np
>>> import pandas as pd
>>> data = np.array([['', 'key', 'timestamp', "value"],['', "a", 1, 27], ['', "b", 3, 4], ['', "a", 5, 17], ['', "a", 3, 7], ['', "b", 2, 45]])
>>> df = pd.DataFrame(data=data[1:, 1:], index=data[1:, 0], columns=data[0, 1:]).astype(dtype={'key': 'object', 'timestamp': 'int64', 'value': 'float64'})
>>> df
key timestamp value
a 127.0
b 34.0
a 517.0
a 37.0
b 245.0#Create a timeseries from a dataframe, providing a timestamp and a value column>>> ts = tspy.time_series(df, ts_column="timestamp", value_column="value")
>>> ts
TimeStamp: 1 Value: 27.0
TimeStamp: 2 Value: 45.0
TimeStamp: 3 Value: 4.0
TimeStamp: 3 Value: 7.0
TimeStamp: 5 Value: 17.0
Copy to clipboardCopied to clipboardShow more
To revert from a time series back to a pandas DataFrame, use the to_df function:
Time series data does not have any standards for the model and data types, unlike some data types such as spatial, which are governed by a standard such as Open Geospatial Consortium (OGC). The challenge with time series data is the wide variety
of functions that need to be supported, similar to that of Spark Resilient Distributed Datasets (RDD).
The data model allows for a wide variety of operations ranging across different forms of segmentation or windowing of time series, transformations or conversions of one time series to another, reducers that compute a static value from a time
series, joins that join multiple time series, and collectors of time series from different time zones. The time series library enables the plug-and-play of new functions while keeping the core data structure unchangeable. The library also
support numeric and categorical typed timeseries.
With time zones and various human readable time formats, a key aspect of the data model is support for Time Reference System (TRS). Every time series is associated with a TRS (system default), which can be remapped to any specific choice of
the user at any time, enabling easy transformation of a specific time series or a segment of a time series. See Using time reference system.
Further, with the need for handling large scale time series, the library offers a lazy evaluation construct by providing a mechanism for identifying the maximal narrow temporal dependency. This construct is very similar to that of a Spark computation
graph, which also loads data into memory on as needed basis and realizes the computations only when needed.
Time series data types
Copy link to section
You can use multiple data types as an element of a time series, spanning numeric, categorical, array, and dictionary data structures.
The following data types are supported in a time series:
Data type
Description
numeric
Time series with univariate observations of numeric type including double and integer. For example:[(1, 7.2), (3, 4.5), (5, 4.5), (5, 4.6), (5, 7.1), (7, 3.9), (9, 1.1)]
numeric array
Time series with multivariate observations of numeric type, including double array and integer array. For example: [(1, [7.2, 8.74]), (3, [4.5, 9.44]), (5, [4.5, 10.12]), (5, [4.6, 12.91]), (5, [7.1, 9.90]), (7, [3.9, 3.76])]
string
Time series with univariate observations of type string, for example: [(1, "a"), (3, "b"), (5, "c"), (5, "d"), (5, "e"), (7, "f"), (9, "g")]
string array
Time series with multivariate observations of type string array, for example: [(1, ["a", "xq"]), (3, ["b", "zr"]), (5, ["c", "ms"]), (5, ["d", "rt"]), (5, ["e", "wu"]), (7, ["f", "vv"]), (9, ["g", "zw"])]
segment
Time series of segments. The output of the segmentBy function, can be any type, including numeric, string, numeric array, and string array. For example: [(1,[(1, 7.2), (3, 4.5)]), (5,[(5, 4.5), (5, 4.6), (5, 7.1)]), (7,[(7, 3.9), (9, 1.1)])]
dictionary
Time series of dictionaries. A dictionary can have arbitrary types inside it
Time series functions
Copy link to section
You can use different functions in the provided time series packages to analyze time series data to extract meaningful information with which to create models that can be used to predict new values based on previously observed values. See
Time series functions.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.