Using the time series library

To get started working with the time series library, import the library to your Python notebook or application:

# Import the package
import tspy

Creating a time series

To create a time series and use the library functions, you must decide on the data source. Supported data sources include:

  • In-memory lists
  • pandas DataFrames
  • In-memory collections of observations (using the ObservationCollection construct)
  • User-defined readers (using the TimeSeriesReader construct)

The following example shows ingesting data from an in-memory list:

ts = tspy.time_series.list([5.0, 2.0, 4.0, 6.0, 6.0, 7.0])
ts

The output is as follows:

TimeStamp: 0     Value: 5.0
TimeStamp: 1     Value: 2.0
TimeStamp: 2     Value: 4.0
TimeStamp: 3     Value: 6.0
TimeStamp: 4     Value: 6.0
TimeStamp: 5     Value: 7.0

You can also operate on many time-series at the same time by using the MultiTimeSeries construct. A MultiTimeSeries is essentially a dictionary of time series, where each time series has its own unique key. The time series are not aligned in time.

The MultiTimeSeries construct provides similar methods for transforming and ingesting as the single time series construct:

mts = tspy.multi_time_series.dict({
	"ts1": tspy.time_series.list([1.0, 2.0, 3.0]),
	"ts2": tspy.time_series.list([5.0, 2.0, 4.0, 5.0])
})

The output is the following:

ts2 time series
------------------------------
TimeStamp: 0     Value: 5.0
TimeStamp: 1     Value: 2.0
TimeStamp: 2     Value: 4.0
TimeStamp: 3     Value: 5.0
ts1 time series
------------------------------
TimeStamp: 0     Value: 1.0
TimeStamp: 1     Value: 2.0
TimeStamp: 2     Value: 3.0

Interpreting time

By default, a time series uses a long data type to denote when a given observation was created, which is referred to as a time tick. A time reference system is used for time series with timestamps that are human interpretable. See Using time reference system.

The following example shows how to create a simple time series where each index denotes a day after the start time of 1990-01-01:

import datetime
granularity = datetime.timedelta(days=1)
start_time = datetime.datetime(1990, 1, 1, 0, 0, 0, 0, tzinfo=datetime.timezone.utc)

ts = tspy.time_series.list([5.0, 2.0, 4.0, 6.0, 6.0, 7.0], granularity=granularity, start_time=start_time)
ts

The output is as follows:

TimeStamp: 1990-01-01T00:00Z     Value: 5.0
TimeStamp: 1990-01-02T00:00Z     Value: 2.0
TimeStamp: 1990-01-03T00:00Z     Value: 4.0
TimeStamp: 1990-01-04T00:00Z     Value: 6.0
TimeStamp: 1990-01-05T00:00Z     Value: 6.0
TimeStamp: 1990-01-06T00:00Z     Value: 7.0

Performing simple transformations

Transformations are functions which, when given one or more time series, return a new time series.

For example, to segment a time series into windows where each window is of size=3, sliding by 2 records, you can use the following method:

window_ts = ts.segment(3, 2)
window_ts

The output is as follows:

TimeStamp: 0     Value: original bounds: (0,2) actual bounds: (0,2) observations: [(0,5.0),(1,2.0),(2,4.0)]
TimeStamp: 2     Value: original bounds: (2,4) actual bounds: (2,4) observations: [(2,4.0),(3,6.0),(4,6.0)]

This example shows adding 1 to each value in a time series:

add_one_ts = ts.map(lambda x: x + 1)
add_one_ts

The output is as follows:

TimeStamp: 0     Value: 6.0
TimeStamp: 1     Value: 3.0
TimeStamp: 2     Value: 5.0
TimeStamp: 3     Value: 7.0
TimeStamp: 4     Value: 7.0
TimeStamp: 5     Value: 8.0

Or you can temporally left join a time series, for example ts with another time series ts2:

ts2 = tspy.time_series.list([1.0, 2.0, 3.0])
joined_ts = ts.left_join(ts2)
joined_ts

The output is as follows:

TimeStamp: 0     Value: [5.0, 1.0]
TimeStamp: 1     Value: [2.0, 2.0]
TimeStamp: 2     Value: [4.0, 3.0]
TimeStamp: 3     Value: [6.0, null]
TimeStamp: 4     Value: [6.0, null]
TimeStamp: 5     Value: [7.0, null]

Using transformers

A rich suite of built-in transformers is provided in the transformers package. Import the package to use the provided transformer functions:

from tspy.builders.functions import transformers

After you have added the package, you can transform data in a time series be using the transform method.

For example, to perform a difference on a time-series:

ts_diff = ts.transform(transformers.difference())

Here the output is:

TimeStamp: 1     Value: -3.0
TimeStamp: 2     Value: 2.0
TimeStamp: 3     Value: 2.0
TimeStamp: 4     Value: 0.0
TimeStamp: 5     Value: 1.0

Using reducers

Similar to the transformers package, you can reduce a time series by using methods provided by the reducers package. You can import the reducers package as follows:

from tspy.builders.functions import reducers

After you have imported the package, use the reduce method to get the average over a time-series for example:

avg = ts.reduce(reducers.average())
avg

This outputs:

5.0

Reducers have a special property that enables them to be used alongside segmentation transformations (hourly sum, avg in the window prior to an error occurring, and others). Because the output of a segmentation + reducer is a time series, the transform method is used.

For example, to segment into windows of size 3 and get the average across each window, use:

avg_windows_ts = ts.segment(3).transform(reducers.average())

This results in:

imeStamp: 0     Value: 3.6666666666666665
TimeStamp: 1     Value: 4.0
TimeStamp: 2     Value: 5.333333333333333
TimeStamp: 3     Value: 6.333333333333333

Graphing time series

Lazy evaluation is used when graphing a time series. When you graph a time series, you can do one of the following:

  • Collect the observations of the time series, which returns an ObservationCollection
  • Reduce the time series to a value or collection of values
  • Perform save or print operations

For example, to collect and return all of the values of a timeseries:

observations = ts.collect()
observations

This results in:

[(0,5.0),(1,2.0),(2,4.0),(3,6.0),(4,6.0),(5,7.0)]

To collect a range from a time series, use:

observations = ts[1:3] # same as ts.get_values(1, 3)
observations

Here the output is:

[(1,2.0),(2,4.0),(3,6.0)]

Note that a time series is optimized for range queries if the time series is periodic in nature.

Using the describe on a current time series, also graphs the time series:

describe_obj = ts.describe()
describe_obj

The output is:

min inter-arrival-time: 1
max inter-arrival-time: 1
mean inter-arrival-time: 1.0
top: 6.0
unique: 5
frequency: 2
first: TimeStamp: 0     Value: 5.0
last: TimeStamp: 5     Value: 7.0
count: 6
mean:5.0
std:1.632993161855452
min:2.0
max:7.0
25%:3.5
50%:5.5
75%:6.25

Learn more