Funkcje szeregów czasowych | IBM Cloud Pak for Data as a Service

Translation not up to date

The translation of this page does not represent the latest version. For the latest updates, see the English version of the documentation.

Go back to the English version of the documentation

Funkcje szeregów czasowych

Last updated: 28 kwi 2023

Funkcje szeregów czasowych

Funkcje szeregów czasowych to funkcje agregujalne, które działają na sekwencje wartości danych mierzone w punktach w czasie.

W poniższych sekcjach opisano niektóre funkcje szeregów czasowych dostępne w różnych pakietach serii czasowych.

Transformacje

Transformacje są funkcjami, które są stosowane w szeregu czasowym, co skutkuje innym szeregiem czasowym. Biblioteka szeregów czasowych obsługuje różne typy transformacji, w tym transformacje udostępnione (przy użyciu produktu from tspy.functions import transformers), jak również transformacje zdefiniowane przez użytkownika.

W poniższym przykładzie przedstawiono niektóre udostępnione transformacje:

#Interpolation
>>> ts = tspy.time_series([1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
>>> periodicity = 2
>>> interp = interpolators.nearest(0.0)
>>> interp_ts = ts.resample(periodicity, interp)
>>> interp_ts.print()
TimeStamp: 0     Value: 1.0
TimeStamp: 2     Value: 3.0
TimeStamp: 4     Value: 5.0

#Fillna
>>> shift_ts = ts.shift(2)
    print("shifted ts to add nulls")
    print(shift_ts)
    print("\nfilled ts to make nulls 0s")
    null_filled_ts = shift_ts.fillna(interpolators.fill(0.0))
    print(null_filled_ts)

shifted ts to add nulls
TimeStamp: 0     Value: null
TimeStamp: 1     Value: null
TimeStamp: 2     Value: 1.0
TimeStamp: 3     Value: 2.0
TimeStamp: 4     Value: 3.0
TimeStamp: 5     Value: 4.0

filled ts to make nulls 0s
TimeStamp: 0     Value: 0.0
TimeStamp: 1     Value: 0.0
TimeStamp: 2     Value: 1.0
TimeStamp: 3     Value: 2.0
TimeStamp: 4     Value: 3.0
TimeStamp: 5     Value: 4.0

# Additive White Gaussian Noise (AWGN)
>>> noise_ts = ts.transform(transformers.awgn(mean=0.0,sd=.03))
>>> print(noise_ts)
TimeStamp: 0     Value: 0.9962378841388397
TimeStamp: 1     Value: 1.9681980879378596
TimeStamp: 2     Value: 3.0289374962174405
TimeStamp: 3     Value: 3.990728648807705
TimeStamp: 4     Value: 4.935338359740761

TimeStamp: 5     Value: 6.03395072999318

Segmentacja

Segmentacja lub okienko to proces podziału szeregów czasowych na wiele segmentów. Biblioteka szeregów czasowych obsługuje różne formy segmentacji i umożliwia tworzenie segmentów zdefiniowanych przez użytkownika.

Segmentacja oparta na oknie

Ten typ segmentacji szeregów czasowych jest oparty na wielkościach segmentu określonych przez użytkownika. Segmenty mogą być oparte na rekordach lub na podstawie czasu. Dostępne są opcje umożliwiające tworzenie elementów, a także przesuwane segmenty oparte na oknach.

>>> import tspy
>>> ts_orig = tspy.builder()
  .add(tspy.observation(1,1.0))
  .add(tspy.observation(2,2.0))
  .add(tspy.observation(6,6.0))
  .result().to_time_series()
>>> ts_orig
timestamp: 1     Value: 1.0
timestamp: 2     Value: 2.0
timestamp: 6     Value: 6.0

>>> ts = ts_orig.segment_by_time(3,1)
>>> ts
timestamp: 1     Value: original bounds: (1,3) actual bounds: (1,2) observations: [(1,1.0),(2,2.0)]
timestamp: 2     Value: original bounds: (2,4) actual bounds: (2,2) observations: [(2,2.0)]
timestamp: 3     Value: this segment is empty
timestamp: 4     Value: original bounds: (4,6) actual bounds: (6,6) observations: [(6,6.0)]

Segmentacja oparta na kotwicy

Segmentacja oparta na kotwie jest bardzo ważnym typem segmentacji, która tworzy segment poprzez zakotwiczenie na konkretnej lambdzie, która może być wartością prostą. Przykład: obserwuje się zdarzenia, które poprzedzały 500 błędów lub badają wartości po obserwacji anomalii. Warianty segmentacji opartej na kotwicy obejmują zapewnienie zakresu z wieloma markerami.

>>> import tspy
>>> ts_orig = tspy.time_series([1.0, 2.0, 3.0, 4.0, 5.0])
>>> ts_orig
timestamp: 0     Value: 1.0
timestamp: 1     Value: 2.0
timestamp: 2     Value: 3.0
timestamp: 3     Value: 4.0
timestamp: 4     Value: 5.0

>>> ts = ts_orig.segment_by_anchor(lambda x: x % 2 == 0, 1, 2)
>>> ts
timestamp: 1     Value: original bounds: (0,3) actual bounds: (0,3) observations: [(0,1.0),(1,2.0),(2,3.0),(3,4.0)]
timestamp: 3     Value: original bounds: (2,5) actual bounds: (2,4) observations: [(2,3.0),(3,4.0),(4,5.0)]

Segmentery

Istnieje kilka wyspecjalizowanych segmentów, które zostały udostępnione w tym polu przez zaimportowanie pakietu segmenters (za pomocą programu from tspy.functions import segmenters). Przykładową segmentacją jest jedna, która wykorzystuje regresję do segmentu szeregów czasowych:

>>> ts = tspy.time_series([1.0,2.0,3.0,4.0,5.0,2.0,1.0,-1.0,50.0,53.0,56.0])
>>> max_error = .5
>>> skip = 1
>>> reg_sts = ts.to_segments(segmenters.regression(max_error,skip,use_relative=True))
>>> reg_sts

timestamp: 0     Value:   range: (0, 4)   outliers: {}
timestamp: 5     Value:   range: (5, 7)   outliers: {}
timestamp: 8     Value:   range: (8, 10)   outliers: {}

Reduktory

Reduktor to funkcja, która jest stosowana do wartości w zbiorze szeregów czasowych w celu uzyskania pojedynczej wartości. Funkcje szeregów czasowych reducer są podobne do koncepcji reduktora używanej przez Hadoop/Spark. Ta pojedyncza wartość może być kolekcją, ale bardziej ogólnie jest pojedynczym obiektem. Przykładem funkcji reduktora jest uśrednianie wartości w szeregu czasowym.

Obsługiwanych jest kilka funkcji reducer , w tym:

Reduktory dystansowe

Reduktory dystansowe to klasa reduktorów, która oblicza odległość między dwoma szeregami czasowych. Biblioteka obsługuje numeryczne, a także jakościowe funkcje odległości w sekwencjach. Obejmują one pomiary odległości w czasie, takich jak Itakura Parallelogram, Sakoe-Chiba Band, DTW non-constrained i DTW non-time warped contraints. Dostępne są również odległości rozkładów, takie jak odległość węgierska i odległość Ziemi-Movers.

W przypadku jakościowych pomiarów odległości szeregów czasowych można użyć miary odległości Damerau Levenshtein i Jaro-Winkler.
```
>>> from tspy.functions import *
>>> ts = tspy.time_series([1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
>>> ts2 = ts.transform(transformers.awgn(sd=.3))
>>> dtw_distance = ts.reduce(ts2,reducers.dtw(lambda obs1, obs2: abs(obs1.value - obs2.value)))
>>> print(dtw_distance)
1.8557981638880405
```

Reduktory matematyczne

Dostępnych jest kilka wygodnych reduktorów matematycznych dla szeregów czasowych. Należą do nich takie podstawowe, jak średnia, suma, odchylenie standardowe i momenty. Entropia, kurtoza, FFT i warianty tego, różne korelacje, i histogram są również zawarte. Wygodnym, podstawowym reduktorem podsumowania jest funkcja describe , która udostępnia podstawowe informacje na temat szeregów czasowych.

>>> from tspy.functions import *
>>> ts = tspy.time_series([1.0, 2.0, 3.0, 4.0, 5.0, 6.0])
>>> ts2 = ts.transform(transformers.awgn(sd=.3))
>>> corr = ts.reduce(ts2, reducers.correlation())
>>> print(corr)
0.9938941942380525

>>> adf = ts.reduce(reducers.adf())
>>> print(adf)
pValue: -3.45
satisfies test: false

>>> ts2 = ts.transform(transformers.awgn(sd=.3))
>>> granger = ts.reduce(ts2, reducers.granger(1))
>>> print(granger) #f_stat, p_value, R2
-1.7123613937876463,-3.874412217575385,1.0

Innym podstawowym reduktorem, który jest bardzo przydatny dla uzyskania pierwszego zrozumienia szeregów czasowych jest opisowy reduktor. Poniższy przykład ilustruje ten reduktor:

>>> desc = ts.describe()
>>> print(desc)
min inter-arrival-time: 1
max inter-arrival-time: 1
mean inter-arrival-time: 1.0
top: null
unique: 6
frequency: 1
first: TimeStamp: 0     Value: 1.0
last: TimeStamp: 5     Value: 6.0
count: 6
mean:3.5
std:1.707825127659933
min:1.0
max:6.0
25%:1.75
50%:3.5
75%:5.25

Łączenia czasowe

Biblioteka zawiera funkcje do łączenia czasowego lub łączenia szeregów czasowych w oparciu o datowniki. Funkcje łączenia są podobne do funkcji w bazie danych, w tym lewe, prawe, zewnętrzne, wewnętrzne, lewe zewnętrzne, prawe zewnętrzne, itp. Następujące przykładowe kody przedstawiają niektóre z tych funkcji łączenia:

# Create a collection of observations (materialized TimeSeries)
observations_left = tspy.observations(tspy.observation(1, 0.0), tspy.observation(3, 1.0), tspy.observation(8, 3.0), tspy.observation(9, 2.5))
observations_right = tspy.observations(tspy.observation(2, 2.0), tspy.observation(3, 1.5), tspy.observation(7, 4.0), tspy.observation(9, 5.5), tspy.observation(10, 4.5))

# Build TimeSeries from Observations
ts_left = observations_left.to_time_series()
ts_right = observations_right.to_time_series()

# Perform full join
ts_full = ts_left.full_join(ts_right)
print(ts_full)

TimeStamp: 1     Value: [0.0, null]
TimeStamp: 2     Value: [null, 2.0]
TimeStamp: 3     Value: [1.0, 1.5]
TimeStamp: 7     Value: [null, 4.0]
TimeStamp: 8     Value: [3.0, null]
TimeStamp: 9     Value: [2.5, 5.5]
TimeStamp: 10     Value: [null, 4.5]

# Perform left align with interpolation
ts_left_aligned, ts_right_aligned = ts_left.left_align(ts_right, interpolators.nearest(0.0))

print("left ts result")
print(ts_left_aligned)
print("right ts result")
print(ts_right_aligned)

left ts result
TimeStamp: 1     Value: 0.0
TimeStamp: 3     Value: 1.0
TimeStamp: 8     Value: 3.0
TimeStamp: 9     Value: 2.5
right ts result
TimeStamp: 1     Value: 0.0
TimeStamp: 3     Value: 1.5
TimeStamp: 8     Value: 4.0
TimeStamp: 9     Value: 5.5

Prognozowanie

Kluczowym funkcjonalnością udostępnianej przez bibliotekę szeregów czasowych jest prognozowanie. Biblioteka zawiera funkcje do prostych, jak i złożonych modeli prognozowania, w tym ARIMA, Exponential, Holt-Winters oraz BATS. W poniższym przykładzie przedstawiono funkcję tworzenia Holt-Winters:

import random

model = tspy.forecasters.hws(samples_per_season=samples_per_season, initial_training_seasons=initial_training_seasons)

for i in range(100):
    timestamp = i
    value = random.randint(1,10) * 1.0
    model.update_model(timestamp, value)

print(model)

Forecasting Model
  Algorithm: HWSAdditive=5 (aLevel=0.001, bSlope=0.001, gSeas=0.001) level=6.087789839896166, slope=0.018901997884893912, seasonal(amp,per,avg)=(1.411203455586738,5, 0,-0.0037471500727535465)

#Is model init-ed
if model.is_initialized():
    print(model.forecast_at(120))

6.334135728495107

ts = tspy.time_series([float(i) for i in range(10)])

print(ts)

TimeStamp: 0     Value: 0.0
TimeStamp: 1     Value: 1.0
TimeStamp: 2     Value: 2.0
TimeStamp: 3     Value: 3.0
TimeStamp: 4     Value: 4.0
TimeStamp: 5     Value: 5.0
TimeStamp: 6     Value: 6.0
TimeStamp: 7     Value: 7.0
TimeStamp: 8     Value: 8.0
TimeStamp: 9     Value: 9.0

num_predictions = 5
model = tspy.forecasters.auto(8)
confidence = .99

predictions = ts.forecast(num_predictions, model, confidence=confidence)

print(predictions.to_time_series())

TimeStamp: 10     Value: {value=10.0, lower_bound=10.0, upper_bound=10.0, error=0.0}
TimeStamp: 11     Value: {value=10.997862810553725, lower_bound=9.934621260488143, upper_bound=12.061104360619307, error=0.41277640121597475}
TimeStamp: 12     Value: {value=11.996821082897318, lower_bound=10.704895525154571, upper_bound=13.288746640640065, error=0.5015571318964149}
TimeStamp: 13     Value: {value=12.995779355240911, lower_bound=11.50957896664928, upper_bound=14.481979743832543, error=0.5769793776877866}
TimeStamp: 14     Value: {value=13.994737627584504, lower_bound=12.33653268707341, upper_bound=15.652942568095598, error=0.6437557559526337}

print(predictions.to_time_series().to_df())

timestamp      value  lower_bound  upper_bound     error
0         10  10.000000    10.000000    10.000000  0.000000
1         11  10.997863     9.934621    12.061104  0.412776
2         12  11.996821    10.704896    13.288747  0.501557
3         13  12.995779    11.509579    14.481980  0.576979
4         14  13.994738    12.336533    15.652943  0.643756

Seria czasu SQL

Biblioteka szeregów czasowych jest ściśle zintegrowana z serwerem Apache Spark. Korzystając z nowych typów danych w programie Spark Catalyst, można wykonywać operacje SQL szeregów czasowych, które skalują się w poziomie przy użyciu Apache Spark. Umożliwia to łatwe korzystanie z rozszerzeń szeregów czasowych w produkcie IBM Analytics Engine lub w rozwiązaniach, które zawierają funkcje IBM Analytics Engine , takie jak środowiska Spark Watson Studio .

Rozszerzenia SQL obejmują większość aspektów funkcji szeregów czasowych, w tym segmentację, transformacje, reduktory, prognozowanie i operacje we/wy. Patrz Analizowanie danych szeregu czasowego.

Dowiedz się więcej

Aby użyć pakietu tspy Python SDK, należy zapoznać się z dokumentacją pakietu tspy Python SDK.