Time-series analysis
Introduction
A time-series is commonly described as a data set that captures observations over time.
Concepts
Peaks and troughs
Let’s start by creating a random walk.
import numpy as np
import pandas as pd
= 10000
N = [-1, 0, 1]
step_set = np.zeros((1, 1))
origin = (N, 1)
step_shape = np.random.choice(a=step_set, size=step_shape)
steps = np.concatenate([origin, steps]).cumsum(0)
path = pd.DataFrame(path,
df =['y']) columns
from scipy.signal import find_peaks
= df.head(100)
subset
= find_peaks(subset["y"])
peaks = find_peaks(-subset["y"])
troughs peaks
(array([ 9, 20, 30, 37, 48, 52, 64, 77, 79, 84, 92]), {})
Autocorrelation
Pandas provides an autocorrelation1 plot function.
"y"])
pd.plotting.autocorrelation_plot(df[ plt.show()
Differencing
Calculating the difference between \(x_t\) and \(x_{t-1}\).
= df['y'].diff() stationary
Tools
In here we’ll look at some tools (mostly for Python) which allow for time-series analysis.
Data
import pandas as pd
= pd.read_csv("../../data/streamad/uniDS.csv") df
Tsfresh
Tsfresh
2 (Time Series Feature Extraction Based on Scalable Hypothesis Tests) is a Python package that automatically calculates and extracts several time series features for classification and regression. Typically used for feature engineering.
from tsfresh import extract_features, extract_relevant_features, select_features
from tsfresh.utilities.dataframe_functions import impute, make_forecasting_frame
from tsfresh.feature_extraction import ComprehensiveFCParameters, settings
= df['timestamp','value']('timestamp','value')
data = make_forecasting_frame(data.value,
df_pass, y_air ="value",
kind=100,
max_timeshift=1) rolling_direction
Rolling: 100%|██████████| 30/30 [00:05<00:00, 5.17it/s]