# Time-series analysis

## Introduction

A time-series is commonly described as a data set that captures observations over time.

## Concepts

### Peaks and troughs

Let’s start by creating a random walk.

```
import numpy as np
import pandas as pd
N = 10000
step_set = [-1, 0, 1]
origin = np.zeros((1, 1))
step_shape = (N, 1)
steps = np.random.choice(a=step_set, size=step_shape)
path = np.concatenate([origin, steps]).cumsum(0)
df = pd.DataFrame(path,
columns =['y'])
```

```
from scipy.signal import find_peaks
subset = df.head(100)
peaks = find_peaks(subset["y"])
troughs = find_peaks(-subset["y"])
peaks
```

```
(array([ 9, 20, 30, 37, 48, 52, 64, 77, 79, 84, 92]), {})
```

### Autocorrelation

Pandas provides an autocorrelation^{1} plot function.

```
pd.plotting.autocorrelation_plot(df["y"])
plt.show()
```

### Differencing

Calculating the difference between $x_t$ and $x_{t-1}$.

```
stationary = df['y'].diff()
```

## Tools

In here we’ll look at some tools (mostly for Python) which allow for time-series analysis.

### Data

```
import pandas as pd
df = pd.read_csv("../../data/streamad/uniDS.csv")
```

### Tsfresh

`Tsfresh`

^{2} (Time Series Feature Extraction Based on Scalable Hypothesis Tests) is a Python package that automatically calculates and extracts several time series features for classification and regression. Typically used for feature engineering.

```
from tsfresh import extract_features, extract_relevant_features, select_features
from tsfresh.utilities.dataframe_functions import impute, make_forecasting_frame
from tsfresh.feature_extraction import ComprehensiveFCParameters, settings
```

```
data = df['timestamp','value']()
df_pass, y_air = make_forecasting_frame(data.value,
kind="value",
max_timeshift=100,
rolling_direction=1)
```

```
Rolling: 100%|██████████| 30/30 [00:05<00:00, 5.17it/s]
```