# Streaming anomaly detection

Useful algorithms:

## Experimental data

Let’s assume the following sequence of observations that we will use with a variety of algorithms. The data is labelled with label with 0 for normal observations and 1 for anomalies.

import pandas as pd ## The Welford algorithm

The Welford’s method is an online algorithm (idescribe single-pass) to calculate running variance and standard deviation. It is formulated from the difference between the squared difference sums of $N$ and $N-1$ observations.

A basic implementation of the Welford’s method can be:

import math
class Welford(object):
def __init__(self):
self.k = 0
self.M = 0
self.S = 0

def update(self,x):
if x is None:
return
self.k += 1
newM = self.M + (x - self.M)*1./self.k
newS = self.S + (x - self.M)*(x - newM)
self.M, self.S = newM, newS

@property
def mean(self):
return self.M
@property
def meanfull(self):
return self.mean, self.std/math.sqrt(self.k)
@property
def std(self):
if self.k==1:
return 0
return math.sqrt(self.S/(self.k-1))
def __repr__(self):
return "<Welford: {} +- {}>".format(self.mean, self.std)


Applying Welford’s algorothm to our data we have:

welford = Welford()

means = []
for value in df.value.to_list():
welford.update(value)
means.append(welford.mean) ## Anomaly detection in streams with extreme value theory

A more in-detail page is available at Streaming anomaly detection with Extreme Value Theory.

### SPOT

An example with streamad’s SPOT4 detector. This is available in the streamad.model.SpotDetector package.

from streamad.util import StreamGenerator, UnivariateDS, plot

stream = StreamGenerator(ds.data)
model = SpotDetector()

scores = []

for x in stream.iter_item():
score = model.fit_score(x)
if score:
scores.append(score)
else:
scores.append(0)

data, label, date, features = ds.data, ds.label, ds.date, ds.features ## Half-Space Trees

See Half-Space Trees.