intrplate

B-splines

A B-spline is a configurable spline function typically used for curve fitting.

A spline of order \(n\) is a piecewise polynomial function of degree \(n-1\) in a variable \(x\) . The values of \(x\) where the pieces of polynomial meet are known as knots, denoted \(t_{0},t_{1},t_{2},\ldots ,t_{n}\) and sorted into non-decreasing order. When the knots are distinct, the first \(n-2\) derivatives of the polynomial pieces are continuous across each knot. When \(r\) r knots are coincident, then only the first \(n-r-1\) derivatives of the spline are continuous across that knot.

For a given sequence of knots, there is, up to a scaling factor, a unique spline \(B_{i,n}(x)\) satisfying:

\[\begin{split} B_{i,n}(x)=\left\{\begin{array}{ll}0&\mathrm {if} \quad x<t_{i}\quad \mathrm {or} \quad x\geq t_{i+n}\\\mathrm {nonzero} &\mathrm {otherwise} \end{array}\right. \end{split}\]

If we add the additional constrain that \(\sum _{i}B_{i,n}(x)=1\) for all \(x\) between the first and last knot, then the scaling factor of \(B_{i,n}(x)\) becomes fixed. The resulting \(B_{i,n}(x)\) spline functions are called B-splines.

Examples

Age and income

Let’s create a \(n=3\) B-spline for a dataset. Let’s assume we age a age and income dataset we want to simulate, and we want to provide the following points:

  • Age 18, income 50k

  • Age 31, income 100k

  • Age 47, income 200k

  • Age 60, income 300k

import numpy as np
import intrplate

data = np.array([[18, 50e3], [31, 100e3], [47, 200e3], [60, 300e3]])
import matplotlib.pyplot as plt

plt.scatter(data[:,0], data[:,1], c='k')
plt.title("Income per age")
plt.xlabel("Age")
plt.ylabel("Income")
plt.show()
_images/intrplate_2_0.png

Let’s create a simple interpolation:

interpolated = intrplate.basis_spline(data=data, degree=3)
plt.scatter(interpolated[:,0], interpolated[:,1], s=2, c='red')
plt.title("Income per age (interpolated)")
plt.xlabel("Age")
plt.ylabel("Income")
plt.scatter(data[:,0], data[:,1], c='k')
plt.show()
_images/intrplate_5_0.png

The dataset, however, does not contain any type of randomness right now. We can use the noisy_basis_spline which adds random noise according to a normal distribution

\[ p^{\prime} = p + \epsilon,\quad\epsilon \sim \mathcal{N}\left(0,\sigma^2\right) \]
interpolated_noisy = intrplate.noisy_basis_spline(data=data, degree=3, std=5.0)
plt.scatter(interpolated_noisy[:,0], interpolated_noisy[:,1], s=2, c='red')
plt.scatter(data[:,0], data[:,1], c='k')
plt.title("Income per age (interpolated)")
plt.xlabel("Age")
plt.ylabel("Income")
plt.show()
_images/intrplate_8_0.png

Categorical variables

For this example, let’s take a categorical variable. Let’s assume as an index the week number and for the categorical variable the possible temperature categories cold , mild , hot .

temperatures = np.array([[4, 'cold'], [18, 'mild'], [32, 'hot'], [37, 'mild'], [49, 'cold']])
plt.scatter(temperatures[:,0], temperatures[:,1], c='k')
plt.title("Temperature category by week number")
plt.xlabel("Week")
plt.ylabel("Temperature")
plt.show()
_images/intrplate_11_0.png
interpolated_temperatures = intrplate.interpolate_categorical(temperatures[:,0].astype(np.float), temperatures[:,1], size=52)
from matplotlib.ticker import FormatStrFormatter
import matplotlib

fig, ax = plt.subplots()

plt.scatter(interpolated_temperatures[:,0], interpolated_temperatures[:,1], s=2, c='red')
plt.title("Temperature category by week number")
plt.xlabel("Week")
plt.ylabel("Temperature")
plt.scatter(temperatures[:,0].astype(np.float), temperatures[:,1], c='k')
ax.xaxis.set_major_formatter(FormatStrFormatter('%.0f'))
ax.xaxis.set_major_locator(matplotlib.ticker.MaxNLocator(nbins=16))
plt.show()
_images/intrplate_13_0.png