Time Series Forecasting

Time series forecasting based on a weather time serires dataset partial machine learning and neural network algorithms for prediction tasks

Introduction

Weather forecasting has always been a critical aspect of our daily lives, influencing various sectors such as agriculture, transportation, and disaster management. Accurate predictions of weather parameters, particularly temperature and humidity, play a vital role in planning and decision-making processes. As climate change continues to introduce variability and uncertainty in weather patterns, the need for reliable forecasting methods becomes increasingly important.

In recent years, advancements in machine learning and data analytics have opened new avenues for improving prediction models. Traditional statistical methods, while useful, often fall short in capturing complex non-linear relationships within the data. By leveraging time series analysis and modern computational techniques, we can enhance the accuracy of forecasts and provide more timely insights into atmospheric conditions.

This project focuses on developing a time series forecasting model for predicting temperature, humidity levels and other factors based on historical weather data from the dataset. We aim to explore various machine learning algorithms, including regression models, neural networks like CNN and RNN etc, to identify the most effective approach for our prediction task.

Configuration

1
2
3
4
5
6
7
8
jupyter notebook
Python 3.10.2
IPython==7.34.0
matplotlib==3.7.1
numpy==1.26.4
pandas==2.1.4
seaborn==0.13.1
tensorflow==2.17.0

Exploratory Data Analysis

Set up

1
2
3
4
5
6
7
8
9
10
11
12
import os
import datetime

import IPython
import IPython.display
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf

Dataset

We will use a weather dataset containing 14 features such as air temperature, atmospheric pressure, and humidity to make predictions hourly. These were collected every 10 minutes, beginning in 2003.

1
2
3
4
5
zip_path = tf.keras.utils.get_file(
origin='https://storage.googleapis.com/tensorflow/tf-keras-datasets/jena_climate_2009_2016.csv.zip',
fname='jena_climate_2009_2016.csv.zip',
extract=True)
csv_path, _ = os.path.splitext(zip_path)
1
2
3
4
5
df = pd.read_csv(csv_path)
# Slice [start:stop:step], starting from index 5 take every 6th record.
df = df[5::6]

date_time = pd.to_datetime(df.pop('Date Time'), format='%d.%m.%Y %H:%M:%S')

use df.head() to take a glance

Inspect and clean up

1
df.describe().transpose()

One thing that should stand out is the min value of the wind velocity (wv (m/s)) and the maximum value (max. wv (m/s)) columns. This -9999 is likely erroneous.

There’s a separate wind direction column, so the velocity should be greater than zero (>=0). Replace it with zeros:

1
2
3
4
5
6
7
8
9
10
wv = df['wv (m/s)']
bad_wv = wv == -9999.0
wv[bad_wv] = 0.0

max_wv = df['max. wv (m/s)']
bad_max_wv = max_wv == -9999.0
max_wv[bad_max_wv] = 0.0

# The above inplace edits are reflected in the DataFrame.
df['wv (m/s)'].min()

Feature engineering

The last column of the data, wd (deg)—gives the wind direction in units of degrees. Angles do not make good model inputs: 360° and 0° should be close to each other and wrap around smoothly. Direction shouldn’t matter if the wind is not blowing.

Right now the distribution of wind data looks like this:

1
2
3
4
plt.hist2d(df['wd (deg)'], df['wv (m/s)'], bins=(50, 50), vmax=400)
plt.colorbar()
plt.xlabel('Wind Direction [deg]')
plt.ylabel('Wind Velocity [m/s]')

But this will be easier for the model to interpret if converting the wind direction and velocity columns to a wind vector:

1
2
3
4
5
6
7
8
9
10
11
12
13
wv = df.pop('wv (m/s)')
max_wv = df.pop('max. wv (m/s)')

# Convert to radians.
wd_rad = df.pop('wd (deg)')*np.pi / 180

# Calculate the wind x and y components.
df['Wx'] = wv*np.cos(wd_rad)
df['Wy'] = wv*np.sin(wd_rad)

# Calculate the max wind x and y components.
df['max Wx'] = max_wv*np.cos(wd_rad)
df['max Wy'] = max_wv*np.sin(wd_rad)

The distribution of wind vectors is much simpler for the model to correctly interpret:

1
2
3
4
5
6
plt.hist2d(df['Wx'], df['Wy'], bins=(50, 50), vmax=400)
plt.colorbar()
plt.xlabel('Wind X [m/s]')
plt.ylabel('Wind Y [m/s]')
ax = plt.gca()
ax.axis('tight')

Similarly, the Date Time column is very useful, but not in this string form. Start by converting it to seconds:

1
timestamp_s = date_time.map(pd.Timestamp.timestamp)

the time in seconds is not a useful model input. Being weather data, it has clear daily and yearly periodicity. And getting usable signals by using sine and cosine transforms to clear “Time of day” and “Time of year” signals:

1
2
3
4
5
6
7
8
9
10
11
day = 24*60*60
year = (365.2425)*day

df['Day sin'] = np.sin(timestamp_s * (2 * np.pi / day))
df['Day cos'] = np.cos(timestamp_s * (2 * np.pi / day))
df['Year sin'] = np.sin(timestamp_s * (2 * np.pi / year))
df['Year cos'] = np.cos(timestamp_s * (2 * np.pi / year))
plt.plot(np.array(df['Day sin'])[:25])
plt.plot(np.array(df['Day cos'])[:25])
plt.xlabel('Time [h]')
plt.title('Time of day signal')

Spilt the data

We will be using a (70%, 20%, 10%) split for the training, validation, and test sets. And the data is not being randomly shuffled before splitting. This is for two reasons I think:

  1. It ensures that chopping the data into windows of consecutive samples is still possible.
  2. It ensures that the validation/test results are more realistic, being evaluated on the data collected after the model was trained.
1
2
3
4
5
6
7
8
column_indices = {name: i for i, name in enumerate(df.columns)}

n = len(df)
train_df = df[0:int(n*0.7)]
val_df = df[int(n*0.7):int(n*0.9)]
test_df = df[int(n*0.9):]

num_features = df.shape[1]

Normalise the data

Normalisation is a common way of doing this scaling: subtract the mean and divide by the standard deviation of each feature.

1
2
3
4
5
6
train_mean = train_df.mean()
train_std = train_df.std()

train_df = (train_df - train_mean) / train_std
val_df = (val_df - train_mean) / train_std
test_df = (test_df - train_mean) / train_std

Now, peek at the distribution of the features. Some features do have long tails, but there are no obvious errors like the -9999 wind velocity value.

## Data windowing

The main features of the input windows are:

  • The width (number of time steps) of the input and label windows.
  • The time offset between them.
  • Which features are used as inputs, labels, or both.

This section focuses on implementing the data windowing so that it can be reused for all of those models.

Here are two examples about data windowing:

1.For example, to make a single prediction 24 hours into the future, given 24 hours of history, we might define a window like this:

2.A model that makes a prediction one hour into the future, given six hours of history, would need a window like this:

WindowGenerator class. This class can:

  1. Handle the indexes and offsets as shown in the diagrams above.
  2. Split windows of features into (features, labels) pairs.
  3. Plot the content of the resulting windows.
  4. Efficiently generate batches of these windows from the training, evaluation, and test data, using tf.data.Datasets.

Indexes and offsets

Start by creating the WindowGenerator class. The __init__ method includes all the necessary logic for the input and label indices.

It also takes the training, evaluation, and test DataFrames as input. These will be converted to tf.data.Datasets of windows later.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
class WindowGenerator():
def __init__(self, input_width, label_width, shift,
train_df=train_df, val_df=val_df, test_df=test_df,
label_columns=None):
# Store the raw data.
self.train_df = train_df
self.val_df = val_df
self.test_df = test_df

# Work out the label column indices.
self.label_columns = label_columns
if label_columns is not None:
self.label_columns_indices = {name: i for i, name in
enumerate(label_columns)}
self.column_indices = {name: i for i, name in
enumerate(train_df.columns)}

# Work out the window parameters.
self.input_width = input_width
self.label_width = label_width
self.shift = shift

self.total_window_size = input_width + shift

self.input_slice = slice(0, input_width)
self.input_indices = np.arange(self.total_window_size)[self.input_slice]

self.label_start = self.total_window_size - self.label_width
self.labels_slice = slice(self.label_start, None)
self.label_indices = np.arange(self.total_window_size)[self.labels_slice]

def __repr__(self):
return '\n'.join([
f'Total window size: {self.total_window_size}',
f'Input indices: {self.input_indices}',
f'Label indices: {self.label_indices}',
f'Label column name(s): {self.label_columns}'])

Here is code to create the 2 windows shown in the diagrams at the start of this section:

1
2
3
w1 = WindowGenerator(input_width=24, label_width=1, shift=24,
label_columns=['T (degC)'])
w1
1
2
3
4
5
#w1
Total window size: 48
Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
Label indices: [47]
Label column name(s): ['T (degC)']
1
2
3
w2 = WindowGenerator(input_width=6, label_width=1, shift=1,
label_columns=['T (degC)'])
w2
1
2
3
4
5
#w2
Total window size: 7
Input indices: [0 1 2 3 4 5]
Label indices: [6]
Label column name(s): ['T (degC)']

Spilt window

Given a list of consecutive inputs, the split_window method will convert them to a window of inputs and a window of labels.

The example w2 you define earlier will be split like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def split_window(self, features):
inputs = features[:, self.input_slice, :]
labels = features[:, self.labels_slice, :]
if self.label_columns is not None:
labels = tf.stack(
[labels[:, :, self.column_indices[name]] for name in self.label_columns],
axis=-1)

# Slicing doesn't preserve static shape information, so set the shapes
# manually. This way the `tf.data.Datasets` are easier to inspect.
inputs.set_shape([None, self.input_width, None])
labels.set_shape([None, self.label_width, None])

return inputs, labels

WindowGenerator.split_window = split_window

Test it out:

1
2
3
4
5
6
7
8
9
10
11
# Stack three slices, the length of the total window.
example_window = tf.stack([np.array(train_df[:w2.total_window_size]),
np.array(train_df[100:100+w2.total_window_size]),
np.array(train_df[200:200+w2.total_window_size])])

example_inputs, example_labels = w2.split_window(example_window)

print('All shapes are: (batch, time, features)')
print(f'Window shape: {example_window.shape}')
print(f'Inputs shape: {example_inputs.shape}')
print(f'Labels shape: {example_labels.shape}')
1
2
3
4
All shapes are: (batch, time, features)
Window shape: (3, 7, 19)
Inputs shape: (3, 6, 19)
Labels shape: (3, 1, 1)

The code above took a batch of three 7-time step windows with 19 features at each time step. It splits them into a batch of 6-time step 19-feature inputs, and a 1-time step 1-feature label. The label only has one feature because the WindowGenerator was initialized with label_columns=['T (degC)'].

Plot

Here is a plot method that allows a simple visualization of the split window:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
def plot(self, model=None, plot_col='T (degC)', max_subplots=3):
inputs, labels = self.example
plt.figure(figsize=(12, 8))
plot_col_index = self.column_indices[plot_col]
max_n = min(max_subplots, len(inputs))
for n in range(max_n):
plt.subplot(max_n, 1, n+1)
plt.ylabel(f'{plot_col} [normed]')
plt.plot(self.input_indices, inputs[n, :, plot_col_index],
label='Inputs', marker='.', zorder=-10)

if self.label_columns:
label_col_index = self.label_columns_indices.get(plot_col, None)
else:
label_col_index = plot_col_index

if label_col_index is None:
continue

plt.scatter(self.label_indices, labels[n, :, label_col_index],
edgecolors='k', label='Labels', c='#2ca02c', s=64)
if model is not None:
predictions = model(inputs)
plt.scatter(self.label_indices, predictions[n, :, label_col_index],
marker='X', edgecolors='k', label='Predictions',
c='#ff7f0e', s=64)

if n == 0:
plt.legend()

plt.xlabel('Time [h]')

WindowGenerator.plot = plot

test it out:

1
w2.plot()
1
w2.plot(plot_col='p (mbar)')

Create tf.data.Datasets

Finally, this make_dataset method will take a time series DataFrame and convert it to a tf.data.Dataset of (input_window, label_window) pairs using the tf.keras.utils.timeseries_dataset_from_array function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def make_dataset(self, data):
data = np.array(data, dtype=np.float32)
ds = tf.keras.utils.timeseries_dataset_from_array(
data=data,
targets=None,
sequence_length=self.total_window_size,
sequence_stride=1,
shuffle=True,
batch_size=32,)

ds = ds.map(self.split_window)

return ds

WindowGenerator.make_dataset = make_dataset

The WindowGenerator object holds training, validation, and test data. Also, we add a standard example batch for easy access and plotting:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
@property
def train(self):
return self.make_dataset(self.train_df)

@property
def val(self):
return self.make_dataset(self.val_df)

@property
def test(self):
return self.make_dataset(self.test_df)

@property
def example(self):
"""Get and cache an example batch of `inputs, labels` for plotting."""
result = getattr(self, '_example', None)
if result is None:
# No example batch was found, so get one from the `.train` dataset
result = next(iter(self.train))
# And cache it for next time
self._example = result
return result

WindowGenerator.train = train
WindowGenerator.val = val
WindowGenerator.test = test
WindowGenerator.example = example

Now, the WindowGenerator object gives you access to the tf.data.Dataset objects, so we can easily iterate over the data.

1
2
# Each element is an (inputs, label) pair.
w2.train.element_spec
1
2
(TensorSpec(shape=(None, 6, 19), dtype=tf.float32, name=None),
TensorSpec(shape=(None, 1, 1), dtype=tf.float32, name=None))
1
2
3
for example_inputs, example_labels in w2.train.take(1):
print(f'Inputs shape (batch, time, features): {example_inputs.shape}')
print(f'Labels shape (batch, time, features): {example_labels.shape}')
1
2
Inputs shape (batch, time, features): (32, 6, 19)
Labels shape (batch, time, features): (32, 1, 1)

MLFLOW

before diving into next section, we will set up MLflow(open source MLOps platform) for tracking and logging the parameters and metrics for the models to be implemented and find out the potential model with best performance lining with the time series data

Install mlflow and pyngrok

1
!pip install mlflow pyngrok --quiet

Config pyngrok port and set up mlflow UI

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from pyngrok import ngrok
from getpass import getpass
import mlflow
# Terminate open tunnels if exist
ngrok.kill()

#sign up a new pyngrok account for the AUTH token if not have
NGROK_AUTH_TOKEN = getpass('Your_AUTH_TOKEN:')
ngrok.set_auth_token(NGROK_AUTH_TOKEN)

# Open an HTTPs tunnel on port 5000 for http://localhost:5000
ngrok_tunnel = ngrok.connect(addr="5000", proto="http", bind_tls=True)
print("MLflow Tracking UI:", ngrok_tunnel.public_url)
get_ipython().system_raw("mlflow ui --port 5000 &")
mlflow.set_experiment("mlflow_[weather_tsf]_exp_[v1]")

access MLFLOW UI to test it out, the MLFLOW Track UI: like https://17fe-34-70-201-166.ngrok-free.app/

Config model params and metrics

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# collect params and metrics for mlflow
def mlflow_metrics_eval_model(model_used, model_type, window_used, model_name, metric_result, metric_type):
from mlflow.models import infer_signature
params = {
'model_used': model_used,
'window_used': window_used,
'model_name': model_name,
'max_epochs': 20,
'batch_size': 32,
'loss': 'mean_squared_error',
'optimizer': 'adam',
'patience' : 2,
'input_width': window_used.input_width,
'label_width': window_used.label_width,
'shift': window_used.shift,
'label_columns': ['T (degC)']
}
metrics = {}
metrics_key_mapping = {
'mean_absolute_error': 'mae',
'mean_absolute_percentage_error': 'mape',
'root_mean_squared_error': 'rmsr',
'loss' : 'mse'
}
for key, value in metric_result.items():
if key == 'mean_absolute_percentage_error' and value > 1:
value = value / 100
metrics_short_key = metrics_key_mapping.get(key, key)
metrics_short_key = f"{metrics_short_key}{metric_type}"
metrics[metrics_short_key] = value
with mlflow.start_run():
mlflow.log_params(params)
mlflow.log_metrics(metrics)
mlflow.set_tag(model_type, model_name)
signature = infer_signature(window_used.example[0].numpy(), model_used(window_used.example[0]).numpy())
model_info = mlflow.sklearn.log_model(
sk_model=model_used,
artifact_path=f"models/{model_type}/{model_name}",
signature=signature,
input_example=window_used.example[0].numpy(),
registered_model_name=model_name,
conda_env={
'name': f"{model_type}_{model_name}",
'dependencies': [
'python=3.10.3',
'IPython==7.34.0',
'matplotlib==3.7.1',
'numpy==1.26.4',
'pandas==2.1.4',
'seaborn==0.13.1',
'tensorflow==2.17.0',
'mlflow',
'pyngrok',
'scikit-learn'
]
}
)

Config mlfow plot function for plotting model metrics

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# plot mlflow metrics for models
def plot_mlflow_metrics(model_type):
runs = mlflow.search_runs()

metrics_list = []

for index, row in runs.iterrows():
run_id = row['run_id']

metrics = mlflow.get_run(run_id).data.metrics

tags = mlflow.get_run(run_id).data.tags
tag_value = tags.get(model_type, run_id)

metrics_list.append({'tag': tag_value, **metrics})

metrics_df = pd.DataFrame(metrics_list)

metrics_df_melted = metrics_df.melt(id_vars='tag', var_name='Metric', value_name='Value')

plt.figure(figsize=(12, 6))
ax = sns.barplot(data=metrics_df_melted, x='Metric', y='Value', hue='tag')

plt.title(f'Metrics with Val_df and Test_df for {model_type} Models')
plt.xlabel('Metrics')
plt.ylabel('Values')
plt.legend(title=model_type)
plt.tight_layout()
plt.show()

Single step models

This blog will introduce two sorts of models, one for single step models with predicting one hour and the other for multi step models with predicting one day.

single step models predicts a single feature’s value—1 time step (one hour) into the future based only on the current conditions.

So, start by building models to predict the T (degC) value one hour into the future

Configure a WindowGenerator object to produce these single-step (input, label) pairs:

1
2
3
single_step_window = WindowGenerator(
input_width=1, label_width=1, shift=1,
label_columns=['T (degC)'])

And we will create 6 different models in time series forecasting to evaluate their performance

Baseline

Before building a trainable model it would be good to have a performance baseline as a point for comparison with the later more complicated models.

1
2
3
4
5
6
7
8
9
10
class Baseline(tf.keras.Model):
def __init__(self, label_index=None):
super().__init__()
self.label_index = label_index

def call(self, inputs):
if self.label_index is None:
return inputs
result = inputs[:, :, self.label_index]
return result[:, :, tf.newaxis]

Instantiate and evaluate this model:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
baseline = Baseline(label_index=column_indices['T (degC)'])

baseline.compile(loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.MeanAbsoluteError()])

val_performance = {}
performance = {}
single_step_performance_key = ['loss','mean_absolute_error']

def handle_performance(model_used,model_type, model_name, window_used, val_perf, perf, perf_key_list ):
metrics_val = model_used.evaluate(window_used.val, return_dict=True)
metrics_test = model_used.evaluate(window_used.test, verbose=0, return_dict=True)
mlflow_metrics_eval_model(model_used, model_type, window_used, model_name, metrics_val,'_val')
mlflow_metrics_eval_model(model_used, model_type, window_used, model_name, metrics_test,'_test')
val_perf[model_name] = {key: metrics_val[key] for key in perf_key_list if key in metrics_val}
perf[model_name] = {key: metrics_test[key] for key in perf_key_list if key in metrics_test }

handle_performance(baseline,'single_step','Baseline',single_step_window,val_performance,performance,single_step_performance_key)
1
2
1/439 ━━━━━━━━━━━━━━━━━━━━ 2:27 337ms/step - loss: 0.0075 - mean_absolute_error: 0.0657
439/439 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - loss: 0.0121 - mean_absolute_error: 0.0769

That printed some performance metrics, but those don’t give you a feeling for how well the model is doing.

The WindowGenerator has a plot method, but the plots won’t be very interesting with only a single sample.

So, create a wider WindowGenerator that generates windows 24 hours of consecutive inputs and labels at a time. The new wide_window variable doesn’t change the way the model operates. The model still makes predictions one hour into the future based on a single input time step. Here, the time axis acts like the batch axis: each prediction is made independently with no interaction between time steps:

1
2
3
wide_window = WindowGenerator(
input_width=24, label_width=24, shift=1,
label_columns=['T (degC)'])

This expanded window can be passed directly to the same baseline model without any code changes. This is possible because the inputs and labels have the same number of time steps, and the baseline just forwards the input to the output:

1
wide_window.plot(baseline)

In the above plots of three examples the single step model is run over the course of 24 hours. This deserves some explanation:

  • The blue Inputs line shows the input temperature at each time step. The model receives all features, this plot only shows the temperature.
  • The green Labels dots show the target prediction value. These dots are shown at the prediction time, not the input time. That is why the range of labels is shifted 1 step relative to the inputs.
  • The orange Predictions crosses are the model’s prediction’s for each output time step. If the model were predicting perfectly the predictions would land directly on the Labels.

Linear model

tf.keras.layers.Dense layer with no activation set is a linear model. The layer only transforms the last axis of the data from (batch, time, inputs) to (batch, time, units); it is applied independently to every item across the batch and time axes.

1
2
3
linear = tf.keras.Sequential([
tf.keras.layers.Dense(units=1)
])

compile and fit function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
MAX_EPOCHS = 20

def compile_and_fit(model, window, patience=2):
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss',
patience=patience,
mode='min')

model.compile(loss=tf.keras.losses.MeanSquaredError(),
optimizer=tf.keras.optimizers.Adam(),
metrics=[tf.keras.metrics.MeanAbsoluteError()])

history = model.fit(window.train, epochs=MAX_EPOCHS,
validation_data=window.val,
callbacks=[early_stopping])
return history

Train the model and evaluate its performance:

1
2
3
history = compile_and_fit(linear, single_step_window)

handle_performance(linear,'single_step','linear',single_step_window,val_performance,performance,single_step_performance_key)

Here is the plot of its example predictions on the wide_window, and how in many cases the prediction is clearly better than just returning the input temperature, but in a few cases it’s worse:

1
wide_window.plot(linear)

One advantage to linear models is that they’re relatively simple to interpret. We can pull out the layer’s weights and visualise the weight assigned to each input:

1
2
3
4
5
plt.bar(x = range(len(train_df.columns)),
height=linear.layers[0].kernel[:,0].numpy())
axis = plt.gca()
axis.set_xticks(range(len(train_df.columns)))
_ = axis.set_xticklabels(train_df.columns, rotation=90)

Sometimes the model doesn’t even place the most weight on the input T (degC). This is one of the risks of random initialisation.

Dense

Here’s a model similar to the linear model for checking the performance of deeper, more powerful, single input step models, except it stacks several a few Dense layers between the input and the output:

1
2
3
4
5
6
7
8
9
dense = tf.keras.Sequential([
tf.keras.layers.Dense(units=64, activation='relu'),
tf.keras.layers.Dense(units=64, activation='relu'),
tf.keras.layers.Dense(units=1)
])

history = compile_and_fit(dense, single_step_window)

handle_performance(dense,'single_step','Dense',single_step_window,val_performance,performance,single_step_performance_key)

Multi dense model

A single-time-step model has no context for the current values of its inputs. It can’t see how the input features are changing over time. To address this issue the model needs access to multiple time steps when making predictions:

The baselinelinear and dense models handled each time step independently. Here the model will take multiple time steps as input to produce a single output.

Create a WindowGenerator that will produce batches of three-hour inputs and one-hour labels:

training a dense model on a multiple-input-step window by adding a tf.keras.layers.Flatten as the first layer of the model:

1
2
3
4
5
6
7
8
9
10
multi_step_dense = tf.keras.Sequential([
# Shape: (time, features) => (time*features)
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(units=32, activation='relu'),
tf.keras.layers.Dense(units=32, activation='relu'),
tf.keras.layers.Dense(units=1),
# Add back the time dimension.
# Shape: (outputs) => (1, outputs)
tf.keras.layers.Reshape([1, -1]),
])
1
2
3
history = compile_and_fit(multi_step_dense, conv_window)

handle_performance(multi_step_dense,'single_step','Multi step dense',conv_window,val_performance,performance,single_step_performance_key)
1
2
print('Input shape:', conv_window.example[0].shape)
print('Output shape:', multi_step_dense(conv_window.example[0]).shape)
1
2
Input shape: (32, 3, 19)
Output shape: (32, 1, 1)
1
conv_window.plot(multi_step_dense)

The main down-side of this approach is that the resulting model can only be executed on input windows of exactly this shape.

1
2
3
4
5
print('Input shape:', wide_window.example[0].shape)
try:
print('Output shape:', multi_step_dense(wide_window.example[0]).shape)
except Exception as e:
print(f'\n{type(e).__name__}:{e}')
1
2
3
4
5
6
7
8
9
10
Input shape: (32, 24, 19)

ValueError:Exception encountered when calling Sequential.call().

Input 0 of layer "dense_4" is incompatible with the layer: expected axis -1 of input shape to have value 57, but received input with shape (32, 456)

Arguments received by Sequential.call():
• inputs=tf.Tensor(shape=(32, 24, 19), dtype=float32)
• training=None
• mask=None

The convolution models in the next section fix this problem.

Convolution neural network

A convolution layer (tf.keras.layers.Conv1D) also takes multiple time steps as input to each prediction.

Below is the same model as multi_step_dense, re-written with a convolution.

1
2
3
4
5
6
7
conv_model = tf.keras.Sequential([
tf.keras.layers.Conv1D(filters=32,
kernel_size=(CONV_WIDTH,),
activation='relu'),
tf.keras.layers.Dense(units=32, activation='relu'),
tf.keras.layers.Dense(units=1),
])

Compile and fit the model, handle the performance

1
2
3
4
5
history = compile_and_fit(conv_model, conv_window)

IPython.display.clear_output()

handle_performance(conv_model,'Conv','Multi step dense',conv_window,val_performance,performance,single_step_performance_key)

check out the input and output tensor shape

1
2
3
print("Conv model on `conv_window`")
print('Input shape:', conv_window.example[0].shape)
print('Output shape:', conv_model(conv_window.example[0]).shape)
1
2
3
Conv model on `conv_window`
Input shape: (32, 3, 19)
Output shape: (32, 1, 1)

check out the input and output tensor shape of wide_window

1
2
3
4
print("Wide window")
print('Input shape:', wide_window.example[0].shape)
print('Labels shape:', wide_window.example[1].shape)
print('Output shape:', conv_model(wide_window.example[0]).shape)
1
2
3
4
Wide window
Input shape: (32, 24, 19)
Labels shape: (32, 24, 1)
Output shape: (32, 22, 1)

We could find out that the output is shorter than the input. To make training or plotting work, we need the labels, and prediction to have the same length. So build a WindowGenerator to produce wide windows with a few extra input time steps so the label and prediction lengths match:

1
2
3
4
5
6
7
8
9
LABEL_WIDTH = 24
INPUT_WIDTH = LABEL_WIDTH + (CONV_WIDTH - 1)
wide_conv_window = WindowGenerator(
input_width=INPUT_WIDTH,
label_width=LABEL_WIDTH,
shift=1,
label_columns=['T (degC)'])

wide_conv_window
1
2
3
4
5
Total window size: 27
Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25]
Label indices: [ 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26]
Label column name(s): ['T (degC)']
1
2
3
4
print("Wide conv window")
print('Input shape:', wide_conv_window.example[0].shape)
print('Labels shape:', wide_conv_window.example[1].shape)
print('Output shape:', conv_model(wide_conv_window.example[0]).shape)
1
2
3
4
Wide conv window
Input shape: (32, 26, 19)
Labels shape: (32, 24, 1)
Output shape: (32, 24, 1)

Every prediction here is based on the 3 preceding time steps:

1
wide_conv_window.plot(conv_model)

Recurrent neural network

A Recurrent Neural Network (RNN) is a type of neural network well-suited to time series data. RNNs process a time series step-by-step, maintaining an internal state from time-step to time-step. And we will use an RNN layer called Long Short-Term Memory (tf.keras.layers.LSTM).

n important constructor argument for all Keras RNN layers, such as tf.keras.layers.LSTM, is the return_sequences argument. This setting can configure the layer in one of two ways:

  1. If False, the default, the layer only returns the output of the final time step, giving the model time to warm up its internal state before making a single prediction:
  1. If True, the layer returns an output for each input. This is useful for:

    • Stacking RNN layers.
    • Training a model on multiple time steps simultaneously.
1
2
3
4
5
6
lstm_model = tf.keras.models.Sequential([
# Shape [batch, time, features] => [batch, time, lstm_units]
tf.keras.layers.LSTM(32, return_sequences=True),
# Shape => [batch, time, features]
tf.keras.layers.Dense(units=1)
])
1
2
3
4
5
history = compile_and_fit(lstm_model, wide_window)

IPython.display.clear_output()

handle_performance(lstm_model,'single_step','LSTM',wide_window,val_performance,performance,single_step_performance_key)
1
2
print('Input shape:', wide_window.example[0].shape)
print('Output shape:', lstm_model(wide_window.example[0]).shape)
1
2
Input shape: (32, 24, 19)
Output shape: (32, 24, 1)
1
wide_window.plot(lstm_model)

Performance

log in MLFLOW UI and check out the metrics

use MLFLOW plot func to present and analyse the metrics

1
plot_mlflow_metrics('single_step')

from the picture and we could find out the LSTM model metrics are better than other models’ based on the test and validation dataset according to the four different common used metrics in time series data prediction, which are mean_absolute_error, mean_square_error, mean_absolute_percentage_error, symmetric_mean_absolute_percentage_error and root_mean_square_error

Diving into deeper on comparison of the mae metric for the models

1
2
3
cm = lstm_model.metrics[1]
cm.metrics

1
[<MeanAbsoluteError name=mean_absolute_error>]
1
val_performance
1
2
3
4
5
6
7
8
9
10
11
12
{'Baseline': {'loss': 0.012845644727349281,
'mean_absolute_error': 0.07846628874540329},
'Linear': {'loss': 0.008695926517248154,
'mean_absolute_error': 0.06866316497325897},
'Dense': {'loss': 0.006793886888772249,
'mean_absolute_error': 0.05716359242796898},
'Multi step dense': {'loss': 0.007616413291543722,
'mean_absolute_error': 0.06059327721595764},
'Conv': {'loss': 0.006222909316420555,
'mean_absolute_error': 0.05451442673802376},
'LSTM': {'loss': 0.0056776562705636024,
'mean_absolute_error': 0.05233458802103996} }
1
2
3
4
5
6
7
8
9
10
11
12
x = np.arange(len(performance))
width = 0.3
metric_name = 'mean_absolute_error'
val_mae = [v[metric_name] for v in val_performance.values()]
test_mae = [v[metric_name] for v in performance.values()]

plt.ylabel('mean_absolute_error [T (degC), normalized]')
plt.bar(x - 0.17, val_mae, width, label='Validation')
plt.bar(x + 0.17, test_mae, width, label='Test')
plt.xticks(ticks=x, labels=performance.keys(),
rotation=45)
_ = plt.legend()
1
2
for name, value in performance.items():
print(f'{name:12s}: {value[metric_name]:0.4f}')
1
2
3
4
5
6
Baseline    : 0.0852
Linear : 0.0663
Dense : 0.0584
Multi step dense: 0.0633
Conv : 0.0543
LSTM : 0.0533

With this dataset typically each of the models does slightly better than the one before it and

from the data, the performance of the model based on LSTM model is better than other’s, which is 0.0533, when predicting single time step.

Multi-step models

This section looks at how to expand these models to make multiple time step predictions.

In a multi-step prediction, the model needs to learn to predict a range of future values. Thus, unlike a single step model, where only a single future point is predicted, a multi-step model predicts a sequence of the future values.

There are two rough approaches to this:

  1. Single shot predictions where the entire time series is predicted at once.
  2. Autoregressive predictions where the model only makes single step predictions and its output is fed back as its input.

For the multi-step model, the training data again consists of hourly samples. However, here, the models will learn to predict 24 hours into the future, given 24 hours of the past.

Here is a Window object that generates these slices from the dataset:

1
2
3
4
5
6
7
OUT_STEPS = 24
multi_window = WindowGenerator(input_width=24,
label_width=OUT_STEPS,
shift=OUT_STEPS)

multi_window.plot()
multi_window
1
2
3
4
Total window size: 48
Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
Label indices: [24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47]
Label column name(s): None

Baselines

A simple baseline for this task is to repeat the last input time step for the required number of output time steps:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class MultiStepLastBaseline(tf.keras.Model):
def call(self, inputs):
return tf.tile(inputs[:, -1:, :], [1, OUT_STEPS, 1])

last_baseline = MultiStepLastBaseline()
last_baseline.compile(loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.MeanAbsoluteError()])

multi_val_performance = {}
multi_performance = {}

multi_step_performance_key = ['loss', 'mean_absolute_error']

handle_performance(last_baseline,'multi_step','Last',multi_window,multi_val_performance,multi_performance,multi_step_performance_key)

multi_window.plot(last_baseline)

Since this task is to predict 24 hours into the future, given 24 hours of the past, another simple approach is to repeat the previous day, assuming tomorrow will be similar:

1
2
3
4
5
6
7
8
9
10
11
class RepeatBaseline(tf.keras.Model):
def call(self, inputs):
return inputs

repeat_baseline = RepeatBaseline()
repeat_baseline.compile(loss=tf.keras.losses.MeanSquaredError(),
metrics=[tf.keras.metrics.MeanAbsoluteError()])

handle_performance(repeat_baseline,'multi_step','Repeat', multi_window, multi_val_performance, multi_performance,multi_step_performance_key)

multi_window.plot(repeat_baseline)

Linear

A simple linear model based on the last input time step does better than either baseline, but is underpowered. The model needs to predict OUTPUT_STEPS time steps, from a single input time step with a linear projection. It can only capture a low-dimensional slice of the behavior, likely based mainly on the time of day and time of year.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
multi_linear_model = tf.keras.Sequential([
# Take the last time-step.
# Shape [batch, time, features] => [batch, 1, features]
tf.keras.layers.Lambda(lambda x: x[:, -1:, :]),
# Shape => [batch, 1, out_steps*features]
tf.keras.layers.Dense(OUT_STEPS*num_features,
kernel_initializer=tf.initializers.zeros()),
# Shape => [batch, out_steps, features]
tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_linear_model, multi_window)

IPython.display.clear_output()

handle_performance(multi_linear_model,'multi_step','Linear',multi_window,multi_val_performance,multi_performance,multi_step_performance_key)

multi_window.plot(multi_linear_model)
### Dense

Adding a tf.keras.layers.Dense between the input and output gives the linear model more power

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
multi_dense_model = tf.keras.Sequential([
# Take the last time step.
# Shape [batch, time, features] => [batch, 1, features]
tf.keras.layers.Lambda(lambda x: x[:, -1:, :]),
# Shape => [batch, 1, dense_units]
tf.keras.layers.Dense(512, activation='relu'),
# Shape => [batch, out_steps*features]
tf.keras.layers.Dense(OUT_STEPS*num_features,
kernel_initializer=tf.initializers.zeros()),
# Shape => [batch, out_steps, features]
tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_dense_model, multi_window)

IPython.display.clear_output()

handle_performance(multi_dense_model,'multi_step','Dense',multi_window,multi_val_performance,multi_performance,multi_step_performance_key)

multi_window.plot(multi_dense_model)

CNN

A convolutional model makes predictions based on a fixed-width history, which may lead to better performance than the dense model since it can see how things are changing over time:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
CONV_WIDTH = 3
multi_conv_model = tf.keras.Sequential([
# Shape [batch, time, features] => [batch, CONV_WIDTH, features]
tf.keras.layers.Lambda(lambda x: x[:, -CONV_WIDTH:, :]),
# Shape => [batch, 1, conv_units]
tf.keras.layers.Conv1D(256, activation='relu', kernel_size=(CONV_WIDTH)),
# Shape => [batch, 1, out_steps*features]
tf.keras.layers.Dense(OUT_STEPS*num_features,
kernel_initializer=tf.initializers.zeros()),
# Shape => [batch, out_steps, features]
tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_conv_model, multi_window)

IPython.display.clear_output()

handle_performance(multi_conv_model,'multi_step','Conv',multi_window,multi_val_performance,multi_performance,multi_step_performance_key)

multi_window.plot(multi_conv_model)

RNN

A recurrent model can learn to use a long history of inputs, if it’s relevant to the predictions the model is making. Here the model will accumulate internal state for 24 hours, before making a single prediction for the next 24 hours. In this single-shot format, the LSTM only needs to produce an output at the last time step, so set return_sequences=False in tf.keras.layers.LSTM.

In this multi-step format, the LSTM only needs to produce an output at the last time step, so set return_sequences=False in tf.keras.layers.LSTM.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
multi_lstm_model = tf.keras.Sequential([
# Shape [batch, time, features] => [batch, lstm_units].
# Adding more `lstm_units` just overfits more quickly.
tf.keras.layers.LSTM(32, return_sequences=False),
# Shape => [batch, out_steps*features].
tf.keras.layers.Dense(OUT_STEPS*num_features,
kernel_initializer=tf.initializers.zeros()),
# Shape => [batch, out_steps, features].
tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_lstm_model, multi_window)

IPython.display.clear_output()

handle_performance(multi_lstm_model,'multi_step','LSTM',multi_window,multi_val_performance,multi_performance,multi_step_performance_key)

multi_window.plot(multi_lstm_model)

Autoregressive RNN model

The above models all predict the entire output sequence in a single step.

In some cases it may be helpful for the model to decompose this prediction into individual time steps. Then, each model’s output can be fed back into itself at each step and predictions can be made conditioned on the previous one.

The model will have the same basic form as the single-step LSTM models from earlier: a tf.keras.layers.LSTM layer followed by a tf.keras.layers.Dense layer that converts the LSTM layer’s outputs to model predictions.

tf.keras.layers.LSTM is a tf.keras.layers.LSTMCell wrapped in the higher level tf.keras.layers.RNN that manages the state and sequence results for you (Check out the Recurrent Neural Networks (RNN) with Keras guide for details).

In this case, the model has to manually manage the inputs for each step, so it uses tf.keras.layers.LSTMCell directly for the lower level, single time step interface.

1
2
3
4
5
6
7
8
9
10
class FeedBack(tf.keras.Model):
def __init__(self, units, out_steps):
super().__init__()
self.out_steps = out_steps
self.units = units
self.lstm_cell = tf.keras.layers.LSTMCell(units)
# Also wrap the LSTMCell in an RNN to simplify the `warmup` method.
self.lstm_rnn = tf.keras.layers.RNN(self.lstm_cell, return_state=True)
self.dense = tf.keras.layers.Dense(num_features)
feedback_model = FeedBack(units=32, out_steps=OUT_STEPS)

The first method this model needs is a warmup method to initialize its internal state based on the inputs. Once trained, this state will capture the relevant parts of the input history. This is equivalent to the single-step LSTM model from earlier

1
2
3
4
5
6
7
8
9
10
def warmup(self, inputs):
# inputs.shape => (batch, time, features)
# x.shape => (batch, lstm_units)
x, *state = self.lstm_rnn(inputs)

# predictions.shape => (batch, features)
prediction = self.dense(x)
return prediction, state

FeedBack.warmup = warmup

This method returns a single time-step prediction and the internal state of the LSTM:

1
2
prediction, state = feedback_model.warmup(multi_window.example[0])
prediction.shape
1
TensorShape([32, 19])

With the RNN‘s state, and an initial prediction you can now continue iterating the model feeding the predictions at each step back as the input.

One of the simplest approach for collecting the output predictions could be using a Python list and a tf.stack after the loop.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
def call(self, inputs, training=None):
# Use a TensorArray to capture dynamically unrolled outputs.
predictions = []
# Initialize the LSTM state.
prediction, state = self.warmup(inputs)

# Insert the first prediction.
predictions.append(prediction)

# Run the rest of the prediction steps.
for n in range(1, self.out_steps):
# Use the last prediction as input.
x = prediction
# Execute one lstm step.
x, state = self.lstm_cell(x, states=state,
training=training)
# Convert the lstm output to a prediction.
prediction = self.dense(x)
# Add the prediction to the output.
predictions.append(prediction)

# predictions.shape => (time, batch, features)
predictions = tf.stack(predictions)
# predictions.shape => (batch, time, features)
predictions = tf.transpose(predictions, [1, 0, 2])
return predictions

FeedBack.call = call
1
2
3
4
5
6
7
history = compile_and_fit(feedback_model, multi_window)

IPython.display.clear_output()

handle_performance(feedback_model,'multi_step','AR LSTM',multi_window,multi_val_performance,multi_performance,multi_step_performance_key)

multi_window.plot(feedback_model)

Performance

MLFLOW metrics

1
plot_mlflow_metrics('multi_step')

LSTM model performs well generally except on mean_abosolute_percentage_error.

1
2
3
4
5
6
7
8
9
10
11
12
13
14

x = np.arange(len(multi_performance))
width = 0.3

metric_name = 'mean_absolute_error'
val_mae = [v[metric_name] for v in multi_val_performance.values()]
test_mae = [v[metric_name] for v in multi_performance.values()]

plt.bar(x - 0.17, val_mae, width, label='Validation')
plt.bar(x + 0.17, test_mae, width, label='Test')
plt.xticks(ticks=x, labels=multi_performance.keys(),
rotation=45)
plt.ylabel(f'MAE (average over all times and outputs)')
_ = plt.legend()
1
2
for name, value in multi_performance.items():
print(f'{name:8s}: {value[metric_name]:0.4f}')
1
2
3
4
5
6
7
Last    : 0.5157
Repeat : 0.3774
Linear : 0.2980
Dense : 0.2765
Conv : 0.2732
LSTM : 0.2767
AR LSTM : 0.2910

The gains achieved going from a dense model to convolutional and recurrent models are only a few percent (if any), and the autoregressive model performed clearly worse. So these more complex approaches may not be worth while on this problem, but there was no way to know without trying.

Finally, we will use LSTM model to implement our time series data forecasting. Checking out the GitHub repo below for complete implementing LSTM model with the configs above on the prediction tasks and the blog: Deploy the model on Kubeflow

GitHub

https://github.com/PaddyZz/Time_Series_Forecasting

Conclusion

We have finished:

• Environment and dependencies set up
• Exploratory Data Analysis
• Configure MLflow for model evaluation and model metrics visualisation
• Compile, fit, train and evaluate the models with single step type
• Compile, fit, train and evaluate the models with multi step type
• Compare the metrics for choosing best performance model among them

References

time series forecasting

Author

Paddy

Posted on

01-10-2024

Updated on

24-10-2024

Categories
projects
Licensed under


Comments