High temporal of precipitation is required for design and simulation of building components and environments. Over the 1990s and into the early 2000s the Bureau of Meteorology (BoM), weather stations transitioned from read-daily rain gauges to recording precipitation data by using tipping bucket rain guagues with high temporal resolution.
Climate files composed by concatenating the 12 most typical (or most extreme) months selected from three decades often include months where that site did not have a tipping bucket rain guage and so only have daily precipitation data to 9AM local clock time.
For this siutation, we have resumed our quest to disaggregate precipitation data by using Long Short-Term Memory (LSTM) model applying associations between precipitation and the other weather elements which have been measured hourly or better over those same three decades. The diagram belows describes brief concept of how we implement the model.
We use an LSTM model, a type of recurrent neural network (RNN). In theory, LSTMs handle long-term dependencies and sequence disaggregation tasks using special memory cells, each with three gates: input, output, and forget gates. These gates regulate information, allowing the network to learn important details while discarding irrelevant information.
Additionally, our architecture includes hidden layers for processing information, fully connected layers to integrate learned features, and a SoftMax layer that outputs probabilities of rainfall for each half-hour period.
An interesting fact, despite the name ‘short-term memory’, LSTMs are used for learning long-term patterns. The ‘short-term’ in the name refers to the neural network’s ability to capture data dependencies and patterns over short sequences, which are then extended to longer sequences using the inbuilt mode-memory mechanism.
Below, we present a comprehensive approach to disaggregating half hourly precipitations using a LSTM-based neural network model. The ability of LSTM architecture to capture temporal dependencies with high accuracies has helped us achieve an overall average RMSE of 0.05. Along with the high accuracies, the training time needed has reduced significantly because of the simplicity of the model built by us.
The Approach
Step 1 – Pre-Processing.
- Merge all RF data and produce half-hourly interpolations.
- Scaling the columns [‘ERH’, ‘ERN’, ‘DNI’, ‘DIF’, ‘CLOUD’, ‘DBT’, ‘RH’, ‘PRESSURE’, ‘WD’, ‘WS’] using the QuantileTransformer to transform the features to follow a uniform distribution.
- ERH: Extraterrestrial Radiation Horizontal
- ERN: Extraterrestrial Radiation Normal
- DNI: Direct Normal Irradiance
- DIF: Diffuse Horizontal Irradiance
- CLOUD: Cloud Cover
- DBT: Dry Bulb Temp
- RH: Relative Humidity
- PRESSURE: Atmospheric Pressure
- WD: Wind Direction
- WS: Wind Speed
- Drop duplicates, check missing values.
Step 2 – Modeling the LSTM architecture.
- Date Embedding: Linear layer that transforms date information into a 32-dimensional embedding.
- Climate Data Encoder: LSTM layer that encodes climate statistics with 10 features into hidden states.
- Decoder: LSTM layer that decodes combined inputs (date embedding, hidden state from climate encoder, and total precipitation) into hidden states.
- Output Layer: Linear layer that generates disaggregated precipitation proportions and uses SoftMax activation to output values.
Step 3 – Validation of Results.
- The result is validated for the MSE loss at each epoch on the validation set.
- After training the saved model is loaded for making disaggregated hour hourly precipitation values, which are further tested for RMSE loss using the Test data split.
Results
The Test results achieved are highlighted below:
The results achieved above highlights the model learning capacity. All half hourly disaggregated values are in 0.30 mm range, with the bulk of values averaging just below 0.1 for root mean squared error.
The highlighted results above, showcases the potential of using LSTMs for further analysis. We are now looking into implementing clustering techniques to further improve upon the current results. The updated result will be announced later.