Progress on Precipitation Data – Disaggregation of 1990s Daily Data

In the May edition of Exemplary Advances, we introduced our research into precipitation disaggregation from daily to half-hourly using a long short-term memory (LSTM) model. Since then, we’ve experimented with a number of other architectures, including foundation models specifically designed for time-series applications. Our most promising results so far have come from a modified LSTM.

High temporal resolution of precipitation is necessary for the design and simulation of building components. Due to tipping bucket rain gauges only being installed from the 1990s and the early 2000s in most localities, climate files created by concatenating the twelve most typical months selected from three decades often include months where that site only has daily precipitation data, measured at 9am local time. Our research aims to provide high-quality, high-resolution half-hourly precipitation data that is consistent with the measured daily value and the simultaneous hourly values of the other weather elements.

LSTM is a recurrent neural network that can handle long-term dependencies using special memory cells which allow the network to learn important details from a sequence of data while discarding irrelevant information. The ‘short-term’ in its name refers to the network’s ability to capture data dependencies and patterns over short sequences that are then extended to longer sequences using the memory mechanism. A diagram of our new architecture can be seen below.

The three major changes we undertook were:

1) Adding a differentiable normalization layer to enforce the daily total constraint from the model itself, rather than later during post-processing. Previously, we used the SoftMax activation function to ensure outputs summed to 1 were multiplied by the daily total when generating the results. However, this has two main issues:

  • First is that SoftMax is non-linear, meaning that it distorts the contributions of time periods to the total.
  • Second is that the daily total is not accounted for at all during training.

Our new layer enforces this constraint effectively during training, leading to more accurate results.

2) Re-shaping the inputs and outputs. We now estimate each half-hour period one-at-a-time and squeeze these into a single tensor, rather than producing 24-hour estimations at each timestep and taking the last estimated value as the output tensor.

3) Further feature engineering. We have some more input variable combinations to integrate into the new model, but each new feature adds significant ‘training’ time for the software. Currently, we are testing and trying to find the best combination based on correlation with precipitation to find the pros of adding more features.

We’ve seen significant improvement in several key metrics.

We are now exceeding the performance of Exemplary’s previous hourly precipitation disaggregation method using a Markov Chain Monte Carlo (MCMC) approach despite estimating to a finer resolution. For example, if we compare absolute error in the total number of rainfall periods, the LSTM scores ~13%, which is within the range recorded by our colleagues’ previous work despite being a much finer time series. Further, the LSTM correctly detects 64.8% of rainfall periods with no temporal error – this is a 2x improvement over the previous model, and still five percentage points more accurate than the previous model with a ± 2-hour window. Even if our new model is precisely predicting daily precipitation, we still need to improve the model to translate it to half hourly resolution.

The graph above shows our current model’s synthesised half hourly precipitation against real data. The model works well on low precipitation while there are big gaps on high precipitation values. It performs 57% of precision and 67% of recall but we are aiming to get these two factors above 80 % in our next iteration of the code.

We are working on hyperparameter fine-tuning to get better results and a testing setup to evaluate the model’s performance for a number of Australian cities. The updated results will be announced in future editions of “Exemplary Advances”.


Project Team: Hong Gic Oh (leader), Nayan Aroroa (graduate) and Harrison Oates (intern)

Leave a Reply