In the realm of weather analysis and prediction, the significance of accurate and reliable data cannot be overstated. We continuously endeavour to improve our in-house software “Climate Cypher” in various aspects to generate more reliable weather and climate files. Reducing uncertainty requires a range of verification and analyses techniques. However, with more than 33 years of hourly data it is not feasible to re-check all raw data manually within limited time and resources.
Currently, we are verifying our latest Australian weather and climate data which includes observations from the period 1990-2022. We are reviewing the results by implementing a rigorous Quality Assurance (QA) process for the high priority 11 sites – the eight capital cities plus three additional locations to provide full coverage across the NCC climate zones – before releasing the data for client application.
In this article we share a few examples from this work, demonstrating how we are enhancing and improving our data. Parts of our QA process are outlined below:
1. Output visualisation
Following the creation of files in the TMY2 format containing hourly data across the full 33 years, we generate visualisations of statistics (in most cases this is monthly maxima, minima and averages) for the following parameters:
- Dry Bulb Temperature (DBT)
- Dew Point Temperature (DPT)
- Relative Humidity (RH)
- Global Horizontal Irradiation (GHI)
- Direct Normal Irradiation (DNI)
- Diffuse Horizontal Irradiation (DIF)
- Cloud Cover (CC)
- Wind Speed (WS)
2. Trend Analysis for Error Detection
By analysing these visualisations, we can swiftly identify trends including seasonal profiles and long-term climatic shifts. This makes it easier to spot unusual patterns and target suspicious trends for thorough investigation. A few examples of these seasonal (monthly) and longer-term charts are shown below:
Yearly Trend

Monthly Trend

3. Raw Data and Code Examination
Where we identify an unexpected result, we review the raw data (observations and instrument measurements) along with relevant sections of our code. This helps to uncover the source of any aberrant output.
4. Improvement
Once the underlying issue is understood, we enhance our code or communicate with our data providers (in most cases this is the Australian Bureau of Meteorology, BoM) to review the source and/or extraction of raw data.
5. Verification of Updated Output
Upon completion of improvements, we execute an updated version of Climate Cypher and compare its output with the previous version’s results. If the new output aligns with expectations, it’s deemed suitable for release. Otherwise, we revisit the code and raw data examination process.
Case Study: Cloud Cover
One recent enhancement involved a challenge with Cloud Cover using ground ceilometer observations wherever available instead of the more readily accessible estimates from geo-stationary satellite observations. Here we detail our improvement process for illustration.
Case 1 Unexpected trend in Cloud Cover for a specific period in Sydney
The TMY2 format reports Cloud Cover as tenths of sky that are covered in cloud. When reviewing our Sydney data, our concerns were raised by a prolonged period of low average of Cloud Cover indicating relatively clear sky which did not fit with seasonal profiles in the years prior and following.
Our investigations revealed that the average cloud cover reported in the source Meteorological Aerodrome Reports (MetAR data) between 2001 and 2005 was significantly lower than expected. This coincided with abnormal flags in the MetAR data which we had interpreted as “clear sky” but are also an indication of missing values arising from (for example) failures in the observation or recording equipment. The flag indicates that no clouds were detected below 12,500 ft, while we are conscious that actual cloud can reach significantly higher.
We adjusted our code to consider other reports of the observation, lowering our prioritisation of the MetAR data in these instances and resulting in a more typical average cloud cover result from 2001 to 2005.
The charts below refer to software version numbers which trace the following update history:
v.3.12.2 : Add linear interpolation for missing values.
v.3.12.4 : Adjust source (MetAR) data interpretation.
v.3.12.5 : Adjust MetAR clear sky flag interpretation.
v.3.12.6 : Update code for the format change (null to 0).
Before

After

Case 2 Unexpected trend in Cloud Cover for a specific period in Darwin
Similar concerns emerged in Darwin, where the Cloud Cover appeared to drop significantly after 2001 and maintain this trend. The period 2001-2017 seems to have been affected by the same issue described above for Sydney, but further investigation showed a different issue arising after 2017: null values in the MetAR source data were indicated with values of 0. Because the BoM data generally eschews nil values for cloud cover, our code was not prepared for this!
In the end, an updated version of Climate Cypher (v3.12.6) was created to revise our interpretation, and a more accurate and reasonable trend of cloud cover was produced.
Before

After

