As part of the Ofwat-funded innovation project River Deep Mountain AI (RDMAI), we are releasing our open-source anomaly detection model, built to enhance the value of continuous water quality monitoring.
In response to the critical condition of UK rivers, the government has strengthened its commitment to environmental protection and made early detection of water pollution a top priority. Strengthened regulation, including the Environment Act 2021 (UK Parliament, 2021), is driving the water sector towards robust monitoring approaches. This will transform how the sector identifies, interprets, and responds to environmental risks and incidents. Under Section 82 of the Environment Act of 2021, Water companies in England will be required to continuously monitor upstream and downstream of sewerage outfalls. It is estimated that 40,000 multi-parameter sondes will need to be deployed across England to meet this requirement. These sondes will be recording data at a high frequency (every 15 minutes at high-risk times and every hour at other times), generating millions of data points annually.
Time-intensive and manual review processes have been applied to traditional, low-frequency water quality monitoring. These review processes often fail to distinguish between natural variations (trends and seasonality) and pollution-driven anomalies. Moreover, the unique hydrological pattern of each river makes it difficult to identify anomalies in the complex data readings. Manual interpretation of monitoring data and timely identification of data anomalies within high-frequency Section 82 sonde data will be impossible due to the sheer volume of data produced.
Our open-source anomaly detection model is a tool designed to automatically distinguish pollution-driven anomalies from natural variations, that can be applied at scale to high-frequency water quality monitoring data, to detect pollution-driven anomalies.
Leveraging AI for Anomaly Detection
We’re proud to be publishing the first iteration of our AI-based Anomaly Detection model for river water quality monitoring. Built on high-frequency, multi-parameter datasets, including pH, dissolved oxygen, turbidity, ammonia, temperature and electrical conductivity; the model is designed to handle high volumes of data with minimal human intervention.
At its core, the framework uses advanced time-series decomposition techniques, blending Multiple Seasonal-Trend decomposition using Loess (MSTL), harmonic regression, and Butterworth filtering to untangle trends, seasonal cycles, and residual patterns in each parameter. This step is crucial for removing trends and seasonality which can mask potential anomalies.
The residual signals are fed into an unsupervised Isolation Forest model to automatically detect anomalies across multiple parameters simultaneously. Isolation Forest is an anomaly detection algorithm that learns the underlying structure of the data without labelled examples and isolates observations that significantly deviate from normal patterns.
Once anomalous observations are identified, K-Means clustering is applied to group these anomalies into clusters of events with similar signatures. Each cluster represents a distinct pattern of abnormal behaviour, often corresponding to specific types of events in the river system.
To enhance interpretability, PCA (Principal Component Analysis) biplots are used to visualize the clustered anomalies in reduced-dimensional space. These biplots not only display the separation of event clusters but also illustrate how individual water quality parameters contribute to each anomaly pattern.