Skip to main content Skip to footer
  • "com.cts.aem.core.models.NavigationItem@4c7a67eb" Careers
  • "com.cts.aem.core.models.NavigationItem@4a41c285" News
  • "com.cts.aem.core.models.NavigationItem@7e0faf5a" Events
  • "com.cts.aem.core.models.NavigationItem@7c5849d9" Investors
Cognizant Blog

As part of River Deep Mountain AI, we are now releasing the first iteration of our orthophosphate AI-model on GitHub. Our AI-driven model will transform how we predict and understand river water quality.

The UK’s water environment is under huge pressure from population growth, climate change, and pollution, with only 14% of English rivers achieving good or above ecological health status.Confronted with challenges of this scale and complexity, the water sector needs innovative solutions to protect the health of our rivers and ensure clean water for future generations. 

Phosphorus concentrations are critical indicators of water quality and ecological health, with high values signalling nutrient pollution that threatens aquatic life and water ecosystems. However, traditional sampling methods for measuring phosphorus compounds, such as orthophosphate, are time-intensive and costly. Consequently, historical measurements of phosphorous compounds are sparse across time and space. 

map of England

Image: Map of orthophosphate concentrations across England, showing values for 2023, averaged first to daily for each monitoring point, and then averaged to a value for each hydrological area (EA Hydrological Boundaries). The classification shown broadly reflects the various boundaries for the UKTAG standards for phosphorus. 

The current monitoring systems and limited data availability make it difficult to observe the fluctuations in orthophosphate over shorter periods of time, hindering effective investigations into finding sources of nutrient pollution.

To address this gap, we have leveraged advanced artificial intelligence and machine learning (AI/ML) to develop a model capable of predicting orthophosphate concentrations reliably. While this model is not a conceptual representation of the underlying processes, it captures patterns and correlations in data that can supplement traditional scientific approaches. Its predictive capabilities enable both short-term forecasting and retrospective estimation, reducing dependence on time-intensive laboratory testing. Furthermore, the relationships identified by the model may serve as a useful cross-check against conceptual water quality models, potentially revealing new hypotheses or validating existing assumptions about phosphorus dynamics.
 

Unlocking insights from decades of data

Our model harnesses 24 years of water quality data from the Environment Agency Water Quality Archive, encompassing over 70 million observations from more than 23,000 sampling points across 55 rivers. The dataset includes more than 3,000 physicochemical parameters – from pH levels to nutrient concentrations – and provides a holistic view of catchment dynamics. By training AI/ML models on these data, we can establish the relationships between these parameters and orthophosphate concentration.

To ensure accuracy, we have collaborated closely with domain experts from across the water sector to conduct extensive exploratory data analysis (EDA), uncovering patterns such as seasonal trends, rainfall impacts, and catchment-specific behaviours. Advanced techniques such as agglomerative clustering and K-Means++ segmentation identified groups of determinants influencing orthophosphate levels, while principal component analysis (PCA), SelectKBest, LassoCV, and f_Regression streamlined feature selection.   
 
The modelling phase tested a suite of algorithms: traditional regression models (linear regression) for baseline insights, tree-based approaches (Random Forest, XGBoost, LightGBM) to capture non-linear relationships, artificial neural networks (ANN) for high-dimensional pattern recognition and Large Language Models (LLMs) such as ChemBERTa (built on RoBERTa), NIH Global chemical database, ChemDataExtractor and KMeans++ for segmentations.

We used advanced AI models, specifically, bidirectional transformers. These are AI models (such as BERT) that understand context by looking at words (or in this case, molecular features) in both directions. This allows the study of how molecules, such as certain environmental pollutants (determinands) and phosphates, are similar at a molecular level. Although this study has provided segmented results, it did not define clear boundaries between these segments.Therefore, we enhanced our analysis by using agglomerative clustering, which is more accurate in performing similarity analysis. For this analysis, we used NIH global chemical data base, RDKit and Cambridge ChemDataExtractor library. These models helped reveal which molecules are chemically attracted or compatible with each other. This understanding allowed us to improve our engineering methods and better connect large-scale environmental data with detailed molecular chemistry.

The Open Orthophosphate Model demonstrates performance, achieving a Mean Absolute Error (MAE) of 0.01, a Mean Squared Error (MSE) of 0.00123, and a Normalised Root Mean Squared Error (NRMSE) of 8%.
 

 A collaborative and Open-Source approach 

The overarching objective of River Deep Mountain AI is to bring key stakeholders involved in waterbody health together and collaboratively develop Open-Source AI/ML models that can inform effective actions to tackle waterbody pollution.  

All our models will be released into the public domain to democratise artificial intelligence and benefit the entire water sector. The first iterations of our models are being released in July 2025, and the second iterations will be released in November 2025.

Access the first iteration of our Open Orthophosphate Model via GitHub.

River Deep Mountain AI is funded by the Ofwat Innovation Fund and consists of 4 core partners: Northumbrian Water, Cognizant Ocean, Xylem Inc, Water Research Centre Limited, The Rivers Trust and ADAS. The project is further supported by 4 water companies across the United Kingdom.

 




Latest posts
Related posts