Machine Learning applications in Hydrology

Zanoni, Maria Grazia

doi:10.15168/11572_384010

This work focuses on the use of Artificial Intelligence (AI), and in particular Machine Learning (ML) to tackle quality and quantity aspects of both surface water and groundwater. Traditionally, river water quality modelling and contaminant transport in groundwater studies resort to the solution of physical-based (PB) equations, which aim to define a conceptual model of reality. The complexity of the processes involved, in some cases undisclosed or indiscernible, calls for a sensitive parameterization by the modeler. For such reason, the PB models can be limited by the complexity of the system, the availability of data, and the consequent need for simplifying assumptions. On the other hand, ML models are data-driven and rely on algorithms to identify patterns in data. These techniques aim to extract a surrogate representation of the reality by learning existing correlations in data. They can handle complex and non-linear relationships between variables and can be more flexible and adaptable to new environments. However, they are directly affected by the quality and quantity of available data, requiring larger datasets than PB models. To explore the potential of these methods in addressing surface and groundwater challenges,we experimented with different algorithms in three distinct applications. First, we compared two ML techniques for a water quality catchment-scale model and the most performing was then employed to fill the gaps in environmental time series and to enhance the prediction of a PB model in the groundwater context. Therefore, in the first part of this work, a water quality model of the Adige River Basin is presented and discussed. For this purpose, Random Forest (RF) and Dense feed-forward Neural Network (DNN) were applied and compared to a standard linear regression (LR) approach and an Importance Features Assessment (IFA) of the drivers was performed. DNN showed to be more flexible and effective in detecting non-linear relationships than RF. LR performed at a satisfactory level, similar to RF and DNN, only when drivers linearly correlated to the observational variable were used, and a limited fraction of variability was explained. However, important drivers, non-linearly related to the water quality variables of interest introduced a significant gain when DNN was used. Regarding the variables investigated, water temperature and dissolved oxygen were modeled accurately, using RF or DNN, and sufficient accuracy was obtained by using the minimum information available, represented here by the Julian day of the measurements embodying the seasonality. The other variables showed instead a more balanced influence by the complete set of drivers, appreciable in the IFA procedure for DNN and RF, and a geogenic origin and anthropogenic disturbances were confirmed for chemical contaminants. The proposed analysis, by means of ML algorithms and through the IFA of the drivers, can be applied to predict spatial and temporal variability of contaminant concentrations and physical parameters and to identify the external forcing exerting the most relevant impacts on the dynamics of water quality variables. The second part of the thesis investigated the use of the DNN algorithm to gap-fill time series measurements, for daily flow rate and daily water temperatures from different sites downstream of the Careser glacier, in Pejo valley (northeastern Italy). Thus, an in-depth analysis of the streamflow response to the hydrological regime alterations of the glacier was carried out, through the reconstruction of the time series of the flow rate measured at a gauging station downstream the glacier, in the period 1976-2019. The water temperature time series, instead, were correlated to the macro-invertebrate population’s statistics in the same period at four sites along the Careser stream from the glacier to the reservoir immediately downstream the Careser Baia gauging station. In the first step, the water temperature was modelled just through the Julian day and air temperature information and, subsequently, precipitation, reconstructed flow rate, and evapotranspiration were introduced for sensitivity analysis of the features. With air temperature projections, the DNN model of the water temperature was also applied to simulate future scenarios up to 2050, considering different emission pathways. In this case, DNN proved to be a reliable tool for gap-filling the observational time series, even for time series with many gaps. The reconstructions of the water temperature allowed us to estimate the delay between the warming in air and water temperature and the effect on the biological invertebrate species in the glacier streams. The sensitivity analysis of the features was again key in underlining the contributions of the forcing available, unveiling the combined effects of the warming in air temperature and the decline of flow rate on the water temperature increase. The in-depth analysis of the flow rate revealed, besides the dramatic reduction of streamflow, the anticipation of the summer peak and the negligible influence of the precipitation in these alterations. Lastly, the framework for an ML-PB hybrid model in the context of contaminant transport by groundwater was presented. In this procedure, the contaminant concentration at several sampling locations was associated with physical parameters characterizing the aquifer. Through a synthetic case, a DNN model was employed to predict the physical parameters and a simplified PB equation was used to project the concentration into the future. The analysis demonstrated the capability of DNN to predict physical parameters by capitalizing on the information contained in the available concentration measurements. The thesis is articulated through 7 chapters. In Chapter 1, a broad overview of Machine Learning is presented, with its specific applications in Water sciences and the consequent motivations and objectives of this research. In Chapter 2 the main Machine Learning basic concepts are clarified and presented, in order to set the floor for the successive developments in which ML is applied to surface and subsurface hydrology. Chapter 3 covers the Machine Learning and statistical algorithms employed for modeling in the current research. In Chapter 4, Adige water catchment case study is presented and discussed. In Chapter 5, the gap-filling time series procedure for Careser case study is presented for both the variables investigated. In Chapter 6 the results of the hybrid Machine-Learning Physics-Based application of a groundwater model on synthetic data are presented. Finally, remarks and conclusions are summarized in Chapter 7, which provides also perspective work for these applications.

Machine Learning applications in Hydrology / Zanoni, Maria Grazia. - (2023 Jul 24), pp. 1-171. [10.15168/11572_384010]

Machine Learning applications in Hydrology

Zanoni, Maria Grazia

2023-07-24

Abstract

This work focuses on the use of Artificial Intelligence (AI), and in particular Machine Learning (ML) to tackle quality and quantity aspects of both surface water and groundwater. Traditionally, river water quality modelling and contaminant transport in groundwater studies resort to the solution of physical-based (PB) equations, which aim to define a conceptual model of reality. The complexity of the processes involved, in some cases undisclosed or indiscernible, calls for a sensitive parameterization by the modeler. For such reason, the PB models can be limited by the complexity of the system, the availability of data, and the consequent need for simplifying assumptions. On the other hand, ML models are data-driven and rely on algorithms to identify patterns in data. These techniques aim to extract a surrogate representation of the reality by learning existing correlations in data. They can handle complex and non-linear relationships between variables and can be more flexible and adaptable to new environments. However, they are directly affected by the quality and quantity of available data, requiring larger datasets than PB models. To explore the potential of these methods in addressing surface and groundwater challenges,we experimented with different algorithms in three distinct applications. First, we compared two ML techniques for a water quality catchment-scale model and the most performing was then employed to fill the gaps in environmental time series and to enhance the prediction of a PB model in the groundwater context. Therefore, in the first part of this work, a water quality model of the Adige River Basin is presented and discussed. For this purpose, Random Forest (RF) and Dense feed-forward Neural Network (DNN) were applied and compared to a standard linear regression (LR) approach and an Importance Features Assessment (IFA) of the drivers was performed. DNN showed to be more flexible and effective in detecting non-linear relationships than RF. LR performed at a satisfactory level, similar to RF and DNN, only when drivers linearly correlated to the observational variable were used, and a limited fraction of variability was explained. However, important drivers, non-linearly related to the water quality variables of interest introduced a significant gain when DNN was used. Regarding the variables investigated, water temperature and dissolved oxygen were modeled accurately, using RF or DNN, and sufficient accuracy was obtained by using the minimum information available, represented here by the Julian day of the measurements embodying the seasonality. The other variables showed instead a more balanced influence by the complete set of drivers, appreciable in the IFA procedure for DNN and RF, and a geogenic origin and anthropogenic disturbances were confirmed for chemical contaminants. The proposed analysis, by means of ML algorithms and through the IFA of the drivers, can be applied to predict spatial and temporal variability of contaminant concentrations and physical parameters and to identify the external forcing exerting the most relevant impacts on the dynamics of water quality variables. The second part of the thesis investigated the use of the DNN algorithm to gap-fill time series measurements, for daily flow rate and daily water temperatures from different sites downstream of the Careser glacier, in Pejo valley (northeastern Italy). Thus, an in-depth analysis of the streamflow response to the hydrological regime alterations of the glacier was carried out, through the reconstruction of the time series of the flow rate measured at a gauging station downstream the glacier, in the period 1976-2019. The water temperature time series, instead, were correlated to the macro-invertebrate population’s statistics in the same period at four sites along the Careser stream from the glacier to the reservoir immediately downstream the Careser Baia gauging station. In the first step, the water temperature was modelled just through the Julian day and air temperature information and, subsequently, precipitation, reconstructed flow rate, and evapotranspiration were introduced for sensitivity analysis of the features. With air temperature projections, the DNN model of the water temperature was also applied to simulate future scenarios up to 2050, considering different emission pathways. In this case, DNN proved to be a reliable tool for gap-filling the observational time series, even for time series with many gaps. The reconstructions of the water temperature allowed us to estimate the delay between the warming in air and water temperature and the effect on the biological invertebrate species in the glacier streams. The sensitivity analysis of the features was again key in underlining the contributions of the forcing available, unveiling the combined effects of the warming in air temperature and the decline of flow rate on the water temperature increase. The in-depth analysis of the flow rate revealed, besides the dramatic reduction of streamflow, the anticipation of the summer peak and the negligible influence of the precipitation in these alterations. Lastly, the framework for an ML-PB hybrid model in the context of contaminant transport by groundwater was presented. In this procedure, the contaminant concentration at several sampling locations was associated with physical parameters characterizing the aquifer. Through a synthetic case, a DNN model was employed to predict the physical parameters and a simplified PB equation was used to project the concentration into the future. The analysis demonstrated the capability of DNN to predict physical parameters by capitalizing on the information contained in the available concentration measurements. The thesis is articulated through 7 chapters. In Chapter 1, a broad overview of Machine Learning is presented, with its specific applications in Water sciences and the consequent motivations and objectives of this research. In Chapter 2 the main Machine Learning basic concepts are clarified and presented, in order to set the floor for the successive developments in which ML is applied to surface and subsurface hydrology. Chapter 3 covers the Machine Learning and statistical algorithms employed for modeling in the current research. In Chapter 4, Adige water catchment case study is presented and discussed. In Chapter 5, the gap-filling time series procedure for Careser case study is presented for both the variables investigated. In Chapter 6 the results of the hybrid Machine-Learning Physics-Based application of a groundwater model on synthetic data are presented. Finally, remarks and conclusions are summarized in Chapter 7, which provides also perspective work for these applications.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				24-lug-2023
			
	Ciclo
	
				XXXV
			
	Anno Accademico
	
				2022-2023
			
	Dipartimento
	
				Ingegneria civile, ambientale e mecc (29/10/12-)
			
	Corso di dottorato
	
				Civil, Environmental and Mechanical Engineering
			
	Supervisore/Relatore di tesi Unitn (Unitn internal supervisor)
	
				Bellin, Alberto
Majone, Bruno
			
	Supervisore aggiunto/Correlatore esterno (External Co-supervisor)
	
				de Barros, Felipe P.J.
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Paese dell'Istituzione/ente esterno in caso di cotutela o collaborazioni internazionali (Country of the Institution in case of bi-nationally supervised PhD thesis or other international collaborations).
	
				STATI UNITI D'AMERICA
			
	Codice DOI
	
				https://dx.doi.org/10.15168/11572_384010
			
	Lingua (Language)
	
				Inglese
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
phd_unitn_Zanoni_MariaGrazia.pdf Open Access dal 24/07/2025 Descrizione: Tesi di dottorato - Maria Grazia Zanoni Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 77.27 MB Formato Adobe PDF Visualizza/Apri	77.27 MB	Adobe PDF	Visualizza/Apri