The rapid growth in interest in deep learning and foundation models (FMs) in particular, has attracted the attention of a diverse range of researchers thanks to their generalization ability. However, the advent of these techniques has also brought to light the lack of transparency and rigor in the way development is pursued. In particular, the inability to determine the number of epochs and other hyperparameters in advance presents challenges in identifying the best model. To address this challenge, machine learning frameworks such as MLFlow can automate the collection of this type of information. However, these tools capture data using proprietary formats and pose little attention to lineage. This paper proposes yProv4ML, a framework that captures provenance information generated during machine learning processes in PROV-JSON format, with minimal code modification.
The rapid growth in interest in deep learning and foundation models (FMs) in particular, has attracted the attention of a diverse range of researchers thanks to their generalization ability. However, the advent of these techniques has also brought to light the lack of transparency and rigor in the way development is pursued. In particular, the inability to determine the number of epochs and other hyperparameters in advance presents challenges in identifying the best model. To address this challenge, machine learning frameworks such as MLFlow can automate the collection of this type of information. However, these tools capture data using proprietary formats and pose little attention to lineage. This paper proposes yProv4ML, a framework that captures provenance information generated during machine learning processes in PROV-JSON format, with minimal code modification.
yProv4ML: Effortless provenance tracking for machine learning systems / Padovani, G.; Anantharaj, V.; Fiore, S.. - In: SOFTWAREX. - ISSN 2352-7110. - 31:September 2025, 102298(2025). [10.1016/j.softx.2025.102298]
yProv4ML: Effortless provenance tracking for machine learning systems
Padovani, G.Primo
;Fiore, S.
2025-01-01
Abstract
The rapid growth in interest in deep learning and foundation models (FMs) in particular, has attracted the attention of a diverse range of researchers thanks to their generalization ability. However, the advent of these techniques has also brought to light the lack of transparency and rigor in the way development is pursued. In particular, the inability to determine the number of epochs and other hyperparameters in advance presents challenges in identifying the best model. To address this challenge, machine learning frameworks such as MLFlow can automate the collection of this type of information. However, these tools capture data using proprietary formats and pose little attention to lineage. This paper proposes yProv4ML, a framework that captures provenance information generated during machine learning processes in PROV-JSON format, with minimal code modification.| File | Dimensione | Formato | |
|---|---|---|---|
|
1-s2.0-S235271102500264X-main.pdf
accesso aperto
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Creative commons
Dimensione
1.33 MB
Formato
Adobe PDF
|
1.33 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



