This paper indicates an approach of a continuous training pipeline to enhance deep learning models and assessing their feasibility based on an evaluation. The purpose of this research is to analyze the quality effect of a continuously learning neural network algorithm for document classification by taking user feedback into account. The hypothesis implies that user feedback through active learning increases the precision and thus makes the process of document classification more efficient. For this purpose, based on a utility analysis, the available technologies are identified, and necessary ones are selected for designing a software concept. TensorFlow as a deep learning framework, Tesseract as an OCR engine, and Apache Airflow for the life cycle management and for orchestrating the elements for the continuous training pipeline are used. This implementation of a machine learning as a service prototype allows for exploration into the synergistic effect between the use of active learning, in the form of user feedback, and the quality of document classification achieved by deep learning. In an experiment, the implemented service is used to analyze the models behavior based on three different states. This includes synthetic data and active learning in the form of user feedback through data from data augmentation and simulated realistic data. The result shows that active learning enhanced models indicate a higher accuracy than artificially generated models. The evaluation experiment confirms the hypothesis that user feedback with continuously learning models perform better in terms of generalizing within the document classification. In conclusion, the paper demonstrates the technical requirements for implementing a machine learning as a service and affirms that the use of active learning can be integrated into existing industrial systems.
Implementation and Evaluation of a MLaaS for Document Classification with Continuous Deep Learning Models / Walter-Tscharf, Franz Frederik Walter Viktor. - ELETTRONICO. - (2022), pp. 229-239. [10.1007/978-3-031-11232-4_20]
Implementation and Evaluation of a MLaaS for Document Classification with Continuous Deep Learning Models
Walter-Tscharf, Franz Frederik Walter Viktor
2022-01-01
Abstract
This paper indicates an approach of a continuous training pipeline to enhance deep learning models and assessing their feasibility based on an evaluation. The purpose of this research is to analyze the quality effect of a continuously learning neural network algorithm for document classification by taking user feedback into account. The hypothesis implies that user feedback through active learning increases the precision and thus makes the process of document classification more efficient. For this purpose, based on a utility analysis, the available technologies are identified, and necessary ones are selected for designing a software concept. TensorFlow as a deep learning framework, Tesseract as an OCR engine, and Apache Airflow for the life cycle management and for orchestrating the elements for the continuous training pipeline are used. This implementation of a machine learning as a service prototype allows for exploration into the synergistic effect between the use of active learning, in the form of user feedback, and the quality of document classification achieved by deep learning. In an experiment, the implemented service is used to analyze the models behavior based on three different states. This includes synthetic data and active learning in the form of user feedback through data from data augmentation and simulated realistic data. The result shows that active learning enhanced models indicate a higher accuracy than artificially generated models. The evaluation experiment confirms the hypothesis that user feedback with continuously learning models perform better in terms of generalizing within the document classification. In conclusion, the paper demonstrates the technical requirements for implementing a machine learning as a service and affirms that the use of active learning can be integrated into existing industrial systems.| File | Dimensione | Formato | |
|---|---|---|---|
|
978-3-031-11232-4.pdf
Solo gestori archivio
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.39 MB
Formato
Adobe PDF
|
1.39 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



