The availability of systems able to process and analyse big amount of data has boosted scientific advances in several fields. Workflows provide an effective tool to define and manage large sets of processing tasks. In the big data analytics area, the Ophidia project provides a cross-domain big data analytics framework for the analysis of scientific, multi-dimensional datasets. The framework exploits a server-side, declarative, parallel approach for data analysis and mining. It also features a complete workflow management system to support the execution of complex scientific data analysis, schedule tasks submission, manage operators dependencies and monitor jobs execution. The workflow management engine allows users to perform a coordinated execution of multiple data analytics operators (both single and massive - parameter sweep) in an effective manner. For the definition of the big data analytics workflow, a JSON schema has been properly designed and implemented. To aid the definition of the workflows, a visual design language consisting of several symbols, named Data Analytics Workflow Modelling Language (DAWML), has been also defined.

A workflow-enabled big data analytics software stack for escience / Palazzo, C.; Mariello, A.; Fiore, S.; D'Anca, A.; Elia, D.; Williams, D. N.; Aloisio, G.. - (2015), pp. 545-552. (Intervento presentato al convegno 13th International Conference on High Performance Computing and Simulation, HPCS 2015 tenutosi a Amsterdam, the Netherlands nel 2015) [10.1109/HPCSim.2015.7237088].

A workflow-enabled big data analytics software stack for escience

Mariello A.;Fiore S.;
2015-01-01

Abstract

The availability of systems able to process and analyse big amount of data has boosted scientific advances in several fields. Workflows provide an effective tool to define and manage large sets of processing tasks. In the big data analytics area, the Ophidia project provides a cross-domain big data analytics framework for the analysis of scientific, multi-dimensional datasets. The framework exploits a server-side, declarative, parallel approach for data analysis and mining. It also features a complete workflow management system to support the execution of complex scientific data analysis, schedule tasks submission, manage operators dependencies and monitor jobs execution. The workflow management engine allows users to perform a coordinated execution of multiple data analytics operators (both single and massive - parameter sweep) in an effective manner. For the definition of the big data analytics workflow, a JSON schema has been properly designed and implemented. To aid the definition of the workflows, a visual design language consisting of several symbols, named Data Analytics Workflow Modelling Language (DAWML), has been also defined.
2015
Proceedings of the 2015 International Conference on High Performance Computing and Simulation, HPCS 2015
Piscataway (New Jersey)‎
Institute of Electrical and Electronics Engineers Inc.
978-1-4673-7812-3
978-1-4673-7813-0
Palazzo, C.; Mariello, A.; Fiore, S.; D'Anca, A.; Elia, D.; Williams, D. N.; Aloisio, G.
A workflow-enabled big data analytics software stack for escience / Palazzo, C.; Mariello, A.; Fiore, S.; D'Anca, A.; Elia, D.; Williams, D. N.; Aloisio, G.. - (2015), pp. 545-552. (Intervento presentato al convegno 13th International Conference on High Performance Computing and Simulation, HPCS 2015 tenutosi a Amsterdam, the Netherlands nel 2015) [10.1109/HPCSim.2015.7237088].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/331708
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 12
social impact