In this paper, we present a method to synthetically generate the training material needed by machine learning algorithms to perform human action recognition from 2D videos. As a baseline pipeline, we consider a 2D video stream passing through a skeleton extractor (OpenPose), whose 2D joint coordinates are analyzed by a random forest. Such a pipeline is trained and tested using real live videos. As an alternative approach, we propose to train the random forest using automatically generated 3D synthetic videos. For each action, given a single reference live video, we edit a 3D animation (in Blender) using the rotoscoping technique. This prior animation is then used to produce a full training set of synthetic videos via perturbation of the original animation curves. Our tests, performed on live videos, show that our alternative pipeline leads to comparable accuracy, with the advantage of drastically reducing both the human effort and the computing power needed to produce the live training material.
Generation of Action Recognition Training Data Through Rotoscoping and Augmentation of Synthetic Animations / Covre, N.; Nunnari, F.; Fornaser, A.; De Cecco, M.. - 11614:(2019), pp. 23-42. (Intervento presentato al convegno 6th International Conference on Augmented Reality, Virtual Reality and Computer Graphics, SALENTO AVR 2019 tenutosi a italia nel 2019) [10.1007/978-3-030-25999-0_3].
Generation of Action Recognition Training Data Through Rotoscoping and Augmentation of Synthetic Animations
Covre N.;Fornaser A.;De Cecco M.
2019-01-01
Abstract
In this paper, we present a method to synthetically generate the training material needed by machine learning algorithms to perform human action recognition from 2D videos. As a baseline pipeline, we consider a 2D video stream passing through a skeleton extractor (OpenPose), whose 2D joint coordinates are analyzed by a random forest. Such a pipeline is trained and tested using real live videos. As an alternative approach, we propose to train the random forest using automatically generated 3D synthetic videos. For each action, given a single reference live video, we edit a 3D animation (in Blender) using the rotoscoping technique. This prior animation is then used to produce a full training set of synthetic videos via perturbation of the original animation curves. Our tests, performed on live videos, show that our alternative pipeline leads to comparable accuracy, with the advantage of drastically reducing both the human effort and the computing power needed to produce the live training material.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione