Comprehensive representation is key for improving controllability in generative neural networks. We present an approach for learning disentangled latent representations of individual instrumental notes, leveraging a Variational Autoencoderbased architecture designed to operate on spectrograms and explicitly capture four musical descriptors: timbre, pitch, dynamics, and duration. To achieve a structured and interpretable latent space, we exploit a combination of Gaussian Mixture priors, adversarial training, and auxiliary supervised clustering, promoting both compactness and semantic coherence in the learned representations yet preserving the ability to accurately reconstruct the original spectrograms. Experimental results and latent space explorations on the TinySol dataset show the effectiveness of the proposed approach, outperforming baseline models and existing methods in key metrics of reconstruction quality and classification accuracy.
Exploring Multi-Descriptor Disentangled Representations of Acoustic Instrument Notes / Dal Rì, Francesco; Giudici, Gregorio Andrea; Turchet, Luca; Conci, Nicola. - (2025), pp. 416-420. ( EUSIPCO Palermo, IT 8th-12th September 2025) [10.23919/eusipco63237.2025.11226170].
Exploring Multi-Descriptor Disentangled Representations of Acoustic Instrument Notes
Francesco Ardan Dal Ri;Gregorio Andrea Giudici;Luca Turchet;Nicola Conci
2025-01-01
Abstract
Comprehensive representation is key for improving controllability in generative neural networks. We present an approach for learning disentangled latent representations of individual instrumental notes, leveraging a Variational Autoencoderbased architecture designed to operate on spectrograms and explicitly capture four musical descriptors: timbre, pitch, dynamics, and duration. To achieve a structured and interpretable latent space, we exploit a combination of Gaussian Mixture priors, adversarial training, and auxiliary supervised clustering, promoting both compactness and semantic coherence in the learned representations yet preserving the ability to accurately reconstruct the original spectrograms. Experimental results and latent space explorations on the TinySol dataset show the effectiveness of the proposed approach, outperforming baseline models and existing methods in key metrics of reconstruction quality and classification accuracy.| File | Dimensione | Formato | |
|---|---|---|---|
|
Exploring_Multi-Descriptor_Disentangled_Representations_of_Acoustic_Instrument_Notes.pdf
Solo gestori archivio
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
9.27 MB
Formato
Adobe PDF
|
9.27 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



