Comprehensive representation is key for improving controllability in generative neural networks. We present an approach for learning disentangled latent representations of individual instrumental notes, leveraging a Variational Autoencoderbased architecture designed to operate on spectrograms and explicitly capture four musical descriptors: timbre, pitch, dynamics, and duration. To achieve a structured and interpretable latent space, we exploit a combination of Gaussian Mixture priors, adversarial training, and auxiliary supervised clustering, promoting both compactness and semantic coherence in the learned representations yet preserving the ability to accurately reconstruct the original spectrograms. Experimental results and latent space explorations on the TinySol dataset show the effectiveness of the proposed approach, outperforming baseline models and existing methods in key metrics of reconstruction quality and classification accuracy.

Exploring Multi-Descriptor Disentangled Representations of Acoustic Instrument Notes / Dal Rì, Francesco; Giudici, Gregorio Andrea; Turchet, Luca; Conci, Nicola. - (2025), pp. 416-420. ( EUSIPCO Palermo, IT 8th-12th September 2025) [10.23919/eusipco63237.2025.11226170].

Exploring Multi-Descriptor Disentangled Representations of Acoustic Instrument Notes

Francesco Ardan Dal Ri;Gregorio Andrea Giudici;Luca Turchet;Nicola Conci
2025-01-01

Abstract

Comprehensive representation is key for improving controllability in generative neural networks. We present an approach for learning disentangled latent representations of individual instrumental notes, leveraging a Variational Autoencoderbased architecture designed to operate on spectrograms and explicitly capture four musical descriptors: timbre, pitch, dynamics, and duration. To achieve a structured and interpretable latent space, we exploit a combination of Gaussian Mixture priors, adversarial training, and auxiliary supervised clustering, promoting both compactness and semantic coherence in the learned representations yet preserving the ability to accurately reconstruct the original spectrograms. Experimental results and latent space explorations on the TinySol dataset show the effectiveness of the proposed approach, outperforming baseline models and existing methods in key metrics of reconstruction quality and classification accuracy.
2025
2025 33rd European Signal Processing Conference (EUSIPCO)
New York City, New York, USA
IEEE
978-9-4645-9362-4
Dal Rì, Francesco; Giudici, Gregorio Andrea; Turchet, Luca; Conci, Nicola
Exploring Multi-Descriptor Disentangled Representations of Acoustic Instrument Notes / Dal Rì, Francesco; Giudici, Gregorio Andrea; Turchet, Luca; Conci, Nicola. - (2025), pp. 416-420. ( EUSIPCO Palermo, IT 8th-12th September 2025) [10.23919/eusipco63237.2025.11226170].
File in questo prodotto:
File Dimensione Formato  
Exploring_Multi-Descriptor_Disentangled_Representations_of_Acoustic_Instrument_Notes.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 9.27 MB
Formato Adobe PDF
9.27 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/473834
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex 0
social impact