Autoregressive models have gained popularity in the field of drug design due to their capability to sample novel molecules from a vast chemical space efficiently. Sampling novel and diverse molecules in an efficient manner is a crucial aspect, as it is important for downstream tasks such as reinforcement learning to identify novel molecules with pre-defined desired properties. Existing sampling strategies like multinomial sampling and beam search often struggle with mode collapses or are computational inefficient, respectively. To address these limitations, we introduce WEISS (Wasserstein efficient sampling strategy), a framework that seamlessly enables autoregressive models to efficiently sample diverse molecules. Our approach, which draws inspiration from the Wasserstein autoencoder, is compatible with any encoder–decoder-based autoregressive model. We show that WEISS effectively mitigates mode collapsing while maintaining token sampling speed 25 times faster than beam search. Secondly, we showcase the efficacy of the proposed method for various drug design tasks such as molecular property optimization and single-step retrosynthesis prediction.

WEISS: Wasserstein efficient sampling strategy for LLMs in drug design / Tedoldi, R., Li, J., Engkvist, O., Passerini, A., Westerlund, A.M., Tibo, A.. - In: MACHINE LEARNING: SCIENCE AND TECHNOLOGY. - ISSN 2632-2153. - 6:2(2025). [10.1088/2632-2153/addc33]

WEISS: Wasserstein efficient sampling strategy for LLMs in drug design

Passerini A.;
2025-01-01

Abstract

Autoregressive models have gained popularity in the field of drug design due to their capability to sample novel molecules from a vast chemical space efficiently. Sampling novel and diverse molecules in an efficient manner is a crucial aspect, as it is important for downstream tasks such as reinforcement learning to identify novel molecules with pre-defined desired properties. Existing sampling strategies like multinomial sampling and beam search often struggle with mode collapses or are computational inefficient, respectively. To address these limitations, we introduce WEISS (Wasserstein efficient sampling strategy), a framework that seamlessly enables autoregressive models to efficiently sample diverse molecules. Our approach, which draws inspiration from the Wasserstein autoencoder, is compatible with any encoder–decoder-based autoregressive model. We show that WEISS effectively mitigates mode collapsing while maintaining token sampling speed 25 times faster than beam search. Secondly, we showcase the efficacy of the proposed method for various drug design tasks such as molecular property optimization and single-step retrosynthesis prediction.
2025
2
Tedoldi, R.; Li, J.; Engkvist, O.; Passerini, A.; Westerlund, A. M.; Tibo, A.
WEISS: Wasserstein efficient sampling strategy for LLMs in drug design / Tedoldi, R., Li, J., Engkvist, O., Passerini, A., Westerlund, A.M., Tibo, A.. - In: MACHINE LEARNING: SCIENCE AND TECHNOLOGY. - ISSN 2632-2153. - 6:2(2025). [10.1088/2632-2153/addc33]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/472611
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
  • OpenAlex 1
social impact