K-means is a popular clustering algorithm with significant applications in numerous scientific and engineering areas. One drawback of K-means is its inability to identify non-linearly separable clusters, which may lead to inaccurate solutions in certain cases. Kernel K-means is a variant of classical K-means that can find non-linearly separable clusters. However, it scales quadratically with respect to the size of the dataset, taking several minutes to cluster even mediumsized datasets on traditional CPU-based machines.In this paper, we present a formulation of Kernel K-means using sparse-dense matrix multiplication (SpMM) and sparse matrix-vector multiplication (SpMV), and we show that our formulation enables the rapid implementation of a fast GPU-based version of Kernel K-means with little programming effort. Our implementation, named Popcorn, is the first opensource GPU-based implementation of Kernel K-means.Popcorn achieves a speedup of up to 123.8x over a CPU implementation of Kernel K-means and a speedup of up to 2.6x over a GPU implementation of Kernel K-means that does not use sparse matrix computations. Our results support the effectiveness of sparse matrices as tools for efficient parallel programming.

K-means is a popular clustering algorithm with significant applications in numerous scientific and engineering areas. One drawback of K-means is its inability to identify non-linearly separable clusters, which may lead to inaccurate solutions in certain cases. Kernel K-means is a variant of classical K-means that can find non-linearly separable clusters. However, it scales quadratically with respect to the size of the dataset, taking several minutes to cluster even medium-sized datasets on traditional CPU-based machines. In this paper, we present a formulation of Kernel K-means using sparse-dense matrix multiplication (SpMM) and sparse matrix-vector multiplication (SpMV), and we show that our formulation enables the rapid implementation of a fast GPU-based version of Kernel K-means with little programming effort. Our implementation, named Popcorn, is the first open-source GPU-based implementation of Kernel K-means. Popcorn achieves a speedup of up to 123.8× over a CPU implementation of Kernel K-means and a speedup of up to 2.6× over a GPU implementation of Kernel K-means that does not use sparse matrix computations. Our results support the effectiveness of sparse matrices as tools for efficient parallel programming.

Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra / Bellavita, J.; Pasquali, T.; Del Rio Martin, L.; Vella, F.; Guidi, G.. - (2025), pp. 426-440. ( 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2025 usa 2025) [10.1145/3710848.3710887].

Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra

Pasquali T.
Co-primo
;
Del Rio Martin L.;Vella F.
Co-ultimo
;
2025-01-01

Abstract

K-means is a popular clustering algorithm with significant applications in numerous scientific and engineering areas. One drawback of K-means is its inability to identify non-linearly separable clusters, which may lead to inaccurate solutions in certain cases. Kernel K-means is a variant of classical K-means that can find non-linearly separable clusters. However, it scales quadratically with respect to the size of the dataset, taking several minutes to cluster even mediumsized datasets on traditional CPU-based machines.In this paper, we present a formulation of Kernel K-means using sparse-dense matrix multiplication (SpMM) and sparse matrix-vector multiplication (SpMV), and we show that our formulation enables the rapid implementation of a fast GPU-based version of Kernel K-means with little programming effort. Our implementation, named Popcorn, is the first opensource GPU-based implementation of Kernel K-means.Popcorn achieves a speedup of up to 123.8x over a CPU implementation of Kernel K-means and a speedup of up to 2.6x over a GPU implementation of Kernel K-means that does not use sparse matrix computations. Our results support the effectiveness of sparse matrices as tools for efficient parallel programming.
2025
Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES
Association for Computing Machinery
Settore INF/01 - Informatica
Bellavita, J.; Pasquali, T.; Del Rio Martin, L.; Vella, F.; Guidi, G.
Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra / Bellavita, J.; Pasquali, T.; Del Rio Martin, L.; Vella, F.; Guidi, G.. - (2025), pp. 426-440. ( 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2025 usa 2025) [10.1145/3710848.3710887].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/473910
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 1
  • OpenAlex 1
social impact