Clustering of cell populations in flow cytometry data using a combination of Gaussian mixtures

IRIS

We propose a supervised learning approach to automatic quantification of cell populations in flow cytometric samples. One sample contains up to millions of measurement vectors with a dimensionality between 10 and 20. Normally, each measurement vector corresponds to a single cell in the biological sample. Identifying biologically meaningful cell populations is essentially a clustering problem, however, standard clustering methods are impractical, because size, shape and location of corresponding clusters may vary strongly between samples mainly due to phenotypic differences and inter-laboratory variations. In our holistic approach, we implicitly employ the structural information (such as relative locations and shape of sub-populations). A new input sample is reconstructed by a linear combination of artificial reference samples each represented by a Gaussian Mixture Model (GMM), in which for each Gaussian component the class label of the corresponding cluster of observations is known. The reference samples are calculated from a larger set of training samples by non-negative matrix factorization and can be regarded as the basis of a lower dimensional feature space, in which input samples are reconstructed. We show a method for calculating the feature space transformation based on minimization the L2 distance defined between two GMM. The feature space representation of the sample is then used to assign each observation to one of the specified sub-populations by a Bayes decision. We present classification results on a database of about 170 patients with Acute Lymphoblastic Leukemia (ALL), where high accuracy in the prediction of relatively small leukemic populations is crucial. The approach is not limited to our application. It can be employed wherever analysis of large, multi-dimensional, numerical data of a specific class of samples with related structure has to be performed.

Clustering of cell populations in flow cytometry data using a combination of Gaussian mixtures / Reiter, M., Rota, P., Kleber, F., Diem, M., Groeneveld-Krentz, S., Dworzak, M.. - In: PATTERN RECOGNITION. - ISSN 0031-3203. - 60:(2016), pp. 1029-1040. [10.1016/j.patcog.2016.04.004]

Clustering of cell populations in flow cytometry data using a combination of Gaussian mixtures

Reiter M.;Rota P.;Kleber F.;Diem M.;Groeneveld-Krentz S.;Dworzak M.

2016-01-01

Abstract

We propose a supervised learning approach to automatic quantification of cell populations in flow cytometric samples. One sample contains up to millions of measurement vectors with a dimensionality between 10 and 20. Normally, each measurement vector corresponds to a single cell in the biological sample. Identifying biologically meaningful cell populations is essentially a clustering problem, however, standard clustering methods are impractical, because size, shape and location of corresponding clusters may vary strongly between samples mainly due to phenotypic differences and inter-laboratory variations. In our holistic approach, we implicitly employ the structural information (such as relative locations and shape of sub-populations). A new input sample is reconstructed by a linear combination of artificial reference samples each represented by a Gaussian Mixture Model (GMM), in which for each Gaussian component the class label of the corresponding cluster of observations is known. The reference samples are calculated from a larger set of training samples by non-negative matrix factorization and can be regarded as the basis of a lower dimensional feature space, in which input samples are reconstructed. We show a method for calculating the feature space transformation based on minimization the L2 distance defined between two GMM. The feature space representation of the sample is then used to assign each observation to one of the specified sub-populations by a Bayes decision. We present classification results on a database of about 170 patients with Acute Lymphoblastic Leukemia (ALL), where high accuracy in the prediction of relatively small leukemic populations is crucial. The approach is not limited to our application. It can be employed wherever analysis of large, multi-dimensional, numerical data of a specific class of samples with related structure has to be performed.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2016
			
	Titolo del periodico (Journal title)
	
				PATTERN RECOGNITION
			
	DOI
	
				https://dx.doi.org/10.1016/j.patcog.2016.04.004
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-84971234016
			
	Codice WOS (WOS identifier)
	
				WOS:000383525600081
			
	Tutti gli autori
	
						Reiter, M.; Rota, P.; Kleber, F.; Diem, M.; Groeneveld-Krentz, S.; Dworzak, M.
					
	Citazione
	
				Clustering of cell populations in flow cytometry data using a combination of Gaussian mixtures / Reiter, M., Rota, P., Kleber, F., Diem, M., Groeneveld-Krentz, S., Dworzak, M.. - In: PATTERN RECOGNITION. - ISSN 0031-3203. - 60:(2016), pp. 1029-1040. [10.1016/j.patcog.2016.04.004]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/251848

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

21

17

20

social impact