Bayesian finite mixture modelling is a flexible parametric modelling approach for classification and density fitting. Many areas of application require distinguishing a signal from a noise component. In practice, it is often difficult to justify a specific distribution for the signal component; therefore, the signal distribution is usually further modelled via a mixture of distributions. However, modelling the signal as a mixture of distributions is computationally non-trivial due to the difficulties in justifying the exact number of components to be used and due to the label switching problem. This paper proposes the use of a non-parametric distribution to model the signal component. We consider the case of discrete data and show how this new methodology leads to more accurate parameter estimation and smaller false non-discovery rate. Moreover, it does not incur the label switching problem. We show an application of the method to data generated by ChIP-sequencing experiments.
Bayesian analysis for mixtures of discrete distributions with a non-parametric component / Alhaji, B. B.; Dai, H.; Hayashi, Y.; Vinciotti, V.; Harrison, A.; Lausen, B.. - In: JOURNAL OF APPLIED STATISTICS. - ISSN 0266-4763. - 43:8(2016), pp. 1369-1385. [10.1080/02664763.2015.1100594]
Bayesian analysis for mixtures of discrete distributions with a non-parametric component
Vinciotti V.;
2016-01-01
Abstract
Bayesian finite mixture modelling is a flexible parametric modelling approach for classification and density fitting. Many areas of application require distinguishing a signal from a noise component. In practice, it is often difficult to justify a specific distribution for the signal component; therefore, the signal distribution is usually further modelled via a mixture of distributions. However, modelling the signal as a mixture of distributions is computationally non-trivial due to the difficulties in justifying the exact number of components to be used and due to the label switching problem. This paper proposes the use of a non-parametric distribution to model the signal component. We consider the case of discrete data and show how this new methodology leads to more accurate parameter estimation and smaller false non-discovery rate. Moreover, it does not incur the label switching problem. We show an application of the method to data generated by ChIP-sequencing experiments.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione