Normal mixture models are widely used to represent data arising from latent subpopulations. We propose a Design-of-Experiments (DOE) and Response Surface Methodology (RSM) framework to estimate the weights of a bimodal Gaussian mixture when component families are known. The procedure is non-iterative: rather than alternating Expectation Maximization (EM) steps, it performs a double-stage method - fit a quadratic response surface to the sample log-likelihood over the weight simplex and solve one constrained optimization - followed by a final Maximum Likelihood re-estimation of means and variances. This yields predictable runtime (driven by design size) and reduced sensitivity to initialization. The pipeline uses 1) k-medians to obtain preliminary component parameters and 99% confidence intervals (CIs) for component proportions; 2) builds a simplex-lattice mixture design within those CI bounds; 3) fits a quadratic response surface to log-likelihood; and 4) optimizes this surface under sum-to-one constraints. We validate the method in 27 Monte Carlo scenarios (n = 100, 500, 1000; low/medium/high differentiation and three weight settings). In medium/high separation, it attains comparable likelihoods to EM while achieving more favorable BIC in multiple scenarios and indistinguishable AIC in many, whereas EM is preferable under low separation. Two real data sets - Old Faithful (Waiting variable) and Photovoltaic Energy (Production variable) - further confirm applicability, with lower AIC/BIC in Old Faithful and lower BIC in PV; clustering agreement is high (κ ≈ 0.99 - 1.00). Overall, DOE-RSM offers a simple, interpretable, and often more parsimonious method, and constitutes a non-iterative alternative for mixture-weight estimation.
Normal mixture models are widely used to represent data arising from latent subpopulations. We propose a Design-of-Experiments (DOE) and Response Surface Methodology (RSM) framework to estimate the weights of a bimodal Gaussian mixture when component families are known. The procedure is non-iterative: rather than alternating Expectation Maximization (EM) steps, it performs a double-stage method - fit a quadratic response surface to the sample log-likelihood over the weight simplex and solve one constrained optimization - followed by a final Maximum Likelihood re-estimation of means and variances. This yields predictable runtime (driven by design size) and reduced sensitivity to initialization. The pipeline uses 1) k-medians to obtain preliminary component parameters and 99% confidence intervals (CIs) for component proportions; 2) builds a simplex-lattice mixture design within those CI bounds; 3) fits a quadratic response surface to log-likelihood; and 4) optimizes this surface under sum-to-one constraints. We validate the method in 27 Monte Carlo scenarios (n = 100, 500, 1000; low/medium/high differentiation and three weight settings). In medium/high separation, it attains comparable likelihoods to EM while achieving more favorable BIC in multiple scenarios and indistinguishable AIC in many, whereas EM is preferable under low separation. Two real data sets - Old Faithful (Waiting variable) and Photovoltaic Energy (Production variable) - further confirm applicability, with lower AIC/BIC in Old Faithful and lower BIC in PV; clustering agreement is high (κ ≈ 0.99 - 1.00). Overall, DOE-RSM offers a simple, interpretable, and often more parsimonious method, and constitutes a non-iterative alternative for mixture-weight estimation.
A Design-of-Experiments-Based Approach for Efficient Estimation of Bimodal Gaussian Mixture Weights / Leal, G. S.; Bessegato, L. F.; Xavier, Y. S. M.; Melgani, F.; Balestrassi, P. P.. - In: IEEE ACCESS. - ISSN 2169-3536. - 13:(2025), pp. 168322-168334. [10.1109/ACCESS.2025.3614023]
A Design-of-Experiments-Based Approach for Efficient Estimation of Bimodal Gaussian Mixture Weights
Melgani F.;
2025-01-01
Abstract
Normal mixture models are widely used to represent data arising from latent subpopulations. We propose a Design-of-Experiments (DOE) and Response Surface Methodology (RSM) framework to estimate the weights of a bimodal Gaussian mixture when component families are known. The procedure is non-iterative: rather than alternating Expectation Maximization (EM) steps, it performs a double-stage method - fit a quadratic response surface to the sample log-likelihood over the weight simplex and solve one constrained optimization - followed by a final Maximum Likelihood re-estimation of means and variances. This yields predictable runtime (driven by design size) and reduced sensitivity to initialization. The pipeline uses 1) k-medians to obtain preliminary component parameters and 99% confidence intervals (CIs) for component proportions; 2) builds a simplex-lattice mixture design within those CI bounds; 3) fits a quadratic response surface to log-likelihood; and 4) optimizes this surface under sum-to-one constraints. We validate the method in 27 Monte Carlo scenarios (n = 100, 500, 1000; low/medium/high differentiation and three weight settings). In medium/high separation, it attains comparable likelihoods to EM while achieving more favorable BIC in multiple scenarios and indistinguishable AIC in many, whereas EM is preferable under low separation. Two real data sets - Old Faithful (Waiting variable) and Photovoltaic Energy (Production variable) - further confirm applicability, with lower AIC/BIC in Old Faithful and lower BIC in PV; clustering agreement is high (κ ≈ 0.99 - 1.00). Overall, DOE-RSM offers a simple, interpretable, and often more parsimonious method, and constitutes a non-iterative alternative for mixture-weight estimation.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



