Purpose: The research devoted to feature selection has been studied extensively in the last years and as a result, important results have been obtained. However, a feature selection technique remains a complex problem because of the necessity for its outcomes to be reliable and accurate within different limitations. Nowadays, there is no feature selection algorithm that outperforms any others. Our research is focused on the development of a feature selection algorithm and its comparison with other well-known feature selection techniques. The proposed algorithm is used for searching for the most reliable features. Despite the fact that we used the proposed algorithm in a specific case in our study (catheter detection task), the algorithm can also be used for general purposes as well. Therefore, it can be easily scaled to find the relevant features of the region of interest in other tasks. Methods: The proposed algorithm is based on the difference in probability density functions (PDFs) or probability mass functions (PMFs). We introduced a score value to assess how stable a certain feature is. The score value determines the degree of intersection of areas under the density/mass distribution functions. In our case where a catheter is detected, we described the region of interest by dividing a 20-feature set into several groups: morphometric, statistical, intensity-based, textural and geometric. To evaluate the results, we compared the results of the proposed algorithm with state-of-the-art feature selection techniques. The sets of features selected by different algorithms were used to train and test a linear SVM classifier. As an additional estimator, we used a neural network with the cascade-forward backpropagation (CFB). Results: The proposedArea Difference Feature Selection (ADFS) algorithm obtained the following accuracy on the intracardiac catheter dataset: 78.7 ± 3.2 % for a 3-feature subset, 82.3 ± 2.8 % for a 6-feature subset and 86.7 ± 1.5 % for a 12-feature subset. According to the obtained accuracy, ADFS was on the list of the three best algorithms. Additionally, we tested the ADFS algorithm on the breast cancer dataset from the UCI machine learning repository and observed that it outperformed other compared algorithms on the 3-feature and 12-feature subsets. Regarding the 6-feature subset, ADFS is inferior in accuracy to only MRMR and MCFS by approximately 1.8 % and 0.3 %, respectively. According to the processing time assessment, the most time-consuming algorithms were FSCM, MRMR, UDFS, and RFFS. In turn, the fastest algorithms were INFS, CFS, and ADFS with the following processing times: 2 ± 4, 2 ± 5 and 6 ± 16 ms. Conclusion: By testing and comparing the different feature selection algorithms, the proposed feature selection algorithm is shown to be accurate and effective for various tasks, including medical imaging and visualization.

Feature Selection Algorithm based on PDF/PMF Area Difference / Danilov, V. V.; Skirnevskiy, I. P.; Manakov, R. A.; Gerget, O. M.; Melgani, F.. - In: BIOMEDICAL SIGNAL PROCESSING AND CONTROL. - ISSN 1746-8094. - 57:(2020), pp. 10168101-10168115. [10.1016/j.bspc.2019.101681]

Feature Selection Algorithm based on PDF/PMF Area Difference

F. Melgani
2020-01-01

Abstract

Purpose: The research devoted to feature selection has been studied extensively in the last years and as a result, important results have been obtained. However, a feature selection technique remains a complex problem because of the necessity for its outcomes to be reliable and accurate within different limitations. Nowadays, there is no feature selection algorithm that outperforms any others. Our research is focused on the development of a feature selection algorithm and its comparison with other well-known feature selection techniques. The proposed algorithm is used for searching for the most reliable features. Despite the fact that we used the proposed algorithm in a specific case in our study (catheter detection task), the algorithm can also be used for general purposes as well. Therefore, it can be easily scaled to find the relevant features of the region of interest in other tasks. Methods: The proposed algorithm is based on the difference in probability density functions (PDFs) or probability mass functions (PMFs). We introduced a score value to assess how stable a certain feature is. The score value determines the degree of intersection of areas under the density/mass distribution functions. In our case where a catheter is detected, we described the region of interest by dividing a 20-feature set into several groups: morphometric, statistical, intensity-based, textural and geometric. To evaluate the results, we compared the results of the proposed algorithm with state-of-the-art feature selection techniques. The sets of features selected by different algorithms were used to train and test a linear SVM classifier. As an additional estimator, we used a neural network with the cascade-forward backpropagation (CFB). Results: The proposedArea Difference Feature Selection (ADFS) algorithm obtained the following accuracy on the intracardiac catheter dataset: 78.7 ± 3.2 % for a 3-feature subset, 82.3 ± 2.8 % for a 6-feature subset and 86.7 ± 1.5 % for a 12-feature subset. According to the obtained accuracy, ADFS was on the list of the three best algorithms. Additionally, we tested the ADFS algorithm on the breast cancer dataset from the UCI machine learning repository and observed that it outperformed other compared algorithms on the 3-feature and 12-feature subsets. Regarding the 6-feature subset, ADFS is inferior in accuracy to only MRMR and MCFS by approximately 1.8 % and 0.3 %, respectively. According to the processing time assessment, the most time-consuming algorithms were FSCM, MRMR, UDFS, and RFFS. In turn, the fastest algorithms were INFS, CFS, and ADFS with the following processing times: 2 ± 4, 2 ± 5 and 6 ± 16 ms. Conclusion: By testing and comparing the different feature selection algorithms, the proposed feature selection algorithm is shown to be accurate and effective for various tasks, including medical imaging and visualization.
2020
Danilov, V. V.; Skirnevskiy, I. P.; Manakov, R. A.; Gerget, O. M.; Melgani, F.
Feature Selection Algorithm based on PDF/PMF Area Difference / Danilov, V. V.; Skirnevskiy, I. P.; Manakov, R. A.; Gerget, O. M.; Melgani, F.. - In: BIOMEDICAL SIGNAL PROCESSING AND CONTROL. - ISSN 1746-8094. - 57:(2020), pp. 10168101-10168115. [10.1016/j.bspc.2019.101681]
File in questo prodotto:
File Dimensione Formato  
BSPC-2020-Catheter-Feature Selection.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 3.44 MB
Formato Adobe PDF
3.44 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/287559
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact