One difficulty with machine learning for information extraction is the high cost of collecting labeled examples. Active Learning can make more efficient use of the learner's time by asking them to label only instances that are most useful for the trainer. In random sampling approach, unlabeled data is selected for annotation at random and thus can't yield the desired results. In contrast, active learning selects the useful data from a huge pool of unlabeled data for the classifier. The strategies used often classify the corpus tokens (or, data points) under wrong classes. The classifier is confused between two categories if the token is located near the margin. We develop a method for solving this problem and show that it favorably results in the increased performance. Our approach is based on the supervised machine learner, Conditional Random Field (CRF). The proposed approach is applied for solving the problem of named entity extraction from biomedical domain. Results show that proposed active learning based technique indeed improves the performance of the system.
Scheda prodotto non validato
I dati visualizzati non sono stati ancora sottoposti a validazione formale da parte dello Staff di IRIS, ma sono stati ugualmente trasmessi al Sito Docente Cineca (Loginmiur).
Titolo: | Active learning technique for biomedical named entity extraction |
Autori: | S., Saha; A., Ekbal; M., Verma; U., Sikdar; Poesio, Massimo |
Autori Unitn: | |
Titolo del volume contenente il saggio: | International Conference on Advances in Computing, Communications and Informatics, ICACCI |
Luogo di edizione: | Stati Uniti |
Casa editrice: | ACM |
Anno di pubblicazione: | 2012 |
Codice identificativo Scopus: | 2-s2.0-84866133469 |
ISBN: | 9781450311960 |
Handle: | http://hdl.handle.net/11572/99722 |
Appare nelle tipologie: | 04.1 Saggio in atti di convegno (Paper in proceedings) |