Rayid Ghani, Rosie Jones, Tom Mitchell and Ellen Riloff
20th International Conference on Machine Learning (ICML 2003), August 21-24, 2003, Washington, DC.
Abstract: Active learning seeks to make efficient use of a labeler's time by asking for labels based on the anticipated value of that label to the learner.
We consider active learning approaches for information extraction problems where each example is described by two distinct sets of features, either of which is sufficient to approximate the function; that is, they fix the cotraining problem setting. We discuss a range of active learning algorithms and show that using feature set disagreement to select examples for active learning leads to improvements in extraction performance regardless of the choice of initially labeled examples. The result is an active learning approach to multiple view feature sets in general, and noun phrase extraction in particular, that significantly reduces training effort and compensates for errors in initially labeled data.