Active Learning For Information Extraction With
Multiple View Feature Sets Rayid Ghani, Rosie Jones, Tom
Mitchell and Ellen Riloff 20th International Conference on Machine
Learning (ICML 2003), August 21-24, 2003, Washington, DC.
Abstract: A major problem with machine
learning approaches to information extraction is the high cost of collecting
labeled examples. Active learning seeks to make efficient use of a labeler's
time by asking for labels based on the anticipated value of that label to the
learner.
We consider active learning approaches for information
extraction problems where each example is described by two distinct sets of
features, either of which is sufficient to approximate the function; that is,
they fix the cotraining problem setting. We discuss a range of active learning
algorithms and show that using feature set disagreement to select examples for
active learning leads to improvements in extraction performance regardless of
the choice of initially labeled examples. The result is an active learning
approach to multiple view feature sets in general, and noun phrase extraction
in particular, that significantly reduces training effort and compensates for
errors in initially labeled data.
Download the full report [PDF, 248K] PDF Help |