AccentureHigh performance starts here
High performance. Delivered.
HomeServicesResearch & InsightsAbout AccentureCareers
Global HomeICML 2003 Workshop

ICML 2003 Workshop


The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining

The 20th International Conference on Machine Learning (ICML 2003) will be held in Washington, DC, August 21-24 2003. It will be co-located with the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2003) and Conference On Learning Theory (COLT 03).

Important Dates
Papers Due: May 5
Notification: May 25
Final Version Due: June 26
Workshop Date: August 21

Organizers:
Rayid Ghani
Accenture Technology Labs

Rosie Jones
Overture Services

Chuck Rosenberg
Carnegie Mellon University

Workshop Description
There is a spectrum of ways to use data in machine learning and data mining. At the one end is completely unsupervised learning or clustering, and at the other end is supervised learning where the target output is known for every example. This workshop aims to explore the space between these two extremes. Techniques that have been proposed include learning from unlabeled data with hints, learning from unlabeled and positive-only labeled data, learning from distantly and noisily labeled data, combining labeled and unlabeled data with cotraining, EM and other semi-supervised techniques, and transductive learning, where the test data is added as an additional source of unlabeled data. The possible sources of labels and hints are also broad: systematic hand-labeling, labels acquired through active learning, and hints derived from domain knowledge are among the techniques that may be used.

The goal of this workshop is to bring together researchers from different fields to talk about their different perspectives on this intersection and to share their latest ideas. We see the workshop as a venue not only for the presentation of papers focusing on exploiting unlabeled data, but also a forum for sharing ideas across different application domains. In particular it is an opportunity for discussion of techniques that are applicable to multiple types of datasets, and experiments across many points in the continuum from unsupervised to supervised learning. The use of domain knowledge as a source of partial supervision, and the generation of examples to be labeled by domain experts though active learning are of particular significance in the data mining context. We are also interested in promoting discussion to develop diagnostic techniques that can inform the user whether unlabeled data is helping or hurting the performance of the underlying learner.

We see this as a unique opportunity due to the co-location of ICML with KDD. With this workshop co-located with KDD, we will target researchers from both academia and industry who are involved in data mining to participate in the workshop. For many data mining problems, large amounts of data have been collected and the labels are either not known or are expensive to obtain. Such examples include security applications (intrusion detection, anomaly detection), CRM (customer interactions, transactional data, call center applications), financial industry (fraud detection, loan defaults, banking), targeted marketing and retail applications (supply chain optimization). Most of these applications have large amounts of unlabeled data being captured but rarely utilized. We encourage the participation of people working on practical applications where some form of unlabeled data can be beneficial.

Workshop Format
The workshop will consist of both regular paper presentations, and debates.

Regular Papers
Papers addressing novel types of data, methods of diagnosing when unlabeled data will help and when it will hinder, and applying techniques across multiple application domains and multiple levels of supervision are particularly encouraged. Papers discussing the acquisition of labels from real-world experts in real-world data mining problems are also encouraged. Data mining practitioners working on real-world problems with large amounts of captured/stored data but a high cost labeling process are encouraged to submit problem descriptions and possible solutions.

Regular papers can be up to eight pages, and may address work in progress. Papers should be in the format required for ICML submissions.

Problem Descriptions from Machine Learning/Data Mining Practitioners
Papers one to two pages in length describing a problem domain you have encountered or dealt with where training data and/or labels are very expensive or hard to obtain. The paper would present a problem statement, give background on the domain, and list sources and amount of available training data. We hope these types of papers will encourage participation from people working on practical applications where unlabeled data can potentially be valuable but is not currently utilized. We hope to devote a session in the workshop to discuss these problems and brainstorm possible solutions and ways to use unlabeled data for the problems posed in these papers.

Debate Position Papers
Position papers, one to two pages in length, on either side of the following topics are solicited.

Accepted papers will be published in the workshop proceedings, and authors will be expected to debate their position. Topics not on this list are also acceptable, if you can coherently argue both sides, or can encourage a colleague to submit the opposing position.

Schedule
To be decided later

Organizers
Rayid Ghani
Accenture Technology Labs, 161 N. Clark St, Chicago, IL 60601
+1 (312) 693-6653

Rosie Jones
Overture Services, 74 N. Pasadena Ave 3F, Pasadena, CA 91107
rosie.jones@overture.com
+1 (626)229-8536

Chuck Rosenberg
Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213
chuck@cs.cmu.edu
+1 (412) 268-8078

Program Commitee
Kristin Bennett, Rennselear Polytechnic Institute
Mark Craven, University of Wisconsin
Zoubin Ghahramani, Gatsby Computational Neuroscience Unit, UCL
Sally Goldman, Washington University, St. Louis
Tony Jebara, Columbia University
Thorsten Joachims, Cornell University
Stefan Kremer, University of Guelph
Bing Liu, National University of Singapore
Andrew McCallum, University of Massachusetts
Ray Mooney, University of Texas, Austin
Ion Muslea, University of California, Irvine
Kamal Nigam, IntelliSeek
Ellen Riloff, University of Utah
Dale Schuurmans, University of Waterloo
Martin Szummer, Microsoft Research, Cambridge
Sarah Zelikovitz, City University of New York
Tong Zhang, IBM Research, Yorktown Heights

To Top

   Print Article

   E-mail to a Colleague



How may we help you?
Contact Us

To discuss how we can help your organization, please call 1 (312) 737-8842 or send us an e-mail
Your Content
Request for Services
E-mail Alerts & Newsletters


Privacy Policy   Terms of Use   Site Map   ©1996-2007 Accenture All Rights Reserved