-
Attribute Extraction from Product Descriptions (with Andrew Fano, Marko Krema and Katharina Probst ): We are developing tools and algorithms for extraction of product attributes and values, such as size, material and other specifications from product descriptions. The tools are intended as enablers for a variety of applications including assortment planning, brand management and catalog mapping.
Our methodology includes machine learning techniques for labeled and unlabeled data, natural language processing techniques and an active learning feedback loop that interactively refines the inferred hypotheses.
-
Individual Consumer Modeling (with Andrew Fano, Marko Krema and Katharina Probst ):This project is aimed at using customer purchase data from retailers to create individual consumer models that are able to detect and predict the behaviors of customers with respect to their shopping. These consumer models enable the retailer to provide customers with individual and personalized interactions as they navigate through the retail store.
Instead of using traditional personalization approaches, such as clustering or segmentation, we learn separate classifiers and statistical models for each customer using historical transactional data only from that customer. This allows us to make very fine, accurate predictions about a particular individual customer during a shopping trip. The research challenges include learning and evaluation metrics with a very large number of categories, noisy data sources and concept drift over time. See paper on Predicting Shopping Lists at KDD 2004, and Intelligent Shopping Assistants at IUI 2005.
-
Intelligent Promotion Planning (with Andrew Fano, Marko Krema and Katharina Probst ):The Intelligent Promotion Planning is a prototype that enables retailers and product manufacturers to industrialize the use of individual consumer models in offering personalized promotions. It offers an environment to visualize and interact with past data about product promotions as well as tools to explore and implement future promotions.
There is an optimization capability, where optimal parameters of the promotion are selected based on high-level goals set by the retailer, as well as a simulation environment where a lot of what-if scenarios could be considered and the results of a potential promotion evaluated before implementation.
-
Price Prediction and Insurance for Online Auctions (with Hillery Simmons): Online auctions are producing very fine-grained data about online transactions that lends itself to a variety of applications and services that can be provided to both buyers and sellers in online marketplaces. We have developed machine learning techniques to use data from online auctions to predict various attributes of the auction, including the probable end prices of online auction items.
This capability enables a new service that we call Auction Price Insurance. We define Price Insurance as the capability to offer insurance to auction sellers that guarantees a price for their goods, for an appropriate premium. If the item sells for less than the insured price, the seller is reimbursed for the difference. Our prototype can be used to automatically analyze historical data about auctions and when given a new auction listing, can offer price insurance to the seller.
The insurance premium and the price at which the item is insured is determined automatically and is fine-tuned to the characteristics of the seller, the item up for auction and the features of the auction environment. In addition to offering price insurance, the price prediction tool can also be used to offer a variety of services, such as helping sellers write "optimal" auction item descriptions, optimizing auction parameters (starting time, duration, starting price, reserve option, etc.).
The research challenges include learning classifiers with limited training data and skewed class distributions in the presence of dynamic environments. See paper at ECML 2004 Workshop for details.
-
Bayesian Reasoning from Large, Redundant and Multimodal Sensor Networks (with Anatole Gershman, Valery Petrushin and Gang Wei): As various forms of sensor networks proliferate throughout physical environments around us, the need to reason and make inferences from these redundant, but noisy information sources becomes critical in many domains.
This project defines a Bayesian framework to use noisy, but redundant data from multiple sensor streams and incorporate it with the contextual and domain knowledge that is provided by both the physical constraints imposed by the local environment where the sensors are located and by the people that are involved the surveillance tasks.
We are currently applying the Bayesian framework to the people localization problem in indoor environments using a sensor network that consists of video cameras, infra-red id badges and fingerprint scanners. See paper at AAAI Spring Symposium 2005 for more details.
-
Product Profiler: Applying text learning techniques and combining labeled and unlabeled data to infer semantic attributes from product descriptions. The inferred attributes can then be used to enhance current product databases and improve the effectiveness of existing data mining algorithms. See papers at ICDM 2002 and RPEC 2002 for details.
-
Improving Knowledge Worker Productivity - the Active integrated approach
P. Warren, N. Kings, I. Thurlow, J. Davies, T. Buerger, E. Simperl, C. Ruiz, J. M. Gomez-Perez, V. Ermolayev, R. Ghani, M. Tilly, T. Bösser, A. Imtiaz
2009, BT Technologiy Journal (2009)
-
ACTIVE - Enabling the Knowledge-Powered Enterprise: Semantic Technology for Knowledge Worker Productivity.
Warren, P., Thurlow, I., Ghani, R., Probst, K., Jentzsch, E., Ermolayev, V.
In Proc 2nd European Semantic Technology Conference (ESTC 2008), Vienna, Austria, Sep. 29 - Oct. 3, 2008
-
Proceedings of KDD Workshop on Data Mining for Business Applications.
Rayid Ghani , Carlos Soares, Fracoise Soulie-Fogelman Editors.
KDD 2008.
-
Maximizing Privacy Under Data Distortion Constraints in Noise Perturbation Methods.
Yaron Rachlin, Katharina Probst, Rayid Ghani.
The Second ACM SIGKDD International Workshop on Privacy, Security, and Trust in KDD held with ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2008). 2008
-
Data Mining for Consumer Modeling and Personalized Promotions.
Rayid Ghani, Chad Cumby, Andrew Fano and Marko Krema.
Book Chapter - Data Mining Methods and Applications. Kenneth D. Lawrence, Stephan Kudyba, Ronald K. Klimberg (Eds.). Auerbach Publications. 2008
-
Trade-offs in the Use of Bayesian Filtering for Sensor Fusion.
Anatole Gershman, Rayid Ghani, Damian Roqueiro and Gang Wei.
International Workshop on Knowledge Discovery from Sensor Data (Sensor-KDD'07) –held with ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2007).
-
Towards Interactive Active Learning in MultiView Feature Sets for Information Extraction. Katharina Probst, Rayid Ghani.
European Conference on Machine Learning (ECML/PKDD 2007).
-
Extracting and using Attribute-Value pairs from product descriptions on the web.
Katharina Probst, Rayid Ghani , Yan Liu, Marko Krema, and Andrew Fano.
Book chapter – Web Mining. 2007
-
Semi-supervised Learning of Attribute-Value Pairs from Product Descriptions
Katharina Probst, Rayid Ghani, Marko Krema, Andy Fano, Yan Liu.
Proceedings of the International Joint Conference in Artificial Intelligence 2007 (IJCAI-07).
-
Data Mining for Business Applications: KDD 2006 Workshop Report.
Rayid Ghani, Carlos Soares.
SIGKDD Explorations December 2006 Vol 8 Issue 2 (2006).
-
Data Mining for Business Applications.
Rayid Ghani and Carlos Soares.
SIGKDD Explorations 2006 Vol 8, Issue 2(2006).
-
Data Mining for Business Applications.
Rayid Ghani , Carlos Soares, Editors.
Proceedings of KDD Workshop on Data Mining for Business Applications (2006).
-
Text Mining to Extract Product Attributes.
Rayid Ghani, Katharina Probst, Yan Liu, Marko Krema and Andrew Fano.
SIGKDD Explorations (2006).
[Paper (PDF, 99K)]
-
Using Bayesian Reasoning From Sensor Network for Indoor Surveillance.
Valery Petrushin, Gang Wei, Rayid Ghani and Anatole Gershman.
Workshop on Pervasive Technology Applied: Real-World Experiences with RFID and Sensor Networks.
-
Learning from Partially Classified Data.
M. Amini, O. Chapelle, R. Ghani, Editors.
Proceedings of ICML Workshop on Learning from Partially Classified Data (2005).
-
Learning Individual Consumer Models for Personalized Promotions: A Data Mining Case Study.
Chad Cumby, Andrew Fano, Rayid Ghani and Marko Krema.
Workshop on Data Mining for Business — held with the European Conference on Machine Learning (ECML/PKDD 2005).
-
Multiple Sensor Integration for Indoor Surveillance.
Valery Petrushin, Gang Wei, Rayid Ghani and Anatole Gershman.
Multimedia Data Mining Workshop – held with 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2005)
-
Price Prediction and Insurance for Online Auctions
Rayid Ghani
11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2005
Chicago, IL
[Paper (PDF, 70K)]
-
A Bayesian Framework for Robust Reasoning from Sensor Networks
V.A. Petrushin, R. Ghani and A.V. Gershman
2005 AAAI Spring Symposium on AI Technologies for Homeland Security
March 21-23, 2005
Stanford University
[Abstract for A Bayesian Framework for Robust Reasoning from Sensor Networks] [Paper (PDF)]
-
Building Intelligent Shopping Assistant Using Individual Consumer Models
C. Cumby, A. Fano, R. Ghani and M. Krema
Proceedings of the 2005 International Conference on Intelligent User Interfaces
January 9-12, 2005
San Diego, California
[Abstract] [Paper (PDF)]
-
Predicting the End-price of Online Auctions
R. Ghani and H. Simmons
International Workshop on Data Mining and Adaptive Modelling Methods for Economics and Management held in conjunction with the 15th European Conference on Machine Learning (ECML/PKDDD 2004)
Pisa, Italy
[Abstract for Predicting the End-price of Online Auctions] [Paper (PDF)]
-
Mining the Web to Add Semantics to Retail Data Mining
R. Ghani
Invited Paper. Web Mining: From Web to Semantic Web.
Springer Lecture Notes in Artificial Intelligence , Vol. 3209. Berendt, B.; Hotho, A.; Mladenic, D.; van Someren, M.; Spiliopoulou, M.; Stumme, G. (Eds.)
2004
[Paper (PDF)]
-
Predicting Customer Shopping Lists from Point-of-sale Purchase Data
Chad Cumby, Andy Fano, Rayid Ghani and Marko Krema
10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2004
Seattle, Washington
[Abstract for Predicting the Customer Shopping Lists from Point-of-sale Purchase Data] [Paper PDF]
-
Building Minority Language Corpora by Learning to Generate Web Search Queries
Rayid Ghani, Rosie Jones and Dunja Mladenic
Journal of Knowledge and Information Systems (KAIS), 2003
-
Active Learning for Information Extraction with Multiple View Feature Sets
Rayid Ghani, Rosie Jones, Tom Mitchell and Ellen Riloff
Workshop on Adaptive Text Extraction & Mining at the European Conference on Machine Learning(ECML 2003), Dubrovnik, Croatia
[Abstract for Active Learning for Information Extraction with Multiple View Feature Sets] [Paper PDF]
-
Combining Labeled and Unlabeled Data for MultiClass Text Categorization
Rayid Ghani
International Conference on Machine Learning (ICML 2002), 8-12 July 2002, Sydney, Australia
-
Using Text Mining to Infer Semantic Attributes for Retail Data Mining
Rayid Ghani and Andrew E. Fano
IEEE International Conference on Data Mining, December 9-12, 2002. Maebashi, Japan [Abstract for Using Text Mining to Infer Semantic Attributes for Retail Data Mining] [Paper PDF]
-
Building Recommender Systems Using a Knowledge Base of Product Semantics
Rayid Ghani and Andrew Fano
Workshop on Recommendation and Personalization in ECommerce (RPEC 2002) at the Second International Conference on Adaptive Hypermedia and Adaptive Web-based Systems (AH 2002), 28 May 2002, Malaga, Spain
[Abstract for Building Recommender Systems Using a Knowledge Base of Product Semantics] [Paper PDF]
-
Automatic Training Data Collection For Semi-Supervised Learning of Information Extraction Systems
Rayid Ghani and Rosie Jones
Accenture Technology Labs Technical Report (2002)
-
Data Mining on Symbolic Knowledge Extracted from the Web
Rayid Ghani, Rosie Jones, Dunja Mladenic, Kamal Nigam and Sean Slattery
Workshop on Text Mining at the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2000), 2000
[Abstract for Data Mining on Symbolic Knowledge Extracted from the Web] [Paper PDF]
-
Building Minority Language Corpora by Learning to Generate Web Search Queries
Rayid Ghani, Rosie Jones and Dunja Mladenic
Journal of Knowledge and Information Systems (KAIS), 2003
[Abstract for Building Minority Language Corpora by Learning to Generate Web Search Queries] [Paper PDF]
-
Using the Web to Create Minority Language Corpora
Rayid Ghani, Rosie Jones and Dunja Mladenic
Tenth International Conference on Information and Knowledge Management (CIKM 2001), 2001 [Abstract for Using the Web to Create Minority Language Corpora] [Paper PDF]
-
Online Learning for Query Generation: Finding Documents Matching a Minority Concept on the Web
Rayid Ghani, Rosie Jones and Dunja Mladenic
First Asia Pacific Conference on Web Intelligence, 2000
[Abstract for Online Learning for Query Generation: Finding Documents Matching a Minority Concept on the Web] [Paper PDF]
-
Automatic Web Search Query Generation to Create Minority Language Corpora
Rayid Ghani, Rosie Jones, and Dunja Mladenic
Poster Paper in the Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001)
-
Learning a Monolingual Language Model from a Multilingual Text Database
Rayid Ghani & Rosie Jones
Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM 2000)
-
Automatically Building a Corpus for a Minority Language from the Web
Rosie Jones & Rayid Ghani
Proceedings of the Student Workshop at the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000)