Accenture 161 North Clark St Chicago IL 60601 USA
Research Interests - Machine Learning & Data Mining: Classification with large number of categories, individual consumer modeling for retail data mining, probabilistic reasoning with large, redundant sensor networks.
- Semi-Supervised & Active Learning: Combining labeled and unlabeled data, multi-view algorithms (co-training, co-EM), combining multi-view algorithms with active learning.
- Text Learning/Mining: Text classification, information extraction, inferring semantic attributes from free-text for data mining.
- Web Mining: Hypertext classification, Web search query generation for collecting language or topic-specific corpora.
Recent Projects
I'm currently co-organizing a KDD 2006 Workshop on Data Mining for Business Applications.
I am actively involved in the Machine Learning and Data-Mining Group. Attribute Extraction from Product Descriptions (with Andrew Fano, Marko Krema and Katharina Probst ): We are developing tools and algorithms for extraction of product attributes and values, such as size, material and other specifications from product descriptions. The tools are intended as enablers for a variety of applications including assortment planning, brand management and catalog mapping. Our methodology includes machine learning techniques for labeled and unlabeled data, natural language processing techniques and an active learning feedback loop that interactively refines the inferred hypotheses. Individual Consumer Modeling (with Chad Cumby, Andrew Fano and Marko Krema): This project is aimed at using customer purchase data from retailers to create individual consumer models that are able to detect and predict the behaviors of customers with respect to their shopping. These consumer models enable the retailer to provide customers with individual and personalized interactions as they navigate through the retail store. Instead of using traditional personalization approaches, such as clustering or segmentation, we learn separate classifiers and statistical models for each customer using historical transactional data only from that customer. This allows us to make very fine, accurate predictions about a particular individual customer during a shopping trip. The research challenges include learning and evaluation metrics with a very large number of categories, noisy data sources and concept drift over time. See paper on Predicting Shopping Lists at KDD 2004, and Intelligent Shopping Assistants at IUI 2005. Intelligent Promotion Planning (with Chad Cumby, Andrew Fano and Marko Krema ): The Intelligent Promotion Planning is a prototype that enables retailers and product manufacturers to industrialize the use of individual consumer models in offering personalized promotions. It offers an environment to visualize and interact with past data about product promotions as well as tools to explore and implement future promotions. There is an optimization capability, where optimal parameters of the promotion are selected based on high-level goals set by the retailer, as well as a simulation environment where a lot of what-if scenarios could be considered and the results of a potential promotion evaluated before implementation. Price Prediction and Insurance for Online Auctions (with Hillery Simmons): Online auctions are producing very fine-grained data about online transactions that lends itself to a variety of applications and services that can be provided to both buyers and sellers in online marketplaces. We have developed machine learning techniques to use data from online auctions to predict various attributes of the auction, including the probable end prices of online auction items. This capability enables a new service that we call Auction Price Insurance. We define Price Insurance as the capability to offer insurance to auction sellers that guarantees a price for their goods, for an appropriate premium. If the item sells for less than the insured price, the seller is reimbursed for the difference. Our prototype can be used to automatically analyze historical data about auctions and when given a new auction listing, can offer price insurance to the seller. The insurance premium and the price at which the item is insured is determined automatically and is fine-tuned to the characteristics of the seller, the item up for auction and the features of the auction environment. In addition to offering price insurance, the price prediction tool can also be used to offer a variety of services, such as helping sellers write "optimal" auction item descriptions, optimizing auction parameters (starting time, duration, starting price, reserve option, etc.). The research challenges include learning classifiers with limited training data and skewed class distributions in the presence of dynamic environments. See paper at ECML 2004 Workshop for details. Bayesian Reasoning from Large, Redundant and Multimodal Sensor Networks (with Anatole Gershman, Valery Petrushin, and Gang Wei): As various forms of sensor networks proliferate throughout physical environments around us, the need to reason and make inferences from these redundant, but noisy information sources becomes critical in many domains. This project defines a Bayesian framework to use noisy, but redundant data from multiple sensor streams and incorporate it with the contextual and domain knowledge that is provided by both the physical constraints imposed by the local environment where the sensors are located and by the people that are involved the surveillance tasks. We are currently applying the Bayesian framework to the people localization problem in indoor environments using a sensor network that consists of video cameras, infra-red id badges and fingerprint scanners. See paper at AAAI Spring Symposium 2005 for more details. Product Profiler: Applying text learning techniques and combining labeled and unlabeled data to infer semantic attributes from product descriptions. The inferred attributes can then be used to enhance current product databases and improve the effectiveness of existing data mining algorithms. See papers at ICDM 2002 and RPEC 2002 for details.
Publications: Text Mining & Data Mining Data Mining for Individual Consumer Models and Personalized Retail Promotions. Rayid Ghani, Chad Cumby, Andrew Fano and Marko Krema. Book Chapter - Data Mining Methods and Applications (2007). Trade-offs in the Use of Bayesian Filtering for Sensor Fusion. Anatole Gershman, Rayid Ghani, Damian Roqueiro and Gang Wei. International Workshop on Knowledge Discovery from Sensor Data (Sensor-KDD'07) –held with ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2007). Data Mining for Business Applications. Rayid Ghani and Carlos Soares. SIGKDD Explorations 2006 Vol 8, Issue 2(2006). Data Mining for Business Applications. Rayid Ghani , Carlos Soares, Editors. Proceedings of KDD Workshop on Data Mining for Business Applications (2006). Text Mining to Extract Product Attributes. Rayid Ghani, Katharina Probst, Yan Liu, Marko Krema and Andrew Fano. SIGKDD Explorations (2006). [Paper (PDF, 99K)] PDF Help Using Bayesian Reasoning From Sensor Network for Indoor Surveillance. Valery Petrushin, Gang Wei, Rayid Ghani and Anatole Gershman. Workshop on Pervasive Technology Applied: Real-World Experiences with RFID and Sensor Networks. Learning from Partially Classified Data. M. Amini, O. Chapelle, R. Ghani, Editors. Proceedings of ICML Workshop on Learning from Partially Classified Data (2005). Learning Individual Consumer Models for Personalized Promotions: A Data Mining Case Study. Chad Cumby, Andrew Fano, Rayid Ghani and Marko Krema. Workshop on Data Mining for Business — held with the European Conference on Machine Learning (ECML/PKDD 2005). Multiple Sensor Integration for Indoor Surveillance. Valery Petrushin, Gang Wei, Rayid Ghani and Anatole Gershman. Multimedia Data Mining Workshop – held with 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2005) Price Prediction and Insurance for Online Auctions Rayid Ghani 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining August 2005 Chicago, IL [Paper (PDF, 70K)] PDF Help A Bayesian Framework for Robust Reasoning from Sensor Networks V.A. Petrushin, R. Ghani and A.V. Gershman 2005 AAAI Spring Symposium on AI Technologies for Homeland Security March 21-23, 2005 Stanford University [Abstract for A Bayesian Framework for Robust Reasoning from Sensor Networks] [Paper (PDF)] PDF Help Building Intelligent Shopping Assistant Using Individual Consumer Models C. Cumby, A. Fano, R. Ghani and M. Krema Proceedings of the 2005 International Conference on Intelligent User Interfaces January 9-12, 2005 San Diego, California [Abstract] [Paper (PDF)] Predicting the End-price of Online Auctions R. Ghani and H. Simmons International Workshop on Data Mining and Adaptive Modelling Methods for Economics and Management held in conjunction with the 15th European Conference on Machine Learning (ECML/PKDDD 2004) Pisa, Italy [Abstract for Predicting the End-price of Online Auctions] [Paper (PDF)] Mining the Web to Add Semantics to Retail Data Mining R. Ghani Invited Paper. Web Mining: From Web to Semantic Web. Springer Lecture Notes in Artificial Intelligence , Vol. 3209. Berendt, B.; Hotho, A.; Mladenic, D.; van Someren, M.; Spiliopoulou, M.; Stumme, G. (Eds.) 2004 [Paper (PDF)] Predicting Customer Shopping Lists from Point-of-sale Purchase Data Chad Cumby, Andy Fano, Rayid Ghani and Marko Krema 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining August 2004 Seattle, Washington [Abstract for Predicting the Customer Shopping Lists from Point-of-sale Purchase Data] [Paper PDF] Active Learning for Information Extraction with Multiple View Feature Sets Rayid Ghani, Rosie Jones, Tom Mitchell and Ellen Riloff Workshop on Adaptive Text Extraction & Mining at the European Conference on Machine Learning(ECML 2003), Dubrovnik, Croatia [Abstract for Active Learning for Information Extraction with Multiple View Feature Sets] [Paper PDF] Using Text Mining to Infer Semantic Attributes for Retail Data Mining Rayid Ghani and Andrew E. Fano IEEE International Conference on Data Mining, December 9-12, 2002. Maebashi, Japan [Abstract for Using Text Mining to Infer Semantic Attributes for Retail Data Mining] [Paper PDF] Building Recommender Systems Using a Knowledge Base of Product Semantics Rayid Ghani and Andrew Fano Workshop on Recommendation and Personalization in ECommerce (RPEC 2002) at the Second International Conference on Adaptive Hypermedia and Adaptive Web-based Systems (AH 2002), 28 May 2002, Malaga, Spain [Abstract for Building Recommender Systems Using a Knowledge Base of Product Semantics] [Paper PDF] Data Mining on Symbolic Knowledge Extracted from the Web Rayid Ghani, Rosie Jones, Dunja Mladenic, Kamal Nigam and Sean Slattery Workshop on Text Mining at the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2000), 2000 [Abstract for Data Mining on Symbolic Knowledge Extracted from the Web] [Paper PDF] Publications: Semi-Supervised Learning—Labeled and Unlabeled Data Semi-Supervised Learning to Extract Attribute-Value pairs from Product Descriptions on the Web. Katharina Probst, Rayid Ghani , Yan Liu, Marko Krema, and Andrew Fano. Workshop on Web Mining - held with theEuropean Conference on Machine Learning (ECML/PKDD 2006). Combining Labeled and Unlabeled Data for MultiClass Text Categorization Rayid Ghani International Conference on Machine Learning (ICML 2002), 8-12 July 2002, Sydney, Australia [Abstract for Combining Labeled and Unlabeled Data for MultiClass Text Categorization] [Paper PDF] Combining Labeled and Unlabeled Data for Text Classification with a Large Number of Categories Rayid Ghani First IEEE International Conference on Data Mining, 2001 [Abstract for Combining Labeled and Unlabeled Data for Text Classification with a Large Number of Categories] [Paper PDF] Analyzing the Effectiveness and Applicability of Co-Training Kamal Nigam and Rayid Ghani Ninth International Conference on Information and Knowledge Management (CIKM 2000), 2000 [Abstract for Analyzing the Effectiveness and Applicability of Co-Training] [Paper PDF] Understanding the Behavior of Co-Training Kamal Nigam & Rayid Ghani Proceedings of the Workshop on Text Mining at the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2000) Publications: Text & Hypertext Classification Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani, Sean Slattery and Yiming Yang 18th International Conference on Machine Learning (ICML 2001), 2001 [Abstract for Hypertext Categorization using Hyperlink Patterns and Meta Data] [Paper PDF] A Study of Approaches for Hypertext Categorization Yiming Yang, Sean Slattery and Rayid Ghani Journal of Intelligent Information Systems—Special Issue on Automatic Text Categorization, 2001 [Abstract for A Study of Approaches for Hypertext Categorization] [Paper PDF] Using Error-Correcting Codes for Efficient Text Classification with a Large Number of Categories Rayid Ghani Masters Thesis. Center for Automated Learning & Discovery, Carnegie Mellon University (2001) Using Error-Correcting Codes for Text Classification Rayid Ghani 17th International Conference on Machine Learning (ICML 2000), 2000 [Abstract for Using Error-Correcting Codes for Text Classification] [Paper PDF] Publications: Information Extraction Towards Interactive Active Learning in Multi-View Feature Sets for Information Extraction.Katharina Probst and Rayid Ghani. European Conference on Machine Learning (ECML/PKDD 2007). Learning to Extract Attributes from Product Descriptions. Katharina Probst, Rayid Ghani , Yan Liu, Marko Krema, and Andrew Fano. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 2007). A Comparison of Efficacy and Assumptions of Bootstrapping Algorithms for Training Information Extraction Systems Rayid Ghani and Rosie Jones (Carnegie Mellon University) Workshop on Linguistic Knowledge Acquisition and Representation: Bootstrapping Annotated Data at the Linguistic Resources and Evaluation Conference (LREC 2002), 27 May 2002, Las Palmas, Spain [Abstract for A Comparison of Efficacy and Assumptions of Bootstrapping Algorithms for Training Information Extraction Systems] [Paper PDF] Publications: Web Mining Building Minority Language Corpora by Learning to Generate Web Search Queries Rayid Ghani, Rosie Jones and Dunja Mladenic Journal of Knowledge and Information Systems (KAIS), 2003 [Abstract for Building Minority Language Corpora by Learning to Generate Web Search Queries] [Paper PDF] Using the Web to Create Minority Language Corpora Rayid Ghani, Rosie Jones and Dunja Mladenic Tenth International Conference on Information and Knowledge Management (CIKM 2001), 2001 [Abstract for Using the Web to Create Minority Language Corpora] [Paper PDF] Online Learning for Query Generation: Finding Documents Matching a Minority Concept on the Web Rayid Ghani, Rosie Jones and Dunja Mladenic First Asia Pacific Conference on Web Intelligence, 2000 [Abstract for Online Learning for Query Generation: Finding Documents Matching a Minority Concept on the Web] [Paper PDF] Automatic Web Search Query Generation to Create Minority Language Corpora Rayid Ghani, Rosie Jones, and Dunja Mladenic Poster Paper in the Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001) Learning a Monolingual Language Model from a Multilingual Text Database Rayid Ghani & Rosie Jones Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM 2000) Automatically Building a Corpus for a Minority Language from the Web Rosie Jones & Rayid Ghani Proceedings of the Student Workshop at the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000) For a complete list of publications, visit http://www.cs.cmu.edu/~rayid. Recent Research Community Activities Co-organizer, Workshop on Data Mining for Business Applications – held with ACM Conference on Knowledge Discovery & Data Mining (KDD 2006) Co-organizer, Workshop on Learning from Partially Classified Data—held with International Conference on Machine Learning, 2005. Co-organizer, Workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining—held with International Conference on Machine Learning, 2003. Co-organizer, Workshop on Operational Text Classification—held with Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003. Invited Speaker: Semantically-Enabled Knowledge Technologies (SEKT) Advisory Board Meeting. 2005 Invited Speaker: Workshop on "Learning with Multiple Views" at International Conference on Machine Learning (ICML 2005) Member of the Advisory Board for the European Union Project on Semantically Enabled Knowledge Technologies, 2004-2006. Invited Speaker: European Web Mining Forum held with European Conference on Machine Learning & Principles of Data Mining (ECML/PKDDD 2003). Program Committee Member: AAAI 2006 World Wide Web Conference (WWW) 2006 ACM Conference on Knowledge Discovery & Data Mining (KDD) 2005. ACM Conference on Research and Development in Information Retrieval (SIGIR) 2004, 2005. International Conference on Machine Learning ICML – 2003, 2004, 2006 Workshop on Learning from Multiple Views – held with International Conference on Machine Learning, 2005 IEEE International Conference on Tools with Artificial Intelligence (ICTAI-2004). Link Discovery Workshop (LinkKDD) held with Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004 Web Mining Workshop (WebKDD) held with Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004 Adaptive Text Extraction & Mining Workshop at ECML 2003 Text Learning Workshop at the International Conference on Machine Learning (ICML 2002) Operational Text Classification Workshop at the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002) Text Mining Workshop at the IEEE Conference on Data Mining, 2001. Reviewer:Machine Learning Journal (MLJ) Journal of Artificial Intelligence Research (JAIR) Journal of Machine Learning Research (JMLR) Journal of Neurocomputing ACM Transactions on Information Systems (TOIS) Education
- M.S., Knowledge Discovery and Data Mining, Carnegie Mellon University
- B.S., Computer Science, Mathematics, University of the South
Personal Interests Traveling, playing squash, volleyball and digital arts. To Top
|