E-mail: hanj at cs.uiuc.edu, URL: http://www.cs.uiuc.edu/~hanj
List of Supported Students, Staff, and Collaborators
1.
Jiawei Han, PI
2.
Xiaoxin Yin, Ph.D.
student, Department of Computer Science,
3.
Philip S.Yu, manager of the Software Tools and Techniques group,
4.
Jiong Yang, Schroeder Assistant Professor,
Electrical Engineering and Computer Science Department,
Project Summary
Most of structured information in this
world is stored in relational databases. Different relations in a database are
interconnected with each other according to the database schema created during
database design, and the linkages between relations indicate semantic
relationships between different objects. The structural information and
linkages in relational databases provide a rich source of information for data
mining. Unfortunately, most data mining techniques today can only be applied to
data stored in single "flat" tables. The scope of this project
includes a variety of tasks on data mining and knowledge discovery from
relational databases. It focuses on discover structural information and linkages
from databases, and using such information in different tasks such as
classification, clustering, outlier detection, etc. Our methodology includes
designing efficient and scalable method for exploring multi-relational data,
and using such methods to discover inherent properties and linkages among such
data.
This study will
contribute to the development of principles and new approaches in knowledge
discovery in multi-relational data, which are of essential importance in a
variety of strategic applications including financial decision support,
customer-relationship analysis, and bioinformatics.
Publications and Products
1. X. Li, J. Han, X. Yin, and D. Xin,
Mining Evolving
Customer-Product Relationships in Multi-Dimensional Space, Proc. 2005 Int.
Conf. on Data Engineering (ICDE'05), Tokyo, Japan, April 2005.
2. X. Yan, X. J. Zhou, J. Han,
Mining Closed
Relational Graphs with Connectivity Constraints, Proc. 2005 Int. Conf. on
Data Engineering (ICDE'05), Tokyo, Japan, April 2005.
3. X. Yin, J. Han, J. Yang, and P. S. Yu, CrossMine: Efficient Classification across Multiple
Database Relations, Proc. 2004 Int. Conf. on Data Engineering (ICDE'04),
4. X. Yin and J. Han, CPAR:
Classification based on Predictive Association Rules, Proc. 2003 SIAM Int.Conf. on Data Mining (SDM'03),
Project
Impact
1. Research Progress: A set of new algorithms and methods (as well as software packages) are developed for mining multi-relational databases. Many of these methods can be used by industry and other agencies.
2.
Education: Parts of this research are
used in a Data Mining graduate course taught at the
3.
Collaborations: For this project we have
established a cooperation with
Current and Future Activities
The
following are some of the highlights of our ongoing work. Please refer to the section: Publications and
Products section for related references
1. Development of efficient and scalable
multi-relational clustering approaches, based on our work of CrossMine published at ICDE'04.
2. Development of efficient and accurate record linkage
approaches based on multi-relational data.
3. Further development of efficient and accuracy methods
for multi-relational classificationmethods, based on
our work of CrossMine published at ICDE'04.
Area Background
Multi-relational
data mining is a new topic proposed a few years ago. It is related to Inductive
Logic Programming, which aims at finding hypothesis by induction based on
knowledge that may be represented in relational form. Multi-relational data
mining explores a much broader scope in both methodologies and applications,
including various data mining tasks such as classification, clustering, outlier
detection, temporal analysis, etc.
Area References
·
[1] H. Blockeel, L. De Raedt, and J.
Ramon. Top-down induction of logical decision trees. In Proc. 1998 Int. Conf.
Machine Learning,
·
[2] S. Dzeroski, N. Lavac (editors).
Relational data mining. Springer,
·
[3].
·
[4] S. Muggleton. Inductive Logic Programming. Academic Press,
·
[5] S. Muggleton and C. Feng. Efficient
induction of logic programs. In Proc. 1990 Conf. Algorithmic Learning Theory,
·
[6] J. Neville,
D. Jensen, L. Friedland, and M. Hay. Learning
Relational Probability Trees. Proc. 2003 Int. Conf. Knowledge Discovery and
Data Mining,
·
[7] J. R.
Quinlan and R. M. Cameron-Jones. FOIL: A midterm report. In Proc. 1993 European
Conf. Machine Learning,
·
[8] B. Taskar, E. Segal, and D. Koller.
Probabilistic classiˉcation and clustering in
relational data. In Proc. 2001 Int. Joint Conf. Artiˉcial
Intelligence,
Potential Related Projects
The project is closely related to many
research projects on knowledge discovery in databases and their applications,
such as homeland security, bioinformatics, etc.
Project Web site URL: http://www.cs.uiuc.edu/~hanj/projs/dbmine.html
Online software: Online software related to this project can be
downloaded at Software
Downloads
Online resources: Research publications related to this project can be
downloaded at Selected
Publications