Project Title: ITR: Automatic On-The-Fly Detection, Characterization, Recovery and Correction of Software Bugs in Production Runs
National Science Foundation Award Number: CCR-0325603 (Sept. 2003—Aug. 2007) Investigators: Josep Torrellas torrella@cs.uiuc.edu (Principal Investigator)
Jiawei Han (Co-Principal Investigator) Yuanyuan Zhou (Co-Principal Investigator) Samuel P. Midkiff (Co-Principal Investigator) Jiawei Han, Co-PI Department of Computer Science University of Illinois, Urbana-Champaign1304 West Springfield Ave. , Urbana, Illinois 61801 U.S.A.
Office: (217) 333-6903, Fax: (217) 244-6500
E-mail: hanj@cs.uiuc.edu, URL:
http://www.cs.uiuc.edu/~hanj
List of Supported Students and Research Scientists:
Chao Liu, Ph.D. student, Department of
Computer Science,
Xifeng Yan, Ph.D.
student, Department of Computer Science,
Keywords: Software debugging, on-the-fly stream data mining, scalable algorithms, online mining algorithms, data mining applications
Project Description:We propose to develop a comprehensive debugging system for automatic on-the-fly debugging of production runs. Our system addresses all the aspects of debugging, including bug detection, characterization, recovery and correction. The system tightly integrates innovations in computer hardware, operating system, data mining, and compiler support. More specifically, our proposed system hinges on the following innovations: (1) low-overhead compiler-directed checking to detect bugs and pass information to the data miner, operating system, and hardware to characterize the bugs; (2) Low-overhead data mining algorithms that build and use models for bug detection, characterization, and correction; (3) Novel hardware support to roll back and deterministically re-execute buggy sections of code with very low overhead and transparently to the user; and (4) operating system support to roll back and re-execute code sections that cannot be supported by hardware. These four layers are tightly integrated in a software prototype. Our work directly aims at improving what has historically been the dark spot of the IT revolution: poor programmer productivity. We can meet the software debugging challenge by focusing on debugging production runs, thanks to providing very low-overhead bug characterization support in hardware. The additional compiler, data miner, and operating system layers can enhance the power of the hardware once the bug is detected. We hope the work will have a broad impact, since expediting the debugging process can lead to dramatic increases in the productivity of IT professionals and students. In addition, the ideas developed can also be used for other anomaly detection, such as intrusion detection and security attacks, which is very important to our society.
Publications and Products:
Journal articles
1. Jian Pei, Jiawei
Han, Hongjun Lu, Shojiro Nishio, Shiwei Tang, and Dongqing Yang, “H-Mine: Fast and Space-Preserving
Frequent Pattern Mining in Large Databases”, IIE Transactions,
39:593-605, 2007.
2. Chulyun Kim, Sangkyum
Kim, Russell Dorer, Dan Xie,
Jiawei Han, and Sheng Zhong, “TagSmart: Analysis
and Visualization for Yeast Mutant Fitness Data Measured by Tag Microarrays”, BMC Bioinformatics, 8:128, April 2007.
(http://www.biomedcentral.com/1471-2105/8/128)
3.
4. Jiawei Han, Hong Cheng, Dong Xin, and Xifeng Yan, “Frequent Pattern Mining: Current Status and
Future Directions”, Data Mining and Knowledge Discovery, 14, 2007.
(Online version published on January 27, 2007, DOI 10.1007/s10618-006-0059-1 SpringerLink).
5. Dong Xin, Jiawei Han, Xifeng Yan and Hong Cheng, “On Compressing Frequent
Patterns”, Knowledge and Data Engineering (Special issue on Intelligent
Data Mining), 60(1): 5-29, 2007.
6. Dong Xin, Jiawei Han, Xiaolei Li, Zheng Shao, and Benjamin W. Wah, “Computing Iceberg Cubes by Top-Down and
Bottom-Up Integration: The StarCubing
Approach”, IEEE Transactions on Knowledge and Data Engineering, 19(1):
111-126, 2007.
7. Chao Liu, Long Fei,
Xifeng Yan, Jiawei Han, and Samuel P. Midkiff,
“Statistical Debugging: A Hypothesis Testing-based Approach”, IEEE
Transactions on Software Engineering, 32(10):831-848, 2006.
8. Yixin Chen, Guozhu
Dong, Jiawei Han,
9. Xifeng Yan, Feida Zhu, Philip S. Yu, and Jiawei
Han, “Feature-based Substructure Similarity Search”, ACM
Transactions on Database Systems, 31(4): 1418-1453, 2006.
10. Deng Cai, Xiaofei He, Jiawei Han and Hong-Jiang Zhang, “Orthogonal Laplacianfaces for Face Recognition”, IEEE
Transactions on Image Processing, 15(11): 3608-3614, 2006.
11. F. Pan, K. Kamath,
K. Zhang, S. Pulapura, A. Achar,
J. Nunez-Iglesias, Y. Huang, X. Yan,
J. Han, H. Hu, M. Xu, X. J.
Zhou. “Integrative Array Analyzer: A software package for analysis of
cross-platform and cross-species microarray
data”, Bioinformatics, 22(13): 1665-1667, 2006.
12. J. Wang, J. Han, and J. Pei, “Closed Constrained-Gradient Mining in Retail
Databases”, IEEE Transactions on Knowledge and Data Engineering,
18(6): 764-769, 2006.
13. X. Yin, J. Han, J. Yang and P. S.
Yu, “Efficient Classification
across Multiple Database Relations: A CrossMine
Approach”, IEEE Transactions on Knowledge and Data Engineering},
18(6): 770-783, 2006.
14. Charu Aggarwal,
Jiawei Han, Jianyong Wang,
and Philip S. Yu, “A Framework for
On-Demand Classification of Evolving Data Streams”, IEEE Transactions
on Knowledge and Data Engineering, 18(5):577-789, 2006.
15. Hwanjo Yu, Jiong
Yang, Jiawei Han, and Xiaolei
Li, “Making SVM Scalable to Large
Data Sets Using Hierarchical Indexing”, Data Mining and Knowledge
Discovery, 11(3): 295-321, 2005.
16. Jiawei Han, Yixin
Chen, Guozhu Dong, Jian
Pei, Benjamin W. Wah, Jianyong
Wang, and Y. Dora Cai, “Stream Cube: An Architecture for Multi-Dimensional Analysis of Data
Streams”, Distributed and Parallel Databases, 18(2): 173-197, 2005.
17. Xifeng Yan,
Philip Yu, and Jiawei Han, “Graph Indexing Based on Discriminative
Frequent Structure Analysis”, ACM Transactions on Database Systems,
30(4): 960-993 2005.
18. Deng Cai, Xiaofei He and Jiawei Han,
“Document Clustering Using Locality
Preserving Indexing”, IEEE Transactions on Knowledge and Data
Engineering, 17(12):1624-1637, 2005.
19. C. Aggarwal,
J. Han, J. Wang, and P. S. Yu, “On
Efficient Algorithms for High Dimensional Projected Clustering of Data Streams”,
Data Mining and Knowledge Discovery,
10:251-272, 2005.
20. Petre Tzvetkov,
Xifeng Yan, Jiawei Han, “TSP: Mining top-k closed sequential
patterns, Knowl. Inf. Syst.,
7(4): 438-457, 2005.
21. J. Wang, J. Han, Y. Lu, and P. Tzvetkov, “TFP: An Efficient Algorithm for Mining
Top-K Frequent Closed Itemsets”, IEEE
Transactions on Knowledge and Data Engineering}, 17(5):652-664, 2005.
22. K. Wang, Y. Jiang, J. X. Yu, G.
Dong, and J. Han, “Divide-and-Approximate:
A Novel Constraint Push Strategy for Iceberg Cube Mining”, IEEE Transactions on Knowledge and Data
Engineering, 17(3):354-368, 2005.
23. H.Yu, J. Han, K. C.-C. Chang, “PEBL:Web PageClassification
Without Negative Examples”, IEEE Transactions onKnowledge
and Data Engineering (Special Issue on Mining and Searching the Web),16(1):
70-81, 2004.
24. G. Dong, J. Han, J. Lam, J. Pei, K.
Wang, and W. Zou, “MiningConstrained
Gradients in Multi-Dimensional Databases”, IEEE Transactions on
Knowledge and Data Engineering, 16(6), 2004.
Book and Book Chapters
3. H. Yu, A. Doan, and
J. Han, "Mining for Information
Discovery on the
Web: Overview and Illustrative Research,"
4. P. Bajcsy,
J. Han, L. Liu, J. Yang, "A Survey of
Bio-Data Analysis from Data Mining Perspective," in D. Shasha, et al.
(eds.), Data Mining in Bioinformatics, Springer Verlag,
2004. pp. 9-39.
Refereed Conference Publications (Refereed Workshop Publications
are omitted due to limited space)
1. Chao Liu, Xiangyu
Zhang, Jiawei Han, Yu Zhang and Bharat K. Bhargava, “Failure Indexing: A
Dynamic Slicing Based Approach”, in Proc. 2007 IEEE Int. Conf. on
Software Maintenance (ICSM'07),
2. Deng Cai, Xiaofei He, and Jiawei Han,
“A Unified
Subspace Learning Framework for Content-Based Image Retrieval”, in
Proc. 2007 Int. Conf. on ACM Multimedia (ACM-MM'07), Augsburg, Germany, Sept.
2007.
3. Tianyi Wu, Yuguo
Chen and Jiawei Han, “Association Mining in
Large Databases: A Re-Examination of Its Measures”, in Proc. 2007
Int. Conf. on Principles and Practice of Knowledge Discovery in Databases
(PKDD'07), Warsaw, Poland, Sept. 2007.
4. Chen Chen,
Xifeng Yan, Philip S. Yu, Jiawei Han, DongQing Zhang, and Xiaohui Gu, “Towards Graph Containment
Search and Indexing”, in Proc. 2007 Int. Conf. on Very Large Data
Bases (VLDB'07), Vienna, Austria, Sept. 2007.
5. Hector Gonzalez, Jiawei
Han, Xiaolei Li, Margaret Myslinska,
and John Paul Sondag, “Adaptive Fastest
Path Computation on a Road Network: A Traffic Mining Approach”, in
Proc. 2007 Int. Conf. on Very Large Data Bases (VLDB'07),
6. Xiaolei Li and Jiawei
Han, “Mining
Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data”,
in Proc. 2007 Int. Conf. on Very Large Data Bases (VLDB'07), Vienna, Austria,
Sept. 2007.
7. Tainyi Wu, Xiaolei
Li, Dong Xin, Jiawei Han, Jacob
Lee, and Ricardo Redder, “DataScope:
Viewing Database Contents in Google Maps' Way”, in Proc. 2007 Int.
Conf. on Very Large Data Bases (VLDB'07), Vienna, Austria, Sept. 2007 (system
demo).
8. Xiaoxin Yin, Jiawei
Han, and Philip S. Yu, “Truth Discovery with
Multiple Conflicting Information Providers on the Web”, in Proc. 2007
ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'07), San
Jose, CA, Aug. 2007.
9. Xiaolei Li, Jiawei
Han, Jae-Gil Lee, and Hector Gonzalez, “Traffic Density-based
Discovery of Hot Routes in Road Networks”, in Proc. 2007 Int. Symp. on Spatial and Temporal
Databases (SSTD'07),
10. Deng Cai, Xiaofei He and Jiawei Han,
“Isometric
Projection”, in Proc. 2007 AAAI Conf. on Artificial Intelligence
(AAAI-07), Vancouver, B. C., Canada, July 2007.
11. Wen Jin, Anthony K.H. Tung, Martin
Ester, and Jiawei Han, “On Efficient
Processing of Subspace Skyline Queries on High Dimensional Data”, in
Proc. 2007 Int. Conf. on Scientific and Statistical Database Management
(SSDBM'07),
12. Deng Cai, Xiaofei He, Yuxiao Hu, Jiawei
Han, and Thomas Huang, “Learning a Spatially
Smooth Subspace for Face Recognition”, in Proc. 2007 IEEE Conf. on
Computer Vision and Pattern Recognition (CVPR'07),
13. Jae-Gil Lee, Jiawei
Han, and Kyu-Young Whang,
“Trajectory
Clustering: A Partition-and-Group Framework”, in Proc. 2007 ACM
SIGMOD Int. Conf. on Management of Data (SIGMOD'07), Beijing, China, June 2007.
14. Dong Xin, Jiawei Han, and Kevin C.-C. Chang, “Progressive and
Selective Merge: Computing Top-K with Ad-hoc Ranking Functions”, in
Proc. 2007 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'07), Beijing,
China, June 2007.
15. Feida Zhu, Xifeng
Yan, Jiawei Han, and Philip
S. Yu, “gPrune: A Constraint Pushing Framework for Graph Pattern
Mining”, in Proc. 2007 Pacific-Asia Conf. on Knowledge Discovery and
Data Mining (PAKDD'07), Nanjing, China, May 2007. (Best
Student Paper Award)
16. Jiawei Han, Hong Cheng, Dong Xin, and Xifeng Yan, “Frequent
Pattern Mining: Current Status and Future Directions”, Data Mining
and Knowledge Discovery, 14(1), 2007. (Online version published on January 27,
2007, DOI 10.1007/s10618-006-0059-1 SpringerLink).
17. Jing Gao, Wei
Fan, and Jiawei Han, “A General Framework for
Mining Concept-Drifting Data Streams with Skewed Distributions”, in
Proc. 2007 SIAM Int. Conf. on Data Mining (SDM'07), Minneapolis, MN, April
2007.
18. Xiaolei Li, Jiawei
Han, Sangkyum Kim, and Hector Gonzalez, “ROAM: Rule- and
Motif-Based Anomaly Detection in Massive Moving Object Data Sets”, in
Proc. 2007
19. Hong Cheng, Xifeng
Yan, Jiawei Han, and Chih-Wei Hsu, “Discriminative
Frequent Pattern Analysis for Effective Classification”, in Proc.
2007 Int. Conf. on Data Engineering (ICDE'07), Istanbul, Turkey, April 2007.
20. Feida Zhu, Xifeng
Yan, Jiawei Han, Philip S.
Yu, and Hong Cheng, “Mining Colossal
Frequent Patterns by Core Pattern Fusion”, in Proc. 2007 Int. Conf.
on Data Engineering (ICDE'07), Istanbul, Turkey, April 2007. (Best Student Paper Award)
21. Hector Gonzalez, Jiawei
Han, and Xuehua Shen,
“Cost-conscious
Cleaning of Massive RFID Data Sets”, in Proc. 2007 Int. Conf. on Data
Engineering (ICDE'07),
22. Xiaoxin Yin, Jiawei
Han, and Philip S. Yu, “Object Distinction:
Distinguishing Objects with Identical Names by Link Analysis”, in
Proc. 2007 Int. Conf. on Data Engineering (ICDE'07),
23. Wen Jin, Martin Ester, Zengjian Hu, and Jiawei Han, “The Multi-Relational
Skyline Operator”, in Proc. 2007 Int. Conf. on Data Engineering
(ICDE'07), Istanbul, Turkey, April 2007.
24. Deng Cai, Xiaofei He, Kun Zhou, Jiawei Han
and Hujun Bao, “Locality
Sensitive Discriminant Analysis”, in
Proc. 2007 Int. Joint Conf. on Artificial Intelligence (IJCAI'07), Hyderabad,
India, Jan. 2007.
25. Chao Liu, Zeng
Lian, and Jiawei Han,
“How
Bayesians Debug?”, in Proc. 2006 Int. Conf.
on Data Mining (ICDM'06),
26. Hong Cheng, Philip S. Yu, and Jiawei Han, “AC-Close: Efficiently
Mining Approximate Closed Itemsets by Core Pattern
Recovery”, in Proc. 2006 Int. Conf. on Data Mining (ICDM'06),
27. Chao Liu and Jiawei
Han, “Failure
Proximity: A Fault Localization-Based Approach”, in Proc. 14th ACM
SIGSOFT Symposium on the Foundations of Software Engineering (FSE'06),
28. Hector Gonzalez, Jiawei
Han, and Xiaolei Li, “Mining Compressed Commodity
Workflows From Massive RFID Data Sets”, in Proc. 2006 Int. Conf. on
Information and Knowledge Management (CIKM'06), Arlington, VA, Nov. 2006.
29. Xiaoxin Yin, Jiawei
Han, and Philip Yu, “LinkClus:
Efficient Clustering via Heterogeneous Semantic Links”, in Proc. 2006
Int. Conf. on Very Large Data Bases (VLDB'06),
30. Hector Gonzalez, Jiawei
Han, and Xiaolei Li, “FlowCube:
Constructuing RFID FlowCubes
for Multi-Dimensional Analysis of Commodity Flows”, in Proc. 2006
Int. Conf. on Very Large Data Bases (VLDB'06),
31. Dong Xin,
Chen Chen, and Jiawei Han, “Towards
Robust Indexing for Ranked Queries”, in Proc. 2006 Int. Conf. on Very
Large Data Bases (VLDB'06),
32. Dong Xin, Jiawei Han, Hong Cheng, and Xiaolei
Li, “Answering
Top-k Queries with Multi-Dimensional Selections: The Ranking Cube Approach”,
in Proc. 2006 Int. Conf. on Very Large Data Bases (VLDB'06),
33. Dong Xin,
Hong Cheng, Xifeng Yan, and
Jiawei Han, “Extracting
Redundancy-Aware Top-K Patterns”, in Proc. 2006 ACM SIGKDD Int. Conf.
on Knowledge Discovery and Data Mining (KDD'06), Philadelphia, PA, Aug. 2006.
34. Qiaozhu Mei, Dong Xin,
Hong Cheng, ChengXiang Zhai,
and Jiawei Han, “Generating
Semantic Annotations for Frequent Patterns with Context Analysis”, in
Proc. 2006 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining
(KDD'06), Philadelphia, PA, Aug. 2006. (Best Student
Paper Runner-Up Award)
35. Chao Liu, Chen Chen,
Jiawei Han, and Philip Yu, “GPLAG:
Detection of Software Plagiarism by Procedure Dependency Graph Analysis”,
in Proc. 2006 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining
(KDD'06),
36. Dong Xin, Xuehua Shen, Qiaozhu
Mei, and Jiawei Han, “Discovering
Interesting Patterns Through User's Interactive Feedback”, in Proc.
2006 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'06),
Philadelphia, PA, Aug. 2006.
37. Deng Cai, Xiaofei He and Jiawei Han,
“Tensor
Space Model for Document Analysis”, in Proc. 2006 Int. ACM SIGIR
Conf. on Research & Development on Information Retrieval (SIGIR'06),
Seattle, WA, Aug. 2006.
38. Hongyan Liu, Ying Lu, Jiawei
Han, and Jun He, “Error-Adaptive and
Time-Aware Maintenance of Frequency Counts over Data Streams”, in
Proc. 2006 Int. Conf. on Web-Age Information Management (WAIM'06),
39. Kaushik Chakrabarti,
Venkatesh Ganti, Jiawei Han, and Dong Xin, “Ranking Objects Based
on Relationships”, in Proc. 2006 ACM SIGMOD Int. Conf. on Management
of Data (SIGMOD'06), Chicago, IL, June 2006.
40. Xiaolei Li, Jiawei
Han, and Sangkyum Kim, “Motion-Alert:
Automatic Anomaly Detection in Massive Moving Objects”, Proc. 2006
IEEE Int. Conf. on Intelligence and Security Informatics (ISI'06), San Diego,
CA, May 2006.
41. Wen Jin, Anthony K. H. Tung, Jiawei Han, and Wei Wang, “Ranking Outliers
Using Symmetric Neighborhood Relationship,” in Proc. 2006
Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD'06),
Singapore, April 2006.
42. Hongyan Liu, Jiawei
Han, Dong Xin, and Zheng Shao, “Mining Interesting
Patterns from Very High Dimensional Data: A Top-Down Row Enumeration Approach,”
in Proc. 2006
43. Chao Liu, Xifeng
Yan, and Jiawei Han,
“Mining
Control Flow Abnormality for Logic Error Isolation,” in Proc. 2006
SIAM Int. Conf. on Data Mining (SDM'06), Bethesda, MD, April 2006.
44. Charu Aggarwal,
Chen Chen, and Jiawei Han,
“On the
Inverse Classification Problem and its Applications”, in Proc. 2006
Int. Conf. on Data Engineering (ICDE'06),
45. Hector Gonzalez, Jiawei Han, Xiaolei Li, and Diego Klabjan,
“Warehousing
and Analysis of Massive RFID Data Sets”, in Proc. 2006 Int. Conf. on
Data Engineering (ICDE'06),
46. Hongyan Liu, Jiawei
Han, Dong Xin, and Zheng Shao, “Top-Down Mining of
Interesting Patterns from Very High Dimensional Data”, in Proc. 2006
Int. Conf. on Data Engineering (ICDE'06), Atlanta, Georgia, April 2006.
47. Dong Xin, Jiawei Han, Zheng Shao, and Hongyan Liu, “C-Cubing: Efficient
Computation of Closed Cubes by Aggregation-Based Checking”, in Proc.
2006 Int. Conf. on Data Engineering (ICDE'06),
48. Xifeng Yan, Feida Zhu, Jiawei Han, and Philip
Yu, “Searching
Substructures with Superimposed Distance”, in Proc. 2006 Int. Conf.
on Data Engineering (ICDE'06), Atlanta, Georgia, April 2006.
49. Deng Cai, Zheng Shao, Xiaofei
He, Xifeng Yan, Jiawei Han, “Community Mining from
Multi-Relational Networks”, in Proc. 2005 European Conf. on
Principles and Practice of Knowledge Discovery in Databases (PKDD'05), Porto,
Portugal, Oct., 2005.
50. Wen Jin, Martin Ester and Jiawei Han, “Efficient
Processing of Ranked Queries with Sweeping Selection”, in Proc. 2005 European Conf. on Principles
and Practice of Knowledge Discovery in Databases (PKDD'05), Porto, Portugal,
Oct., 2005.
51. Xiaoxin Yin and Jiawei
Han, “Efficient
Classification from Multiple Heterogeneous Databases”, in Proc. 2005
European Conf. on Principles and Practice of Knowledge Discovery in Databases
(PKDD'05), Porto, Portugal, Oct., 2005.
52. C. Liu, X. Yan,
L. Fei, J. Han, and S. Midkiff,
“SOBER: Statistical
Model-based Bug Localization”, Proc. 2005 ACM SIGSOFT Symp. on the Foundations of
Software Engineering (FSE 2005),
53. D. Xin, J.
Han, X. Yan and H. Cheng, “Mining Compressed
Frequent-Pattern Sets”, Proc. 2005 Int. Conf. on Very Large Data
Bases (VLDB'05),
54. X. Yan, H.
Cheng, J. Han, and D. Xin, “Summarizing Itemset Patterns: A Profile-Based Approach”,
Proc. 2005 Int. Conf. on Knowledge Discovery and Data Mining (KDD'05),
55. X. Yan, X.
J. Zhou, and J. Han, “Mining Closed
Relational Graphs with Connectivity Constraints”, Proc. 2005 Int.
Conf. on Knowledge Discovery and Data Mining (KDD'05),
56. X. Yin, J. Han, and P.S. Yu, “Cross-Relational
Clustering with User's Guidance”, Proc. 2005 Int. Conf. on Knowledge
Discovery and Data Mining (KDD'05),
57. S. Cong, J. Han, and D.
58. D. Cai and X. He. “Orthogonal
Locality Preserving Indexing”, Proc. 2005 Int. Conf. on Research and
Development in Information Retrieval (SIGIR'05), Salvador, Brazil, Aug. 2005.
59. X. Yin, J. Han, and J. Yang, “Searching for Related
Objects in Relational Databases”, in Proc. 2005 Int. Conf. on
Scientific and Statistical Database Management (SSDBM'05), Santa Barbara, CA,
June 2005.
60. H. Hu, X. Yan, Yu, J. Han and X. J. Zhou, “Mining Coherent Dense Subgraphs across Massive Biological Networks for Functional
Discovery”, in Proc. 2005 Int. Conf. on Intelligent Systems for
Molecular Biology (ISMB 2005), Ann Arbor, MI, June 2005.
61. X. Yan, P.
S. Yu, and J. Han, “Substructure
Similarity Search in Graph Databases”, in Proc. 2005 ACM-SIGMOD Int.
Conf. on Management of Data (SIGMOD'05), Baltimore, Maryland, June 2005.
62. C. Liu, X. Yan,
H. Yu, J. Han, and P. S. Yu, “Mining Behavior
Graphs for Backtrace of Noncrashing
Bugs”, in Proc. 2005 SIAM Int. Conf. on Data Mining (SDM'05), Newport
Beach, CA, April 2005.
63. H. Cheng, X. Yan,
and J. Han, “SeqIndex: Indexing Sequences by Sequential Pattern Analysis”,
in Proc. 2005
64. X. Li, J. Han, X. Yin, and D. Xin, “Mining Evolving
Customer-Product Relationships in Multi-Dimensional Space”, in Proc.
2005 Int. Conf. on Data Engineering (ICDE'05), Tokyo, Japan, April 2005.
65. X. Yan, X.
J. Zhou, J. Han, “Mining Closed
Relational Graphs with Connectivity Constraints”, in Proc. 2005 Int.
Conf. on Data Engineering (ICDE'05), Tokyo, Japan, April 2005.
66. S. Cong, J. Han and D. Padua,
“A
Sampling-based Framework for Parallel Data Mining,” in Proc. 2005 ACM
SIGPLAN Symp. on Principles
& Practice of Parallel Programming (PPOPP'05),
67. W. Jin, J. Han, and M. Ester,
“Mining
Thick Skylines over Large Databases”, Proc. 2004 European Conf. on
Principles of Principles and Practice of Knowledge Discovery in Databases
(PKDD’04), Pisa, Italy, Sept. 2004.
68. C. Aggarwal, J. Han,
J. Wang, and P. S. Yu, “A Framework for
Projected Clustering of High Dimensional Data Streams”, Proc. 2004
Int. Conf. on Very Large Data Bases (VLDB'04), Toronto, Canada, Aug. 2004.
69. X. Li, J. Han, and H. Gonzalez,
“High-Dimensional
OLAP: A Minimal Cubing Approach”, Proc. 2004 Int. Conf. on
Very Large Data Bases (VLDB'04),
70. C. Aggarwal,
J. Han, J. Wang, and P. S. Yu, “On Demand
Classification of Data Streams”, Proc. 2004 Int. Conf. on Knowledge
Discovery and Data Mining (KDD'04),
71. H. Cheng, X. Yan, and J. Han,
“IncSpan: Incremental
Mining of Sequential Patterns in Large Database”, Proc. 2004
Int. Conf. on Knowledge Discovery and Data Mining (KDD'04),
72. B. He, K.C.-C. Chang, and J. Han,
“Discovering
Complex Matchings across Web Query Interfaces: A
Correlation Mining Approach”, Proc.
2004 Int. Conf. on Knowledge Discovery and Data Mining (KDD'04),
73. Y. Li, J. Han, and J. Yang, “Clustering Moving
Objects”, Proc. 2004 Int. Conf. on Knowledge Discovery and Data
Mining (KDD'04), Seattle, WA, Aug. 2004.
74. A. Wu, M. Garland, and J. Han,
“Mining
Scale-Free Networks using Geodesic Clustering”, Proc. 2004 Int. Conf.
on Knowledge Discovery and Data Mining (KDD'04),
75. J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.-C.
Hsu, “Mining
Sequential Patterns by Pattern-Growth: The PrefixSpan
Approach”, IEEE Transactions on Knowledge and Data Engineering,
16(10), 2004.
76. J. Han, J. Pei, and X. Yan, “From Sequential
Pattern Mining to Structured Pattern Mining: A Pattern-Growth Approach,”
Journal of Computer Science and Technology, 19(3): 257-279, 2004.
77. Z. Shao,
J. Han, and D. Xin, “MM-Cubing:
Computing Iceberg Cubes by Factorizing the Lattice Space”, Proc. 2004
Int. Conf. on Scientific and Statistical Database Management (SSDBM'04),
78. Y. Li, J. Yang, and J. Han, “Continuous K-Nearest
Neighbor Search for Moving Objects”, Proc. 2004 Int. Conf. on
Scientific and Statistical Database Management (SSDBM'04), Santorini
Island, Greece, June 2004.
79. J. Han, J. Pei, Y. Yin and R. Mao,
“Mining Frequent
Patterns without Candidate Generation: A Frequent-Pattern Tree Approach”,
Data Mining and Knowledge Discovery, 8(1):53-87, 2004.
80. X. Yan, P.
S. Yu, and J. Han, “Graph Indexing: A
Frequent Structure-based Approach”, Proc. 2004 ACM-SIGMOD Int. Conf.
on Management of Data (SIGMOD'04), Paris, France, June 2004.
81. Y. D. Cai,
D. Clutter, G. Pape, J. Han, M. Welge,
and L. Auvil, “MAIDS: Mining
Alarming Incidents from Data Streams”, (system demonstration), Proc.
2004 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'04), Paris, France, June
2004.
82. W.-Y. Kim, Y.-K. Lee, and J. Han,
“CCMine: Efficient Mining of Confidence-Closed Correlated
Patterns”, Proc. 2004 Pacific-Asia Conf. on Knowledge Discovery and
Data Mining (PAKDD'04),
83. X. Yin, J. Han, J. Yang, and P. S.
Yu, “CrossMine: Efficient Classification across Multiple
Database Relations”, Proc. 2004 Int. Conf. on Data Engineering (ICDE'04),
84. J. Wang and J. Han, “BIDE: Efficient Mining
of Frequent Closed Sequences”, Proc. 2004 Int. Conf. on Data
Engineering (ICDE'04),
85. X. Yan and
J. Han, ``CloseGraph: Mining Closed Frequent Graph Patterns'',
Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining
(KDD'03),
86. H. Yu, J. Yang, and J. Han, ``Classifying Large Data Sets Using SVM with
Hierarchical Clusters'', Proc. 2003 ACM SIGKDD Int. Conf. on
Knowledge Discovery and Data Mining (KDD'03),
87. J. Wang, J. Han, and J. Pei, ``CLOSET+: Searching for the Best Strategies for Mining
Frequent Closed Itemsets'', Proc. 2003
ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03),
88. H. Wang, W. Fan, P. S. Yu, and J.
Han, ``Mining Concept-Drifting Data Streams
using Ensemble Classifiers'', Proc. 2003 ACM SIGKDD Int. Conf.
on Knowledge Discovery and Data Mining (KDD'03), Washington, D.C., Aug.
2003.
89. C. Aggarwal,
J. Han, J. Wang, and P. S. Yu, ``A
Framework for Clustering Evolving Data Streams'', Proc. 2003
Int. Conf. on Very Large Data Bases (VLDB'03),
90. D. Xin, J.
Han, X. Li, and B. W. Wah, ``Star-Cubing: Computing Iceberg Cubes by Top-Down and
Bottom-Up Integration'', Proc. 2003 Int. Conf. on Very Large
Data Bases (VLDB'03),
Project Impact:
1. Education:
Parts of the new research results will be used in Data Mining courses
(CS397Han, CS412) for both undergraduate and graduate students being taught in
the Department of Computer Science, the
2. Collaborations: For this project we have established a corporation with
Goals, Objectives, and Targeted Activities:
This proposal researches methods that support efficient and scalable data mining operations on stream data. We will be focus on mining dynamics of data streams in multiple dimensional space and its applications in industry (such as mining unusual patterns and outliers), Internet (such as network intrusi