DiscKNet: Discovering Knowledge from Scientific Research Networks
The goal of this EAGER project is to develop an infrastructure to collect and mine the enriched scientific research network for scientific impact, emerging trends, new research areas, etc. The development of this infrastructure, entails:- Building a platform to automatically collect scientific data: publications, discussion forums, mailing lists, reports from research blogs, conference pages and common interest groups in social media such as Facebook and Twitter, LinkedIn, etc;
- Designing a sub-system to manage and clean massive collected data, which mainly include an efficient author names disambiguation sub-system, and a missing data inference sub-system;
- Constructing a scientific research network based on the design, development, and application of data mining techniques on this network which will lead to a scientific discovery process through the identification of high impact institutions tools and techniques, trends and usage patterns, common issues with software tools;
- providing a platform for scientists, experimentalists, research centers to discovering dynamics of scientific progress and new trends of scientific topics over scientific network.
- Sep. 2014, a project related paper ''Social Role Identification via Dual Uncertainty Minimization Regularization" has been accepted in the 14th IEEE International Conference on Data Mining (ICDM 2014).
- April 2014, a project related paper ''Batch Mode Active Learning with Hierarchical-Structured Embedded Variance" has been accepted in 2014 SIAM International Conference on Data Mining (SDM 2014).
- Nov. 2013, a project related paper "Bootstrapping Active Name Disambiguation with Crowdsourcing" has been accpeted the 22th ACM International Conference on Information and Knowledge Management (CIKM 2013).

Download Source Codes
Download Sample Datasets
Yu Cheng, Ankit Agrawal, Huan Liu, and Alok Choudhary. "Social Role Identification via Dual Uncertainty Minimization Regularization".
In the international Conference on Data Mining, December 2014 in Shenzhen, China.
Yusheng Xie, Zhengzhang Chen, Diana Palsetia, Ankit Agrawal, and Alok Choudhary. Indexing Bipartite Memberships in Web Graphs.
In the international Conference on Advances in Social Network Analysis and Mining, August 2014 in Beijing, China.
Yu Cheng, Zhengzhang Chen, Lu Liu, Jiang Wang, Ankit Agrawal, and Alok Choudhary. Feedback-Driven Multiclass Active Learning
for Data Streams. In the 22th ACM International Conference on Information and Knowledge Management, October 2013.
Yu Cheng, Zhengzhang Chen, Jiang Wang, Ankit Agrawal, and Alok Choudhary. Bootstrapping Active Name Disambiguation with Crowdsourcing.
In the 22th ACM International Conference on Information and Knowledge Management, October 2013.
Lu Liu, Jie Tang, Yu Cheng, Ankit Agrawal, Wei-keng Liao, and Alok Choudhary. Mining Diabetes Complication and Treatment Patterns
for Clinical Decision Support. In the ACM International Conference on Information and Knowledge Management (CIKM 2013).
Yu Cheng, Yusheng Xie, Zhengzhang Chen, Ankit Agrawal, Alok Choudhary, and Songtao Guo. JobMiner: A Real-time System for Mining
Job-related Patterns from Social Media (Demo). In the 19th ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining (KDD), pp. 1450–1453, August 2013
Kathy Lee, Ankit Agrawal, and Alok Choudhary. Real-time Disease Surveillance Using Twitter Data: Demonstration On Flu And
Cancer (Demo). In the 19th ACM SIGKDD International Conference on knowledge Discovery And Data Mining, pp. 1474–1477, August 2013.
Kathy Lee, Ankit Agrawal, and Alok Choudhary. Real-Time Digital Flu Surveillance using Twitter Data. In the SDM Workshop
on Data Mining for Medicine and Healthcare (DMMH), pp. 19–27, May 2013.
Yu Cheng, Yusheng Xie, Kunpeng Zhang, Ankit Agrawal, Dan Honbo, Alok Choudhary, "CluChunk: Clustering Large Scale User-generated Content Incorporating Chunklet Information" is an accepted workshop paper at ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 12-16, 2012 in Beijing, China.
Yu Cheng, Yusheng Xie, Kunpeng Zhang, Ankit Agrawal, Dan Honbo, Alok Choudhary, "How Online Content is Received by Users in Social Media: A Case Study on Facebook.com Posts" is an accepted workshop paper at ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 12-16, 2012 in Beijing, China.
Diana Palsetia, Md. Mostofa Ali Patwary, Kunpeng Zhang, Kathy Lee, Christopher Moran, Yves Xie, Daniel Honbo, Ankit Agrawal, Wei-keng Liao, Alok Choudhary, "User-Interest based Community Extraction in Social Networks" is an accepted workshop paper at ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 12-16, 2012 in Beijing, China.
Yusheng Xie, Yu Cheng, Daniel Honbo, Kunpeng Zhang, Ankit Agrawal, Alok Choudhary, "Crowdsourcing Recommendations From Social Sentiment" is an accepted workshop paper at ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 12-16, 2012 in Beijing, China.
Yusheng Xie, Daniel Honbo, Kunpeng Zhang, Yu Cheng, Ankit Agrawal, Alok Choudhary, "VOXSUP: A Social Engagement Framework" is an accepted workshop paper at ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August 12-16, 2012 in Beijing, China.
Kunpeng Zhang, Yu Cheng, Yusheng Xie, Ankit Agrawal, Alok Choudhary at Special Interest Group on Information Retrieval (SIGIR), August 12-16, 2012 in Portland, Oregon.
Ankit Agrawal and Alok Choudhary. Identifying HotSpots in Lung Cancer Data Using Association Rule Mining. In the Workshop on Biological Data Mining and its Applications in Healthcare: Prediction, Extremes, and Impacts, held in conjunction with the IEEE International Conference on Data Mining, December 2011. (pdf)
Lalith Polepeddi, Ankit Agrawal, and Alok Choudhary. Poll: A Citation-Text-Based System for Identifying High-Impact Contributions of an Article. In the Workshop on Data Mining in Networks, held in conjunction with the the IEEE International Conference on Data Mining, December 2011. (pdf)
>William Hendrix, Isaac Tetteh, Ankit Agrawal, Fredrick Semazzi, Wei-keng Liao, and Alok Choudhary. Community Dynamics and Analysis of Decadal Trends in Climate Data. In the Workshop on Knowledge Discovery from Climate Data: Prediction, Extremes, and Impacts, held in conjunction with the the IEEE International Conference on Data Mining, December 2011. (pdf)
Kunpeng Zhang, Yu Cheng, Yusheng Xie, Ankit Agrawal, Diana Palsetia, Kathy Lee, Wei-keng Liao, and Alok Choudhary. SES: Sentiment Elicitation System for Social Media Data. In the Workshop on Sentiment Elicitation from Natural Text for Information Retrieval and Extraction, held in conjunction with the the IEEE International Conference on Data Mining, December 2011. (pdf)
Kathy Lee, Diana Palsetia, Md. Mostofa Ali Patwary, Ankit Agrawal, Alok Choudhary, and Ramanathan Narayanan. Twitter Trending Topic Classification. In the Workshop on Optimization Based Methods for Emerging Data Mining Problems, held in conjunction with the the IEEE International Conference on Data Mining, December 2011. (pdf)
Yu Cheng, Kunpeng Zhang, Yusheng Xie, Ankit Agrawal, Wei-keng Liao, and Alok Choudhary. Learning to Group Web Text Incorporating Prior Information. In the Workshop on Optimization Based Methods for Emerging Data Mining Problems, held in conjunction with the the IEEE International Conference on Data Mining, December 2011. (pdf)
Related Links
- Understanding, Analyzing, and Retrieving Knowledge from Social Media
- Microsoft Academic Search
- The DBLP Computer Science Bibliography
This work is supported by the National Science Foundation, Office of CyberInfrastructure (OCI), under award number OCI-1144061.