CIS5930: Advanced Data Mining (Spring 2021)
Instructor: Peixiang Zhao
| Syllabus | Announcement | Schedule | Assignment | Resources |
Assignment Information
- There will be four assignments, each of which is designed for testing your understanding of the taught materials. It could be either programming or written analysis.
- All students are expected to follow the FSU Academic Honor Code.
- All assignments follow the "no-late" policy; That is, assignments received after the due time will receive zero credit.
- Assignment 1
- Topic: Data, Data Preprocessing
- Type: written analysis and programming
- Due time/date: 11:59pm Wednesday 2/10
- Submission: one submission package (zip or tar.gz format) including the written homework (pdf) and the software package (including C/C++ source code, makefile, and readme). Submit via Canvas.
- Assignment 2
- Topic: Data Preprocessing, Frequent Pattern Mining
- Type: written analysis
- Due time/date: 11:59pm Friday 3/5
- Submission: one pdf file submitted via Canvas.
- Assignment 3
- Topic: Classification
- Type: machine code
- Due time/date: 11:59pm Sunday 3/28
- Submission: one .zip package file submitted via Canvas.
- Assignment 4
- Topic: Classification, Clustering
- Type: written analysis
- Due time/date: 11:59pm Friday 4/16
- Submission: one pdf file submitted via Canvas.
Project Information
- The semester-long project involves a systematic study for a data mining research topic, by reading and understanding scientific publications, and writing a survey-like summary for that topic;
- The project needs to be done individually;
- The deliverables include (1) Project proposal (1-to-2 page): 10%; (2) Project presentation (15-20 minutes video): 30%; (3) Project report (5-8 pages, single column, Latex-preparation preferred): 60%.
- Some recommended topics (and readings) are as follows:
- Tree-based ensemble learning
- XGBoost: a Scalable Tree Boosting System. KDD'16
- LightGBM: a Highly Efficient Gradient Boosting Decision Tree. NeurIPS'17
- CatBoost: Unbiased Boosting with Categorical Features. NeurIPS'18
- Frequent graph pattern mining
- gSpan: Graph-based Substructure Pattern Mining. ICDM'03
- A Quickstart in Frequent Structure Mining Can Make a Difference. KDD'04
- Flexible and Feasible Support Measures for Mining Frequent Patterns in Large Labeled Graphs. SIGMOD'17
- Generative Adversarial Nets (GAN)
- Generative Adversarial Nets. NeurIPS'14
- Wasserstein GAN. Arxiv'17
- Are GANs Created Equal? NeurIPS'18
- Graph Embedding
- DeepWalk - Online Learning of Social Representations (KDD'14)
- LINE - Large-scale Information Network Embedding (WWW'15)
- Node2vec - Scalable Feature Learning for Networks (KDD'16)
- Data Sketching for Data Streams
- Mergeable Summaries (TODS'13)
- Efficient Frequent Directions Algorithm for Sparse Matrices (KDD'16)
- A high-performance algorithm for identifying frequent items in data streams (IMC'17)