CAP5778: Advanced Data Mining (Fall 2024)
Instructor: Peixiang Zhao
| Syllabus | Announcement | Schedule | Assignment | Resources |
Assignment Information
- There will be four assignments, each of which is designed for testing your understanding of the taught materials. It could be either programming or written analysis.
- All students are expected to follow the FSU Academic Honor Code.
- All assignments follow the "no-late" policy; That is, assignments received after the due time will receive zero credit.
Project Information
- The semester-long project involves a systematic study for a data mining research topic, by reading and understanding scientific publications, and writing a survey-like summary for that topic;
- The projects need to be done in groups; Each group can have at most 3 members;
- The deliverables include (1) Project proposal (1-to-2 page): 5 points; (2) Project presentation (15-20 minutes in-class presentation): 10 points; (3) Project report (5-7 pages, single column, Latex-preparation preferred): 15 points.
- Some recommended topics (and readings) are as follows:
- Similarity Search
- A unified framework for string similarity search with edit-distance constraint. (VLDB Journal'16)
- Adaptive Top-k Overlap Set Similarity Joins. (ICDE'20)
- Indexing Metric Spaces for Exact Similarity Search. (ACM Survey'22)
- Similarity query processing for high-dimensional data. (VLDB'20)
- MinSearch: An Efficient Algorithm for Similarity Search under Edit Distance. (KDD'20)
- A Two-Level Signature Scheme for Stable Set Similarity Join. (VLDB'23)
- A scalable index for top-k subtree similarity queries. (SIGMOD'19)
- Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. (CACM'08)
- Data Streams
- What is Data Sketching, and Why Should I Care? (CACM'17)
- Network Applications of Bloom Filters: A Survey (Internet Mathematics'03)
- Mergeable Summaries (TODS'13)
- Efficient Frequent Directions Algorithm for Sparse Matrices (KDD'16)
- Cuckoo filter: Practically better than Bloom. (Context'14)
- PageRank
- Estimating Single-Node PageRank in Õ (min{dt, √m}) Time (VLDB'23)
- Efficient Algorithms for Personalized PageRank Computation: A Survey (TKDE'24)
- Massively parallel algorithms for personalized pagerank (VLDB'21)
- Graph Embedding
- DeepWalk - Online Learning of Social Representations (KDD'14)
- LINE - Large-scale Information Network Embedding (WWW'15)
- Node2vec - Scalable Feature Learning for Networks (KDD'16)
- Generative Adversarial Nets (GAN)
- Generative Adversarial Nets. NeurIPS'14
- Wasserstein GAN. Arxiv'17
- Are GANs Created Equal? NeurIPS'18