CAP5778: Advanced Data Mining (Fall 2024)
Instructor: Peixiang Zhao
| Syllabus | Announcement | Schedule | Assignment | Resources |
Data in the information era is accumulating at an incredible rate due to a host of technological advances. Electronic data capture has become inexpensive and ubiquitous as a by-product of innovations such as the Internet, e-commerce, point-of-sale devices, bar-code readers, and intelligent machines. Such data is often stored in databases or on the Web specifically intended for decision support and business intelligence. Data mining, also known as knowledge discovery from data, is a rapidly growing field that is concerned with designing and developing principles and algorithms to make intelligent use of big data repositories. A number of successful applications have been reported in a wide range of real-world areas such as credit rating, fraud detection, database marketing, customer relationship management, and social network analysis, to name a few.
As an advanced course for data mining, we introduce in this course the key concepts, principles, algorithms, and systems of data mining, including, but not limited to (1) foundations of data mining, (2) similarity search and approximate query processing, (3) dimensionality reduction, (4) link analysis, (5) mining social-network graphs, (6) mining frequent patterns and association analysis, (7) deep learning, (8) clustering, and other advanced topics. The course will primarily serve graduate students interested in the fields of data science, data mining, machine learning, and knowledge discovery and management. Also, this course may attract students from other disciplines who desire to understand, develop, and use data mining techniques and systems to analyze massive amounts of data in the information era.
Administrivia
- Time: Tuesday/Thursday 9:45am - 11:00am
- Venue: EOA 1044 (map)
- Instructor: Peixiang Zhao
- Office: 361 James Love building
- Email: zhao AT cs DOT fsu DOT edu
- Office hours: Tuesday/Thursday right after classes
- Teaching Assitants: Kedarnath Ravi Shankar Gubbi
- Office: LOV 025 (CS Majors Lab)
- Email: kg23k AT cs DOT fsu DOT edu
- Office hours: Monday 4:30pm - 5:30pm
- Web site: http://www.cs.fsu.edu/~zhao/cap5778/main.html
Textbook
- Mining of Massive Datasets, 3rd edition, by Jure Leskovec, Anand Rajaraman, Jeff Ullman. ISBN: 9781108476348, Cambridge University Press, 2020.
Reference
- Introduction to Data Mining, 2nd edition, by Pang-ning Tan, Michael Steinbach, and Vipin Kumar. ISBN:9780134080284, Pearson, 2019.
- Data Mining: Concepts and Techniques, 4th edition, by Jiawei Han, Jian Pei, and Hanghang Tong. ISBN: 978-01238117606, Morgan Kaufmann Publisher, 2023.
- Data Mining: The Textbook, by Charu Aggarwal. ISBN: 978-3319141411, Springer Inc., 2015.
- The Elements of Statistical Learning, 2nd edition, by Trevor Hastie, Robert Tibshirani, and Jorome Friedman. ISBN: 978-0387848587, Springer, 2009.
- Pattern Recognition and Machine Learning, by Christopher Bishop. ISBN: 978-0387310732, Springer, 2006.
Prerequisites
COP3330 and COP4530 (or equivalent) are required. Students should come with good programming skills and basic knowledge in probability and linear algebra. If you are not sure whether you have the right background, please contact the instructor.
Note: Students need to be familar with leading programming language, such as C/C++/Java and Python. Please note that we will not cover programming-specific issues in this course.
This course will draw materials primarily from the textbook as well as the data mining and machine learning literature. Students will study the materials, do both programming and written assignments, perform a group-based, semester-long project, and take a final exam.
Lectures and reading: we encourage (and appreciate!) students to attend classes, because effective lectures rely on students' participation to raise questions and contribute in discussions. We will provide lecture notes and related readings before class, which will be posted on the schedule page.
Read the textbook for the required reading before lectures, and study them more carefully after class. Please note that all the required readings are fair materials for exams. These materials may not be fully covered in lectures. Our lectures are intended to motivate as well as provide a road map for your reading -- with the limited lecture time we may not be able to cover everything in the readings.
Questions: We encourage students discussing their questions and problems first with peers and classmates. This way, you can get immediate help and also learn to communicate "professionally" with your classmates. In any case for more thorough discussion, come to the office hours of TA's and the instructor's. Any announcement will be posted on the announcement page or Canvas. Make sure to check it frequently enough to stay informed.
Assignments: There will be four homework including both written assignments and programming problems spaced out over the course of the semester. All the assignments should be done individually by the students. Assignments should be submitted via Canvas on or before the designated time of due dates.
Research: There will be a semester-long, group-based research project. Students of the same group are required to choose a research problem or topic in data mining, and study the milestone scientific articles in order to gain a deep understanding for it. Students need to prepare a presentation and a research survey for the selected topics.
Exam: There will be a final exam at the end of the semester.
General Policy
- University Attendance Policy: Excused absences include documented illness, deaths in the family and other documented crises, call to active military duty or jury duty, religious holy days, and official University activities. These absences will be accommodated in a way that does not arbitrarily penalize students who have a valid excuse. Consideration will also be given to students whose dependent children experience serious illness.
- Academic Honor Policy: The Florida State University Academic Honor Policy outlines the University's expectations for the integrity of students' academic work, the procedures for resolving alleged violations of those expectations, and the rights and responsibilities of students and faculty members throughout the process. Students are responsible for reading the Academic Honor Policy and for living up to their pledge to "...be honest and truthful and... [to] strive for personal and institutional integrity at Florida State University." (Florida State University Academic Honor Policy, found at here.)
- Syllabus Change Policy: Except for changes that substantially affect implementation of the evaluation (grading) statement, this syllabus is a guide for the course and is subject to change with advance notice.
- You are allowed to discuss written and programming assignments. However, any such discussion must be clearly acknowledged on the submitted solution or write-up. Your solution should be stapled together and neatly prepared;
- You are bound to attend all lectures unless notifying the instructor in advance with reasonable excuses.
Collaboration/Academic Honesty
All course participants must adhere to the academic honor code of FSU which is available in the student handbook. All instances of academic dishonesty will be reported to the university. Evey student must write his/her own homework/code (unless you are in the same group for the programming progject). Showing your code or homework solutions to others is a violation of academic honesty. It is your responsibility to ensure that others cannot access your code or homework solutions. Consulting related textbooks, papers and information available on Internet for your assignment and homework is fine. However, copying a large portion of such information will be considered as academic dishonesty. If you borrow a small piece of any such information, please acknowledge that in your assignment. Please see the following web site for a complete explanation of the Academic Honor Code.
Late Policy and Make-up Exams
- Late submission of homework or other course work will not ordinarily be accepted. If, for some compelling reason, you cannot hand in an assignment on time, please contact the TA or instructor as far in advance as possible. Any late notificaiton after the submission deadlines will typically not be considered.
- No credit will be given to late submissions of assignments;
- No make-up exams (except under extremely unusual circumstances).
Students with Disabilities
Americans With Disabilities Act: Students with disabilities needing academic accommodation should: (1) register with and provide documentation to the Student Disability Resource Center; (2) bring a letter to the instructor indicating the need for accommodation and what type.
This syllabus and other class materials are available in alternative format upon request. For more information about services available to FSU students with disabilities, contact the: Student Disability Resource Center: 874 Traditions Way, 108 Student Services Building, Florida State University, Tallahassee, FL 32306-4167. (850) 644-9566 (voice), (850) 644-8504 (TDD), sdrc@admin.fsu.edu, http://www.disabilitycenter.fsu.edu/.
Grading Policy
The course grade will break down as follows,
- Homework:40%;
- Research project: 30%;
- Project proposal: 5%
- Project presentation: 10%
- Survey report: 15%
- Final exam:30%;
Any regrading request should be submitted to the intructor or the TA(s) within one week since the graded deliverables are handed out to students.
Your final grade will be assigned as follows,
- A: 100 - 90; A-: 90 - 85;
- B+: 85 - 80; B: 80 - 70; B-: 70 - 60;
- F: 60 - 0.
This table indicates minimum guaranteed grades. Under certain limited circumstances (e.g., an unreasonably hard exam), we may select more generous ranges or scale the scores to adjust.
Last updated: Jul.10th, 2024
|