CIS5930: Advanced Data Mining (Spring 2021)
Instructor: Peixiang Zhao
| Syllabus | Announcement | Schedule | Assignment | Resources |
COVID-19 Update: Advanced Data Mining will be offered online Spring 2021. In order to accommodate
students studying in different places or time zones, this course will be recorded and put online on Canvas (under the tab "Course Media") on every Tuesday and Thursday weekly, if not announced in advance due to public holidays, or meeting conflicts for the instructor. The slides of course materials will be put online on the course "Schedule" page before class begins, and the assignments will be put online on the course "Assignment" page.
Communication: 1. Please watch the course "Announcement" page periodically for newly updated information throughout the semester; 2. The instructor maintains weekly office hour via Zoom (Meeting ID: 616 053 9964. Password: data); 3. Alway feel free to drop me a line if you have any questions relating to this class.
Data in the information era is accumulating at an incredible rate due to a host of technological advances. Electronic data capture has become inexpensive and ubiquitous as a by-product of innovations such as the Internet, e-commerce, point-of-sale devices, bar-code readers, and intelligent machines. Such data is often stored in databases or on the Web specifically intended for decision support and business intelligence. Data mining, also known as knowledge discovery from data, is a rapidly growing field that is concerned with designing and developing principles and algorithms to make intelligent use of big data repositories. A number of successful applications have been reported in a wide range of real-world areas such as credit rating, fraud detection, database marketing, customer relationship management, and social network analysis, to name a few.
As an advanced course for data mining, this course introduces the key concepts, principles, algorithms, and systems of data mining, including, but not limited to (1) what is data mining? (2) get to know your data, (3) data preprocessing, integration and transformation, (4) mining frequent patterns and associations, (5) classification, (6) cluster analysis, (7) similarity search, and other advanced topics. The course will primarily serve graduate students interested in the fields of data science, data mining, machine learning, and knowledge discovery and management. Also, this course may attract students from other disciplines who desire to understand, develop, and use data mining techniques and systems to analyze massive amounts of data.
Administrivia
Textbook
Reference
- Introduction to Data Mining, 2nd edition, by Pang-ning Tan, Michael Steinbach, and Vipin Kumar. ISBN:9780134080284, Pearson, 2019.
- Data Mining: The Textbook, 1st edition, by Charu Aggarwal. ISBN: 978-3319141428, Springer, 2015.
- Data Mining and Machine Learning, 2nd edition, by Mohammed Zaki and Wagner Meira, Jr.. ISBN: 978-1108473989, Cambridge University Press, 2020.
- The Elements of Statistical Learning, 2nd edition, by Trevor Hastie, Robert Tibshirani, and Jorome Friedman. ISBN: 978-0387848587, Springer, 2009.
- Pattern Recognition and Machine Learning, by Christopher Bishop. ISBN: 978-0387310732, Springer, 2006.
Prerequisites
COP3330: Object-oriented Programming and COP4530: Data Structures and Algorithms or equivalents courses are required. Students should come with good programming skills and basic knowledge in probability and linear algebra. If you are not sure whether you have the right background, please contact the instructor.
Note: Students need to be familar with at least one programming language, such as C/C++, Java, or Python. We will not cover programming-specific issues in this course.
This course will draw materials from the textbook as well as data mining and machine learning literature. Students will study the materials, do both programming and written assignments, take a series of in-class quizzes, a midterm exam, and a final exam.
Lectures and reading: we encourage (and appreciate!) students to attend classes, because effective lectures rely on students' participation to raise questions and contribute in discussions. We will provide lecture notes and related readings before class, which will be posted on the schedule page.
Read the textbook for the required reading before lectures, and study them more carefully after class. Please note that all the required readings are fair materials for exams. These materials may not be fully covered in lectures. Our lectures are intended to motivate as well as provide a road map for your reading-- with the limited lecture time we may not be able to cover everything in the readings.
Questions: We encourage students discussing their questions and problems first with peers and classmates. This way, you can get immediate help and also learn to communicated "professionally" with your classmates. In any case for more thorough discussion, come to the office hours of TA's and the instructor's. Any announcement will be posted on the announcement page. Make sure to check it frequently enough to stay informed.
Assignments: There will be four homework including both written assignments and programming problems spaced out over the course of the semester. All the assignments should be done individually by the students. Assignments should be submitted before the class begins on the due dates.
Quizzes: There will be a series of in-class quizzes with an aim of testing basic understanding of key concepts and knowledge, and calling for attendence in classes.
Research: There will be a semester-long research project. Students are required to choose a research problem or topic in data mining, and study the milestone scientific articles in order to gain a deep understanding for it. Students need to prepare a video-typed presentation and a research survey for the selected topic.
Exam: There will be a final exam at the end of the semester.
General Policy
- University Attendance Policy: Excused absences include documented illness, deaths in the family and other documented crises, call to active military duty or jury duty, religious holy days, and official University activities. These absences will be accommodated in a way that does not arbitrarily penalize students who have a valid excuse. Consideration will also be given to students whose dependent children experience serious illness.
- Academic Honor Policy: The Florida State University Academic Honor Policy outlines the University's expectations for the integrity of students' academic work, the procedures for resolving alleged violations of those expectations, and the rights and responsibilities of students and faculty members throughout the process. Students are responsible for reading the Academic Honor Policy and for living up to their pledge to "...be honest and truthful and... [to] strive for personal and institutional integrity at Florida State University." (Florida State University Academic Honor Policy, found at here.)
- Syllabus Change Policy: Except for changes that substantially affect implementation of the evaluation (grading) statement, this syllabus is a guide for the course and is subject to change with advance notice.
- You are allowed to discuss written and programming assignments. However, any such discussion must be clearly acknowledged on the submitted solution or write-up. Your solution should be stapled together and neatly prepared;
- You are bound to attend all lectures unless notifying the instructor in advance with reasonable excuses.
Collaboration/Academic Honesty
All course participants must adhere to the academic honor code of FSU which is available in the student handbook. All instances of academic dishonesty will be reported to the university. Evey student must write his/her own homework/code (unless you are in the same group for the programming progject). Showing your code or homework solutions to others is a violation of academic honesty. It is your responsibility to ensure that others cannot access your code or homework solutions. Consulting related textbooks, papers and information available on Internet for your assignment and homework is fine. However, copying a large portion of such information will be considered as academic dishonesty. If you borrow a small piece of any such information, please acknowledge that in your assignment. Please see the following web site for a complete explanation of the Academic Honor Code.
Late Policy and Make-up Exams
- Late assignments and paper summaries will not ordinarily be accepted. If, for some compelling reason, you cannot hand in an assignment on time, please contact the TA or instructor as far in advance as possible. Written and programming assignments are due at the beginning of a class, you should hand them in at the beginning of the class;
- No credit will be given to late submissions of assignments;
- No make-up exams (except under extremely unusual circumstances).
Students with Disabilities
Americans With Disabilities Act: Students with disabilities needing academic accommodation should: (1) register with and provide documentation to the Student Disability Resource Center; (2) bring a letter to the instructor indicating the need for accommodation and what type.
This syllabus and other class materials are available in alternative format upon request. For more information about services available to FSU students with disabilities, contact the: Student Disability Resource Center: 874 Traditions Way, 108 Student Services Building, Florida State University, Tallahassee, FL 32306-4167. (850) 644-9566 (voice), (850) 644-8504 (TDD), sdrc@admin.fsu.edu, http://www.disabilitycenter.fsu.edu/.
Grading Policy
The course grade will break down as follows,
- Quiz: 10%.
- Homework:40%;
- Research project: 20%;
- Final exam:30%;
Any regrading request should be submitted to the intructor or the TA(s) within one week since the graded deliverables are handed out to students.
Your final grade will be assigned as follows,
- A: 100 - 90; A-: 90 - 85;
- B+: 85 - 80; B: 80 - 70; B-: 70 - 60;
- F: 60 - 0.
This table indicates minimum guaranteed grades. Under certain limited circumstances (e.g., an unreasonably hard exam), we may select more generous ranges or scale the scores to adjust.
Last updated: Jul.11th, 2020
|