HIF 539: Big Data and Data Mining

Course Description

This course provides an accessible introduction to the core principles and techniques of data mining and big data analytics. Students will learn how to explore, preprocess, and analyze data to extract meaningful patterns and insights. The course introduces fundamental concepts such as classification, clustering, and association rule mining using real-world examples. It covers the basics of big data platforms like Hadoop and Spark to demonstrate how data mining scales in modern distributed environments. This course emphasizes intuitive understanding and practical applications over mathematical rigor, making it ideal for students with little or no prior background in data science. (3 credits)

Prerequisite

  • None

Student Learning Outcomes (SLOs)

Upon successful completion of the course, the student will be able to:

  1. Describe the fundamental concepts of data mining and big data and their role in modern analytics.
  2. Apply data preprocessing techniques such as cleaning, normalization, and transformation.
  3. Perform basic classification tasks using intuitive methods like decision trees and k-nearest neighbors.
  4. Apply clustering techniques to group data based on similarity and explore how to interpret the results.
  5. Generate simple association rules from transactional data and explain their practical uses.
  6. Explain the architecture and role of big data tools such as Hadoop, MapReduce, and Spark.
  7. Analyze large datasets using scalable data mining techniques in distributed environments.
  8. Create insights and findings from data through small projects or case studies using real-world data.

 

Course Activities and Grading

AssignmentsPointsWeight

Discussions (Weeks 1-7)

70

20%

Assignments (Weeks 1-7)

325

50%

Presentations (Week 8)

30

5%

Final Project (Weeks 6-8)

100

25%

Total

525

100%

Required Textbooks

Available through Charter Oak State College's Book Bundle

  • Erl, T., Khattak, W., & Buhler, P. (2016). Big data fundamentals: Concepts, drivers & techniques. Pearson. (Original work published 2016) ISBN 978-0134291079.
  • Han, J., Pei, J., & Tong, H. (2022). Data mining: Concepts and techniques. (4th ed.). Morgan Kaufmann. (Original work published 2022) ISBN 978-0128117606.

Additional Resource

Course Schedule

Week

SLOs

Readings and Exercises

Assignments

1

1,2

Topic: Intro to Data Mining and Big Data  

  • Read and Review:
    • Readings: BDF Ch. 1; Data Mining Ch. 1; MMDS Ch. 1
    • Tools intro: Hadoop, Spark (overview only)
  • Read assigned material
  • Review lecture material
  • Discussion: Data Mining (10 points)
  • Assignment: Local Data (25 points)

2

2,3

Topics: Cleaning and Organizing Data

  • Read and Review:
    • Readings: BDF Ch. 2, Data Mining Ch. 2
  • Read assigned material
  • Review lecture material
  • Discussion: Impact of Preprocessing Decisions on Data Mining Results (10 points)
  • Assignment: Data Cleaning (50 points)

3

3

Topic: Classification – Fundamentals

  • Read and Review:
    • Readings: Data Mining Ch. 8
  • Read assigned material
  • Review lecture material
  • Discussion: Simple Decision Tree (10 points)
  • Download Weka
  • Assignment: Understanding Decision Trees (50 points)

4

3,4

Topic: Classification – Advanced Topics

  • Read and Review:
    • Readings: IDM Ch. 5 (skip complex math sections)
  • Read assigned material
  • Review lecture material
  • Discussion: Accuracy isn’t everything (10 points)
  • Assignment: Comparing Classification Models (50 points)

5

4,5

Topic: Clustering

  • Read and Review:
    • Readings: IDM Ch. 7 (focus on illustrations and applications)
  • Read assigned material
  • Review lecture material
  • Discussion: Cluster interpretation (10 points)
  • Assignment: Clustering – K-Means and Hierarchical (100 points)

6

5

Topics: Big Data Fundamentals & Ecosystem

  • Read and Review:
    • Readings: BDF Ch. 3-5
  • Read assigned material
  • Review lecture material
  • Discussion: Need for Big Data (10 points)
  • Assignment: Big Data Ecosystem Overview and Tool Comparison (25 points)
  • Final Project – Proposal (100 points)

7

6,7

Topics: Hadoop & Spark Architecture

  • Read and Review:
    •  Readings: BDF Ch. 5, 6
  • Read assigned material
  • Review lecture material
  • Discussion: Which wins (10 points)
  • Assignment: Designing a pipeline (25 points)

8

4,5,6,7,8

Topics: Big Data Integration & Final Project Presentation

  • Read and Review:
    • Readings: BDF Ch. 7, 8
  • Read assigned material
  • Review lecture material
  • Final project presentation (30 points)
  • Complete Course Evaluation

COSC Accessibility Statement

Charter Oak State College encourages students with disabilities, including non-visible disabilities such as chronic diseases, learning disabilities, head injury, attention deficit/hyperactive disorder, or psychiatric disabilities, to discuss appropriate accommodations with the Office of Accessibility Services at OAS@charteroak.edu.

COSC Policies, Course Policies, Academic Support Services and Resources

Students are responsible for knowing all Charter Oak State College (COSC) institutional policies, course-specific policies, procedures, and available academic support services and resources. Please see COSC Policies for COSC institutional policies, and see also specific policies related to this course. See COSC Resources for information regarding available academic support services and resources.