ITE 204: Data Preparation and Processing

Course Description

Data Preparation and Processing focuses on the critical steps involved in preparing raw data for analysis and ensuring its quality and usability. The course covers techniques for cleaning, transforming, and organizing data, including handling missing values, outlier detection, normalization, and feature engineering. Students will explore methods for working with various data types and formats, such as structured, unstructured, and semi-structured data. Additionally, the course emphasizes the importance of data preprocessing for ensuring accurate and reliable results in data analysis and machine learning applications. By the end of the course, students will be able to effectively prepare datasets for analysis in real-world scenarios. (3 credits)

Prerequisite

  • DAT 201: Principles of Data Science

Student Learning Outcomes (SLOs)

Students who successfully complete this course will be able to:

  1. Identify common data quality issues, including missing values, outliers, and inconsistencies, and explain their potential impact on data analysis and machine learning outcomes.
  2. Apply data cleaning techniques to handle missing values, correct data inconsistencies, and manage outliers, ensuring data accuracy and reliability.
  3. Transform raw data using normalization, scaling, and encoding techniques, preparing datasets for machine learning algorithms and advanced analysis.
  4. Perform feature engineering, including feature selection and extraction, to enhance model performance and improve the interpretability of data insights.
  5. Distinguish between structured, unstructured, and semi-structured data, demonstrating effective methods for processing and organizing each type for analysis.
  6. Prepare datasets for real-world analytical tasks by implementing a complete data preprocessing pipeline, from data cleaning through feature engineering, that supports accurate analysis and predictive modeling.

Course Activities and Grading

AssignmentsWeight

Discussions (Weeks 1-7)

10%

Homework Assignments (Weeks 1-3, & 5-7)

40%

Midterm Project (Week 4)

17%

Final Project: Dataset Selection (Week 8)

8%

Final Project: Jupyter Notebook (Week 8)

25%

Total

100%

Required Textbook

  • This course uses Open Educational Resources (OER). OER are openly licensed, educational resources that can be used for teaching, learning and research. OER may consist of a variety of resources such as textbooks, videos and software that are no cost for students.

Course Schedule

Week

SLOs

Readings and Exercises

Assignments

1

1,2

Topic: Data Cleaning and Imputation

  • Read assigned links
  • Review optional resources
  • Review lecture material
  • Participate in discussions
  • Submit Week 1 Homework - Data Cleaning
  • Begin working on Midterm Project

2

4

Topic: Mutual Information and Feature Selection

  • Read assigned links
  • Review lecture material
  • Participate in discussions
  • Submit Week 2 Homework - Titanic Feature Selection
  • Continue working on Midterm Project

3

3

Topic: Feature Extraction, Feature Scaling, Encoding, and Binning

  • Read assigned links
  • Review lecture material
  • Participate in discussions
  • Submit Week 3 Homework - Ames Housing
  • Continue working on Midterm Project

4

1,2,3,4

Topic: Midterm Project

  • Participate in discussions
  • Submit Week 4 Midterm Project

5

4

Topic: Clustering
  • Read assigned links
  • Review lecture material
  • Participate in discussions
  • Submit Week 5 Homework - K-Means Clustering
  • Begin working on Final Project

6

4

Topic: Principles Component Analysis

  • Read assigned links
  • Review lecture material
  • Participate in discussions
  • Submit Week 6 Homework - PCA
  • Continue working on Final Project

7

5,6

Topic: Data Types & Preprocessing Pipelines

  • Read assigned links
  • Review lecture material
  • Participate in discussions
  • Submit Week 7 Homework - Data Pipelines
  • Continue working on Final Project

8

1,2,3,4,5,6

Topic: Final Project

  • Submit Week 8 - Final Project: Dataset Selection
  • Submit Week 8 - Final Project: Jupyter Notebook
  • Complete Course Evaluation

COSC Accessibility Statement

Charter Oak State College encourages students with disabilities, including non-visible disabilities such as chronic diseases, learning disabilities, head injury, attention deficit/hyperactive disorder, or psychiatric disabilities, to discuss appropriate accommodations with the Office of Accessibility Services at OAS@charteroak.edu.

COSC Policies, Course Policies, Academic Support Services and Resources

Students are responsible for knowing all Charter Oak State College (COSC) institutional policies, course-specific policies, procedures, and available academic support services and resources. Please see COSC Policies for COSC institutional policies, and see also specific policies related to this course. See COSC Resources for information regarding available academic support services and resources.