DATA 202 Module 11: Capstone Project

Introduction

The capstone project is the culmination of DATA 202, where you integrate everything learned throughout both courses into a substantial, original data science project. Unlike the DATA 201 project, this capstone challenges you to work with advanced data types, modern AI tools, and production-quality engineering practices.


Part 1: Capstone Expectations

What Distinguishes a Capstone

A capstone project should demonstrate:

Technical Depth: Use of advanced techniques from DATA 202

Originality: Not a tutorial reproduction

Completeness: Full lifecycle coverage

Reflection: Critical analysis

Scope Guidelines

Too Small:

Too Large:

Just Right:


Part 2: Project Categories

Category A: Novel Data Applications

Apply advanced data science to a domain or dataset not typically explored:

Examples:

Requirements:

Category B: Multi-Modal Systems

Combine multiple data types or modalities:

Examples:

Requirements:

Category C: Deployed Applications

Build a working application that serves predictions:

Examples:

Requirements:

Category D: Research Replication and Extension

Replicate and extend a published paper:

Examples:

Requirements:


Part 3: Project Phases

Phase 1: Proposal (Week 1-2)

Deliverable: 2-3 page proposal including:

Review Process: Instructor feedback and approval

Phase 2: Data and Exploration (Week 3-4)

Deliverable: Progress report with:

Checkpoint: Verify feasibility and scope

Phase 3: Implementation (Week 5-8)

Work: Core technical development

Mid-Point Check-in: Brief progress presentation

Phase 4: Documentation and Presentation (Week 9-10)

Deliverables:


Part 4: Deliverables in Detail

Final Report

Structure:

  1. Abstract: One paragraph summary
  2. Introduction: Problem, motivation, contributions
  3. Related Work: Prior approaches and context
  4. Data: Sources, collection, description, limitations
  5. Methods: Approach, techniques, architecture
  6. Results: Findings, evaluations, comparisons
  7. Discussion: Interpretation, limitations, implications
  8. Conclusion: Summary and future work
  9. References: Proper citations

Quality Expectations:

Code Repository

Requirements:

Optional but Valued:

Presentation

Components:

Delivery:


Part 5: Evaluation

Rubric

Criterion Weight Description
Originality 15% Novel question, approach, or application
Technical Quality 25% Correct methodology, appropriate techniques
Data Work 15% Acquisition, preparation, documentation
Implementation 20% Code quality, reproducibility, engineering
Results and Analysis 10% Meaningful findings, honest evaluation
Communication 10% Report quality, presentation delivery
Reflection 5% Limitations, ethics, future directions

Excellence Markers

A-Level Work:

B-Level Work:

C-Level Work:


Part 6: Project Ideas

Data Acquisition Focus

NLP and Language

Computer Vision

Audio and Speech

Networks and Graphs

Foundation Models

Deployed Systems


Part 7: Resources and Support

Office Hours

Computing Resources

Ethical Review


Final Thoughts

The capstone is your opportunity to synthesize everything you’ve learned and create something meaningful. The best projects come from genuine curiosity—questions you actually want to answer, problems you want to solve, tools you want to build.

Start early. Iterate often. Ask for help when stuck. And remember: the goal is learning, not perfection. A project that encountered challenges and documented them honestly is more valuable than one that hid difficulties behind polished output.

Welcome to the final challenge of DATA 202. Make it count.


Module 11 structures the capstone project—the culminating experience of DATA 202 where students integrate advanced techniques into substantial, original work. From proposal to presentation, the capstone demonstrates mastery of modern data science.