# Course Syllabus This curriculum introduces students to the foundations of data science through hands-on projects and real-world applications. Each module connects computing skills with domain expertise through interdisciplinary collaboration.

DATA 201: Introduction to Data Science I

Module 1
"Understanding the Data Revolution"
Topics
  • History and impact of data collection
  • Structured vs. unstructured data
  • Basics of data science
  • Role of statistics and computing
Computing
Python, Jupyter, Pandas, Basic Visualization
View Full Module
Module 2
"Telling Stories with Data"
Topics
  • Data from various disciplines
  • Lists, matrices, vectors, images, video
  • Visualization techniques
  • Interactive graphics
Computing
Interactive visualizations with Matplotlib, Seaborn, Plotly
View Full Module
Module 3
"Probabilistic Thinking"
Topics
  • Real-world statistical examples
  • Probability theory
  • Statistical inference
  • Correlations and causation
Computing
Simulations and statistical analysis in Python (scipy, statsmodels)
View Full Module
Module 4
"Optimization and Model Fitting"
Topics
  • Optimization problems
  • Linear programming
  • Gradient descent
  • Supervised learning basics
Computing
Python implementations of basic optimization (scipy.optimize, sklearn)
View Full Module
Module 5
"Discovering Patterns"
Topics
  • Customer segmentation
  • Clustering methods (K-means, GMM)
  • Dimensionality reduction (PCA, t-SNE)
  • Anomaly detection
Computing
Applying clustering techniques using sklearn, visualization with UMAP
View Full Module
Module 6
"Machines that Speak"
Topics
  • Text/language datasets
  • Natural language processing
  • Tokenization and embeddings
  • Transformers
Computing
Hands-on NLP with NLTK, spaCy, Hugging Face transformers
View Full Module
Module 7
"Machines that See"
Topics
  • Image datasets
  • Computer vision fundamentals
  • Neural network basics
  • CNN architectures
Computing
Deep neural networks with PyTorch, image processing with OpenCV
View Full Module
Module 8
"Machines that Hear"
Topics
  • Raw audio signals
  • Digital signal processing
  • Spectral analysis
  • Speech recognition
Computing
Fast Fourier Transform, librosa, speech synthesis
View Full Module
Module 9
"Measuring things in time"
Topics
  • Sensors and measurement devices
  • Forecasting methods
  • Chaos theory and stochastic processes
  • Trend analysis
Computing
Python-based forecasting (statsmodels, Prophet)
View Full Module
Module 10
"Learning from examples"
Topics
  • Regression datasets
  • Classification datasets
  • Linear and logistic regression
  • Feature engineering
Computing
Model training and evaluation in Python (sklearn, cross-validation)
View Full Module
Module 11
"Advanced Neural Networks"
Topics
  • Real-world AUB datasets
  • Research-driven projects
  • Integrating ML, visualization, optimization
  • Team collaboration
Computing
Transfer learning, fine-tuning pre-trained models (PyTorch, Hugging Face)
View Full Module
Module 12
"Responsible Innovation"
Topics
  • AI ethics case studies
  • Ethical frameworks
  • Bias in ML
  • Fairness and accountability
Computing
Evaluating models for fairness (Fairlearn, AIF360)
View Full Module
Module 13
"Real-World Impact"
Topics
  • Real-world AUB datasets
  • Research or product-driven project
  • Integrating learned techniques
  • Team work and problem formulation
View Full Module
Module 14
"Sharing Your Insights"
Topics
  • Final presentations
  • Peer feedback
  • Synthesis and reflection
  • Learning outcomes assessment
View Full Module

DATA 202: Introduction to Data Science II

Module 1
"Transforming Raw Data into Insights"
Topics
  • Web scraping
  • API integration
  • Handling missing data
  • Handling biased data
Computing
Automated data cleaning (pandas, BeautifulSoup, requests)
View Full Module
Module 2
"Working with Large and Complex Datasets"
Topics
  • Big datasets handling
  • Data cleaning pipelines
  • Storage strategies
  • Real-world scalability
Computing
Distributed processing with Spark, SQL databases
View Full Module
Module 3
"Forecasting and Pattern Discovery in Temporal Data"
Topics
  • Real-time data streams
  • Streaming architectures
  • Online learning
  • Edge computing
Computing
Implementing real-time time-series models, streaming pipelines
View Full Module
Module 4
"Bringing Text from the Physical to the Digital World"
Topics
  • Scanned documents
  • Handwritten texts
  • Historical archives
  • Image-to-text processing
  • Deep learning OCR models
Computing
OCR implementation with Tesseract, deep learning approaches
View Full Module
Module 5
"Analyzing and Generating Sound with AI"
Topics
  • Audio signal processing
  • Speech recognition
  • Music generation
  • Voice synthesis
Computing
Speech-to-text, music analysis with librosa, generative audio models
View Full Module
Module 6
"Processing Spatial and Temporal Visual Data"
Topics
  • Video processing
  • Spatial field data
  • 3D data representations
  • Scientific visualization
Computing
Video analysis with OpenCV, 3D data processing
View Full Module
Module 7
"Understanding networks in datapoints"
Topics
  • Social networks
  • Citation graphs
  • Biological networks
  • Centrality measures
  • Community detection
Computing
NetworkX, graph algorithms, Dijkstra's algorithm
View Full Module
Module 8
"How big data and big models work"
Topics
  • Text datasets (news, medical, legal, scientific)
  • Language models
  • N-gram models
  • Search and retrieval
Computing
Training LLMs (GPT), prompt engineering, RAG
View Full Module
Module 9
"Taking AI from Research to Production"
Topics
  • Large-scale datasets for training
  • Foundation models
  • Model optimization
  • GPU parallelization
Computing
Fine-tuning models with APIs, GPU programming basics, model serving
View Full Module
Module 10
"Ensuring Responsible AI Development"
Topics
  • Case studies on biased datasets
  • Ethical frameworks
  • Fairness constraints
  • Interpretability
Computing
Evaluating bias in AI models, explainability tools (SHAP, LIME)
View Full Module
Module 11
"Solving a Large-Scale Data Challenge"
Topics
  • Dataset selection (music, healthcare, astronomy, finance)
  • Complete data processing pipeline
  • AI pipeline design
  • Research-style report and presentation
View Full Module

Discussion & Feedback

Have suggestions for module content or datasets? Leave a comment below.