# Course Syllabus
This curriculum introduces students to the foundations of data science through hands-on projects and real-world applications. Each module connects computing skills with domain expertise through interdisciplinary collaboration.
DATA 201: Introduction to Data Science I
Topics
- History and impact of data collection
- Structured vs. unstructured data
- Basics of data science
- Role of statistics and computing
Computing
Python, Jupyter, Pandas, Basic Visualization
Topics
- Data from various disciplines
- Lists, matrices, vectors, images, video
- Visualization techniques
- Interactive graphics
Computing
Interactive visualizations with Matplotlib, Seaborn, Plotly
Topics
- Real-world statistical examples
- Probability theory
- Statistical inference
- Correlations and causation
Computing
Simulations and statistical analysis in Python (scipy, statsmodels)
Topics
- Optimization problems
- Linear programming
- Gradient descent
- Supervised learning basics
Computing
Python implementations of basic optimization (scipy.optimize, sklearn)
Topics
- Customer segmentation
- Clustering methods (K-means, GMM)
- Dimensionality reduction (PCA, t-SNE)
- Anomaly detection
Computing
Applying clustering techniques using sklearn, visualization with UMAP
Topics
- Text/language datasets
- Natural language processing
- Tokenization and embeddings
- Transformers
Computing
Hands-on NLP with NLTK, spaCy, Hugging Face transformers
Topics
- Image datasets
- Computer vision fundamentals
- Neural network basics
- CNN architectures
Computing
Deep neural networks with PyTorch, image processing with OpenCV
Topics
- Raw audio signals
- Digital signal processing
- Spectral analysis
- Speech recognition
Computing
Fast Fourier Transform, librosa, speech synthesis
Topics
- Sensors and measurement devices
- Forecasting methods
- Chaos theory and stochastic processes
- Trend analysis
Computing
Python-based forecasting (statsmodels, Prophet)
Topics
- Regression datasets
- Classification datasets
- Linear and logistic regression
- Feature engineering
Computing
Model training and evaluation in Python (sklearn, cross-validation)
Topics
- Real-world AUB datasets
- Research-driven projects
- Integrating ML, visualization, optimization
- Team collaboration
Computing
Transfer learning, fine-tuning pre-trained models (PyTorch, Hugging Face)
Topics
- AI ethics case studies
- Ethical frameworks
- Bias in ML
- Fairness and accountability
Computing
Evaluating models for fairness (Fairlearn, AIF360)
Topics
- Real-world AUB datasets
- Research or product-driven project
- Integrating learned techniques
- Team work and problem formulation
Topics
- Final presentations
- Peer feedback
- Synthesis and reflection
- Learning outcomes assessment
DATA 202: Introduction to Data Science II
Topics
- Web scraping
- API integration
- Handling missing data
- Handling biased data
Computing
Automated data cleaning (pandas, BeautifulSoup, requests)
Topics
- Big datasets handling
- Data cleaning pipelines
- Storage strategies
- Real-world scalability
Computing
Distributed processing with Spark, SQL databases
Topics
- Real-time data streams
- Streaming architectures
- Online learning
- Edge computing
Computing
Implementing real-time time-series models, streaming pipelines
Topics
- Scanned documents
- Handwritten texts
- Historical archives
- Image-to-text processing
- Deep learning OCR models
Computing
OCR implementation with Tesseract, deep learning approaches
Topics
- Audio signal processing
- Speech recognition
- Music generation
- Voice synthesis
Computing
Speech-to-text, music analysis with librosa, generative audio models
Topics
- Video processing
- Spatial field data
- 3D data representations
- Scientific visualization
Computing
Video analysis with OpenCV, 3D data processing
Topics
- Social networks
- Citation graphs
- Biological networks
- Centrality measures
- Community detection
Computing
NetworkX, graph algorithms, Dijkstra's algorithm
Topics
- Text datasets (news, medical, legal, scientific)
- Language models
- N-gram models
- Search and retrieval
Computing
Training LLMs (GPT), prompt engineering, RAG
Topics
- Large-scale datasets for training
- Foundation models
- Model optimization
- GPU parallelization
Computing
Fine-tuning models with APIs, GPU programming basics, model serving
Topics
- Case studies on biased datasets
- Ethical frameworks
- Fairness constraints
- Interpretability
Computing
Evaluating bias in AI models, explainability tools (SHAP, LIME)
Topics
- Dataset selection (music, healthcare, astronomy, finance)
- Complete data processing pipeline
- AI pipeline design
- Research-style report and presentation
Discussion & Feedback
Have suggestions for module content or datasets? Leave a comment below.