-
Area(s) of Expertise
- Scalable AI Systems
- Machine Learning Platforms
- Time-Series Modeling
- Applied Deep Learning
- Generative AI
Contact Info -
Area(s) of Expertise
- Data Management and Database Systems
- Machine Learning Systems
Contact Info -
Area(s) of Expertise
- Data Management
- Causal Inference
- Algorithmic Fairness
- Explainable AI
Contact Info
DSC 208R: Data Management for Analytics
Course Information
Course Type
Course Description
(Prereq. DSC 207R)
Principles, techniques, and tools for organizing, storing, querying, transforming, and using data for analytics and machine learning computations at scale; including basics of data storage, acquisition, governance, organization, principles of the relational data model, relational algebra and its relationship to DataFrames, the Structured Query Language (SQL), relational database system features for faster querying and analytics, and basics of non-relational data systems. Coverage of major data quality issues and methodologies to clean data. An introduction to cluster and cloud computing, MapReduce and Spark, and the use of these tools and SQL to transform data at scale for ML feature engineering. Methodologies to critically evaluate analytics results, including debugging and reasoning about bias and fairness in the data science pipeline.