Core Definition
Data Science is a field that combines statistics, programming, domain expertise, and domain knowledge to extract meaningful insights and actionable knowledge from data, often using machine learning and data analysis techniques.
Core components
- Statistics & Mathematics: Foundational methods for analyzing data patterns and probability
- Programming: Tools and languages (Python, SQL, R) to process and manipulate data
- Domain Knowledge: Understanding the specific business context or field where data exists
- Machine Learning: Building predictive models from data
Key activities
- Data collection and preparation (cleaning, transforming)
- Exploratory data analysis (understanding patterns)
- Feature engineering (selecting relevant variables)
- Model building and training
- Evaluation and interpretation
- Communication of findings to stakeholders
Distinction from related fields
- vs. ML: ML is a technique within data science; data science is broader and includes analysis, statistics, and business interpretation
- vs. Analytics: Analytics focuses on describing past data; data science predicts future trends and builds models
- vs. AI: Data science provides methods and insights; AI is the broader goal of intelligent systems

Output
Insights, predictions, models, and recommendations that drive business decisions
AWS context
Amazon SageMaker, AWS Glue, and QuickSight are tools data scientists use