Core Definition

Data Science is a field that combines statistics, programming, domain expertise, and domain knowledge to extract meaningful insights and actionable knowledge from data, often using machine learning and data analysis techniques.

Core components

  • Statistics & Mathematics: Foundational methods for analyzing data patterns and probability
  • Programming: Tools and languages (Python, SQL, R) to process and manipulate data
  • Domain Knowledge: Understanding the specific business context or field where data exists
  • Machine Learning: Building predictive models from data

Key activities

  1. Data collection and preparation (cleaning, transforming)
  2. Exploratory data analysis (understanding patterns)
  3. Feature engineering (selecting relevant variables)
  4. Model building and training
  5. Evaluation and interpretation
  6. Communication of findings to stakeholders
  • vs. ML: ML is a technique within data science; data science is broader and includes analysis, statistics, and business interpretation
  • vs. Analytics: Analytics focuses on describing past data; data science predicts future trends and builds models
  • vs. AI: Data science provides methods and insights; AI is the broader goal of intelligent systems

Output

Insights, predictions, models, and recommendations that drive business decisions

AWS context

Amazon SageMaker, AWS Glue, and QuickSight are tools data scientists use