Syllabus

Syllabus

  1. Basic Probability Theory
    1. Probability Spaces
    2. Conditional Probability
    3. Independent and Dependent Variables
  2. Random Variables
    1. What are random variables?
    2. Multivariate random variables
    3. Discrete random variables
    4. Continuous random variables
    5. Functions of random variables
    6. Creating random variables
  3. Expectation
    1. Expectation operator
    2. Mean and Variance
    3. Covariance
    4. Conditional Expectation
  4. Random Processes
    1. What are random processes?
    2. Mean and autocovariance functions
    3. Independent identically distributed sequences
    4. Gaussian process
    5. Random walk
  5. Convergence of Random Processes
    1. Types of convergence
    2. Law of large numbers
    3. Central limit theorem
    4. Monte Carlo Simulation
  6. Descriptive Statistics
    1. Histogram
    2. Sample mean and variance
    3. Order statistics
    4. Sample covariance
  7. Frequent Statistics
    1. Independent identically distributed sampling
    2. Mean square error
    3. Consistency
    4. Confidence Intervals
    5. Nonparametric model estimation
    6. Parametric model estimation
  8. Bayesian Statistics
    1. Bayesian parametric models
    2. Conjugate prior
    3. Bayesian estimators
  9. Hypothesis Testing
    1. What is hypothesis testing?
    2. Parametric testing
    3. Nonparametric testing
    4. Multiple Testing
  10. Linear Regression
    1. Linear Models
    2. Least-square estimation
    3. Underfitting and Overfitting
    4. Correlation
    5. Regression

As a data science student, here are some key topics in statistics that you should focus on:

  1. Descriptive Statistics: Learn about measures of central tendency (mean, median, mode) and measures of dispersion (variance, standard deviation, range, interquartile range).
  2. Probability Theory: Understand basic probability concepts, conditional probability, Bayes’ theorem, and different probability distributions like binomial, normal, Poisson, etc.
  3. Inferential Statistics: Learn hypothesis testing (t-tests, chi-square tests, ANOVA), confidence intervals, and p-values to make data-driven decisions.
  4. Regression Analysis: Study linear regression, multiple regression, and logistic regression for predicting outcomes and understanding relationships between variables.
  5. Bayesian Statistics: This is increasingly used in machine learning, so having a good foundation in Bayesian reasoning can be beneficial.
  6. Sampling and Resampling Methods: Learn about different sampling techniques and methods like bootstrap and cross-validation, which are essential for model evaluation.
  7. Time Series Analysis: If you're dealing with sequential data (like stock prices or sales), learn about time series models, trend analysis, and seasonal decomposition.
  8. Dimensionality Reduction: Study techniques like Principal Component Analysis (PCA) and t-SNE for reducing the complexity of data while preserving its structure.
  9. Non-parametric Methods: Understand non-parametric tests that are used when data doesn't follow normal distribution assumptions, like the Mann-Whitney U test or the Kruskal-Wallis test.
  10. Multivariate Statistics: Explore techniques for analyzing data with more than one variable, such as MANOVA and cluster analysis.