- Basic Probability Theory
- Probability Spaces
- Conditional Probability
- Independent and Dependent Variables
- Random Variables
- What are random variables?
- Multivariate random variables
- Discrete random variables
- Continuous random variables
- Functions of random variables
- Creating random variables
- Expectation
- Expectation operator
- Mean and Variance
- Covariance
- Conditional Expectation
- Random Processes
- What are random processes?
- Mean and autocovariance functions
- Independent identically distributed sequences
- Gaussian process
- Random walk
- Convergence of Random Processes
- Types of convergence
- Law of large numbers
- Central limit theorem
- Monte Carlo Simulation
- Descriptive Statistics
- Histogram
- Sample mean and variance
- Order statistics
- Sample covariance
- Frequent Statistics
- Independent identically distributed sampling
- Mean square error
- Consistency
- Confidence Intervals
- Nonparametric model estimation
- Parametric model estimation
- Bayesian Statistics
- Bayesian parametric models
- Conjugate prior
- Bayesian estimators
- Hypothesis Testing
- What is hypothesis testing?
- Parametric testing
- Nonparametric testing
- Multiple Testing
- Linear Regression
- Linear Models
- Least-square estimation
- Underfitting and Overfitting
- Correlation
- Regression
As a data science student, here are some key topics in statistics that you should focus on:
- Descriptive Statistics: Learn about measures of central tendency (mean, median, mode) and measures of dispersion (variance, standard deviation, range, interquartile range).
- Probability Theory: Understand basic probability concepts, conditional probability, Bayes’ theorem, and different probability distributions like binomial, normal, Poisson, etc.
- Inferential Statistics: Learn hypothesis testing (t-tests, chi-square tests, ANOVA), confidence intervals, and p-values to make data-driven decisions.
- Regression Analysis: Study linear regression, multiple regression, and logistic regression for predicting outcomes and understanding relationships between variables.
- Bayesian Statistics: This is increasingly used in machine learning, so having a good foundation in Bayesian reasoning can be beneficial.
- Sampling and Resampling Methods: Learn about different sampling techniques and methods like bootstrap and cross-validation, which are essential for model evaluation.
- Time Series Analysis: If you're dealing with sequential data (like stock prices or sales), learn about time series models, trend analysis, and seasonal decomposition.
- Dimensionality Reduction: Study techniques like Principal Component Analysis (PCA) and t-SNE for reducing the complexity of data while preserving its structure.
- Non-parametric Methods: Understand non-parametric tests that are used when data doesn't follow normal distribution assumptions, like the Mann-Whitney U test or the Kruskal-Wallis test.
- Multivariate Statistics: Explore techniques for analyzing data with more than one variable, such as MANOVA and cluster analysis.