Image generated with ChatGPT
Learning statistics is a core part of your journey toward becoming a data scientist, data analyst, or even an AI engineer. The majority of the machine learning models used in modern technology are statistical models. So, having a strong understanding of statistics will make it easier for you to learn and build advanced AI technologies.
In this blog, we will explore 10 GitHub repositories to help you master statistics. These repositories include code examples, books, Python libraries, guides, documentations, and visual learning materials.
1. Practical Statistics for Data Scientists
Repository: gedeck/practical-statistics-for-data-scientists
This repository offers practical examples and code snippets from the book “Practical Statistics for Data Scientists” that cover essential statistical techniques and concepts. It is a great starting point for data scientists who want to apply statistical methods in real-world scenarios.
The book’s code repository contains proper R and Python code examples. If you are used to the Jupyter Notebook style of coding, it also provides similar examples in a Jupyter Notebook for Python and R.
2. Probabilistic Programming and Bayesian Methods for Hackers
Repository: CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
This repository provides an interactive, hands-on introduction to Bayesian methods using Python. The content is presented as Jupyter notebooks using nbviewer, making it easy to follow theory and Python code about Bayesian models and probabilistic programming.
The interactive book consists of an introduction to Bayesian methods, getting started with Python’s PyMC library, Markov Chain Monte Carlo, the law of large numbers, loss functions, and more.
3. Statsmodels: Statistical Modeling and Econometrics in Python
Repository: statsmodels/statsmodels
Statsmodels is a powerful library for statistical modeling and econometrics in Python. This repository includes comprehensive documentation and examples for performing various statistical tests, linear models, time series analysis, and more. We can use these examples from the documentation to learn how to perform all kinds of statistical analysis, including time series analysis, survival analysis, multivariate analysis, linear regression, and more.
4. TensorFlow Probability
Repository: tensorflow/probability
TensorFlow Probability is a library for probabilistic reasoning and statistical analysis in TensorFlow. It extends TensorFlow core library with tools for building and training probabilistic models, making it an excellent resource for those interested in combining deep learning with statistical modeling.
The documentation contains examples of linear mixed effects models, hierarchical linear models, probabilistic principal components analysis, bayesian neural networks, and more.
5. The Probability and Statistics Cookbook
Repository: mavam/stat-cookbook
This repository is a collection of recipes for solving common statistical problems, serving as a helpful reference for finding quick solutions and examples for various statistical tasks. It provides concise guidance for probability and statistics, including concepts such as continuous distribution, probability theory, random variables, expectation, variance, and inequalities. You can either use the make command to access the cookbook locally or download the PDF file. The repository also includes LaTeX files for the various statistical concepts.
6. Seeing Theory
Repository: seeingtheory/Seeing-Theory
Seeing Theory is a visual introduction to probability and statistics. This repository includes interactive visualizations and explanations that make complex statistical concepts more accessible and easier to understand, especially for visual learners.
It is a highly interactive book for beginners and covers various topics such as basic probability, compound probability, probability distributions, frequentist inference, bayesian inference, and regression analysis.
7. Stats Maths with Python
Repository: tirthajyoti/Stats-Maths-with-Python
This repository contains scripts and Jupyter notebooks covering general statistics, mathematical programming, and scientific computing using Python. It is a valuable resource for anyone looking to strengthen their statistical and mathematical programming skills.
It includes the examples on bayes rule, brownian motion, hypothesis testing, linear regression, and more.
8. Python for Probability, Statistics, and Machine Learning
Repository: unpingco/Python-for-Probability-Statistics-and-Machine-Learning
This repository includes code examples and Jupyter notebooks from the book “Python for Probability, Statistics, and Machine Learning” that cover a wide range of topics, from basic probability and statistics to advanced machine learning techniques.
Within the “chapters” folder, there are three subfolders containing Jupyter notebooks on statistics, probability, and machine learning. Each notebook includes code, output, and a description explaining the methodology, code, and results.
9. Probability and Statistics VIP Cheatsheets
Repository: shervinea/stanford-cme-106-probability-and-statistics
This repository contains VIP cheatsheets for Stanford’s Probability and Statistics for Engineers course. The cheatsheets provide concise summaries of key concepts and formulas, making them a handy reference for students and professionals.
It is a popular cheatsheet that covers topics on conditional probability, random variables, parameter estimation, hypothesis testing, and more.
10. Basic Mathematics for Machine Learning
Repository: hrnbot/Basic-Mathematics-for-Machine-Learning
Understanding the mathematical foundations is crucial for mastering machine learning and statistics. This repository aims to demystify mathematics and help you learn the basics of algebra, calculus, statistics, probability, vectors, and matrices through Python Jupyter Notebooks.
Final Thoughts
Learning resources shared on GitHub are created by experts and the open-source community, aiming to share their knowledge to pave an easier path for beginners in the fields of data science and statistics. You will learn statistics by reading theory, solving code examples, understanding mathematical concepts, building projects, performing various analyses, and exploring popular statistical tools. All of these are covered in the GitHub repository mentioned above. These resources are free, and anyone can contribute to improve them. So, keep learning and keep building amazing things.
Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master’s degree in technology management and a bachelor’s degree in telecommunication engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.