Image by Author | Midjourney & Canva
KDnuggets’ sister site, Statology, has a wide range of available statistics-related content written by experts, content which has accumulated over a few short years. We have decided to help make our readers aware of this great resource for statistical, mathematical, data science, and programming content by organizing and sharing some of its fantastic tutorials with the KDnuggets community.
Learning statistics can be hard. It can be frustrating. And more than anything, it can be confusing. That’s why Statology is here to help.
This latest collection of tutorials focuses on visualizing data. No data or statistical analysis is complete without visualizing one’s data. A variety of tools exist for us to be able to better understand our data through visualization, and these tutorials will help do just that. Learn these different techniques, and then continue on reading Statology’s archives for more gems.
Boxplots
A boxplot (sometimes called a box-and-whisker plot) is a plot that shows the five-number summary of a dataset.
The five-number summary include:
- The minimum
- The first quartile
- The median
- The third quartile
- The maximum
A boxplot allows us to easily visualize the distribution of values in a dataset using one simple plot.
Stem-and-Leaf Plots: Definition & Examples
A stem-and-leaf plot displays data by splitting up each value in a dataset into a “stem” and a “leaf.”
This tutorial explains how to create and interpret stem-and-leaf plots.
Scatterplots
Scatterplots are used to display the relationship between two variables.
Suppose we have the following dataset that shows the weight and height of players on a basketball team:
The two variables in this dataset are height and weight.
To make a scatterplot, we place the height along the x-axis and the weight along the y-axis. Each player is then represented as a dot on the scatterplot:
Scatterplots help us see relationships between two variables. In this case, we see that height and weight have a positive relationship. As height increases, weight tends to increase as well.
Relative Frequency Histogram: Definition + Example
Often in statistics you will encounter tables that display information about frequencies. Frequencies simply tell us how many times a certain event has occurred.
For example, the following table shows how many items a particular shop sold in a week based on the price of the item:
This type of table is known as a frequency table. In one column we have the “class” and in the other column we have the frequency of the class.
Often we use frequency histograms to visualize the values in a frequency table since it’s typically easier to gain an understanding of data when we can visualize the numbers.
What are Density Curves? (Explanation & Examples)
A density curve is a curve on a graph that represents the distribution of values in a dataset. It’s useful for three reasons:
- A density curve gives us a good idea of the “shape” of a distribution, including whether or not a distribution has one or more “peaks” of frequently occurring values and whether or not the distribution is skewed to the left or the right.
- A density curve lets us visually see where the mean and the median of a distribution are located.
- A density curve lets us visually see what percentage of observations in a dataset fall between different values
For more content like this, keep checking out Statology, and subscribe to their weekly newsletter to make sure you don’t miss anything.
Matthew Mayo (@mattmayo13) holds a master’s degree in computer science and a graduate diploma in data mining. As managing editor of KDnuggets & Statology, and contributing editor at Machine Learning Mastery, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, language models, machine learning algorithms, and exploring emerging AI. He is driven by a mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.