We just witnessed a wonderful three weeks of sports as the 2024 Olympic Games unfolded in Paris, with millions of people watching the streams and rooting for their favorites. Simultaneously, we saw household names crushing it again, and the next generation of stat athletes emerged.
As a curious data scientist, I started to wonder how the collective opinion behind these evolves — which are the most popular sports, and which athletes are on the rise? Then, to put data behind this, I decided to do an extensive data collection from Wikipedia, containing both Wikipedia profiles and view count information, and then compare the popularity of different sports as well as top athletes. Below, please find my results and all the Python code needed to reproduce these.
All images were created by the author.
First, let’s visit the Wikipedia site of the 2024 Summer Olympics, and download it using the requests library. Then, I use the BeautifulSoup package to extract information from the html — namely, the list of all summer sports and their Wiki site links. To extract the sports, additionally, I follow my manual…