Learning to play the piano is a challenging and fun process, but every novice will have a dilemma: what kind of instrument to buy? The choice in the market is wide, from the tiny two-octave instruments like in the photo above to 70kg full-size keyboards in a wooden cabinet. On the one hand, having extra keys will never hurt; on the other hand, there is always a trade-off between size, weight, and price.
Obviously, one of the ways is to just ask a teacher about what he or she can recommend, but maybe most musicians just never thought about music from that perspective. Is there any more quantitative way to find an answer? Actually, the answer is yes; we can easily make a musical note distribution using Python.
This tutorial can be useful for beginners in data science; it does not require any complex math or libraries, and the results are easy to interpret. It can also be useful for those who want to learn to play music but have not yet decided what kind of instrument to buy.
Let’s get into it!
Data Source
For data analysis, I will be using MIDI files. It’s a pretty old format; the first MIDI (Musical Instrument Digital Interface) specification was published in 1983. The key feature of MIDI files is that they store the music not as raw audio but in the “original” notation form. Every record in a midi file contains the instrument type, pitch, timing, and other parameters. For example, I can open a Bagatelle in C minor written by Beethoven in a free MuseScore application and see something like this:
Let’s open the same file in Python and dump its content:
import mido # pip3 install midomid = mido.MidiFile("Beethoven/Bagatelle.mid", clip=True)
for ind, track in enumerate(mid.tracks):
print(f"Track {ind}")
for item in track[:10]:
print(item)
The output looks like this (a full file is longer; here, I print only the first lines from each track):