Music streaming services have grown to be an essential part of our digital landscape. Differentiating between instrumental music, which is music without voices, and vocal music is one of the major issues in music streaming. This distinction is essential for a variety of uses, such as building playlists for particular objectives, concentration, or relaxation, and even as a first step in language categorization for singing, which is crucial in marketplaces with numerous languages.
There is a sizable body of academic literature devoted to scalable content-based algorithms for automatic music tagging in order to offer context. It includes techniques that often entail developing low-level content features that consist of audio data or a variety of other data modalities into supervised multi-class multi-label models. These models have demonstrated significant performance in many different applications, such as predicting music genre, mood, instrumentation, or language.
In recent research, a team of researchers from Amazon has addressed the issue of automatic instrumental music detection. The researchers have contended that when it comes to detecting instrumental music, using the conventional approach yields less than ideal-results. With regard to instrumental music identification specifically, applying these models yields low recall, i.e., the proportion of relevant instances properly identified at high levels of precision (the proportion of instances indicated as relevant that are actually relevant).
To address this challenge, the team has proposed a unique multi-stage method for instrumental music detection. This method consists of three main stages, which are as follows.
- Source Separation Model: In the first stage, the audio recording is divided into two parts: the vocals and the accompaniment, i.e., the background music. This distinction is essential because instrumental music shouldn’t, in theory, include any vocal components.
- Quantification of Singing Voice: In the second stage, the vocal signal’s singing voice content is quantified. This quantification makes it possible to tell whether a track has vocals or not. The presence of a singing voice implies that the recording is instrumental if it falls below a predetermined level.
- Background Track Analysis: The background track, which stands in for the song’s instrumental components, is also examined. A neural network that has been trained to divide sounds into instrumental and non-instrumental categories is used for this investigation. This neural network’s main job is to determine whether the background recording has any musical instruments in it or not. A binary classifier is applied to the voice signal to determine whether or not the music is instrumental if the quantity of singing voice falls below the threshold.
The methodology seeks to reach a firm conclusion regarding whether specific music is instrumental or not by employing this multi-stage approach. To arrive at this conclusion, it makes use of the singing voice’s presence as well as the features of the background music. A comparative evaluation against various cutting-edge models for instrumental music detection has also been provided to verify this method’s efficacy.
Metrics that measure the method’s precision and recall have been included. The research illustrates the superiority of its approach in obtaining both high precision and high recall in identifying instrumental music within a large-scale music catalog by contrasting its findings to existing models. In conclusion, this research is definitely a great development for discussing the challenges in identifying instrumental music automatically in the context of music streaming services.
Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
If you like our work, you will love our newsletter..
Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.