Investigation of methods of vocal extraction in mixed records
DOI:
https://doi.org/10.34185/1562-9945-3-128-2020-05Keywords:
сліпий поділ сигналу, деміксування, цифрова обробка сигналівAbstract
In the modern world, the blind division of the signal is an urgent task for musicians and the audio industry workers. It is to isolate the source signal from the mix, that is, by looking at the music area, it is the selection of a single instrument track from a finished mix. Despite the presence of a large number of signal processing methods, the problem of demixing has not been solved to date, and attempts to solve it yield signals with many distortions at the output, which makes it impossible to use them further. The purpose of the research is to isolate the characteristics of the vocal signal on the basis of existing methods and software.
Due to the fact that each voice is unique, there is no universal way to extract a vocal track from the finished mix. Depending on the particular arrangement and the particular voice, different methods can produce different results.
The following methods of vocal isolation are described in this paper:
1. Frequency filtering of vocals;
2. Phase subtraction method;
3. Methods using artificial intelligence.
A comparative analysis was conducted to evaluate the effectiveness of these methods. A set of examples has been prepared for analysis, for which compositions in different styles of music and with different filling of musical instruments in arrangement are selected. Selected compositions were subjected to phase subtraction processing and two software products that operate on the basis of artificial intelligence systems: Spleeter and iZotope RX7.
Evaluating the results of methods using artificial intelligence give very similar results, but different methods are better for different compositions. In all cases, there is no perfect vocal line distortion – either distorted timbre or tones from other instruments. As a result, the phase subtraction method produces a mono-signal, which is a major drawback and cannot separate vocals from instruments in the same range and position in the panorama. A common disadvantage of all methods is that they do not adapt to the voice in a particular musical composition. In this regard, we need to develop a method that will determine the timbre characteristics for a particular composition and highlight the track with that timbre.
References
Lavrova E.V. Speech therapy. Basics of phonopaedia [Text]: textbook. manual for university students enrolled in the specialty - speech therapy / E.V. Lavrova. – M.: AcademiA, 2007. – 144 p.: ill. – (Higher vocational education: psychology).– Bibliography: p.139-142. – ISBN 978-5-7695-3753-0
Sergienko A.B. Digital signal processing. 3rd ed. – SPb .: BHV-Petersburg, 2011 .– 768 p.: Ill. – (Textbooks for universities).
V. Popchenko. Fight phase distortion during microphone recording. [Electronic resource]. – Access mode:
http://prosound.ixbt.com/exp/papchenko-phase.shtml
Ale Koretzky. Audio AI: isolating vocals from stereo music using Convolutional Neural Networks. [Electronic resource]. – Access mode:
Hannah Robertson. Exploring the Technology that Makes RX 7 Music Rebalance Possible. [Electronic resource]. – Access mode: