Learning from Audio: Wave Forms
An introduction to wave forms and dealing with null data.
Related Articles:
- Learning from Audio: Time Domain Features
- Learning from Audio: Fourier Transformations
- Learning from Audio: Spectrograms
- Learning from Audio: The Mel Scale, Mel Spectrograms, and Mel Frequency Cepstral Coefficients
- Learning from Audio: Pitch and Chromagrams
Introduction:
Audio is an extremely rich data source. Depending on the sample rate
— the number of points sampled per second to quantify the signal — one second of data could contain thousands of points. Scale this up to hours of recorded audio, and you can see how Machine Learning and Data Science nicely intertwine with signal processing techniques.
This article aims to break down what exactly wave forms are as well as utilize librosa
in Python for analysis and visualizations — alongside numpy
and matplotlib
.
Wave Forms:
Waves
are repeated signals
that oscillate and vary in amplitude, depending on their complexity. In the real world, waves
are continuous and mechanical — which is quite different from computers being discrete and digital.
So, how do we translate something continuous and mechanical into something that is discrete and digital?
This is where the sample rate
defined earlier comes in. Say, for example, the sample rate
of the recorded audio is 100. This means that for every recorded second of audio, the computer will place 100 points along the signal
in attempts to best “trace” the continuous curve. Once all the points are in place, a smooth curve joins them all together for humans to be able to visualize the sound. Since the recorded audio is in terms of amplitude
and time
, we can intuitively say that the wave form operates in the time domain
.
To better understand what something like this sounds like, we will look at three sounds: a kick drum, a guitar, and a snare drum. The code and data can be found in my GitHub repository for this article.