avatarmlearnere

Summary

The website content provides an introduction to audio signal processing, focusing on wave forms, null data handling, and visualization using Python libraries like librosa, numpy, and matplotlib.

Abstract

The article "Learning from Audio: Wave Forms" serves as a primer on understanding wave forms in audio data analysis. It explains the concept of wave forms as oscillating signals that are mechanical in the real world but must be digitized for computer processing through sampling. The author uses the Python library librosa alongside numpy and matplotlib to demonstrate audio signal analysis and visualization, illustrating the process with examples of a kick drum, a guitar, and a snare drum. The article also addresses the challenge of dealing with null data in audio signals by setting a threshold to filter out insignificant amplitude points, thereby enhancing the visualization and interpretation of audio wave forms. The conclusion invites readers to explore further advanced topics in audio analysis.

Opinions

  • The author emphasizes the richness of audio as a data source and the importance of machine learning and data science in audio signal processing.
  • The use of librosa, numpy, and matplotlib is recommended for audio analysis and visualization, indicating a preference for these tools within the data science community.
  • The article suggests that handling null data by applying a threshold to the amplitude of signals is a straightforward and effective method.
  • By providing code examples and visualizations, the author conveys a hands-on approach to learning, encouraging readers to engage with the material actively.
  • The mention of related articles on topics like time domain features, Fourier transformations, and spectrograms implies a comprehensive series aimed at a thorough understanding of audio signal processing.
  • The author's note at the end, promoting an AI service, indicates a belief in the value of cost-effective, high-performance AI tools for tasks similar to those discussed in the article.

Learning from Audio: Wave Forms

An introduction to wave forms and dealing with null data.

Photo by Jonathan Velasquez on Unsplash

Related Articles:

Introduction:

Audio is an extremely rich data source. Depending on the sample ratethe number of points sampled per second to quantify the signal — one second of data could contain thousands of points. Scale this up to hours of recorded audio, and you can see how Machine Learning and Data Science nicely intertwine with signal processing techniques.

This article aims to break down what exactly wave forms are as well as utilize librosa in Python for analysis and visualizations — alongside numpy and matplotlib.

Wave Forms:

Waves are repeated signals that oscillate and vary in amplitude, depending on their complexity. In the real world, waves are continuous and mechanical — which is quite different from computers being discrete and digital.

So, how do we translate something continuous and mechanical into something that is discrete and digital?

This is where the sample rate defined earlier comes in. Say, for example, the sample rate of the recorded audio is 100. This means that for every recorded second of audio, the computer will place 100 points along the signal in attempts to best “trace” the continuous curve. Once all the points are in place, a smooth curve joins them all together for humans to be able to visualize the sound. Since the recorded audio is in terms of amplitude and time, we can intuitively say that the wave form operates in the time domain.

To better understand what something like this sounds like, we will look at three sounds: a kick drum, a guitar, and a snare drum. The code and data can be found in my GitHub repository for this article.

Now that the data is loaded in, let’s visualize these sounds.

Figure 1

From the get-go, we see some issues with the visualization.

While we can easily tell some differences between the visualizations, it is not as distinct as we would like it to be. We also know that audio signals do not just suddenly disappear, they in fact fade out until it is impossible to perceive. This means that in terms of audio, this constitutes as null data.

Null Data in Audio:

There are many ways to treat null audio data in the time domain. However, this approach often is the simplest.

Given the signal and a minimum threshold for the amplitude of the signal:

  • Take the absolute value of each point in the signal
  • If the point is greater than the threshold, we keep it. Otherwise, we remove it.
Figure 2

You can think of thresholds as a sort of parameter for the recordings. Different thresholds work differently for various sounds. Playing around with the threshold is a good way to see how and why this visualization changes.

Now that the null data has been removed from these recordings, it is much easier to see the personality in each sound. The guitar is much more uniform in shape, drowning out gradually with time. The kick drum hits hard in the beginning and quickly drowns out with some remnants of sound remaining. The snare drum is loud and raucous, something you will not want to listen to repeatedly.

Conclusion:

This concludes the basics of dealing with audio signals in Python with librosa. Stay tuned for more articles that delve into more advanced topics of how to learn from audio!

Thank you for reading.

Note: all figures without a source is by the author.

Audio
Deep Learning
Python
Tutorial
Machine Learning
Recommended from ReadMedium