Autocorrelation – Frequency Estimation

Pitch Detection – It’s, in my opinion, one of the coolest things you can do with Digital Signal Processing (DSP). I was pretty boring because all I’ve really used this algorithm for so far is Musician’s Kit. But in reality the applications for this algorithm are endless.

The Autocorrelation Algorithm takes a segment of audio sampels, and compares it against itself at different intervals. At first this concept seems pointless, but if you think about it its kind of ingenious. Humans perceive “pitch” as a result of periodic vibrations in the air/environment around them, so all the autocorrelation algorithm tries to do is detect if there is any periodic characteristics in the segment of audio.

A nice way to think about it: take two transparencies and draw identical sine waves on each. Now place them on top of each other, and this is where the autocorrelation algorithm starts its magic. It looks at how similar the two transparencies are (at first they are, obviously, identical), and then shifts one to the right. This process, analyzing the similarity and shifting one step to the right, continues until the transparencies no longer overlap. After this process is done, you have a list of numbers – these represent the similarity of the transparencies at each position of the transparencies. Using this result you can determine if there was any specific position of high similarity past the first identical analysis. The position of this peak in the resulting list gives you enough information to determine period of the predominant periodic tone in the segment of audio.

If I haven’t completely confused you yet, there’s still some math involved with this algorithm. The overarching idea is fairly simple, but I didn’t explain how you compare the “similarity” of the segment of audio against itself. Every segment of digital audio (by segment I mean say a little interval of audio coming in from the microphone) is a list of numbers that represent the air pressure over time sampled at the sampling rate – for the most part on iOS that’s 44100 Hz (meaning the iPhone microphone takes 44100 measurements of the air pressure each second). This may be redundant but here’s the math, and I want to be as clear as my crappy writing allows.

To determine the similarity of two lists of numbers you can take the “dot product”. This is a vector (list of numbers) operation, and all it means is you multiply each element by the corresponding element (same index) in the other vector, and then add all of those products up. So say you have a segment of audio represented by:

Audio = {1, 3, 5, 2, -1, -3, -4, -2, 3, 6, 2, 0, -3, -5, -2, 1}

Make a copy so you have two (remember the two-identical-transparencies analogy?) and get the dot product!

Audio • Audio = {1*1 + 3*3 + 5*5 + 2*2 + -1*-1 + -3*-3 + -4*-4 + -2*-2 + 3*3 + 6*6 + 2*2 + 0*0 + -3*-3 + -5*-5 + -2*2 + 1*1}

= {1 + 9 + 25 + 4 + 1 + 9 + 16 + 4 + 9 + 36 + 4 + 0 + 9 + 25 + 4 + 1} = 206

So lets shift the second copy and see if our numbers are less similar (and if we shift it one audio sample, it should be less similar). In the name of simplicity I’m just going to wrap around the ends.

Audio • Audio(lag=1) = {1, 3, 5, 2, -1, -3, -4, -2, 3, 6, 2, 0, -3, -5, -2, 1}
• {3, 5, 2, -1, -3, -4, -2, 3, 6, 2, 0, -3, -5, -2, 1, 1}

= {1*3 + 3*5 + 5*2 + 2*-1 + -1*-3 + -3*-4 + -4*-2 + -2*3 + 3*6 + 6*2 + 2*0 + 0*-3 + -3*-5 + -5*-2 + -2*1 + 1*1}
= 3 + 15 + 10 – 2 + 3 + 12 + 8 – 6 + 18 + 12 + 0 + 0 + 15 + 10 – 2 + 1
= 97

As you can see, the resulting number was less at one shift (one lag). Lets do two more dot products, one at a lag of 4 and one of 8:

Audio • Audio(lag=4) = {1, 3, 5, 2, -1, -3, -4, -2, 3, 6, 2, 0, -3, -5, -2, 1}
• {-1, -3, -4, -2, 3, 6, 2, 0, -3, -5, -2, 1, 1, 3, 5, 2}
= {1*-1 + 3*-3 + 5*-4 + 2*-2 + -1*3 + -3*6 + -4*2 + -2*0 + 3*-3 + 6*-5 + 2*-2 + 0*1 + -3*1 + -5*3 + -2*5 + 1*2}
= -1 – 9 – 20 – 4 – 3 – 18 – 8 – 0 – 9 – 30 – 4 – 0 – 3 – 15 – 10 + 2
= -132

At a lag of 4, the signal isn’t very periodic. Let’s try it with an offset of 8 samples.

Audio • Audio(lag=8) = {1, 3, 5, 2, -1, -3, -4, -2, 3, 6, 2, 0, -3, -5, -2, 1}
• {3, 6, 2, 0, -3, -5, -2, 1, 1, 3, 5, 2, -1, -3, -4, -2}
= {1*3 + 3*6 + 5*2 + 2*0 + -1*-3 + -3*-5 + -4*-2 + -2*1 + 3*1 + 6*3 + 2*5 + 0*2 + -3*-1 + -5*-3 + -2*-4 + 1*-2}
= 3 + 18 + 10 + 2 + 3 + 15 + 8 – 2 + 3 + 18 + 10 + 3 + 15 + 8 – 2
= 112

As you can see, from lags 0 – 4, the similarity decreases, and then as the lag approaches 8 the similarity increases which means we will probably have a peak in similarity around an offset of 8 samples.

This is how autocorrelation algorithms detect pitch. They essentially take a string of audio samples and compare it against offset versions of the same string using a dot product.

Downloads:

  •  Autocorrelation - An xCode project that performs a simple autocorrelation algorithm and shows you the output frequency.
  • PitchDetector - A simple Objective-C class that performs the autocorrelation algorithm shown above.

iOS Audio I/O Example – xCode Project

Audio handling on iOS can be pretty tricky, but once you get past the first few speed bumps that deal with setting up the Audio Session, Audio Units, and Audio I/O configuration. The concept is relatively simple though.

When your app wants to play audio, and Audio Session is initialized. There are a bunch of different categories of audio sessions that describe if your audio is playing, recording, or both and how it should behave in relation to other sounds (ring tones, music player, etc…).

Next you need Audio Units. An audio unit is kind of like a plug in for low-level audio handling. For example, there is one unit called the Remote IO unit that facilitates the routing of audio input and output between the app and device. Other audio units can add reverberation or filters (processing units), mix different channels of audio (mixing units, kind of like a real life mixing board), or play different samples of audio based on MIDI input (Sampler unit). In Mac OS X you can write custom audio units for your program, but on iOS you’re restricted to what Apple has provided.

iOS Audio I/O Example Screen Shot

Anyways, below are links to an example audio project (both full xCode project and class). I sort of taught myself all of the concepts that are involved through the use of other online blogs about iPhone audio and if you look closely at my code you might see them peek through.

The project is fairly small, I just wanted to show an example implementation of Audio I/O on the iPhone. The AudioController object is very similar to the one used in my own personal projects. I’ve written comments throughout most of the project to help guide any beginners through the audio setup process. The end result is a small app that can generate and play Sine/Saw/Triangle/Square waves on the go, give some UI feedback on the SPL of the microphone (“MeterView.h”), and play AUSampler notes using a simple MIDI-esque abstraction.

Downloads

PS – I sort of have this burning desire to keep working on this AudioController project and possibly make an easier Objective-C interface for controlling low-latency audio signals, so if there’s any desire out there to see it happen, please tell.

Triangle and Sawtooth Wave Generation

I’ll keep this short because Triangle, Sawtooth, and Square waves all share something in common: they’re non-sinusoidal. They are all made up of their own unique (and infinite) summation of sinusoids.

A triangle wave can be represented as:

Triangle Wave Equationand a sawtooth wave:

Sawtooth Wave Equation where = 2πf (f being the frequency of the generated wave).

If the words “non-sinusoidal” or “infinite summation of sinusoids” are confusing, it might be helpful to look at the following animations. As you can see in the equations above, they all end with “…”, this means that the previous pattern continues to infinity. It never ends, so its quite impossible for a computer to represent a square, triangle, or sawtooth wave as an infinite summation of sinusoids. But it can get close! Take a look at these animations that I grabbed from wikipedia:

But I digress, when we have to generate these waves the on the go, it’s not absolutely necessary to have a perfect understanding of the math in the past two posts. However, it is very useful if you ever go into Digital Signal Processing or any music tech field for that matter.

So to get back on topic, here’s the two snippets of code for generating triangle and sawtooth waves:


int triIndex = 0;
void generateTriangle(SInt16 *sampleBuffer, int numFrames, float sampleRate, float frequency, float amp) {
    if(amp>1) amp=1;
    if(amp<0) amp=0;
    amp = amp * SHRT_MAX;

    float samplesPerCycle = sampleRate/frequency;
    for(int i = 0; i<numFrames; i++) {
        if(fmodf(triIndex, samplesPerCycle)/samplesPerCycle>0.5) {
            sampleBuffer[i] = (SInt16) amp * ((2-2*((fmodf(triIndex, samplesPerCycle)/samplesPerCycle-0.5)/0.5))-1);
        } else {
            sampleBuffer[i] = (SInt16) amp * ((2*((fmodf(triIndex, samplesPerCycle)/samplesPerCycle)/0.5))-1);
        }
        triIndex = triIndex+1;
    }
}
int sawIndex = 0;
void generateSawtooth(SInt16 *sampleBuffer, int numFrames, float sampleRate, float frequency, float amp) {
    if(amp>1) amp=1;
    if(amp<0) amp=0;
    amp = amp * SHRT_MAX;

    float samplesPerCycle = sampleRate/frequency;
    for(int i = 0; i<numFrames; i++) {
        sampleBuffer[i] = (SInt16) amp * (2*(fmodf(sawIndex, samplesPerCycle)/samplesPerCycle)-1);
        sawIndex = sawIndex+1;
        if(sawIndex>=samplesPerCycle) sawIndex-=samplesPerCycle;
    }
}

Square Wave Generation

This is somewhat of a continuation of my last post on Sine Wave Generation, but instead we’ll be producing a square wave! Square waves are really annoying to listen to. So if you’re going to test this at all – make sure you turn down the volume (it can be very unpleasant).

Square waves are a bit more complex than sine waves. Sine waves all have their energy concentrated on one frequency. Square waves are, in reality, somewhat theoretical – it is the result of an infinite summation of sine waves, each weighted differently. Therefore, in real life, it is non-sinusoidal. But it’s still very relevant, so to be thorough – a square wave is:

Square(t) = 4/PI(sin(2(PI)ft) + 1/3sin(6(PI)ft) + ...)

Obviously, it would be impractical to add a bunch of different sin(C), where C is some number determined by the arguments and equation above, every time we need to compute just one sample. There are a few other solutions. One solution (unnecessarily complicated, but no more difficult than other common methods of audio synthesis) is to use the inverse FFT. Without diving into a lecture about the Fast Fourier Transform (FFT), it’s main use is to take a chunk of audio, and transform it into the frequency domain so that we can see how energy is spread across the frequency spectrum (20Hz – 20,000Hz for humans). Our solution would employ the Inverse FFT (or IFFT) – this does the opposite, you begin with an ideal frequency spectrum and end up with a chunk of audio. It’s very useful, you can essentially draw the frequency spectrum (drawing sound, eh??).

But this is far too complicated for now, because we just want to generate a square wave on the go, and that’s all it is, a square. For every cycle of a square wave, it is either high or low. Therefore its incredibly easy to just loop through the buffer and insert the constant amp, changing the sign every half cycle. My C++ implementation is below.

C++:

int squareIndex = 0;
int amp = SHRT_MAX * 0.8;
void generateSquare(SInt16 *sampleBuffer, int numFrames, float sampleRate, float frequency) {
    float samplesPerCycle = sampleRate/frequency;
    for(int i = 0; i < numFrames; i++) {
        if(fmodf(squareIndex, samplesPerCycle)/samplesPerCycle > 0.5) {
            sampleBuffer[i] = amp;
        } else {
            sampleBuffer[i] = -1*amp;
        }
        squareIndex = squareIndex+1;
        if(squareIndex >= samplesPerCycle) squareIndex-=samplesPerCycle;
    }
}

This is a faux-implementation of how I would fill up a callback buffer:

#define sample_rate 44100.0
#define desired_tone 440.0 //A440
static void playbackCallback(..., int inNumberFrames, SInt16 *ioData) {
    generateSine(ioData, inNumberFrames, sample_rate, desired_tone);
}

Sine Wave Generation

Generating a sampled sine wave

Generating basic waveforms is a fundamental concept which all audio DSP engineers should understand. Basic waveforms can be generated on the go, without considering the harmonic properties that define each wave shape. Since these waves are periodic, they can also be produced once, and then stored as in an array or a wavetable (or multiple waves can be produced and stored in a matrix). Wavetables are used very often in audio synthesis (wavetable synthesis) where multiple waveforms are stored, modulated, and post-processed to create new sounds.

I have provided code below that outlines how to produce the simplest of the basic waveforms: The Sine Wave. Because most of my audio developing experience has been on the iPhone, I’ve posted the code in C++.

C++:

#define M_PI 3.14; //M_PI is a pi macro used by many compilers
double theta = 0;
void generateSine(SInt16 *sampleBuffer, int numFrames, float sampleRate, float frequency) {
    float deltaTheta = 2*M_PI*(frequency/sampleRate);
    for(int i = 0; i2*M_PI) theta = theta - 2*M_PI;
    }
}

The idea for this function/method is that you call this method when you have a buffer that must be populated with a sine wave of a given frequency. To use this function, you call it with the following arguments:

  • SInt16 *sampleBuffer: a reference of the buffer to be populated.
  • int numFrames: the number of samples in *sampleBuffer.
  • float sampleRate: the device’s current sampleRate – usually 44100.0 on the iPhone (although you can usually set it yourself, more on this in a few weeks).
  • float frequency: the desired output frequency in Hz.

So, to give an example of how I would use these snippets in real life I will leave you with an example. This is a mockup callback for an Audio I/O Framework (based off of iOS). A callback is a function that is called every time the device is about to run out of audio samples to play from the speaker, and alas, must ask you to feed it some more.

#define sample_rate 44100.0
#define desired_tone 440.0 //A440
static void playbackCallback(..., int inNumberFrames, SInt16 *ioData) {
    generateSine(ioData, inNumberFrames, sample_rate, desired_tone);
}

Yep, its Another Tech Related Blog

Happy February, internet!

Seeing as I am quickly closing in on my last few months at NYU, I have decided to migrate to WordPress so I can provide documentation for some projects. Furthermore, I now have a place to publish tech related malarkey. Hooray!

My first few posts will probably be very specific to iPhone Development and Audio Processing and I/O. But I hope to move onto other things like Android, OpenGL graphics, and Web Development (PHP/MySQL n’ such).

Hopefully this will get interesting at some point.

-Kevin Murphy