Pitch Detection – It’s, in my opinion, one of the coolest things you can do with Digital Signal Processing (DSP). I was pretty boring because all I’ve really used this algorithm for so far is Musician’s Kit. But in reality the applications for this algorithm are endless.

The **Autocorrelation Algorithm** takes a segment of audio sampels, and compares it against itself at different intervals. At first this concept seems pointless, but if you think about it its kind of ingenious. Humans perceive “pitch” as a result of periodic vibrations in the air/environment around them, so all the autocorrelation algorithm tries to do is detect if there is any periodic characteristics in the segment of audio.

A nice way to think about it: take two transparencies and draw identical sine waves on each. Now place them on top of each other, and this is where the autocorrelation algorithm starts its magic. It looks at how similar the two transparencies are (at first they are, obviously, identical), and then shifts one to the right. This process, analyzing the similarity and shifting one step to the right, continues until the transparencies no longer overlap. After this process is done, you have a list of numbers – these represent the similarity of the transparencies at each position of the transparencies. Using this result you can determine if there was any specific position of high similarity past the first identical analysis. The position of this peak in the resulting list gives you enough information to determine period of the predominant periodic tone in the segment of audio.

If I haven’t completely confused you yet, there’s still some math involved with this algorithm. The overarching idea is fairly simple, but I didn’t explain how you compare the “similarity” of the segment of audio against itself. Every segment of digital audio (by segment I mean say a little interval of audio coming in from the microphone) is a list of numbers that represent the air pressure over time sampled at the sampling rate – for the most part on iOS that’s 44100 Hz (meaning the iPhone microphone takes 44100 measurements of the air pressure each second). This may be redundant but here’s the math, and I want to be as clear as my crappy writing allows.

To determine the similarity of two lists of numbers you can take the “dot product”. This is a vector (list of numbers) operation, and all it means is you multiply each element by the corresponding element (same index) in the other vector, and then add all of those products up. So say you have a segment of audio represented by:

Make a copy so you have two (remember the two-identical-transparencies analogy?) and get the dot product!

= {1 + 9 + 25 + 4 + 1 + 9 + 16 + 4 + 9 + 36 + 4 + 0 + 9 + 25 + 4 + 1} = **206**

So lets shift the second copy and see if our numbers are less similar (and if we shift it one audio sample, it should be less similar). In the name of simplicity I’m just going to wrap around the ends.

• {3, 5, 2, -1, -3, -4, -2, 3, 6, 2, 0, -3, -5, -2, 1, 1}

= {1*3 + 3*5 + 5*2 + 2*-1 + -1*-3 + -3*-4 + -4*-2 + -2*3 + 3*6 + 6*2 + 2*0 + 0*-3 + -3*-5 + -5*-2 + -2*1 + 1*1}

= 3 + 15 + 10 – 2 + 3 + 12 + 8 – 6 + 18 + 12 + 0 + 0 + 15 + 10 – 2 + 1

= **97**

As you can see, the resulting number was less at one shift (one lag). Lets do two more dot products, one at a lag of 4 and one of 8:

Audio • Audio(lag=4) = {1, 3, 5, 2, -1, -3, -4, -2, 3, 6, 2, 0, -3, -5, -2, 1}

• {-1, -3, -4, -2, 3, 6, 2, 0, -3, -5, -2, 1, 1, 3, 5, 2}

= {1*-1 + 3*-3 + 5*-4 + 2*-2 + -1*3 + -3*6 + -4*2 + -2*0 + 3*-3 + 6*-5 + 2*-2 + 0*1 + -3*1 + -5*3 + -2*5 + 1*2}

= -1 – 9 – 20 – 4 – 3 – 18 – 8 – 0 – 9 – 30 – 4 – 0 – 3 – 15 – 10 + 2

= **-132**

At a lag of 4, the signal isn’t very periodic. Let’s try it with an offset of 8 samples.

• {3, 6, 2, 0, -3, -5, -2, 1, 1, 3, 5, 2, -1, -3, -4, -2}

= {1*3 + 3*6 + 5*2 + 2*0 + -1*-3 + -3*-5 + -4*-2 + -2*1 + 3*1 + 6*3 + 2*5 + 0*2 + -3*-1 + -5*-3 + -2*-4 + 1*-2}

= 3 + 18 + 10 + 2 + 3 + 15 + 8 – 2 + 3 + 18 + 10 + 3 + 15 + 8 – 2

=

**112**

As you can see, from lags 0 – 4, the similarity decreases, and then as the lag approaches 8 the similarity increases which means we will probably have a peak in similarity around an offset of 8 samples.

This is how autocorrelation algorithms detect pitch. They essentially take a string of audio samples and compare it against offset versions of the same string using a dot product.

**Downloads:**

- Autocorrelation - An xCode project that performs a simple autocorrelation algorithm and shows you the output frequency.
- PitchDetector - A simple Objective-C class that performs the autocorrelation algorithm shown above.