Author Topic: Question on how audio is stored in SoundBuffer (Read 8349 times)

Joshua Flynn · « **on:** October 29, 2013, 12:44:29 am »

This assumes a SoundBuffer set with a channel count of 2 and a sample rate of 44100.

I'm trying to understand how audio is stored in SoundBuffer. So how are the numbers interpreted?

For example, if I had a sample array that contained 1000 Int16s, how do I know which is pitch, volume, channel, etc?

Tuffywub · « **Reply #1 on:** October 29, 2013, 04:04:34 am »

An array of Int16s only stores the amplitude of the sound at that point in time. It does not store the pitch, or any sort of effect. You can think of the numbers as the position of the speaker at that point in time.

A good example of working with lower level sound can be found on the SFML wiki:
https://github.com/SFML/SFML/wiki/Tutorial%3A-Play-Sine-Wave

Joshua Flynn · « **Reply #2 on:** October 31, 2013, 11:52:08 am »

Okay, thank you for the help so far.

So if I wanted to work out the pitch of a given sound piece (starting simple, let's say a basic 440hz sine wave loaded from a .wav file), how would I go about doing that?

wintertime · « **Reply #3 on:** October 31, 2013, 02:45:33 pm »

You could calculate the Fourier-Transform and look at the data in frequency form.

Nexus · « **Reply #4 on:** October 31, 2013, 10:33:59 pm »

If you know that it's a sine wave, you can also measure the distance between two zero-crossings (i.e. one period). This wave length is then inverse to the frequency.

Joshua Flynn · « **Reply #5 on:** November 01, 2013, 12:02:56 am »

I'm looking to enable it to deal with more complicated things (such as verbal recordings) in order to detect pitch for the slightly less musical gifted (IE myself). The sine wave is just a starting base to figure the theory behind pitch detection (which I know is possible as I toyed with such software in the windows 2000/XP days).

Obviously the proper wav file won't be so clean (pops, crackles, low-level background noise, inconsistent pitch, silences etc).

I'm hoping to somehow 'average out' a specified section so it can give me the closest approximation to the pitch it thinks I'm attempting to 'sing'. Obviously I don't anticipate very noisy or poor noise:signal ratios but it won't be a clear-cut sine wave either.

Based on what I've learnt, I'm assuming the variables in the equation would be distance between the mid-range of two points and speed (closer the points, higher the speed, higher the pitch, further the points, slower the speed, lower the pitch).

Nexus · « **Reply #6 on:** November 01, 2013, 12:20:54 am »

Looking at the time domain is not a good idea for this kind of analysis. To investigate the distribution of different frequencies (i.e. pitch) across your signal, transform it into frequency domain -> Fourier Transform, as already mentioned by wintertime. You can use an existing FFT implementation from a C or C++ library.

Joshua Flynn · « **Reply #7 on:** November 21, 2013, 05:10:06 pm »

Quote from: Nexus on November 01, 2013, 12:20:54 am

Looking at the time domain is not a good idea for this kind of analysis. To investigate the distribution of different frequencies (i.e. pitch) across your signal, transform it into frequency domain -> Fourier Transform, as already mentioned by wintertime. You can use an existing FFT implementation from a C or C++ library.

I'm not smart enough to grasp Fourier Transform or other mathematical effects.

I find it easier if I devise my own solutions that I understand.

Nexus · « **Reply #8 on:** November 21, 2013, 05:48:54 pm »

It is of course your decision, whether you spend time reinventing a wheel that is not halfway as accurate as existing solutions, or whether you read some theory about signal processing. Since you are actually doing signal processing here, the second option doesn't look like a bad idea to me

It's not that you have to understand all the mathematical concepts behind Fourier Transform, but you should know what it represents and how it can make your task easier (not to say, possible at all). I have no idea how you would want to solve the problem in time domain -- after all, tools like Fourier Transforms were researched to make frequency-related operations simpler, not more complicated. Given some high-level knowledge, applying a FFT library shouldn't be very difficult.

Joshua Flynn · « **Reply #9 on:** November 22, 2013, 01:43:39 pm »

Quote from: Nexus on November 21, 2013, 05:48:54 pm

It is of course your decision, whether you spend time reinventing a wheel that is not halfway as accurate as existing solutions, or whether you read some theory about signal processing. Since you are actually doing signal processing here, the second option doesn't look like a bad idea to me

It's not that you have to understand all the mathematical concepts behind Fourier Transform, but you should know what it represents and how it can make your task easier (not to say, possible at all). I have no idea how you would want to solve the problem in time domain -- after all, tools like Fourier Transforms were researched to make frequency-related operations simpler, not more complicated. Given some high-level knowledge, applying a FFT library shouldn't be very difficult.

Have you seen the Fourier Transfrom wikipedia page? Or any page on it, for that matter? I'm looking at mathematical symbols I can't grasp, and that to me, is not simplification of a process. It doesn't make my task simple if I don't understand how it works. If your rebuttal is 'read up about it', then my rebuttal is 'it's not a simple solution then'. I shouldn't have to master a segment of mathematics and physics just to calculate what is a three variable sum: time, speed, distance.

As far as I can tell, it's pretty straight forward to build a way to work it out: Isolate a segment of the sound wave, work out min/max, calculate mid-point using min/max, detect two min/max points (one after the initial peak), measure the difference in distance between the two min/max points, get speed (sample rate).

Then use distance and speed to work out frequency. Probably by comparing it to a baseline distance in 440hz (stored, not calculated) at 44100 sample rate by working out differences to arrive at an approximate frequency.

Or I could force myself to learn a huge segment of mathematics, physics, learn to read mathematical symbols I will never again use, figure out what a formulae is actually saying in it's obscure format, spend hours in a library revising other sub-topics ('wiki-walk') in order to grasp the main topic, spend hours searching for a compatible code library that won't install properly anyway, only to discover when I try to run the code it throws up numerous compile time errors because I didn't include another library before finally discovering it has an incompatible copy-left license that requires I make my entire code base public on demand allowing for an unethical company to subvert my entire work by stealing it then just disobeying the copy-left altogether and keeping it hidden (doesn't happen? What open source OS is Apple, Microsoft and Android based on and around?).

Even if said library is successfully installed, usually the calling function has a bizarre behavioural set under a certain set of conditions (such as always returning void and that a certain phase of the moon's cycle prevents it from working correctly) or just crashes 'just because'. And then maybe that library isn't cross-compatible, and hates 64 bit architecture, and AMD processors.

But apparently I should go out of my ridiculous way not to 'reinvent the wheel' (same justification used for trying to convince me to use std::string: std::string implemented two of my ideas in C++11, so ner-ni-ner: shrink to fit and always null-appended strings), even if said wheel is 10 tonne of stone and square.

Better question: why do I need to import an entire library for what should be, in effect, one function?

zsbzsb · « **Reply #10 on:** November 22, 2013, 01:53:10 pm »

Quote from: Joshua Flynn on November 22, 2013, 01:43:39 pm

As far as I can tell, it's pretty straight forward to build a way to work it out: Isolate a segment of the sound wave, work out min/max, calculate mid-point using min/max, detect two min/max points (one after the initial peak), measure the difference in distance between the two min/max points, get speed (sample rate).

Then use distance and speed to work out frequency. Probably by comparing it to a baseline distance in 440hz (stored, not calculated) at 44100 sample rate by working out differences to arrive at an approximate frequency.

That "should" work in theory as long as you are guaranteed to be working with sin waves exclusively. Problem is if you are trying to analyze recorded data it could be coming in as really any wave form.

Quote from: Joshua Flynn

Better question: why do I need to import an entire library for what should be, in effect, one function?

One function for determining the frequency of an audio sample? Well that wouldn't work too well, because even after you implement a FFT there can still be multiple pitches playing at the same time. So you would need to look at the FFT and determine which is your fundamental pitch. Think of just simple chords on the piano/keyboard, a major chord is generally the 1st, 3rd, and 5th notes. So if your program applied the FFT to a recording of a C major chord you need someway to determine between the C, E, and G notes.

Note the fundamental note/pitch is not always the lowest, so this is impossible to wrap into one function

Nexus · « **Reply #11 on:** November 22, 2013, 10:09:37 pm »

Quote from: Joshua Flynn on November 22, 2013, 01:43:39 pm

Have you seen the Fourier Transfrom wikipedia page? Or any page on it, for that matter?

The Wikipedia article begins as follows: "The Fourier transform [...] is a mathematical transformation employed to transform signals between time (or spatial) domain and frequency domain [...]. The new function is then known as the Fourier transform and/or the frequency spectrum of the function f." Frequency spectrum is the important term here, it tells you how the signal is represented over different frequencies (instead of different times). It's very simple to understand the concept of a frequency spectrum without deep mathematical backgrounds.

Quote from: Joshua Flynn on November 22, 2013, 01:43:39 pm

I'm looking at mathematical symbols I can't grasp, and that to me, is not simplification of a process. It doesn't make my task simple if I don't understand how it works.

You use the compiler, yet you don't know how it works internally. You use SFML, although you don't know the libraries behind it and the interaction with the hardware. This concept is abstraction: Using things as a black box without caring about the implementation details. It's the same with FFT, you won't need to compute the integral or sum yourself. But you have to understand the different signal representations, in order to know what input and output mean.

Quote from: Joshua Flynn on November 22, 2013, 01:43:39 pm

As far as I can tell, it's pretty straight forward to build a way to work it out: Isolate a segment of the sound wave, work out min/max, calculate mid-point using min/max, detect two min/max points (one after the initial peak), measure the difference in distance between the two min/max points, get speed (sample rate).

As soon as your signal contains the slightest bit of noise, min/max (and thus mid-point) become effectively useless. These metrics only take into account extreme values, which is not a very good representation of the signal.

Quote from: Joshua Flynn on November 22, 2013, 01:43:39 pm

[Libraries are worse than hell]

I won't discuss with you about that, since success at using libraries depends a lot on personal experience and motivation. What I can say however is that FFT is a very often used functionality (not only in scientific field, but by virtually all media-related applications, especially music players), so you can be optimistic that there are well-tested and optimized implementations out there. But you seem to have made your choice already without even looking at a single library, that's a pity

Quote from: Joshua Flynn on November 22, 2013, 01:43:39 pm

same justification used for trying to convince me to use std::string

No idea what you mean, but the reason "it's heavy" on it's own -- without an situation where this actually affects performance -- doesn't justify a massive additional effort. At least not at a point where you want to be productive and don't have the time for experiments.

Joshua Flynn · « **Reply #12 on:** November 23, 2013, 06:04:16 pm »

Quote from: Nexus on November 22, 2013, 10:09:37 pm

You use the compiler, yet you don't know how it works internally. You use SFML, although you don't know the libraries behind it and the interaction with the hardware. This concept is abstraction: Using things as a black box without caring about the implementation details. It's the same with FFT, you won't need to compute the integral or sum yourself. But you have to understand the different signal representations, in order to know what input and output mean.

Ah, to the contrary! In the end we're forced to learn about the things we're using one way or another. I'm aware of how compilers convert between C's high-level language into assembly code, and how different architecture has different machine commands and how one program won't work on another, because ignorance in coding is surely not bliss (not that knowledge thereforth brings much happiness: just better preparation).

Big endian, little endian. Subjects I'd gladly prefer to not have to deal with or remain ignorant of, but they form fundamental to the innerworkings of a machine. And likewise, if there's a problem with the FFT, how does one propose I diagnose it?

Copy-paste programming very rarely works. Lots of little cogs have to be internally shifted before things run together smoothly. If I feed in my noise, I have to know what to do with the output.

Frequency spectrum, straight forward to understand, yes!

But this mathematical function encoded into those funny symbols akin to another language: Chinese room problem. I don't understand them. I'd have to learn what each symbol means, even though I don't know their names or where to start. Ever tried to find the name of a song to which you don't know the lyrics or band name to? It's like that.

Quote from: Nexus on November 22, 2013, 10:09:37 pm

As soon as your signal contains the slightest bit of noise, min/max (and thus mid-point) become effectively useless. These metrics only take into account extreme values, which is not a very good representation of the signal.

Not true! Tis only supposed to be an average (you can't work out the 'pitch' of two competing noises, as it's a harmony and you're recording input wrong). Background noise should register less than the peak (noted by the max, which denotes most dominant noise source), and by approximating a fuzzy-logic area for the peak, you know when you've cross the main crest between a given frequency before searching for the mid-point.

Maybe it's slightly inaccurate. But if I wanted accurate, I'd not be a cheapskate and go out and buy the real thing. Or maybe I would become a music maestro and learn pitch-perfect hearing. But I'm musically and therefore pitch oblivious. I just want a tool that takes my voice, extrapolates what pitch it thinks I'm singing so then I can find the right note and manually script the music to each note, maybe in series, to form a song.

Quote from: Nexus on November 22, 2013, 10:09:37 pm

I won't discuss with you about that, since success at using libraries depends a lot on personal experience and motivation. What I can say however is that FFT is a very often used functionality (not only in scientific field, but by virtually all media-related applications, especially music players), so you can be optimistic that there are well-tested and optimized implementations out there. But you seem to have made your choice already without even looking at a single library, that's a pity

Well, you'd think graphics, which is used in every piece of software, would have a straight-forward library for it. But alas, here we are at SFML, what is in effect an Indie development project, because other libraries fail to meet the tasks: licensing (main reason I use SFML is Laurent's permittance for commericialisation, although if I ever get lucky he can expect recompense for his efforts), implementation, or simply obtuse coding language.

But it doesn't. DirectX and it's copyright infringement rage-quit. OpenGL and the license-stranglehold minefield some libraries use or bad implementations.

FFT is no different. I'm lucky in that I was shown SFML and, later, Code::Blocks (the latter of which includes things like posix threads and cross-platform directory headers installed along with the IDE, both extremely useful).

Licensing and installing libraries installs dread and fear in a way it shouldn't (I've wondered why for years why people accept status quo of difficulty adding in libraries and haven't written programs that 'auto-configure' IDEs, compilers etc and the like to get them to work). Such that losing the entire setup IDE is the main fear and not so much the codebase itself.

I was hoping to make a Code::Blocks only OS of Linux on a LiveCD that could be duplicated (with stuff like SFML etc added) so that if the IDE was ever lost, I could just run another copy of it. That way I could boot the IDE, and thus all libraries, from any machine, anywhere, ever.

(Custom LiveCDs however require tons of space just to modify, and a lot of time. Sigh.)

Why couldn't adding libraries be as simple as 'sudo apt-get SFML' or something for an IDE?

Author Topic: Question on how audio is stored in SoundBuffer (Read 8349 times)

Joshua Flynn

Question on how audio is stored in SoundBuffer

Tuffywub

Re: Question on how audio is stored in SoundBuffer

Joshua Flynn

Re: Question on how audio is stored in SoundBuffer

wintertime

Re: Question on how audio is stored in SoundBuffer

Nexus

Re: Question on how audio is stored in SoundBuffer

Joshua Flynn

Re: Question on how audio is stored in SoundBuffer

Nexus

Re: Question on how audio is stored in SoundBuffer

Joshua Flynn

Re: Question on how audio is stored in SoundBuffer

Nexus

Re: Question on how audio is stored in SoundBuffer

Joshua Flynn

Re: Question on how audio is stored in SoundBuffer

zsbzsb

Re: Question on how audio is stored in SoundBuffer

Nexus

Re: Question on how audio is stored in SoundBuffer

Joshua Flynn

Re: Question on how audio is stored in SoundBuffer