Author Topic: SoundStream Latency (Read 21057 times)

Zcool31 · « **on:** January 09, 2011, 05:09:31 pm »

I'm trying to write an audio synthesizer using SoundStream to continuously generate and output sound. I have tested most of my code to a great extent and I know that I'm able to generate audio more than quickly enough to keep providing the soundstream with more data when it needs it.

The issue that I'm encountering is that with small buffer sizes (i fill each chunk with only 1000 samples) there is significant stuttering. With larger buffers (5000 - 10000 samples) there is no more stuttering but the issue with this is the very large latency between when I generate audio and when it's output.

Knowing that internally, soundstream copies the data right after OnGetData returns, and it keeps 3 buffers so it can queue two to play while OnGetData fills the third, I figured that making each buffer small would help make the copying faster and decrease the latency.

TL:DR;
Is there another way to output sound in real time that has lower latency? Am I doing something wrong?

Here's some sample code:

Code: [Select]


class Generator{
public:
	void SetFrequency(double freq);
	double GetSample();
	/* Outputs sine wave samples at any frequency using
	 * pre-computed lookup table of a very high sample rate
	 * sine wave. Also takes care of incrementing position
	 * and looping around. */
}

class MySynth : public sf::SoundStream{
public:
	MySynth(){
		myBufferSize = 4096;	//smallest buffer without stuttering
		myBuffer = new short[myBufferSize];
	}
private:
	bool OnGetData(sf::Chunk& chunk){
		for(size_t i=0; i<myBufferSize; ++i){
			myBuffer[i]=SHRT_MAX*myGen.GetSample();
		}
		chunk.nbSamples=myBufferSize;
		chunk.data=myBuffer;
		return true;
	}
	short* myBuffer;
	size_t myBufferSize;
	Generator myGen;
}

Zcool31 · « **Reply #1 on:** January 12, 2011, 04:52:10 pm »

It's been three days, and no answer. This is the one issue that's holding up my project.

Any help would be greatly appreciated.

Laurent · « **Reply #2 on:** January 12, 2011, 07:19:29 pm »

Sorry, I really have no time (working in my new appartment 10h/24) but I'll try to test your code as soon as possible.

Zcool31 · « **Reply #3 on:** January 12, 2011, 10:30:51 pm »

Sorry, Laurent, didn't mean to use up your free time. Would just appreciate a little help.

According to this article, the best way to implement a low latency synth is to have a small ringbuffer where you keep track of the current playing position and place new data just before it (replacing the samples that have just been played).

The most obvious solution to this problem is to have a looping sound with one soundbuffer, and to overwrite parts of this soundbuffer right after the sound plays them.

Unfortunately, SFML doesn't support this (I'm pretty sure raw OpenAL won't either).

My next option was SoundStream. However, this has the limitation that each chunk in OnGetData can't be too small. I can come up with a way to deal with this problem while retaining temporal resolution, but I can't eliminate the latency.

Here's a simple implementation of my Generator:

Code: [Select]


#include <math.h>

class Generator{
public:
	Generator() : phaseAccumulator(0), frequency(440){
		lookupTableSize=8820;
		lookupTable=new double[lookupTableSize];
		for(int i=0; i<lookupTableSize; ++i){
			double d=sin((2*M_PI*i)/lookupTableSize);
			lookupTable[i]=d;
		}
	}
	~Generator(){
		delete [] lookupTable;
	}
	void SetFrequency(double freq){
		frequency=freq;
	}
	double GetSample(){
		double position;
		//assume output is at 44100hz
		phaseAccumulator = modf(phaseAccumulator+=frequency/44100, &position);
		position = phaseAccumulator*lookupTableSize;
		//interpolate two nearest values in lookuptable
		size_t index = (size_t)floor(position);
		position -= index;
		return (1.0-position)*lookupTable[index] + position*lookupTable[(index+1)%lookupTableSize];
	}
private:
	double* lookupTable;
	size_t lookupTableSize;
	double sampleRate;
	double phaseAccumulator;
	double frequency;
}

It's a bit slower than just calling sin(t), but it has the benefit that you don't loose accuracy as t becomes large, and you can easily store any waveform you want in the lookup table. However, I'm sure you agree that this is easily fast enough to generate 44100 samples per second.

You can use MySynth in my original post as is with no changes. If there are any compilation errors, these are probably spelling mistakes on my part and should be obvious.

Any help is greatly appreciated.

Laurent · « **Reply #4 on:** January 12, 2011, 11:08:15 pm »

Can I have your actual code please? The one you gave is full of errors.

Laurent · « **Reply #5 on:** January 12, 2011, 11:14:37 pm »

Ok never mind, the compiler errors were easy to fix, and the runtime one was fixed by calling Initialize().

The reason SFML is that bad with small buffers is simple (and stupid): in the streaming loop, there's a 100ms sleep to leave some CPU for other threads. It has been reduced to 10ms in SFML 2, which makes it possible to use buffers of 1500 samples (at least on my computer) with perfect output.

You can change it in SFML 1.6 and recompile, if you don't want to switch to SFML 2: it's at line 262 of src/SFML/Audio/SoundStream.cpp.

Zcool31 · « **Reply #6 on:** January 12, 2011, 11:58:48 pm »

Laurent, thank you so much! You're brilliant. I saw that line when I originally looked through your code, but wasn't quite sure what it was about.

If I read your code correctly, you simply have a loop that runs every 100ms and checks if any of the buffers have been used up, and refills the used buffers with new data and re-queues them. You need to put a delay in this loop so it doesn't consume the entire CPU.

Assuming the above is correct, I would like to ask you this question: Instead of constantly looping and checking, could you not instead set up some sort of interrupt or callback that happens when OpenAL finishes with one of these buffers? If I understand correctly, something like this might be platform dependent or impossible.

Alternatively, could you not create a thread with low priority and remove the delay? Again, this might be platform dependent or impossible, even though I'm sure almost every platform has a scheduler that respects priorities.

Finally, and this is the most novel solution I could come up with, could you not choose a delay for the loop that is affected by the size of the chunks from OnGetData? A good delay, in my opinion, is half the length of the most recently filled buffer (the most recent chunk). I believe this is the most optimal approach as it has several benefits:
The delay in the loop grows with the size of the chunks provided by the user - You don't need to check for processed buffers 10 times per second if you know each buffer is one second long. Alternatively, if chunks are small, you will check frequently enough to get new data in time.
Your CPU usage then becomes inversely proportional to the length of the chunks. Long chunks mean long delays but low CPU usage. Short chunks mean short delays and higher CPU usage. Either way, you make sure that only as much CPU is used as is necessary.
Naturally, this approach might fail if the size of the chunks varies extremely or if it takes longer to fill a buffer than it does to play it. However, in such cases little can be done.

I would be grateful if you could evaluate these ideas and tell me if they are good or bad, and why they would or wouldn't work.

I ask you these questions because you must have amassed a significant amount of knowledge in creating SFML and would know how to correctly answer them.

Zcool31 · « **Reply #7 on:** January 13, 2011, 02:19:25 am »

Since posing my last question, I've gone out and done some research on my own. It turns out that OpenAL is not very well suited to outputting audio that's generated on the fly.

However, I have come accross an open source cross platform library for audio output that is called PortAudio. It is a more low level library that does not have support for features like 3D positional audio with multiple sources. However, it does provide a simple and direct means to output raw data to your sound card. It uses a callback paradigm: a function is called every time it needs more data.

For me, the greatest benefit of this library is its relatively fixed latency and insensitivity to buffer sizes. It can work with buffers ranging from 32 to 7000 samples and has a fixed latency that is dependent only on the audio interface you are using.

The only downside of this library is that it does not provide pre-compiled binaries for supported platforms, I had to compile from source. However, this is fairly straightforward if you're using GCC-based tools.

With all that being said, I would still like to hear the answer to my previous question.

Other than that, feel free to mark this topic as SOLVED.

Thank you very much for your help!

Laurent · « **Reply #8 on:** January 13, 2011, 07:40:48 am »

Quote

If I read your code correctly, you simply have a loop that runs every 100ms and checks if any of the buffers have been used up, and refills the used buffers with new data and re-queues them. You need to put a delay in this loop so it doesn't consume the entire CPU.

This is correct.

Quote

Assuming the above is correct, I would like to ask you this question: Instead of constantly looping and checking, could you not instead set up some sort of interrupt or callback that happens when OpenAL finishes with one of these buffers? If I understand correctly, something like this might be platform dependent or impossible.

This is indeed impossible, that's the first thing that I wanted to do

Quote

Alternatively, could you not create a thread with low priority and remove the delay? Again, this might be platform dependent or impossible, even though I'm sure almost every platform has a scheduler that respects priorities.

I'm not sure if a low-priority thread running at 100% is really lightweight. On the other hand, what happens if you have other threads (graphics, input, AI, ...) running at 100% in parallel? Will the stream's thread have enough time allocated by the scheduler? This seems too random and unreliable.

Quote

Finally, and this is the most novel solution I could come up with, could you not choose a delay for the loop that is affected by the size of the chunks from OnGetData? A good delay, in my opinion, is half the length of the most recently filled buffer (the most recent chunk).

I thought about such a solution yesterday. But it doesn't work. The scheduler is not as accurate as we may want (a time slice may last no less than 20ms), so what could happen with this approach is that the loop iteration runs slightly before the last buffer is finished, so it doesn't fill a new one, but the next iteration will not happen until a long time (500ms if we have 1s buffers), which will create a huge gap in the sound output.

Quote

It turns out that OpenAL is not very well suited to outputting audio that's generated on the fly.

OpenAL is very low-level, in fact the entire streaming algorithm is written in SFML. So I think we can always improve it to make it close to perfect.

Quote

However, I have come accross an open source cross platform library for audio output that is called PortAudio. It is a more low level library that does not have support for features like 3D positional audio with multiple sources. However, it does provide a simple and direct means to output raw data to your sound card. It uses a callback paradigm: a function is called every time it needs more data.

I know this library (I think I know every single low-level audio library :lol:), and unfortunately it misses standard features like spacialization so I can't replace OpenAL with it. But what I could do is to have a look at their streaming code

l0calh05t · « **Reply #9 on:** January 13, 2011, 09:53:13 am »

Quote from: "Laurent"

OpenAL is very low-level, in fact the entire streaming algorithm is written in SFML. So I think we can always improve it to make it close to perfect.

Although this is true, OpenAL, unlike PortAudio is NOT designed for low latency. PortAudio allows you to use ASIO drivers, which allow for far lower latencies (down to 32 samples with my RME soundcard, for example)

Zcool31 · « **Reply #10 on:** January 13, 2011, 05:58:02 pm »

Quote from: "Laurent"

Quote
If I read your code correctly, you simply have a loop that runs every 100ms and checks if any of the buffers have been used up, and refills the used buffers with new data and re-queues them. You need to put a delay in this loop so it doesn't consume the entire CPU.

This is correct.

Quote
Assuming the above is correct, I would like to ask you this question: Instead of constantly looping and checking, could you not instead set up some sort of interrupt or callback that happens when OpenAL finishes with one of these buffers? If I understand correctly, something like this might be platform dependent or impossible.

This is indeed impossible, that's the first thing that I wanted to do

Quote
Alternatively, could you not create a thread with low priority and remove the delay? Again, this might be platform dependent or impossible, even though I'm sure almost every platform has a scheduler that respects priorities.

I'm not sure if a low-priority thread running at 100% is really lightweight. On the other hand, what happens if you have other threads (graphics, input, AI, ...) running at 100% in parallel? Will the stream's thread have enough time allocated by the scheduler? This seems too random and unreliable.

Quote
Finally, and this is the most novel solution I could come up with, could you not choose a delay for the loop that is affected by the size of the chunks from OnGetData? A good delay, in my opinion, is half the length of the most recently filled buffer (the most recent chunk).

I thought about such a solution yesterday. But it doesn't work. The scheduler is not as accurate as we may want (a time slice may last no less than 20ms), so what could happen with this approach is that the loop iteration runs slightly before the last buffer is finished, so it doesn't fill a new one, but the next iteration will not happen until a long time (500ms if we have 1s buffers), which will create a huge gap in the sound output.

I considered that as well. However, is it not true that you have three buffers queued to play, and you replace one as soon as it's been processed by OpenAL? In that case, if we have 3 one-second buffers, and we sleep past when one buffer finishes, wouldn't we still be not quite halfway done with the second buffer, and have a third untouched buffer waiting to be played?

Quote

Quote
It turns out that OpenAL is not very well suited to outputting audio that's generated on the fly.

OpenAL is very low-level, in fact the entire streaming algorithm is written in SFML. So I think we can always improve it to make it close to perfect.

Quote
However, I have come accross an open source cross platform library for audio output that is called PortAudio. It is a more low level library that does not have support for features like 3D positional audio with multiple sources. However, it does provide a simple and direct means to output raw data to your sound card. It uses a callback paradigm: a function is called every time it needs more data.

I know this library (I think I know every single low-level audio library :lol:), and unfortunately it misses standard features like spacialization so I can't replace OpenAL with it. But what I could do is to have a look at their streaming code

I would never ask you to replace OpenAL in SFML, nor would I want something like that to ever happen. OpenAL is the superior library in terms of ease of use and features.

However, comparing OpenAL to PortAudio is like comparing OpenGL to writing directly to the framebuffer. OpenAL, like OpenGL, is faster most of the time, provides lots of advanced features, and is easier to use in most cases. However, there are times when you simply want to write the pixel values yourself, just like I simply want to write the samples myself.

In any case, Laurent, I really appreciate all of your replies. You've been very helpful.

Laurent · « **Reply #11 on:** January 13, 2011, 10:22:48 pm »

Quote

Although this is true, OpenAL, unlike PortAudio is NOT designed for low latency. PortAudio allows you to use ASIO drivers, which allow for far lower latencies (down to 32 samples with my RME soundcard, for example)

Ok I see. But you're talking about latency in general, right? We're not in the specific context of streaming with small buffers?

Quote

I considered that as well. However, is it not true that you have three buffers queued to play, and you replace one as soon as it's been processed by OpenAL? In that case, if we have 3 one-second buffers, and we sleep past when one buffer finishes, wouldn't we still be not quite halfway done with the second buffer, and have a third untouched buffer waiting to be played?

Ah, this is true. In fact I guess I would have to play with this idea to see what happens

l0calh05t · « **Reply #12 on:** January 13, 2011, 10:36:28 pm »

Quote from: "Laurent"

Quote
Although this is true, OpenAL, unlike PortAudio is NOT designed for low latency. PortAudio allows you to use ASIO drivers, which allow for far lower latencies (down to 32 samples with my RME soundcard, for example)

Ok I see. But you're talking about latency in general, right? We're not in the specific context of streaming with small buffers?

Buffer size in fact determines (minimum) latency. Say you are using 3 2048 sample buffers in OpenAL, then this alone creates 128 ms of latency (at 48kHz sampling rate). That is in addition to any internal buffers used by the sound drivers. PortAudio (when using ASIO) writes pretty much directly to the sound card's output buffer, and those 32 samples result 2/3 of a millisecond of latency (again at 48k).

For synths, and other real-time audio programs this is pretty much a necessity, for games, not so much, although it really depends on the game type, a music-based game might very well benefit from <10ms latency.

Laurent · « **Reply #13 on:** January 13, 2011, 10:41:44 pm »

Quote

Say you are using 3 2048 sample buffers in OpenAL, then this alone creates 128 ms of latency (at 48kHz sampling rate)

Why? I undestand that OpenAL has more latency than PortAudio/ASIO because it doesn't write directly to the sound card, but why do you directly translate these 2048 samples to latency? 128ms is the duration of these 3x2048 samples, I don't see how you end up with latency.

l0calh05t · « **Reply #14 on:** January 13, 2011, 10:47:32 pm »

Quote from: "Laurent"

Why? I undestand that OpenAL has more latency than PortAudio/ASIO because it doesn't write directly to the sound card, but why do you directly translate these 2048 samples to latency? 128ms is the duration of these 3x2048 samples, I don't see how you end up with latency.

To start playing, a buffer first needs to be filled, that's the whole reason, and also why latency can never be lower than the sound cards minimum buffer size. So a 2048 sample buffer directly results 42ms latency. Using multiple buffers is basically the same as one big buffer in regards to latency, as before the third buffer starts playing, the first needs to be filled (42.6ms), the first needs to play while the second is filled (42.6ms) and then the second needs to play, before the third starts (42.6ms) for a total of 128ms.

It's just the same thing with graphics: When using triple buffering, you will have 1 extra frame (compared to double buffering) of latency (~17ms)