Welcome, Guest. Please login or register. Did you miss your activation email?

Author Topic: Sanity Check: Sound documentation & request for doc example of spatial audio  (Read 7774 times)

0 Members and 1 Guest are viewing this topic.

SuperApe

  • Newbie
  • *
  • Posts: 17
    • View Profile
The "help" I'm looking for is really a request to double-check I used SFML as it was intended to be used for my case, and then the request would be to make this clearer in the documentation in one or two places, if so.

I post this as I have just gotten all to work as expected / intended, but the route to get here was painful, so I'm hoping to see changes to documentation that help others in a similar situation.

The subject is using the static Listener and setting properties on Sound object instances to achieve spatial audio in a 2D environment, where the Listener is to match the View position, and sounds would therefore really only register as "to the left" or "to the right" in relation to the center of the view, with proper attenuation.

All audio used is mono, so spatial audio is possible. Some of the audio used in the project is not meant to be spatial (music, ui sfx) and so there are different configurations that were needed.

References used in this effort include the SoundSource doc https://www.sfml-dev.org/documentation/2.5.1/classsf_1_1SoundSource.php and static Listener doc https://www.sfml-dev.org/documentation/2.5.1/classsf_1_1Listener.php.

Additionally, the Spatialization tutorial https://www.sfml-dev.org/tutorials/2.5/audio-spatialization.php and then this forum thread https://en.sfml-dev.org/forums/index.php?topic=14566.0 helped to highlight where I was having problems understanding the API intent, so ultimately it helps to reference all these.

With all that said, and understanding that we're all careful, we (largely) read documentation before making assumptions, and we (largely) make mistakes from time to time, worthy of improvement, I humbly suggest we revisit how this kind of effort is documented. There are a couple of things to point out in terms of how the documentation presents the tools available.

The Spatialization tutorial says this:
Quote
(in General) By default, sounds and music are played at full volume in each speaker; they are not spatialized. ... (for Audio Sources) The main property is the position of the audio source. ... This position is absolute by default, but it can be relative to the listener if needed. ... This can be useful for sounds emitted by the listener itself (like a gun shot, or foot steps). It also has the interesting side-effect of disabling spatialization if you set the position of the audio source to (0, 0, 0). Non-spatialized sounds can be required in various situations: GUI sounds (clicks), ambient music, etc.
The concern here is simply that the sound results are defined in two states: absolute and relative (to listener); however if you read this carefully, you are hearing "default = absolute = full volume both speakers" and then "relative = emitted by listener itself = examples: foot steps, gui sounds, ambient music". Which, are actually describing the same results: full volume both speakers. This could be carefully cleaned up in the documentation to avoid confusion by clearly defining two states of the sound source: absolute and relative.

I suggest that without clearing it up, absolute sound by default apparently means the sound seems emitted at the listener location. But that conflicts with what the SoundSource class documentation has to say:
Quote
void sf::SoundSource::setRelativeToListener(bool relative)   
Make the sound's position relative to the listener or absolute.

Making a sound relative to the listener will ensure that it will always be played the same way regardless of the position of the listener. This can be useful for non-spatialized sounds, sounds that are produced by the listener, or sounds attached to it. The default value is false (position is absolute).

Parameters
relative   True to set the position relative, false to set it absolute
Here is where the solution to my particular setup fell into place, but only by realizing that the word "relative", as in, "this sound source position relative to the listener position", actually is used in an inverse meaning. Based on the explanation, rather than the word "relative", it was clear what was meant is the SoundSource position will seem as if it is following the Listener position, as if it is the same position as the Listener with no difference, no delta.

"Relative", I'm suggesting, actually infers a difference between two, with a delta that in this case would make sense for spatialization. In other words, if a sound is relative to the listener, the suggested meaning is that this enables spatialization between sound and listener. But in practice, the opposite is true.

So I'm suggesting a) the word "relative" was used in a misleading way (if not opposite), and b) the description of the results called for with relative = true matches what the Spatialization tutorial says is already default (absolute) behavior; in other words, this documentation again conflicts.

That was my first point, the use of the word "relative" in the method name, property name conflicts with the actual results. My suggestion is documentation can call this out more clearly, and help others in the future. (I'm not suggesting you change the API to fit the language, I know that's too much to ask at this point)

The second point has to do with the practical use for spatialization of audio in 2D, and without a clear example, the development was more painful than necessary, and a short example in the documentation would be excellent. So this calls for another sanity check: did I do this as the API expects I would, because it seemed as intuitive as I could get it.

Again, there is a brief mention in the Spatialization tutorial
Quote
If you have a 2D world you can just use the same Y value everywhere, usually 0.
So, to be honest, that threw me off for a while. Converting my moving view center position to (x, 0, y) for the 3D vector needed by the listener was not intuitive and did not yield expected results. And setting SourceSource locations to match this space was tried in many different attempts to follow this guidance. All this didn't turn out to be necessary in the end.

In fact, here is what worked:
    sf::Listener::setDirection(0.f,0.f,-1.f);
    sf::Listener::setUpVector(0.f,1.f,0.f);
    while (gameLoop) {
        sf::Listener::setPosition( myView.getCenter().x, myView.getCenter().y, 0.f );
    }
This configuration is what worked for a simple 2D view game, where the view travels with my local player in the x,y 2D space, and all sound emitting objects likewise live on the same x,y 2D space. So, when a sound object was set for spatialization the position was set as
    mySound.setRelativeToListener(false);
    mySound.setMinDistance(50.f);
    mySound.setAttenuation(0.1f);
    mySound.setPosition(sPos.x,sPos.y,0.f);
(Individual configurations for minDistance and Attenuation will vary) But, notice two things here: first, the space is simply using z=0, with x & y the same, and second, to make this spatialized, the sound is set to be absolute, not relative (opposite meaning of "relative to listener"). And, following this new understanding, the non-attenuated / non-spatialized sounds needed for this project are defined like this:
    mySound.setPosition(0.f,0.f,0.f);
    mySound.setRelativeToListener(true);
    mySound.setMinDistance(1.f);
    mySound.setAttenuation(1.f);
(while the position, minDistance and Attenuation are irrelevant if not spatialized, I set it this way for tidy-ness)

So, my second point is: This clear example, with a 1 to 1 relationship to the 2D visual space a simple project works in, would be so helpful to have in the documentation, and it would have saved a lot of time. Clearing up the "y=0" idea, as well, would be helpful, as that was not the intuitive way to go.

The results with the above example are as one would expect of a simple 2D audio space. Sounds from the left are heard on the left, etc. Note the direction of the Listener is pointing to negative Z. Up vector is positive Y. And all positions are using the 2D space with a Z of zero.

With all this, I'm hoping the two points are well taken in the spirit of making the documentation better for the next person.
  • All documentation surrounding relativity to listener ("isRelativeToListener", "setRelativeToListener", etc) needs clarification to ensure the reader understands the two states of the sound: absolute means it lives in a spatialized sound space, and the listener position will matter (even if default is origin and if you do nothing to the listener, it will play at full volume in both speakers), while relative actually means the sound will emit from the listener regardless of its position or the sound source position, and will always play at full volume in both speakers.
  • A clear example for a simple 2D spatialized audio setup would be extremely helpful. Translate sound positions to 3D sound space for listener as (x,y,0) and position listener likewise as (x,y,0). Set direction of listener to point toward negative Z. Set up vector for listener to positive Y.
I do hope that helps. And of course, if I've misunderstood any/all this, I think that also helpful to be corrected.

(love SFML btw, thanks!)

Hapax

  • Hero Member
  • *****
  • Posts: 3379
  • My number of posts is shown in hexadecimal.
    • View Profile
    • Links
With regards to the absolute/relative usage, the documentation seems to use it rather clearly.
I would like to clarify that "relative to listener" does not necessarily mean that it cannot be spacialized. If it's (0, 0) when relative to listener, this means that the absolute position of that sound is identical to the listener. However, if the sound's position is (10, 0) when relative to listener it will be "next to" the listener - 10 units away.
They aren't technically "emitted from the listener", their positions are just linked to the listener. This part is why "relative" is used.
Remember that, even if relative to listener, the sound's position alters where the sound comes from.



For slight clarity, in case someone is reading this and either of us have confused the issue slightly, I shall use an analogy:
First, it helps to think in 3D space for it make sense.

Imagine a car. It's running but it isn't moving; it's making a sound but it's position is not changing. This position is its absolute position.
Then there's you, the listener. You can hear this sound. However, you want to hear it better so you get closer. The car's absolute position has not changed but as you get closer, it gets louder.

Then, imagine you are carrying a phone. Your phone is playing music. You hold out you phone in front of you. The difference between the phone's position and yours is its relative position. When you walk (away from the car?) including turning, the phone stays in the same relative position to you, the listener, so it sounds - to you - the same.
If you then put your phone to your right ear, its relative position has changed; it's now just to the right of you. However, it's still relative to you because it stays there as you continue to walk.

If you then lay down on your left side with you phone still on your right ear, its relative position is still the same but the listener has changed orientation. The phone is still on your right but it's technically above you.

If you decide to put down the phone, its new absolute position is simply your (the listener's) position + its relative position.
If it is at that absolute position and not relative to the listener, it will sound the same until the listener moves away.



I'll check the 2D stuff to see if it does what is expected for me so we can see if we may have discrepancies in our implementation or if I agree with error in documentation. :)

EDIT: After an initial check, using x and z positions for sound and listener (while keeping y at 0) seems to work "okay" for 2D positioning.
It seems that z acts as actual depth though since it ducks in volume when "behind". So, with that said, the listener's orientation should be modified if x and z are to be used.
With the default orientation of the listener, the listener and sources should use x and y and keep z at zero. So yes, I agree that the documentation is either incorrect or unclear for this point.

It's also worth noting that with the listener on the same plane as the sound sources, a sound source at the listener's position, will be slightly louder regardless of any attenuation settings. The solution to this is to raise the listener away from the 2D plane where the sound sources originate (as with your head and the display). This (the listener's z position) should be a positive value otherwise all sounds would be behind and therefore quieter when near the listener.
« Last Edit: May 22, 2020, 03:18:19 am by Hapax »
Selba Ward -SFML drawables
Cheese Map -Drawable Layered Tile Map
Kairos -Timing Library
Grambol
 *Hapaxia Links*

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
Hapax is right about the relative mode, sound sources can still have a position but in this case their origin is the listener and not the origin of the "3D world". If it's not the case, then it's a huge bug.

I think there's a major misunderstanding about the 3D world coordinates here. Using Y as the up vector, and then XY for your sound sources, will produce sounds that can be on the left/right/top/bottom of the listener. Therefore it is suitable for a side-scrolling game.
The tutorial/documentation rather describes a game viewed from the top, with sounds being on the left/right/front/back of the listener, and in this case, the up vector should really be perpendicular to the plane where sound sources are moving.
The documentation should clearly describe these two examples (with code), as both are different and equally likely to occur in real life.
Laurent Gomila - SFML developer

Hapax

  • Hero Member
  • *****
  • Posts: 3379
  • My number of posts is shown in hexadecimal.
    • View Profile
    • Links
I think the confusion with the 2D plane in the tutorial is that it suggests using the XZ plane but doesn't mention that the listener should be oriented differently.
If it shows examples of both, as you suggested, it should show how the listener's orientation is linked with these setups.

Also, the XY plane can be used "more simply" since the up vector and direction is already prepared for this setup. So, just leave z as zero.

One thing I touched on above is that if the listener is located "on that 2D plane", the sound can be unreliable around the listener's position. If I was to hazard a guess at its cause, it could be floating point errors and the fact that it might be unsure if something is considered in front or behind.
That said, it seems to be more as if it treats the sound source as if it's not infinitely small and as if it comes from all directions around the listener.
As also mentioned above, the clear solution would be to move the listener "back" from the plane - as if looking at it instead of being in it. The minimum distance can be increased by the distance from the listener to the plane so that it still treats the point at the listener's position as full volume. Attenuation would need to be adjusted (stronger).
Selba Ward -SFML drawables
Cheese Map -Drawable Layered Tile Map
Kairos -Timing Library
Grambol
 *Hapaxia Links*

SuperApe

  • Newbie
  • *
  • Posts: 17
    • View Profile
All good information, and I appreciate the responses. The analogy makes sense, and helps to make clear the names used.

'Just for those keeping score at home: an even simpler (and kind of obvious) setup with regard to Listener orientation is _not_ up vector +Y and facing direction -Z, of course instead it would make a lot more sense to leave orientation aligned with the 'default' 2D world as up vector -Y and facing direction +Z. (that is, in our 2D default orientation using 'top' and 'height', negative Y is up; and so positive Z can remain forward)

(I realized this small point a little after posting, and sry I come from a 3D background with positive Y up and just forgot to make that point clearer until now)

SuperApe

  • Newbie
  • *
  • Posts: 17
    • View Profile
The "help" I'm looking for is really a request to double-check I used SFML as it was intended to be used for my case, and then the request would be to make this clearer in the documentation in one or two places, if so.

I post this as I have just gotten all to work as expected / intended, but the route to get here was painful, so I'm hoping to see changes to documentation that help others in a similar situation.

The subject is using the static Listener and setting properties on Sound object instances to achieve spatial audio in a 2D environment, where the Listener is to match the View position, and sounds would therefore really only register as "to the left" or "to the right" in relation to the center of the view, with proper attenuation.

All audio used is mono, so spatial audio is possible. Some of the audio used in the project is not meant to be spatial (music, ui sfx) and so there are different configurations that were needed.

References used in this effort include the SoundSource doc https://www.sfml-dev.org/documentation/2.5.1/classsf_1_1SoundSource.php and static Listener doc https://www.sfml-dev.org/documentation/2.5.1/classsf_1_1Listener.php.

Additionally, the Spatialization tutorial https://www.sfml-dev.org/tutorials/2.5/audio-spatialization.php and then this forum thread https://en.sfml-dev.org/forums/index.php?topic=14566.0 helped to highlight where I was having problems understanding the API intent, so ultimately it helps to reference all these.

With all that said, and understanding that we're all careful, we (largely) read documentation before making assumptions, and we (largely) make mistakes from time to time, worthy of improvement, I humbly suggest we revisit how this kind of effort is documented. There are a couple of things to point out in terms of how the documentation presents the tools available.

The Spatialization tutorial says this:
Quote
(in General) By default, sounds and music are played at full volume in each speaker; they are not spatialized. ... (for Audio Sources) The main property is the position of the audio source. ... This position is absolute by default, but it can be relative to the listener if needed. ... This can be useful for sounds emitted by the listener itself (like a gun shot, or foot steps). It also has the interesting side-effect of disabling spatialization if you set the position of the audio source to (0, 0, 0). Non-spatialized sounds can be required in various situations: GUI sounds (clicks), ambient music, etc.
The concern here is simply that the sound results are defined in two states: absolute and relative (to listener); however if you read this carefully, you are hearing "default = absolute = full volume both speakers" and then "relative = emitted by listener itself = examples: foot steps, gui sounds, ambient music". Which, are actually describing the same results: full volume both speakers. This could be carefully cleaned up in the documentation to avoid confusion by clearly defining two states of the sound source: absolute and relative.

I suggest that without clearing it up, absolute sound by default apparently means the sound seems emitted at the listener location. But that conflicts with what the SoundSource class documentation has to say:
Quote
void sf::SoundSource::setRelativeToListener(bool relative)   
Make the sound's position relative to the listener or absolute.

Making a sound relative to the listener will ensure that it will always be played the same way regardless of the position of the listener. This can be useful for non-spatialized sounds, sounds that are produced by the listener, or sounds attached to it. The default value is false (position is absolute).

Parameters
relative   True to set the position relative, false to set it absolute
Here is where the solution to my particular setup fell into place, but only by realizing that the word "relative", as in, "this sound source position relative to the listener position", actually is used in an inverse meaning. Based on the explanation, rather than the word "relative", it was clear what was meant is the SoundSource position will seem as if it is following the Listener position, as if it is the same position as the Listener with no difference, no delta.

"Relative", I'm suggesting, actually infers a difference between two, with a delta that in this case would make sense for spatialization. In other words, if a sound is relative to the listener, the suggested meaning is that this enables spatialization between sound and listener. But in practice, the opposite is true.

So I'm suggesting a) the word "relative" was used in a misleading way (if not opposite), and b) the description of the results called for with relative = true matches what the Spatialization tutorial says is already default (absolute) behavior; in other words, this documentation again conflicts.

That was my first point, the use of the word "relative" in the method name, property name conflicts with the actual results. My suggestion is documentation can call this out more clearly, and help others in the future. (I'm not suggesting you change the API to fit the language, I know that's too much to ask at this point)

The second point has to do with the practical use for spatialization of audio in 2D, and without a clear example, the development was more painful than necessary, and a short example in the documentation would be excellent. So this calls for another sanity check: did I do this as the API expects I would, because it seemed as intuitive as I could get it.

Again, there is a brief mention in the Spatialization tutorial
Quote
If you have a 2D world you can just use the same Y value everywhere, usually 0.
So, to be honest, that threw me off for a while. Converting my moving view center position to (x, 0, y) for the 3D vector needed by the listener was not intuitive and did not yield expected results. And setting SourceSource locations to match this space was tried in many different attempts to follow this guidance. All this didn't turn out to be necessary in the end.

In fact, here is what worked:
    sf::Listener::setDirection(0.f,0.f,-1.f);
    sf::Listener::setUpVector(0.f,1.f,0.f);
    while (gameLoop) {
        sf::Listener::setPosition( myView.getCenter().x, myView.getCenter().y, 0.f );
    }
This configuration is what worked for a simple 2D view game, where the view travels with my local player in the x,y 2D space, and all sound emitting objects likewise live on the same x,y 2D space. So, when a sound object was set for spatialization the position was set as
    mySound.setRelativeToListener(false);
    mySound.setMinDistance(50.f);
    mySound.setAttenuation(0.1f);
    mySound.setPosition(sPos.x,sPos.y,0.f);
(Individual configurations for minDistance and Attenuation will vary) But, notice two things here: first, the space is simply using z=0, with x & y the same, and second, to make this spatialized, the sound is set to be absolute, not relative (opposite meaning of "relative to listener"). And, following this new understanding, the non-attenuated / non-spatialized sounds needed for this project are defined like this:
    mySound.setPosition(0.f,0.f,0.f);
    mySound.setRelativeToListener(true);
    mySound.setMinDistance(1.f);
    mySound.setAttenuation(1.f);
(while the position, minDistance and Attenuation are irrelevant if not spatialized, I set it this way for tidy-ness)

So, my second point is: This clear example, with a 1 to 1 relationship to the 2D visual space a simple project works in, would be so helpful to have in the documentation, and it would have saved a lot of time. Clearing up the "y=0" idea, as well, would be helpful, as that was not the intuitive way to go.

The results with the above example are as one would expect of a simple 2D audio space. Sounds from the left are heard on the left, etc. Note the direction of the Listener is pointing to negative Z. Up vector is positive Y. And all positions are using the 2D space with a Z of zero.

With all this, I'm hoping the two points are well taken in the spirit of making the documentation better for the next person.
  • All documentation surrounding relativity to listener ("isRelativeToListener", "setRelativeToListener", etc) needs clarification to ensure the reader understands the two states of the sound: absolute means it lives in a spatialized sound space, and the listener position will matter (even if default is origin and if you do nothing to the listener, it will play at full volume in both speakers), while relative actually means the sound will emit from the listener regardless of its position or the sound source position, and will always play at full volume in both speakers.
  • A clear example for a simple 2D spatialized audio setup would be extremely helpful. Translate sound positions to 3D sound space for listener as (x,y,0) and position listener likewise as (x,y,0). Set direction of listener to point toward negative Z. Set up vector for listener to positive Y.
[EDIT:]Correction as noted in my reply below:
Quote
of course instead it would make a lot more sense to leave orientation aligned with the 'default' 2D world as up vector -Y and facing direction +Z

I do hope that helps. And of course, if I've misunderstood any/all this, I think that also helpful to be corrected.

(love SFML btw, thanks!)

 

anything