Yes, and that's a valid point. What I meant with "not clearly" is that both approaches have their advantages and disadvantages, it's not as if the compound blatantly outperformed separate methods. There are multiple design considerations which can have different weight depending on the library or context where they are used.
Yep this is true which is why it isn't the crucial part of the discussion for me. I could live with such an API even if i myself would have done it differently. The issue at hand with broken spatialisation is more important and as long as that is fixed, it would be enough for me.
You mentioned the skewed coordinate system. Out of curiosity, do you know what practical implications this has on sound? Will sounds simply appear in the "wrong" place?
The analogy to the human head sounds straightforward at first, because the front and top of our head point to different directions separated by 90°. But when you only consider the eyes that look at different heights without moving the head, a non-orthogonal system would make sense. I think the eyes are quite a good example: you cannot look straight up or down, so you'll never encounter the linear dependent vectors. Sorry for offtopic
Yeah, I am not sure how it works on the most detailed level, but as far as I've understood, the "listener" is mererely a coordinate system defined by these three vectors (at, up, right) where "right" is calculated to be orthogonal to the other two using a simple cross product. These vectors are analogous to how a normal coordinate system is defined by the vectors: (1,0,0) (0,1,0) (0,0,1). If you have done 3D rendering, these concepts will be familiar to you, it is how cameras work pretty much. Anyway, sound sources are calculated along this listener-space, which will make them be "oriented" after it. So if the space is skewed, the sounds' positions would end up skewed as well.
I am not sure if I managed to explain it well, but depending on how much the skew is (the skew is greater, the closer in angle these two vectors get) the sounds would be positioned a bit wrong. It would probably not be noticed in most cases since sound locations are hard to verify with 2 speakers, but they would be off none-theless. The equivalent in 3D graphics would be if you tried to render a straight square on a skewed 2D space, it would look something like this on the screen: