Author Topic: Minor bug: U+FFFF not encoded properly in UTF-16 (Read 6203 times)

deadc0der · « **on:** October 29, 2015, 12:17:57 am »

Hi, I think I found a minor bug. When using sf::Utf16 t encode the Unicode character U+FFFF, it is incorrectly encoded as a surrogate pair, while it should be encoded as a single code unit. For confirmation, RFC 2781, which specifies the UTF-16 encoding, specifies in section 2.1:

Quote

If U < 0x10000, encode U as a 16-bit unsigned integer and terminate.

U+FFFF (0xFFFF) fulfills this requirement, so it should be encoded as a single code unit. However, include/SFML/System/Utf.inl at line 325 encodes a code point into a single code unit only if it is strictly inferior to 0xFFFF, whereas it should be inferior or equal in order to account for the edge-case that is U+FFFF.

Now, since U+FFFF is barely ever used by applications, this is not going to end the world, but I thought it might be good to report it nonetheless.

NOTE: This is my first post on this forum, so I apologize if this is not the right place to post this.

Laurent · « **Reply #1 on:** October 29, 2015, 08:32:00 am »

Thanks for reporting it.

The mistake is even more obvious if we look at what happens when we encode the value as a surrogate pair: we start by subtracting 0x10000 from it. So it's clear that surrogate pairs should start at 0x10000 and not 0xFFFF.

Should we create a PR for this '=' to add?

Nexus · « **Reply #2 on:** October 29, 2015, 08:44:37 am »

Quote from: Laurent on October 29, 2015, 08:32:00 am

Should we create a PR for this '=' to add?

I'm also missing the time where such things could be fixed directly, without bureaucracy and delay

I don't see any point in long reviewing of this, either.

eXpl0it3r · « **Reply #3 on:** October 29, 2015, 09:24:21 am »

Quote from: Nexus on October 29, 2015, 08:44:37 am

I'm also missing the time where such things could be fixed directly, without bureaucracy and delay

There's not really much bureaucracy for small changes and if there's bureaucracy it's rarely about getting something into master and more about styling, API or other discussions.
And the delay doesn't really matter. Whether something gets into master today or within a week or so mostly doesn't matter.

Also there's not more work to be done then if you directly pushed to master. Update master, switch branch, apply changes, commit, push to branch, create PR.

Tank · « **Reply #4 on:** October 29, 2015, 09:36:39 am »

That "bureaucracy" helps maintaining a consistent workflow and minimizes errors. Whether it's a small or big change, errors can happen everywhere. Let's just avoid those "exceptions", otherwise we would have to find out the difference between a small/trivial and big change.

deadc0der · « **Reply #5 on:** October 29, 2015, 02:29:49 pm »

Quote from: Laurent on October 29, 2015, 08:32:00 am

Should we create a PR for this '=' to add?

That's probably overkill, a single ~~commit~~ push would do

Hiura · « **Reply #6 on:** October 30, 2015, 08:27:52 am »

https://github.com/SFML/SFML/pull/997

Actually it doesn't take much more time than a direct commit to master: for simple task like that you can create a branch/commit/PR directly in your browser in less than 2 minutes.

@deadcOder: `commit` was the right word

Author Topic: Minor bug: U+FFFF not encoded properly in UTF-16 (Read 6203 times)

deadc0der

Minor bug: U+FFFF not encoded properly in UTF-16

Laurent

Re: Minor bug: U+FFFF not encoded properly in UTF-16

Nexus

Re: Minor bug: U+FFFF not encoded properly in UTF-16

eXpl0it3r

Re: Minor bug: U+FFFF not encoded properly in UTF-16

Tank

Re: Minor bug: U+FFFF not encoded properly in UTF-16

deadc0der

Re: Minor bug: U+FFFF not encoded properly in UTF-16

Hiura

Re: Minor bug: U+FFFF not encoded properly in UTF-16