Welcome, Guest. Please login or register. Did you miss your activation email?

Author Topic: Minor bug: U+FFFF not encoded properly in UTF-16  (Read 6230 times)

0 Members and 1 Guest are viewing this topic.

deadc0der

  • Newbie
  • *
  • Posts: 3
    • View Profile
    • Email
Minor bug: U+FFFF not encoded properly in UTF-16
« on: October 29, 2015, 12:17:57 am »
Hi, I think I found a minor bug. When using sf::Utf16 t encode the Unicode character U+FFFF, it is incorrectly encoded as a surrogate pair, while it should be encoded as a single code unit. For confirmation, RFC 2781, which specifies the UTF-16 encoding, specifies in section 2.1:

Quote
If U < 0x10000, encode U as a 16-bit unsigned integer and terminate.

U+FFFF (0xFFFF) fulfills this requirement, so it should be encoded as a single code unit. However, include/SFML/System/Utf.inl at line 325 encodes a code point into a single code unit only if it is strictly inferior to 0xFFFF, whereas it should be inferior or equal in order to account for the edge-case that is U+FFFF.

Now, since U+FFFF is barely ever used by applications, this is not going to end the world, but I thought it might be good to report it nonetheless.

NOTE: This is my first post on this forum, so I apologize if this is not the right place to post this.

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
Re: Minor bug: U+FFFF not encoded properly in UTF-16
« Reply #1 on: October 29, 2015, 08:32:00 am »
Thanks for reporting it.

The mistake is even more obvious if we look at what happens when we encode the value as a surrogate pair: we start by subtracting 0x10000 from it. So it's clear that surrogate pairs should start at 0x10000 and not 0xFFFF.

Should we create a PR for this '=' to add? ;D
Laurent Gomila - SFML developer

Nexus

  • SFML Team
  • Hero Member
  • *****
  • Posts: 6287
  • Thor Developer
    • View Profile
    • Bromeon
Re: Minor bug: U+FFFF not encoded properly in UTF-16
« Reply #2 on: October 29, 2015, 08:44:37 am »
Should we create a PR for this '=' to add? ;D
I'm also missing the time where such things could be fixed directly, without bureaucracy and delay ;)

I don't see any point in long reviewing of this, either.
Zloxx II: action platformer
Thor Library: particle systems, animations, dot products, ...
SFML Game Development:

eXpl0it3r

  • SFML Team
  • Hero Member
  • *****
  • Posts: 11033
    • View Profile
    • development blog
    • Email
Re: Minor bug: U+FFFF not encoded properly in UTF-16
« Reply #3 on: October 29, 2015, 09:24:21 am »
I'm also missing the time where such things could be fixed directly, without bureaucracy and delay ;)
There's not really much bureaucracy for small changes and if there's bureaucracy it's rarely about getting something into master and more about styling, API or other discussions.
And the delay doesn't really matter. Whether something gets into master today or within a week or so mostly doesn't matter.

Also there's not more work to be done then if you directly pushed to master. Update master, switch branch, apply changes, commit, push to branch, create PR.
Official FAQ: https://www.sfml-dev.org/faq.php
Official Discord Server: https://discord.gg/nr4X7Fh
——————————————————————
Dev Blog: https://duerrenberger.dev/blog/

Tank

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1486
    • View Profile
    • Blog
    • Email
Re: Minor bug: U+FFFF not encoded properly in UTF-16
« Reply #4 on: October 29, 2015, 09:36:39 am »
That "bureaucracy" helps maintaining a consistent workflow and minimizes errors. Whether it's a small or big change, errors can happen everywhere. Let's just avoid those "exceptions", otherwise we would have to find out the difference between a small/trivial and big change.

deadc0der

  • Newbie
  • *
  • Posts: 3
    • View Profile
    • Email
Re: Minor bug: U+FFFF not encoded properly in UTF-16
« Reply #5 on: October 29, 2015, 02:29:49 pm »
Should we create a PR for this '=' to add? ;D

That's probably overkill, a single commit push would do :)
« Last Edit: October 29, 2015, 11:55:53 pm by deadc0der »

Hiura

  • SFML Team
  • Hero Member
  • *****
  • Posts: 4321
    • View Profile
    • Email
Re: Minor bug: U+FFFF not encoded properly in UTF-16
« Reply #6 on: October 30, 2015, 08:27:52 am »
https://github.com/SFML/SFML/pull/997

Actually it doesn't take much more time than a direct commit to master: for simple task like that you can create a branch/commit/PR directly in your browser in less than 2 minutes.  ;)

@deadcOder: `commit` was the right word  :P

SFML / OS X developer

 

anything