Welcome, Guest. Please login or register. Did you miss your activation email?

Author Topic: On Windows, WM_CHAR is being used instead of WM_UNICHAR  (Read 3952 times)

0 Members and 1 Guest are viewing this topic.

nullpointer

  • Newbie
  • *
  • Posts: 8
    • View Profile
On Windows, WM_CHAR is being used instead of WM_UNICHAR
« on: March 13, 2013, 04:30:06 am »
I noticed in the 2.0 source code, in WindowImplWin32.cpp line 515, WM_CHAR is being used to set event.text.unicode. The SFML documentation says that event.text.unicode is supposed to be UTF-32 (ie. the library user always gets a plain, unencoded code point). However, the MSDN documentation for WM_CHAR says that the application receives UTF-16, and notes that "The WM_UNICHAR message is the same as WM_CHAR, except it uses UTF-32."

Shouldn't SFML be using WM_UNICHAR?

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32504
    • View Profile
    • SFML's website
    • Email
Re: On Windows, WM_CHAR is being used instead of WM_UNICHAR
« Reply #1 on: March 13, 2013, 08:03:17 am »
WM_UNICHAR is only available since Windows XP.

And for all codepoints in the Basic Multilingual Plane (in range [0 .. 65535]), UTF-16 == UTF-32. So it could be a problem for codepoints outside the BMP, which require two "characters" and therefore two WM_CHAR messages. I haven't tested it though.

But characters outside the BMP are rarely used (almost all characters of today's languages are in the BMP), and I don't even know how they would be "typed" -- probably with some kind of virtual keyboard.

So... let's say that the day someone complains about SFML not being able to process non-BMP characters on Windows, I'll work on a fix (which will be to convert UTF-16 to UTF-32, not to use WM_UNICHAR) ;)
Laurent Gomila - SFML developer

nullpointer

  • Newbie
  • *
  • Posts: 8
    • View Profile
Re: On Windows, WM_CHAR is being used instead of WM_UNICHAR
« Reply #2 on: March 13, 2013, 08:26:14 am »
Thanks for the reply. I recently read up on real-world Unicode usage in software after wanting to make sure my software would have the broadest potential audience. Apparently supporting code points outside of the BMP is important to some people, particularly those who use Asian languages on their computers.

Obviously the cost/benefit decision is up to you, but my research led me to believe that code points outside the BMP may not be as obscure as it might seem to a speaker of Western languages. And technically, assuming UTF-16 is UTF-32 is a bug. Also, converting from UTF-16 to UTF-32 is like 3 lines of code, I think.

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32504
    • View Profile
    • SFML's website
    • Email
Re: On Windows, WM_CHAR is being used instead of WM_UNICHAR
« Reply #3 on: March 13, 2013, 08:45:43 am »
Quote
Apparently supporting code points outside of the BMP is important to some people, particularly those who use Asian languages on their computers.
As far as I know, asian languages are in the BMP. The SMP rather contains dead languages and things like emoticons.

Quote
And technically, assuming UTF-16 is UTF-32 is a bug
Ok, I should mention that only the BMP is supported.

Quote
Also, converting from UTF-16 to UTF-32 is like 3 lines of code, I think
I know. I have written all these conversions in the sf::Utf classes.
And although it only requires 3 lines of code, I wouldn't implement it without deep testing. I don't trust the Win32 API, and prefer to check its behaviour ;)
« Last Edit: March 13, 2013, 09:23:08 am by Laurent »
Laurent Gomila - SFML developer

nullpointer

  • Newbie
  • *
  • Posts: 8
    • View Profile
Re: On Windows, WM_CHAR is being used instead of WM_UNICHAR
« Reply #4 on: March 13, 2013, 09:18:21 am »
As far as I know, asian languages are in the BMP. The SMP rather contains dead languages and things like emoticons.

That is not true. For instance, the CJK Unified Ideographs Extension B contains many characters widely used in Hong Kong, and many of the characters used for Chinese and Japanese names are not in the BMP. I did my research, and realized if I didn't pay attention to these facts, people would hate me.
« Last Edit: March 13, 2013, 09:23:16 am by Laurent »

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32504
    • View Profile
    • SFML's website
    • Email
Re: On Windows, WM_CHAR is being used instead of WM_UNICHAR
« Reply #5 on: March 13, 2013, 09:27:34 am »
Thanks for the details. You're right, these things are often ignored by us, european/american developers, and it's not the right attitude. You should add an issue in the tracker for that, and I'll try to get this fixed soon :)
Laurent Gomila - SFML developer

nullpointer

  • Newbie
  • *
  • Posts: 8
    • View Profile
Re: On Windows, WM_CHAR is being used instead of WM_UNICHAR
« Reply #6 on: March 13, 2013, 10:00:18 am »
Cool! Issue tracker post! ;D

I also think opening files in SFML with Unicode paths will not work on Windows, but that's an issue that I will look into on another day, and I will forgive you for not getting enthusiastic about it. :P Windows makes handling Unicode paths in a cross-platform way really evil (UTF-8 works by default for file paths on every other OS, I believe, but the Windows platform does not support UTF-8 and provides no way to use Unicode paths in a way that conforms with the C/C++ standard).

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32504
    • View Profile
    • SFML's website
    • Email
Re: On Windows, WM_CHAR is being used instead of WM_UNICHAR
« Reply #7 on: March 13, 2013, 10:29:53 am »
Yep... Unicode filenames are a problem that will be harder to solve. I use third-party libraries to load files, and I have no idea if they support it.
Laurent Gomila - SFML developer