Welcome, Guest. Please login or register. Did you miss your activation email?

Author Topic: Unicode with cout command...  (Read 8650 times)

0 Members and 1 Guest are viewing this topic.

Pixel_Outlaw

  • Jr. Member
  • **
  • Posts: 50
    • View Profile
Unicode with cout command...
« on: June 20, 2011, 05:22:37 am »
Hello,

I understand that SFML provides some Unicode functionality.

How would one output some unicode characters to a console using the provided SFML functionality?

My project is terminal based for the Linux platform and I'm writing software to convert images to Unicode characters for image viewing in text only environments.

Help is appreciated thank you. :D

Disch

  • Full Member
  • ***
  • Posts: 220
    • View Profile
Unicode with cout command...
« Reply #1 on: June 20, 2011, 06:05:54 am »
It depends on the system you're running.  On windows, outputting Unicode characters to the console is unreasonably difficult.

If you're only concern is Linux, then you should be able to output utf-8 just fine to cout.  It worked for me on Ubuntu.  I can't imagine that other distros would be different.

Pixel_Outlaw

  • Jr. Member
  • **
  • Posts: 50
    • View Profile
Unicode with cout command...
« Reply #2 on: June 20, 2011, 06:11:53 am »
This is indeed for Linux. I'm using the GCC compiler with Code::Blocks.
How do I add a unicode character to the sf::String type then output the value of that string to the console window?  :?

Could you provide a simple C++ code example please?

Disch

  • Full Member
  • ***
  • Posts: 220
    • View Profile
Unicode with cout command...
« Reply #3 on: June 20, 2011, 06:57:49 am »
looks like sf::String doesn't have a ToUTF8 function (Wtf?  seriously?)

So you'll have to go through the sf::utf utilities.

Read:  This code is untested:


Code: [Select]

std::string ToUTF8(const sf::String& original)
{
    std::string str;                        // the final UTF-8 string
    str.resize(original.GetSize() * 4);           // worst case scenario:  4 bytes per codepoint

    std::string::iterator last = sf::Utf<32>::ToUtf8( original.GetData(), original.GetData() + original.GetSize(), str.begin() );
    str.resize(last - str.begin());

    return str;
}



So then if you want to put a unicode codepoint in the string and print it....

(again untested):
Code: [Select]

int main()
{
  sf::String example;
  example.push_back(0x304D); // U+304D = き  (Hiragana Ki)

  std::cout << ToUtf8(example);
}

Hiura

  • SFML Team
  • Hero Member
  • *****
  • Posts: 4321
    • View Profile
    • Email
Unicode with cout command...
« Reply #4 on: June 20, 2011, 10:38:46 am »
I think you don't need this ToUTF8 conversions function. The following works fine for me (on OS X) :
Code: [Select]
#include <SFML/System.hpp>
#include <iostream>

int main (int argc, const char * argv[])
{
    sf::String example("き");
    std::cout << example.ToAnsiString();
    std::wcout << example.ToWideString();

    return EXIT_SUCCESS;
}
SFML / OS X developer

Disch

  • Full Member
  • ***
  • Posts: 220
    • View Profile
Unicode with cout command...
« Reply #5 on: June 20, 2011, 04:20:54 pm »
Quote
ToAnsiString()


Then that function is poorly named.

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32504
    • View Profile
    • SFML's website
    • Email
Unicode with cout command...
« Reply #6 on: June 20, 2011, 04:33:37 pm »
Quote
Then that function is poorly named.

I know. "ANSI string" is commonly used, but I don't know if it has a clear definition (does it refer to non-Unicode strings? to ASCII strings? to strings made of chars? ...).

Maybe this function should be named ToLocaleString()?
Laurent Gomila - SFML developer

Disch

  • Full Member
  • ***
  • Posts: 220
    • View Profile
Unicode with cout command...
« Reply #7 on: June 20, 2011, 06:44:48 pm »
Well what does the function actually convert to?  And how does it do it?  (can't check the source right now because I'm at work).


Personally I fail to see the point in doing anything besides Unicode, as Unicode is all encompassing.  Especially since sf::String seems to be UTF-32 internally.  Code-page based glyph mapping is kind of like a legacy thing.

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32504
    • View Profile
    • SFML's website
    • Email
Unicode with cout command...
« Reply #8 on: June 20, 2011, 07:06:27 pm »
The function converts to std::string using the encoding defined by the current locale.

Non-Unicode encodings are still very important. Linux has moved to UTF-8, but Windows is still stuck with extended-ASCII codepages.
Laurent Gomila - SFML developer

Disch

  • Full Member
  • ***
  • Posts: 220
    • View Profile
Unicode with cout command...
« Reply #9 on: June 20, 2011, 07:12:55 pm »
Quote
The function converts to std::string using the encoding defined by the current locale.


In that event, the function is not poorly named, it's just ambiguous because it's platform dependent.

Personally, I would remove it altogether, as platform dependent behavior kind of defeats the point of using a crossplatform lib like SFML.

Quote
Linux has moved to UTF-8, but Windows is still stuck with extended-ASCII codepages.


Not true.  Windows has been on UTF-16 since at least NT 4.0 (1997-ish?)

Granted, the C++ standard libs are oblivious to this and tend to not support UTF-8 when printed to cout (or passed to fopen, etc), but that's not an issue that SFML needs to concern itself with, IMO.

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32504
    • View Profile
    • SFML's website
    • Email
Unicode with cout command...
« Reply #10 on: June 20, 2011, 07:17:27 pm »
Quote
Personally, I would remove it altogether, as platform dependent behavior kind of defeats the point of using a crossplatform lib like SFML.

I disagree. I think platform-specific behaviour is exactly what we want here. Having a std::string with an encoding different than the system's one would be nearly useless. Many people and many functions work with a system-locale-specific std::string. Because that's the only way to use std::string without having to care about the encoding.

Quote
Not true. Windows has been on UTF-16 since at least NT 4.0 (1997-ish?)

The Windows API can work with UTF-16, true. But when you print to the console of read a text file, there's no UTF-16 anymore (the default on my machine is CP-1252, which is a superset of Latin-1, which is a subset of UCS-4).
Laurent Gomila - SFML developer

Disch

  • Full Member
  • ***
  • Posts: 220
    • View Profile
Unicode with cout command...
« Reply #11 on: June 20, 2011, 07:32:55 pm »
Quote
Many people and many functions work with a system-locale-specific std::string. Because that's the only way to use std::string without having to care about the encoding.


Point taken.

Quote
The Windows API can work with UTF-16, true. But when you print to the console of read a text file, there's no UTF-16 anymore (the default on my machine is CP-1252, which is a superset of Latin-1, which is a subset of UCS-4).


Reading text from a file is dependant on the encoding of the text file.  Text files can certainly be UTF-8 encoded on any platform.  Maybe I'm missing your point there?

As for printing to the console, you can print Unicode to the console if you go through WinAPI functions (like you said), it's just that the standard lib doesn't cover it for whatever reason.  Of course I'm not suggesting that SFML try to fix Windows' version of the standard lib.

But yeah, I get your point.  For working with the standard lib, you need a specific locale encoded string (yuk).  So yeah, I'm wrong.  You're right.


I still think sf::String should have conversion to Unicode functions.  I was a little surprised when I didn't see any.

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32504
    • View Profile
    • SFML's website
    • Email
Unicode with cout command...
« Reply #12 on: June 20, 2011, 08:05:56 pm »
Quote
Reading text from a file is dependant on the encoding of the text file. Text files can certainly be UTF-8 encoded on any platform. Maybe I'm missing your point there?

Yes, sorry I was not clear enough. I was talking about the default encoding of default text editors on Windows -- which is *not* Unicode (even in Visual Studio, you don't get Unicode by default).

Quote
As for printing to the console, you can print Unicode to the console if you go through WinAPI functions (like you said)

It's difficult to change the console encoding, so the only solution is to convert to its default encoding (CP-1252 for example) -- and this is not easy.

Quote
it's just that the standard lib doesn't cover it for whatever reason.

The SL works with the system's locale, it doesn't need to handle more than this.

Quote
I still think sf::String should have conversion to Unicode functions. I was a little surprised when I didn't see any.

The problem is the return type: std::string is not suitable for multi-bytes encoding such as UTF-8 or UTF-16. So what should I return?
Laurent Gomila - SFML developer

Disch

  • Full Member
  • ***
  • Posts: 220
    • View Profile
Unicode with cout command...
« Reply #13 on: June 20, 2011, 08:11:20 pm »
Quote
The problem is the return type: std::string is not suitable for multi-bytes encoding such as UTF-8 or UTF-16. So what should I return?


std::string works just fine for UTF-8.  The only thing is that the length() is not really going to be the length in codepoints.  But that's a minor issue

As for UTF-16, std::wstring would work even though wchar_t is sometimes larger than 16-bits.

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32504
    • View Profile
    • SFML's website
    • Email
Unicode with cout command...
« Reply #14 on: June 20, 2011, 08:35:33 pm »
Quote
std::string works just fine for UTF-8. The only thing is that the length() is not really going to be the length in codepoints. But that's a minor issue

It works with UTF-8 as a container, not as a string class. Anything that requires to interpret the characters will be broken -- so basically, anything that std::string has and that std::vector doesn't have ;)

Quote
As for UTF-16, std::wstring would work even though wchar_t is sometimes larger than 16-bits.

The standard defines nothing regarding size or encoding for wchar_t/std::wstring. Some implementations might define a 8-bits wchar_t; and even 32-bits wchar_t (Unix) would not be suitable (why using UTF-16 if it's to use a 32-bits storage?).
Laurent Gomila - SFML developer