SFML community forums
Help => System => Topic started by: Pixel_Outlaw on June 20, 2011, 05:22:37 am
-
Hello,
I understand that SFML provides some Unicode functionality.
How would one output some unicode characters to a console using the provided SFML functionality?
My project is terminal based for the Linux platform and I'm writing software to convert images to Unicode characters for image viewing in text only environments.
Help is appreciated thank you. :D
-
It depends on the system you're running. On windows, outputting Unicode characters to the console is unreasonably difficult.
If you're only concern is Linux, then you should be able to output utf-8 just fine to cout. It worked for me on Ubuntu. I can't imagine that other distros would be different.
-
This is indeed for Linux. I'm using the GCC compiler with Code::Blocks.
How do I add a unicode character to the sf::String type then output the value of that string to the console window? :?
Could you provide a simple C++ code example please?
-
looks like sf::String doesn't have a ToUTF8 function (Wtf? seriously?)
So you'll have to go through the sf::utf utilities.
Read: This code is untested:
std::string ToUTF8(const sf::String& original)
{
std::string str; // the final UTF-8 string
str.resize(original.GetSize() * 4); // worst case scenario: 4 bytes per codepoint
std::string::iterator last = sf::Utf<32>::ToUtf8( original.GetData(), original.GetData() + original.GetSize(), str.begin() );
str.resize(last - str.begin());
return str;
}
So then if you want to put a unicode codepoint in the string and print it....
(again untested):
int main()
{
sf::String example;
example.push_back(0x304D); // U+304D = き (Hiragana Ki)
std::cout << ToUtf8(example);
}
-
I think you don't need this ToUTF8 conversions function. The following works fine for me (on OS X) :
#include <SFML/System.hpp>
#include <iostream>
int main (int argc, const char * argv[])
{
sf::String example("き");
std::cout << example.ToAnsiString();
std::wcout << example.ToWideString();
return EXIT_SUCCESS;
}
-
ToAnsiString()
Then that function is poorly named.
-
Then that function is poorly named.
I know. "ANSI string" is commonly used, but I don't know if it has a clear definition (does it refer to non-Unicode strings? to ASCII strings? to strings made of chars? ...).
Maybe this function should be named ToLocaleString()?
-
Well what does the function actually convert to? And how does it do it? (can't check the source right now because I'm at work).
Personally I fail to see the point in doing anything besides Unicode, as Unicode is all encompassing. Especially since sf::String seems to be UTF-32 internally. Code-page based glyph mapping is kind of like a legacy thing.
-
The function converts to std::string using the encoding defined by the current locale.
Non-Unicode encodings are still very important. Linux has moved to UTF-8, but Windows is still stuck with extended-ASCII codepages.
-
The function converts to std::string using the encoding defined by the current locale.
In that event, the function is not poorly named, it's just ambiguous because it's platform dependent.
Personally, I would remove it altogether, as platform dependent behavior kind of defeats the point of using a crossplatform lib like SFML.
Linux has moved to UTF-8, but Windows is still stuck with extended-ASCII codepages.
Not true. Windows has been on UTF-16 since at least NT 4.0 (1997-ish?)
Granted, the C++ standard libs are oblivious to this and tend to not support UTF-8 when printed to cout (or passed to fopen, etc), but that's not an issue that SFML needs to concern itself with, IMO.
-
Personally, I would remove it altogether, as platform dependent behavior kind of defeats the point of using a crossplatform lib like SFML.
I disagree. I think platform-specific behaviour is exactly what we want here. Having a std::string with an encoding different than the system's one would be nearly useless. Many people and many functions work with a system-locale-specific std::string. Because that's the only way to use std::string without having to care about the encoding.
Not true. Windows has been on UTF-16 since at least NT 4.0 (1997-ish?)
The Windows API can work with UTF-16, true. But when you print to the console of read a text file, there's no UTF-16 anymore (the default on my machine is CP-1252, which is a superset of Latin-1, which is a subset of UCS-4).
-
Many people and many functions work with a system-locale-specific std::string. Because that's the only way to use std::string without having to care about the encoding.
Point taken.
The Windows API can work with UTF-16, true. But when you print to the console of read a text file, there's no UTF-16 anymore (the default on my machine is CP-1252, which is a superset of Latin-1, which is a subset of UCS-4).
Reading text from a file is dependant on the encoding of the text file. Text files can certainly be UTF-8 encoded on any platform. Maybe I'm missing your point there?
As for printing to the console, you can print Unicode to the console if you go through WinAPI functions (like you said), it's just that the standard lib doesn't cover it for whatever reason. Of course I'm not suggesting that SFML try to fix Windows' version of the standard lib.
But yeah, I get your point. For working with the standard lib, you need a specific locale encoded string (yuk). So yeah, I'm wrong. You're right.
I still think sf::String should have conversion to Unicode functions. I was a little surprised when I didn't see any.
-
Reading text from a file is dependant on the encoding of the text file. Text files can certainly be UTF-8 encoded on any platform. Maybe I'm missing your point there?
Yes, sorry I was not clear enough. I was talking about the default encoding of default text editors on Windows -- which is *not* Unicode (even in Visual Studio, you don't get Unicode by default).
As for printing to the console, you can print Unicode to the console if you go through WinAPI functions (like you said)
It's difficult to change the console encoding, so the only solution is to convert to its default encoding (CP-1252 for example) -- and this is not easy.
it's just that the standard lib doesn't cover it for whatever reason.
The SL works with the system's locale, it doesn't need to handle more than this.
I still think sf::String should have conversion to Unicode functions. I was a little surprised when I didn't see any.
The problem is the return type: std::string is not suitable for multi-bytes encoding such as UTF-8 or UTF-16. So what should I return?
-
The problem is the return type: std::string is not suitable for multi-bytes encoding such as UTF-8 or UTF-16. So what should I return?
std::string works just fine for UTF-8. The only thing is that the length() is not really going to be the length in codepoints. But that's a minor issue
As for UTF-16, std::wstring would work even though wchar_t is sometimes larger than 16-bits.
-
std::string works just fine for UTF-8. The only thing is that the length() is not really going to be the length in codepoints. But that's a minor issue
It works with UTF-8 as a container, not as a string class. Anything that requires to interpret the characters will be broken -- so basically, anything that std::string has and that std::vector doesn't have ;)
As for UTF-16, std::wstring would work even though wchar_t is sometimes larger than 16-bits.
The standard defines nothing regarding size or encoding for wchar_t/std::wstring. Some implementations might define a 8-bits wchar_t; and even 32-bits wchar_t (Unix) would not be suitable (why using UTF-16 if it's to use a 32-bits storage?).
-
It looks like I can easily use the character I need directly with my editor.
I really hope they standardize Unicode soon, while I don't understand all of the posts here completely I feel that Unicode really is superior to ASCII.
Thanks for all the help.
-
I really hope they standardize Unicode soon
Unicode is a standard. It's just C++ that poorly supports it (will be much better in the next C++ standard).
I feel that Unicode really is superior to ASCII
ASCII defines 128 characters. Unicode defines more than 200000 characters. So yes, it is superior ;)