SFML community forums
Help => Network => Topic started by: xarxer on December 18, 2010, 03:30:19 pm
-
Hello fellow programmers! :)
I'm using SFML2
I've been extending the functionality of MrX's irc-bot and I have a problem regarding the sf::TcpSocket::Receive(), or at least so I think..
I'm reading the incoming data from the server with
sf::TcpSocket::Receive(Buffer, 0, 4096)
where Buffer is a char Buffer[4096];
I then convert all the incoming data to a std::string and then continue to process what it contains etc.
The problem lies within the encoding of the text in the string. In the event of a name-list being received from the server I add them to a vector containing a custom class CUser.
But it seems that when the usernames contain the letters å, ä & ö, it fails totally.
If I try to output a username containing å, ä or ö into the console, the letter shows up as � and I'm not really sure what to do to help this.
Any ideas are welcome :)
-
What is the encoding of the strings that you receive? You have to convert from it to the encoding which is used by whatever displays your text.
-
Hmm is there any way to find out you think?
Or would I have to ask the server operator?
-
You can check the first bytes, some Unicode encodings include a specific marker (BOM). But I don't think it is used here, so you should rather ask someone who knows.
If I had to guess, I'd say it's UTF-8.
-
If its not UTF-8, just try decoding using a variety of formats until something works. If you know what text to expect, it makes your life very easy since you can just run through the various encoding schemes until you find one that produces the correct output text.
http://xchat.org/encoding/
-
If its not UTF-8, just try decoding using a variety of formats until something works
It's not that easy, many characters have the same code in different encodings (especially 8 bits non-unicode encodings -- check all the Latin-1 variants). So it may work many times and then one day simply fail.
-
But the bottom line is I should put what I receive in a string, and then change the coding in the string?
EDIT: I think the server is sending iso-8859-1 characters.. is it possible to convert it to utf-8 when I receive the message?
-
You can use sf::Utf8::FromAnsi, if you can manage to get a std::locale which has ISO-8859-1 encoding.
If you don't want to do it using locales, you'll probably have to use a more complete conversion library (like libiconv) since SFML only handles Unicode encodings and locales.
-
Well it seems this will be tricky..
But if what I receive is ISO-8859-1, and my OS locale uses UTF-8, wouldn't all letter be unreadable? and not just å, ä and ö?
I can't see a simple solution to this, lol :)
-
But if what I receive is ISO-8859-1, and my OS locale uses UTF-8, wouldn't all letter be unreadable? and not just å, ä and ö?
No, UTF-8 and ISO-8859-1 have a lot in common. At least the ASCII range (0 - 127).
I can't see a simple solution to this, lol
Another conversion library :P
-
I did like this and thought it would help, obviously I was wrong.. :o
void Process_string(std::string& str)
{
QString myString = QString::fromStdString(str);
myString.ToUtf8();
str = myString.ToStdString();
}
Any idea why this doesn't work?
The receiving part looks somewhat like this (not complete code):
sf::TcpSocket Communicator;
char Buffer[4096];
size_t length = 0;
std::string temp1;
Communicator.Receive(Buffer, 4096m length);
temp1 = Buffer;
Process_string(temp1);
So I figured now temp1 would be converted into utf8, no?
EDIT::
Function was faulty, now looks like this, still doesn't do the job though:
void Bot::Process_string(std::string& str)
{
QString myString = QString::fromStdString(str);
str = myString.toUtf8().constData();
}
EDIT 2 ::
Completely ignore this post, I did a major blunder :)
Everything works perfectly now :)
-
Hum you're using Qt and still uses sfml-network? Why don't you use QtNetwork? It would be much better integrated to your app, if it's a Qt one.
By the way, I just remembered that ISO-8859-1 is exactly the 256 first codes of the Unicode standard, which means that conversion from/to UTF-X is straight-forward. So I should add it to SFML.