Welcome, Guest. Please login or register. Did you miss your activation email?

Author Topic: Text encoding problem  (Read 6391 times)

0 Members and 1 Guest are viewing this topic.

xarxer

  • Newbie
  • *
  • Posts: 35
    • View Profile
Text encoding problem
« on: December 18, 2010, 03:30:19 pm »
Hello fellow programmers!  :)

I'm using SFML2

I've been extending the functionality of MrX's irc-bot and I have a problem regarding the sf::TcpSocket::Receive(), or at least so I think..

I'm reading the incoming data from the server with
Code: [Select]

sf::TcpSocket::Receive(Buffer, 0, 4096)

where Buffer is a char Buffer[4096];

I then convert all the incoming data to a std::string and then continue to process what it contains etc.

The problem lies within the encoding of the text in the string. In the event of a name-list being received from the server I add them to a vector containing a custom class CUser.
But it seems that when the usernames contain the letters å, ä & ö, it fails totally.

If I try to output a username containing å, ä or ö into the console, the letter shows up as � and I'm not really sure what to do to help this.

Any ideas are welcome :)

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
Text encoding problem
« Reply #1 on: December 18, 2010, 03:39:37 pm »
What is the encoding of the strings that you receive? You have to convert from it to the encoding which is used by whatever displays your text.
Laurent Gomila - SFML developer

xarxer

  • Newbie
  • *
  • Posts: 35
    • View Profile
Text encoding problem
« Reply #2 on: December 18, 2010, 03:55:28 pm »
Hmm is there any way to find out you think?
Or would I have to ask the server operator?

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
Text encoding problem
« Reply #3 on: December 18, 2010, 04:26:29 pm »
You can check the first bytes, some Unicode encodings include a specific marker (BOM). But I don't think it is used here, so you should rather ask someone who knows.

If I had to guess, I'd say it's UTF-8.
Laurent Gomila - SFML developer

Spodi

  • Full Member
  • ***
  • Posts: 150
    • View Profile
    • http://www.netgore.com/
Text encoding problem
« Reply #4 on: December 18, 2010, 06:05:40 pm »
If its not UTF-8, just try decoding using a variety of formats until something works. If you know what text to expect, it makes your life very easy since you can just run through the various encoding schemes until you find one that produces the correct output text.

http://xchat.org/encoding/

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
Text encoding problem
« Reply #5 on: December 18, 2010, 06:13:13 pm »
Quote
If its not UTF-8, just try decoding using a variety of formats until something works

It's not that easy, many characters have the same code in different encodings (especially 8 bits non-unicode encodings -- check all the Latin-1 variants). So it may work many times and then one day simply fail.
Laurent Gomila - SFML developer

xarxer

  • Newbie
  • *
  • Posts: 35
    • View Profile
Text encoding problem
« Reply #6 on: December 18, 2010, 06:59:22 pm »
But the bottom line is I should put what I receive in a string, and then change the coding in the string?

EDIT: I think the server is sending iso-8859-1 characters.. is it possible to convert it to utf-8 when I receive the message?

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
Text encoding problem
« Reply #7 on: December 18, 2010, 07:21:09 pm »
You can use sf::Utf8::FromAnsi, if you can manage to get a std::locale which has ISO-8859-1 encoding.

If you don't want to do it using locales, you'll probably have to use a more complete conversion library (like libiconv) since SFML only handles Unicode encodings and locales.
Laurent Gomila - SFML developer

xarxer

  • Newbie
  • *
  • Posts: 35
    • View Profile
Text encoding problem
« Reply #8 on: December 18, 2010, 08:02:29 pm »
Well it seems this will be tricky..

But if what I receive is ISO-8859-1, and my OS locale uses UTF-8, wouldn't all letter be unreadable? and not just å, ä and ö?

I can't see a simple solution to this, lol :)

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
Text encoding problem
« Reply #9 on: December 18, 2010, 08:25:52 pm »
Quote
But if what I receive is ISO-8859-1, and my OS locale uses UTF-8, wouldn't all letter be unreadable? and not just å, ä and ö?

No, UTF-8 and ISO-8859-1 have a lot in common. At least the ASCII range (0 - 127).

Quote
I can't see a simple solution to this, lol

Another conversion library :P
Laurent Gomila - SFML developer

xarxer

  • Newbie
  • *
  • Posts: 35
    • View Profile
Text encoding problem
« Reply #10 on: December 19, 2010, 11:54:53 pm »
I did like this and thought it would help, obviously I was wrong..  :o

Code: [Select]


void Process_string(std::string& str)
{
    QString myString = QString::fromStdString(str);

    myString.ToUtf8();

    str = myString.ToStdString();
}



Any idea why this doesn't work?

The receiving part looks somewhat like this (not complete code):

Code: [Select]


    sf::TcpSocket Communicator;
    char Buffer[4096];
    size_t length = 0;
    std::string temp1;

    Communicator.Receive(Buffer, 4096m length);

    temp1 = Buffer;

    Process_string(temp1);



So I figured now temp1 would be converted into utf8, no?


EDIT::

Function was faulty, now looks like this, still doesn't do the job though:

Code: [Select]

void Bot::Process_string(std::string& str)
{
    QString myString = QString::fromStdString(str);
   
    str = myString.toUtf8().constData();
}


EDIT 2 ::

Completely ignore this post, I did a major blunder :)
Everything works perfectly now :)

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
Text encoding problem
« Reply #11 on: December 20, 2010, 07:49:59 am »
Hum you're using Qt and still uses sfml-network? Why don't you use QtNetwork? It would be much better integrated to your app, if it's a Qt one.

By the way, I just remembered that ISO-8859-1 is exactly the 256 first codes of the Unicode standard, which means that conversion from/to UTF-X is straight-forward. So I should add it to SFML.
Laurent Gomila - SFML developer