Welcome, Guest. Please login or register. Did you miss your activation email?

Author Topic: UTF-8 and stuff  (Read 6791 times)

0 Members and 1 Guest are viewing this topic.

Danetta

  • Newbie
  • *
  • Posts: 11
    • View Profile
UTF-8 and stuff
« on: December 15, 2016, 03:52:07 am »
Hello.

EDIT: It seems like I resolved almost all the issues.

EDIT2: How much bytes I possibly need to convert from sf::String to std::string or std::wstring?
In case when I need to resize empty strings so the converted one would fit. Probably an easy question, but can't google it.
« Last Edit: December 15, 2016, 04:49:36 am by Danetta »

eXpl0it3r

  • SFML Team
  • Hero Member
  • *****
  • Posts: 11033
    • View Profile
    • development blog
    • Email
UTF-8 and stuff
« Reply #1 on: December 15, 2016, 11:06:22 am »
Just make use of the convert functions the SFML provides, see the documentation.
Official FAQ: https://www.sfml-dev.org/faq.php
Official Discord Server: https://discord.gg/nr4X7Fh
——————————————————————
Dev Blog: https://duerrenberger.dev/blog/

Danetta

  • Newbie
  • *
  • Posts: 11
    • View Profile
Re: UTF-8 and stuff
« Reply #2 on: December 15, 2016, 04:01:51 pm »
Almost all of them require iterators to begin of original string | end of original string | begin of output string.
Output string is empty by default and most conversion function from there would give me
string iterator + offset out of range
if I do not set the size of output string manually like that:

std::string ANSI_to_UTF8(const sf::String& original)
{
        std::string ansi;                                                              
        ansi.resize(original.getSize() * 4);

        std::string::iterator last = sf::Utf<8>::fromAnsi(original.begin(), original.end(), ansi.begin(), std::locale("Russian"));
        ansi.resize(last - ansi.begin());
        return ansi;
}

eXpl0it3r

  • SFML Team
  • Hero Member
  • *****
  • Posts: 11033
    • View Profile
    • development blog
    • Email
Re: UTF-8 and stuff
« Reply #3 on: December 15, 2016, 05:35:58 pm »
Maybe describe your problem in detail, because I don't really understand the issue. Or maybe you just misunderstood the API?

If you have a sf::String that is UTF and want to convert to an ANSI string, all you have to do is call toAnsiString() (and toWideString() for wstring).

If you want to do it manually, you could also take a look how SFML does it internally for toAnsi.
Official FAQ: https://www.sfml-dev.org/faq.php
Official Discord Server: https://discord.gg/nr4X7Fh
——————————————————————
Dev Blog: https://duerrenberger.dev/blog/

dabbertorres

  • Hero Member
  • *****
  • Posts: 505
    • View Profile
    • website/blog
Re: UTF-8 and stuff
« Reply #4 on: December 15, 2016, 05:50:09 pm »
string iterator + offset out of range
if I do not set the size of output string manually like that
You want std::back_inserter

Danetta

  • Newbie
  • *
  • Posts: 11
    • View Profile
Re: UTF-8 and stuff
« Reply #5 on: December 15, 2016, 08:20:16 pm »
Maybe describe your problem in detail, because I don't really understand the issue. Or maybe you just misunderstood the API?

If you have a sf::String that is UTF and want to convert to an ANSI string, all you have to do is call toAnsiString() (and toWideString() for wstring).

If you want to do it manually, you could also take a look how SFML does it internally for toAnsi.
What if my sf::String is widestring and I want to convert it to UTF-8?


string iterator + offset out of range
if I do not set the size of output string manually like that
You want std::back_inserter
Seems like true, will try to figure out.
« Last Edit: December 15, 2016, 08:22:53 pm by Danetta »

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
Re: UTF-8 and stuff
« Reply #6 on: December 15, 2016, 08:28:42 pm »
Quote
What if my sf::String is widestring and I want to convert it to UTF-8?
sf::String::toUtf8(). And you don't have to care about what is stored internally in sf::String (it's UTF-32, by the way).

You still have said nothing about your real problem. This conversation can last for days if you don't tell us what you really want to do.
« Last Edit: December 15, 2016, 08:31:01 pm by Laurent »
Laurent Gomila - SFML developer

Danetta

  • Newbie
  • *
  • Posts: 11
    • View Profile
Re: UTF-8 and stuff
« Reply #7 on: December 15, 2016, 08:39:59 pm »
I find sf::String::toUtf8 too hard to use because output is std::basic_string while sf::Utf<8> allows me to ger result almost within in a single string. I tried it and got what I need, but with some problems like having to resize declared strings manually so converted ones would fit into them.

Answering your question about what I am trying to do:
Client terminal displaying strings in cp_1251, client gui (label and editboxes) using widestring (seems like?), server processing requests to Database using UTF-8 strings as arguments.

So, when I receive widestring (or UTF-32? I am still not sure, but widestring works well when I use it as parameter; I don't want to, however, but TGUI documentation only said it's "sf::String"; also, UTF-8 doesn't display well with TGUI unless I convert to widestring.. ugh..) from gui, I need to handle all the encodings and decodings from client to server and then back.


Edited.
« Last Edit: December 15, 2016, 09:09:57 pm by Danetta »

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
Re: UTF-8 and stuff
« Reply #8 on: December 15, 2016, 09:07:49 pm »
Quote
I find sf::String::toUtf8 too hard to use because output is std::basic_string
?
std::string is just a typedef to std::basic_string<char>; so what you get with this function is basically std::string with something else than char inside.

Quote
I tried it and got what I need, but with some problems like having to resize declared strings manually so converted ones would fit into them.
As already said, std::back_inserter is the solution. Look at sf::Utf source code for examples.
Laurent Gomila - SFML developer

Danetta

  • Newbie
  • *
  • Posts: 11
    • View Profile
Re: UTF-8 and stuff
« Reply #9 on: December 16, 2016, 03:20:59 am »
std::string String::toAnsiString(const std::locale& locale) const
{
    // Prepare the output string
    std::string output;
    output.reserve(m_string.length() + 1);

    // Convert
    Utf32::toAnsi(m_string.begin(), m_string.end(), std::back_inserter(output), 0, locale);

    return output;
}

Why would you use string::reserve in this case? Is not it unneeded when you use std::back_inserter?

eXpl0it3r

  • SFML Team
  • Hero Member
  • *****
  • Posts: 11033
    • View Profile
    • development blog
    • Email
UTF-8 and stuff
« Reply #10 on: December 16, 2016, 03:28:02 am »
I'm not certain that this is the primary reason, but by reserving the needed memory space, you can do one memory allocation opposed to the string incrementally allocating more and more space.
Official FAQ: https://www.sfml-dev.org/faq.php
Official Discord Server: https://discord.gg/nr4X7Fh
——————————————————————
Dev Blog: https://duerrenberger.dev/blog/

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
Re: UTF-8 and stuff
« Reply #11 on: December 16, 2016, 08:40:22 am »
Yes, it's just an optimization to avoid many small memory re-allocations. You can remove this line, the function will still work as expected.
Laurent Gomila - SFML developer