Welcome, Guest. Please login or register. Did you miss your activation email?

Author Topic: New unicode / string / text handling in SFML 2  (Read 19128 times)

0 Members and 1 Guest are viewing this topic.

Tank

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1486
    • View Profile
    • Blog
    • Email
New unicode / string / text handling in SFML 2
« Reply #15 on: February 25, 2010, 09:48:21 am »
Thanks for the link Nexus, it's an interesting article.

I didn't mean that sf::String should be a clone of std::string. I more meant that it should cover a similar functionality -- which doesn't include that it has to be implemented the same way.

But you're absolutely right. I took another look at the sf::String API docs and discovered Begin() and End(). Having those we're of course able to use a lot of STL algos, like you said (especially find() and search()). Is there a way to insert an iterator range (or at least single values) at an iterator? I didn't find such an algorithm and can't think of another way to do it. If it's possible, well then sf::String would be almost feature-complete. ;)

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
New unicode / string / text handling in SFML 2
« Reply #16 on: February 25, 2010, 09:54:53 am »
No it's not possible, I think I have to add a few member functions.
Laurent Gomila - SFML developer

Tank

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1486
    • View Profile
    • Blog
    • Email
New unicode / string / text handling in SFML 2
« Reply #17 on: February 25, 2010, 10:18:02 am »
Okay, then what about giving access to the underlying std::basic_string to the outside?

Seriously, I don't think this is a bad idea. Even when I modify the contents of the interal std::basic_string of sf::String, the state of sf::String doesn't get changed, does it? As far as I see, sf::String is mostly for conversions.

But having a (non-const!) reference to the internal std::basic_string, we'd have all the methods available for std::basic_string and STL algorithms.

When you write new methods like Insert() and Find(), then those are nothing more than pure wrappers. I don't see a good reason to write such.

Edit:
However, being forced to write code like
Code: [Select]
mystring.GetString() += sf::String( "foo" );
is also not the best solution IMHO.

Maybe sticking back to the old design is a better solution. My proposal would be a
Code: [Select]
typedef std::basic_string<sf::Uint23>  sf::String
and a bunch of global conversion functions (maybe implicit conversions are still possible).

Nexus

  • SFML Team
  • Hero Member
  • *****
  • Posts: 6287
  • Thor Developer
    • View Profile
    • Bromeon
New unicode / string / text handling in SFML 2
« Reply #18 on: February 25, 2010, 03:17:40 pm »
Quote from: "Tank"
Is there a way to insert an iterator range (or at least single values) at an iterator?
No, its not possible, as it's neither possible to remove a value from an iterator range (std::remove() and std::remove_if() just reorder and copy parts of the sequence, the total number of elements remains the same). The problem is, the container has to be known to perform such operations. But an important property of an iterator is, that it abstracts from containers.

Another very interesting link to a presentation of Andrei Alexandrescu: Iterators Must Go. He describes some problems of iterators and shows a nice alternative.

Back to topic:
Quote from: "Tank"
When you write new methods like Insert() and Find(), then those are nothing more than pure wrappers. I don't see a good reason to write such.
The adapter approach has the advantage of cutting off some hardly ever used functionality like find_last_not_of() or get_allocator() and therefore keeping the interface simple. I understand that the thought that SFML has an own string class isn't extremely comfortable. But due to the lack of a general-case unicode string class in the standard library, this is probably not a bad decision. Note that the typedef std::basic_string<sf::Uint32> isn't really better in this respect. The opposite is the case: One can no longer write simple statements like
Code: [Select]
myText.SetString("hello");since there is an explicit conversion function required. And it would be the only class in SFML using a different naming convention (under_scores). Those points might confuse several beginners. By contrast, an own sf::String class is more flexible and probably easier to use.

About returning a reference to the internal basic_string: Hmm, I don't know whether it's a good idea. What would you need that for? If Laurent overloaded the Insert() functions, the class would already be quite useful in my opinion. I also think, one Find() would be appropriate (the most general one):
Code: [Select]
unsigned int sf::String::Find(const sf::String& substring, unsigned int startPosition = 0);
Zloxx II: action platformer
Thor Library: particle systems, animations, dot products, ...
SFML Game Development:

Tank

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1486
    • View Profile
    • Blog
    • Email
New unicode / string / text handling in SFML 2
« Reply #19 on: February 25, 2010, 06:52:43 pm »
Quote from: "Nexus"
The problem is, the container has to be known to perform such operations. But an important property of an iterator is, that it abstracts from containers.

Exactly, that's why I jumped in with opening the underlying basic_string for the outside. Not that I like this idea very much, but it would eliminate the problem.

Quote
He describes some problems of iterators and shows a nice alternative.

I've read the papers and like what's being shown. But I think it will take a very long time (if it happens at all) for it to be available in a regular C++ environment.

Quote
and therefore keeping the interface simple.

True. Personally I think mixing method names alone would be a design break.

Quote
I understand that the thought that SFML has an own string class isn't extremely comfortable.

I don't agree with that, since I'd be happy to have a well written string class in SFML. SFML takes care of so many facts in multi-platform environments, so why not a basic one like Unicode strings? Also, SFML itself makes use of them, so it should at least bring in an external Unicode library or classes collection or provide its own. The latter is my favorite, because it keeps the code style consistent with the rest.


Quote
About returning a reference to the internal basic_string: Hmm, I don't know whether it's a good idea. What would you need that for?

I've just supposed this regarding to the iterator problem and not being able to modify the string. If Laurent's facing towards writing own methods, that'd be of course completely irrelevant and I'd be happy to use those instead of basic_string's.

Quote
I also think, one Find() would be appropriate (the most general one)

I agree with that -- but don't forget a ReverseFind(), it's often very useful. ;)

However, Laurent, get your pants on and write those methods. ;) Seriously: If you don't have the time for it, I'd be glad to help you out on this -- following all your coding styles/guidelines. I mean it's not really hard to implement some wrapper methods for basic_string. So this could save you (and especially me ;)) some time.

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
New unicode / string / text handling in SFML 2
« Reply #20 on: February 25, 2010, 07:10:00 pm »
Don't worry, I have much more spare time than you think ;)

I'll write those functions as soon as I finish my current task.
Laurent Gomila - SFML developer

Tank

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1486
    • View Profile
    • Blog
    • Email
New unicode / string / text handling in SFML 2
« Reply #21 on: February 25, 2010, 07:15:20 pm »
Okay, thanks a lot. :)

Nexus

  • SFML Team
  • Hero Member
  • *****
  • Posts: 6287
  • Thor Developer
    • View Profile
    • Bromeon
New unicode / string / text handling in SFML 2
« Reply #22 on: February 25, 2010, 08:35:22 pm »
Quote from: "Tank"
I've read the papers and like what's being shown. But I think it will take a very long time (if it happens at all) for it to be available in a regular C++ environment.
Boost has a Range library. But regarding the acceptance and spreading, I think similarly. C++0x might also be a challenge for a big part of the C++ community. Propably several developpers won't use the new features because they don't see any advantages or can't switch to new compilers.

Quote from: "Tank"
I don't agree with that, since I'd be happy to have a well written string class in SFML. SFML takes care of so many facts in multi-platform environments, so why not a basic one like Unicode strings? Also, SFML itself makes use of them, so it should at least bring in an external Unicode library or classes collection or provide its own. The latter is my favorite, because it keeps the code style consistent with the rest.
Actually, I thought you would prefer a standard solution, obviously I misunderstood you. ;)

Of course, it would not be bad if the C++ standard library supported Unicode in a better way. Considering the status quo, I find sf::String a good solution, too. :)
Zloxx II: action platformer
Thor Library: particle systems, animations, dot products, ...
SFML Game Development:

Tank

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1486
    • View Profile
    • Blog
    • Email
New unicode / string / text handling in SFML 2
« Reply #23 on: February 26, 2010, 09:52:14 am »
Quote from: "Nexus"
C++0x might also be a challenge for a big part of the C++ community. Propably several developpers won't use the new features because they don't see any advantages or can't switch to new compilers.

Exactly. Also there's so much library code already out there which even uses code standards before the current one. However, I'd be very happy so see the new features natively built into the compiler.

Quote
Actually, I thought you would prefer a standard solution, obviously I misunderstood you. ;)

You didn't. I was digging for quick solutions for the current situation. But as this discussion went on, I changed me mind. :)

Quote
Of course, it would not be bad if the C++ standard library supported Unicode in a better way. Considering the status quo, I find sf::String a good solution, too. :)

In a better way, or at all? ;) It's a fact that C++  is still rather low-level. Of course there're many libraries out there which try to close the gaps. But other languages are just better in certain areas, and strings (or string processing) is definitely one of them. However, it's getting better, and I'm really looking forward to the new features.

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
New unicode / string / text handling in SFML 2
« Reply #24 on: February 26, 2010, 10:42:03 am »
I added Insert and Find (no ReverseFind at the moment ;)), as well as constructors for converting from single characters. That makes all functions taking a String work for single characters as well, without writing 4 overloads for each.

I'm still not satisfied with this solution. For example, using STL generic algorithms cannot be done because none of the member functions of sf::String take iterators. Maybe I'll have to define more overloads to handle that...

I hate this class :lol:
Laurent Gomila - SFML developer

Tank

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1486
    • View Profile
    • Blog
    • Email
New unicode / string / text handling in SFML 2
« Reply #25 on: February 26, 2010, 01:58:40 pm »
Seriously, i love you. ;) This brings a project of mine a lot forward.

You hate it? Well I think there isn't a solution that satisfies all problems. Either go with the STL and lose consistency and the easy nature or go with the current design and write some boring overloads. ;)

Nexus

  • SFML Team
  • Hero Member
  • *****
  • Posts: 6287
  • Thor Developer
    • View Profile
    • Bromeon
New unicode / string / text handling in SFML 2
« Reply #26 on: September 15, 2010, 01:46:07 am »
In my opinion, sf::String member functions taking iterators are quite important. This includes the constructor, so that one can easily create sf::String from iterator ranges, independent of the underlying container.
Code: [Select]
std::string buffer;
sf::String subString(buffer.begin(), buffer.begin() + 5);

Substrings are easy to implement provided the iterator interface. At the moment, there is no way to add characters except using Insert(), which is not helpful for substrings, because it only accepts complete sf::String objects. The workaround is rather ugly:
Code: [Select]
sf::String SubString(const sf::String& source, unsigned int begin, unsigned int length)
{
std::basic_string<sf::Uint32> buffer(source.Begin() + begin, source.Begin() + begin + length);
return sf::String(buffer);
}

I personally find the single-character constructors rather questionable, but I understand they are a simple way to allow concatenations without masses of + and += overloads. However, the constructors could be extended to take a second parameter (similar to std::string, but in reverse order), allowing construction of fillers like whitespaces or dots:
Code: [Select]
sf::String(char fillCharacter, size_t length = 1);
Finally, I don't see much benefit in the coexistence of To...() methods and conversion operators. Are the two following operators really required? I wouldn't provide them just for convenience. Multiple functions doing the same thing often cause confusion at users – as well as some implicit conversions.
Code: [Select]
operator std::string()
operator std::wstring()
Zloxx II: action platformer
Thor Library: particle systems, animations, dot products, ...
SFML Game Development:

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
New unicode / string / text handling in SFML 2
« Reply #27 on: September 15, 2010, 08:13:59 am »
Quote
In my opinion, sf::String member functions taking iterators are quite important. This includes the constructor, so that one can easily create sf::String from iterator ranges, independent of the underlying container.

I don't know if we can find the character type from an iterator; without the character type, sf::String can't deduce the source encoding and can't convert to UTF-32.

Quote
At the moment, there is no way to add characters except using Insert(), which is not helpful for substrings, because it only accepts complete sf::String objects. The workaround is rather ugly

I agree.

Quote
Finally, I don't see much benefit in the coexistence of To...() methods and conversion operators

Implicit cast operators alone are never enough. There are always situations where people want to convert explicitely, for example if they don't store the result directly into a std::string / std::wstring or whatever. And using static_cast<target_type>(string)... well, I prefer string.ToXxx(), it feels more natural :)

Quote
Are the two following operators really required? I wouldn't provide them just for convenience. Multiple functions doing the same thing often cause confusion at users – as well as some implicit conversions.

This class is particular. It's entirely provided for convenience, so its API is designed to make the user's life easier. There are potentially so many conversions between sf::String and other string types, that I really need something that operates silently.
Laurent Gomila - SFML developer

Silvah

  • Guest
New unicode / string / text handling in SFML 2
« Reply #28 on: September 15, 2010, 09:32:07 am »
Quote from: "Laurent"

I don't know if we can find the character type from an iterator; without the character type, sf::String can't deduce the source encoding and can't convert to UTF-32.
Are you looking for std::iterator_traits<>::value_type?

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
New unicode / string / text handling in SFML 2
« Reply #29 on: September 15, 2010, 09:39:12 am »
Quote
Are you looking for std::iterator_traits<>::value_type?

Absolutely.
Laurent Gomila - SFML developer