SFML community forums

General => General discussions => Topic started by: Laurent on November 26, 2009, 09:16:05 am

Title: New unicode / string / text handling in SFML 2
Post by: Laurent on November 26, 2009, 09:16:05 am
Hi

I've done many changes regarding strings and text:

- The low level Unicode handling now happens the in sf::Utf<X> classes (Utf<8>, Utf<16>, Utf<32> -- typedef'd to Utf8, Utf16, Utf32), with static functions. There are more functions, and the API is more consistent

- The sf::Unicode::Text class was replaced with sf::String. It's still implemented with UTF-32, and still handles automatically the conversions from/to ANSI and wide standard strings. It now contains more operators and functions, so that it is directly usable without having to cast it to another string type. However, I'm not 100% fan of this design and it might change in the future. If you have interesting ideas regarding this class, feel free to share them with me :)

- The sf::String graphics class was renamed to sf::Text. GetText and SetText were renamed to resp. GetString and SetString

Enjoy :)
Title: New unicode / string / text handling in SFML 2
Post by: Dominator on November 26, 2009, 12:53:48 pm
I'm using VS2008 and trying to recompile SFML2-SVN, but the linker  complains about unresolved external symbols regarding sf::String in csfml-network.

The other projects compiled fine though.
Title: New unicode / string / text handling in SFML 2
Post by: Laurent on November 26, 2009, 01:57:54 pm
I fixed this bug this morning, but I think I forgot to commit the changes.
Title: New unicode / string / text handling in SFML 2
Post by: Tank on November 26, 2009, 08:01:31 pm
I was really looking forward for this feature. :) I'll check it out and drop you some comments then.
Title: New unicode / string / text handling in SFML 2
Post by: Tank on December 02, 2009, 01:18:32 am
Okay, I took a quick look into it. Looks good so far, I think. I'm not so happy that it couldn't be implemented by keeping at the STL, but it's nearly impossible because of the other bindings, I guess. This way it's a lot more portable.

However some methods are missing, like Insert(), SubString() etc. Also, Erase() should accept an Iterator, too (overloaded method).

I will try to use the new sf::String in one of my bigger projects that's using SFML2. But before I can actually do that, I need at least Insert(). ;)
Title: New unicode / string / text handling in SFML 2
Post by: Laurent on December 02, 2009, 09:05:48 am
Quote
However some methods are missing, like Insert(), SubString() etc. Also, Erase() should accept an Iterator, too (overloaded method).

Absolutely ;)
Like I said, this solution is kind of temporary, I will implement all the missing functions only when I'm 100% sure that I keep it.
Of course I'll implement everything that is in std::string, although I don't like this solution. I guess it would have been much easier if string functions in the STL were generic algorithms operating on a pair of iterators.

Quote
but it's nearly impossible because of the other bindings, I guess

sf::String is not implemented in any binding, actually. Each language already has its own string class supporting unicode natively.
The reason is that I simply couldn't find a clean and easy way to mix the features I needed (mainly automatic conversions between types/encodings) and std::basic_string. Every solution I found involved using a more verbose syntax for every string manipulation.
Title: New unicode / string / text handling in SFML 2
Post by: panithadrum on December 02, 2009, 12:37:53 pm
So what wil be the final product? Will sf::String stand or will you finally come with a std::string solution.
Title: New unicode / string / text handling in SFML 2
Post by: Laurent on December 02, 2009, 12:43:21 pm
sf::String is the current final product, it is the best solution I've found so far. But I'm not really satisfied of it, so if I find a better solution it might change again.

Don't forget that you can still use std::string and std::wstring instead of sf::String for string manipulations. Or even a better external string class. You can also write your own, with the sf::Utf low-level functions.
Title: New unicode / string / text handling in SFML 2
Post by: Tank on December 02, 2009, 02:08:21 pm
Quote
I guess it would have been much easier if string functions in the STL were generic algorithms operating on a pair of iterators.

Yep, that's absolutely true.

I remember you once talked about a templated version of a character, like QChar in Qt. This way you keep the STL algorithms but can still do conversions for specified templates.

However, I think you've already thought about that. The current sf::String is fine and will work as supposed. Indeed directly using the STL would be great, but if that complicates things a lot, then just keep the current design.
Title: New unicode / string / text handling in SFML 2
Post by: Laurent on December 02, 2009, 02:31:26 pm
Quote
I remember you once talked about a templated version of a character, like QChar in Qt. This way you keep the STL algorithms but can still do conversions for specified templates.

There are two problems with a generic Char class:
1. It cannot conveniently manipulate UTF-8 and UTF-16, where a single character can be represented by multiple elements (what would Char::ToUtf8() return?).
2. It doesn't help regarding the "automatic conversion" feature, I still need a class on top of standard strings for that (I don't want to use non-member functions in order to keep conversions implicit; I don't want something like text.SetString(sf::Utf32("blah"))
Title: New unicode / string / text handling in SFML 2
Post by: Tank on December 02, 2009, 02:36:57 pm
Okay I see, those are good reasons. Well, then the current implementation seems to be alright. :)
Title: New unicode / string / text handling in SFML 2
Post by: Tank on February 24, 2010, 07:10:13 pm
Are there any new thoughts on this? Would be very helpful when the sf::String class would be more feature-complete. ;)
Title: New unicode / string / text handling in SFML 2
Post by: Laurent on February 24, 2010, 07:16:01 pm
No, sorry. I'm working on something totally different now.

But if you have new ideas (or at least requests), don't hesitate to tell me ;)
Title: New unicode / string / text handling in SFML 2
Post by: Tank on February 24, 2010, 09:59:20 pm
Not really new ideas, still the old ones.

I think the current design is fine (the discussion happened already). What it lacks are the missing methods. Important should be: Insert, Find, SubString, At...uhm, probably you better take a look at the std::string documentation. ;)
Title: New unicode / string / text handling in SFML 2
Post by: Nexus on February 24, 2010, 10:25:28 pm
Quote from: "Tank"
At...uhm, probably you better take a look at the std::string documentation. ;)
Probably it's better not to repeat the design mistake of the standard libary and write 103 member functions. Especially the 24 versions of find() should be given a thought, I don't think a  find_last_not_of() is necessary. Be aware that a lot of functionality can be achieved by using STL algorithms on the iterator range.

Here (http://www.gotw.ca/gotw/084.htm) is an interesting article concerning std::string's design on Guru of the Week (GotW).
Title: New unicode / string / text handling in SFML 2
Post by: Tank on February 25, 2010, 09:48:21 am
Thanks for the link Nexus, it's an interesting article.

I didn't mean that sf::String should be a clone of std::string. I more meant that it should cover a similar functionality -- which doesn't include that it has to be implemented the same way.

But you're absolutely right. I took another look at the sf::String API docs and discovered Begin() and End(). Having those we're of course able to use a lot of STL algos, like you said (especially find() and search()). Is there a way to insert an iterator range (or at least single values) at an iterator? I didn't find such an algorithm and can't think of another way to do it. If it's possible, well then sf::String would be almost feature-complete. ;)
Title: New unicode / string / text handling in SFML 2
Post by: Laurent on February 25, 2010, 09:54:53 am
No it's not possible, I think I have to add a few member functions.
Title: New unicode / string / text handling in SFML 2
Post by: Tank on February 25, 2010, 10:18:02 am
Okay, then what about giving access to the underlying std::basic_string to the outside?

Seriously, I don't think this is a bad idea. Even when I modify the contents of the interal std::basic_string of sf::String, the state of sf::String doesn't get changed, does it? As far as I see, sf::String is mostly for conversions.

But having a (non-const!) reference to the internal std::basic_string, we'd have all the methods available for std::basic_string and STL algorithms.

When you write new methods like Insert() and Find(), then those are nothing more than pure wrappers. I don't see a good reason to write such.

Edit:
However, being forced to write code like
Code: [Select]
mystring.GetString() += sf::String( "foo" );
is also not the best solution IMHO.

Maybe sticking back to the old design is a better solution. My proposal would be a
Code: [Select]
typedef std::basic_string<sf::Uint23>  sf::String
and a bunch of global conversion functions (maybe implicit conversions are still possible).
Title: New unicode / string / text handling in SFML 2
Post by: Nexus on February 25, 2010, 03:17:40 pm
Quote from: "Tank"
Is there a way to insert an iterator range (or at least single values) at an iterator?
No, its not possible, as it's neither possible to remove a value from an iterator range (std::remove() and std::remove_if() just reorder and copy parts of the sequence, the total number of elements remains the same). The problem is, the container has to be known to perform such operations. But an important property of an iterator is, that it abstracts from containers.

Another very interesting link to a presentation of Andrei Alexandrescu: Iterators Must Go (http://www.boostcon.com/site-media/var/sphene/sphwiki/attachment/2009/05/08/iterators-must-go.pdf). He describes some problems of iterators and shows a nice alternative.

Back to topic:
Quote from: "Tank"
When you write new methods like Insert() and Find(), then those are nothing more than pure wrappers. I don't see a good reason to write such.
The adapter approach has the advantage of cutting off some hardly ever used functionality like find_last_not_of() or get_allocator() and therefore keeping the interface simple. I understand that the thought that SFML has an own string class isn't extremely comfortable. But due to the lack of a general-case unicode string class in the standard library, this is probably not a bad decision. Note that the typedef std::basic_string<sf::Uint32> isn't really better in this respect. The opposite is the case: One can no longer write simple statements like
Code: [Select]
myText.SetString("hello");since there is an explicit conversion function required. And it would be the only class in SFML using a different naming convention (under_scores). Those points might confuse several beginners. By contrast, an own sf::String class is more flexible and probably easier to use.

About returning a reference to the internal basic_string: Hmm, I don't know whether it's a good idea. What would you need that for? If Laurent overloaded the Insert() functions, the class would already be quite useful in my opinion. I also think, one Find() would be appropriate (the most general one):
Code: [Select]
unsigned int sf::String::Find(const sf::String& substring, unsigned int startPosition = 0);
Title: New unicode / string / text handling in SFML 2
Post by: Tank on February 25, 2010, 06:52:43 pm
Quote from: "Nexus"
The problem is, the container has to be known to perform such operations. But an important property of an iterator is, that it abstracts from containers.

Exactly, that's why I jumped in with opening the underlying basic_string for the outside. Not that I like this idea very much, but it would eliminate the problem.

Quote
He describes some problems of iterators and shows a nice alternative.

I've read the papers and like what's being shown. But I think it will take a very long time (if it happens at all) for it to be available in a regular C++ environment.

Quote
and therefore keeping the interface simple.

True. Personally I think mixing method names alone would be a design break.

Quote
I understand that the thought that SFML has an own string class isn't extremely comfortable.

I don't agree with that, since I'd be happy to have a well written string class in SFML. SFML takes care of so many facts in multi-platform environments, so why not a basic one like Unicode strings? Also, SFML itself makes use of them, so it should at least bring in an external Unicode library or classes collection or provide its own. The latter is my favorite, because it keeps the code style consistent with the rest.


Quote
About returning a reference to the internal basic_string: Hmm, I don't know whether it's a good idea. What would you need that for?

I've just supposed this regarding to the iterator problem and not being able to modify the string. If Laurent's facing towards writing own methods, that'd be of course completely irrelevant and I'd be happy to use those instead of basic_string's.

Quote
I also think, one Find() would be appropriate (the most general one)

I agree with that -- but don't forget a ReverseFind(), it's often very useful. ;)

However, Laurent, get your pants on and write those methods. ;) Seriously: If you don't have the time for it, I'd be glad to help you out on this -- following all your coding styles/guidelines. I mean it's not really hard to implement some wrapper methods for basic_string. So this could save you (and especially me ;)) some time.
Title: New unicode / string / text handling in SFML 2
Post by: Laurent on February 25, 2010, 07:10:00 pm
Don't worry, I have much more spare time than you think ;)

I'll write those functions as soon as I finish my current task.
Title: New unicode / string / text handling in SFML 2
Post by: Tank on February 25, 2010, 07:15:20 pm
Okay, thanks a lot. :)
Title: New unicode / string / text handling in SFML 2
Post by: Nexus on February 25, 2010, 08:35:22 pm
Quote from: "Tank"
I've read the papers and like what's being shown. But I think it will take a very long time (if it happens at all) for it to be available in a regular C++ environment.
Boost has a Range library. But regarding the acceptance and spreading, I think similarly. C++0x might also be a challenge for a big part of the C++ community. Propably several developpers won't use the new features because they don't see any advantages or can't switch to new compilers.

Quote from: "Tank"
I don't agree with that, since I'd be happy to have a well written string class in SFML. SFML takes care of so many facts in multi-platform environments, so why not a basic one like Unicode strings? Also, SFML itself makes use of them, so it should at least bring in an external Unicode library or classes collection or provide its own. The latter is my favorite, because it keeps the code style consistent with the rest.
Actually, I thought you would prefer a standard solution, obviously I misunderstood you. ;)

Of course, it would not be bad if the C++ standard library supported Unicode in a better way. Considering the status quo, I find sf::String a good solution, too. :)
Title: New unicode / string / text handling in SFML 2
Post by: Tank on February 26, 2010, 09:52:14 am
Quote from: "Nexus"
C++0x might also be a challenge for a big part of the C++ community. Propably several developpers won't use the new features because they don't see any advantages or can't switch to new compilers.

Exactly. Also there's so much library code already out there which even uses code standards before the current one. However, I'd be very happy so see the new features natively built into the compiler.

Quote
Actually, I thought you would prefer a standard solution, obviously I misunderstood you. ;)

You didn't. I was digging for quick solutions for the current situation. But as this discussion went on, I changed me mind. :)

Quote
Of course, it would not be bad if the C++ standard library supported Unicode in a better way. Considering the status quo, I find sf::String a good solution, too. :)

In a better way, or at all? ;) It's a fact that C++  is still rather low-level. Of course there're many libraries out there which try to close the gaps. But other languages are just better in certain areas, and strings (or string processing) is definitely one of them. However, it's getting better, and I'm really looking forward to the new features.
Title: New unicode / string / text handling in SFML 2
Post by: Laurent on February 26, 2010, 10:42:03 am
I added Insert and Find (no ReverseFind at the moment ;)), as well as constructors for converting from single characters. That makes all functions taking a String work for single characters as well, without writing 4 overloads for each.

I'm still not satisfied with this solution. For example, using STL generic algorithms cannot be done because none of the member functions of sf::String take iterators. Maybe I'll have to define more overloads to handle that...

I hate this class :lol:
Title: New unicode / string / text handling in SFML 2
Post by: Tank on February 26, 2010, 01:58:40 pm
Seriously, i love you. ;) This brings a project of mine a lot forward.

You hate it? Well I think there isn't a solution that satisfies all problems. Either go with the STL and lose consistency and the easy nature or go with the current design and write some boring overloads. ;)
Title: New unicode / string / text handling in SFML 2
Post by: Nexus on September 15, 2010, 01:46:07 am
In my opinion, sf::String member functions taking iterators are quite important. This includes the constructor, so that one can easily create sf::String from iterator ranges, independent of the underlying container.
Code: [Select]
std::string buffer;
sf::String subString(buffer.begin(), buffer.begin() + 5);

Substrings are easy to implement provided the iterator interface. At the moment, there is no way to add characters except using Insert(), which is not helpful for substrings, because it only accepts complete sf::String objects. The workaround is rather ugly:
Code: [Select]
sf::String SubString(const sf::String& source, unsigned int begin, unsigned int length)
{
std::basic_string<sf::Uint32> buffer(source.Begin() + begin, source.Begin() + begin + length);
return sf::String(buffer);
}

I personally find the single-character constructors rather questionable, but I understand they are a simple way to allow concatenations without masses of + and += overloads. However, the constructors could be extended to take a second parameter (similar to std::string, but in reverse order), allowing construction of fillers like whitespaces or dots:
Code: [Select]
sf::String(char fillCharacter, size_t length = 1);
Finally, I don't see much benefit in the coexistence of To...() methods and conversion operators. Are the two following operators really required? I wouldn't provide them just for convenience. Multiple functions doing the same thing often cause confusion at users – as well as some implicit conversions.
Code: [Select]
operator std::string()
operator std::wstring()
Title: New unicode / string / text handling in SFML 2
Post by: Laurent on September 15, 2010, 08:13:59 am
Quote
In my opinion, sf::String member functions taking iterators are quite important. This includes the constructor, so that one can easily create sf::String from iterator ranges, independent of the underlying container.

I don't know if we can find the character type from an iterator; without the character type, sf::String can't deduce the source encoding and can't convert to UTF-32.

Quote
At the moment, there is no way to add characters except using Insert(), which is not helpful for substrings, because it only accepts complete sf::String objects. The workaround is rather ugly

I agree.

Quote
Finally, I don't see much benefit in the coexistence of To...() methods and conversion operators

Implicit cast operators alone are never enough. There are always situations where people want to convert explicitely, for example if they don't store the result directly into a std::string / std::wstring or whatever. And using static_cast<target_type>(string)... well, I prefer string.ToXxx(), it feels more natural :)

Quote
Are the two following operators really required? I wouldn't provide them just for convenience. Multiple functions doing the same thing often cause confusion at users – as well as some implicit conversions.

This class is particular. It's entirely provided for convenience, so its API is designed to make the user's life easier. There are potentially so many conversions between sf::String and other string types, that I really need something that operates silently.
Title: New unicode / string / text handling in SFML 2
Post by: Silvah on September 15, 2010, 09:32:07 am
Quote from: "Laurent"

I don't know if we can find the character type from an iterator; without the character type, sf::String can't deduce the source encoding and can't convert to UTF-32.
Are you looking for std::iterator_traits<>::value_type?
Title: New unicode / string / text handling in SFML 2
Post by: Laurent on September 15, 2010, 09:39:12 am
Quote
Are you looking for std::iterator_traits<>::value_type?

Absolutely.
Title: New unicode / string / text handling in SFML 2
Post by: Nexus on September 15, 2010, 10:33:16 am
Quote from: "Laurent"
Implicit cast operators alone are never enough.
I rather meant the conversion functions would be enough. :P

I just think, the need to provide explicit functions in addition to the conversion operators is a hint, that the conversion can be ambiguous and that the operators are not appropriate. But okay, let's make an exception for the often used string conversions. I'll probably benefit from them, too :)

(By the way, you can also call them explicitly. Very beautiful ;))
Code: [Select]
myString.operator std::string()