SFML community forums
General => Feature requests => Topic started by: Haze on November 16, 2011, 01:58:04 am
-
SFML 2.0 introduces sf::String, which is really convenient for handling strings.
Since this class already has string manipulation methods (Erase, Insert, Find and iterators), that would be very useful if sf::String provides more of these.
I am thinking about basic operations such as:
- Removing leading and trailing whitespaces:
sf::String Trim() const;
And why not LTrim (leading only) and RTrim (trailing only)
- Extracting a sub-string:
sf::String Substr(size_t Index, size_t Length) const;
- Replacing occurences:
sf::String Replace(const sf::String& LookFor, const sf::String& ReplaceBy) const;
- Converting to lowercase:
sf::String ToLowerCase() const;
- Converting to uppercase:
sf::String ToUpperCase() const;
-
First, I have to say that I hate sf::String, and I don't think that I'll make it a super powerful string class.
sf::String Trim() const;
And why not LTrim (leading only) and RTrim (trailing only)
What can be considered a whitespace? In the Unicode world, there are probably more than the usual {tab, space, line feed} set.
- Extracting a sub-string:
sf::String Substr(size_t Index, size_t Length) const;
It is already pending in the task tracker.
- Replacing occurences:
sf::String Replace(const sf::String& LookFor, const sf::String& ReplaceBy) const;
Why not.
- Converting to lowercase:
sf::String ToLowerCase() const;
- Converting to uppercase:
sf::String ToUpperCase() const;
Definitely not. These operations are super complicated when dealing with the full Unicode range.
As a conclusion, I may add more functions in the future, but none that requires to interpret the characters.
You can search the task that already exists in the issue tracker, and add your suggestions to it (only Replace please, so that I won't have to duplicate my answer there :D).
Thanks for your feedback.
-
What can be considered a whitespace? In the Unicode world, there are probably more than the usual {tab, space, line feed} set.
When I made this for libMy, i simply copied the php functions default arguments found here; http://se2.php.net/manual/en/function.trim.php
Also, I made it a single function, by replacing LTrim and RTrim with an extra optional argument that could be either my::Both, my::Left or my::Right (default to both).
Why not.
If you do this, you might want to also add a "recursive" bool option, to decide wether to replace only once, or replace until needle isn't found anymore.. This has proven to be very useful for me when I wrote and used my own functions for this.
-
The definition of "whitespace" is not clear at all, for example the '\n' character may or may not be included in the definition.
http://en.wikipedia.org/wiki/Whitespace_character
If we extend it to the full Unicode range, it gets a lot more complicated.
http://en.wikipedia.org/wiki/Space_(punctuation)#Spaces_in_Unicode
(sorry about the last link, I can't get it right with BBCode)
-
First, I have to say that I hate sf::String, and I don't think that I'll make it a super powerful string class.
Sure, I bet you won't turn it into something like QString.
But std::string objects are so limited...
What can be considered a whitespace? In the Unicode world, there are probably more than the usual {tab, space, line feed} set.
I understand your concern when dealing with Unicode, that's why I suggest to keep it fast & simple (:wink:) and just use the good old standard isspace function:
int isspace ( int c ); (http://www.cplusplus.com/reference/clibrary/cctype/isspace/)
Let's face it, that would cover most of the cases people care about.
You can search the task that already exists in the issue tracker, and add your suggestions to it (only Replace please, so that I won't have to duplicate my answer there :D).
Done. Sorry for not notifying the github issue (https://github.com/SFML/SFML/issues/21) first!
-
Sure, I bet you won't turn it into something like QString.
But std::string objects are so limited...
My main concern in SFML was initially to allow easy conversions between encodings and character types. That's all I needed. But then the only solution that came up was to rewrite a string class, and thus I had to duplicate a lot of functionality -- that's why I don't like sf::String.
I'd be a lot happier if people could just use their preferred string class and use SFML only to convert from/to the desired encoding. So no, I won't make something better than std::string.
just use the good old standard isspace function
A trim function would already be a step in the direction of QString. If I implement this function, why not implementing truncate, chop, left, mid, right, justified, ... ? :P
-
I get your point; I guess it's all about finding the right balance.
Thanks for your time by the way, I always enjoy reading those brainstorming discussions.
-
But std::string objects are so limited...
std::string's interface is already heavily bloated. It is even a classic example of a monolith class, see here (http://www.gotw.ca/gotw/084.htm). I don't think it would be clever to add even more member functions.
If the functionality were added in form of free, generic functions operating on iterator ranges, they could also be applied to other containers than std::string. You should take a look at the Boost.StringAlgorithm (http://www.boost.org/doc/libs/1_48_0/doc/html/string_algo.html) library. If you make sf::String accessible via Boost.Range, you can apply any Boost algorithms on it, just like on std::string. That's generic programming :)
-
Just a suggestion : wouldn't it be simpler to still use std::string and add a copy of boost string algorithms in a SFML namespace? That way you don't loose people who rely on std::string, you get algorithms for free by stripping boost algorithms functions and you don't have to bother anymore about this whole string thing.
I don't know if it's a good solution but i feels like it might be a good compromise.
-
What's the point of moving Boost functions to the SFML namespace?
As I mentioned, Boost.StringAlgorithms are generic. Therefore, they can also work with sf::String, as long as you define the interface for Boost.Range (e.g. free begin() and end() functions). So, there's no sense in copying the algorithms.
Apart from that, std::string is not suitable for SFML because it lacks unicode support.
-
Simply to avoid adding boost as a dependency. It wouldn't be the first project that takes the code of a library in boost and add it in it's own (Ogre, CppCms are two good examples).
About Unicode, does sfml::String managge anything unicode or does it just ask the user to make sure the data encoding is unicode? AFAIK it's the second, so I don't see what unicode support would be required. Assuming that every string entering SFML is UTF-8 is far enough and makes things simple.
-
SFML works with UTF-32, so std::string is not a good candidate. And SFML does convert automatically between encodings, that's exactly why sf::String exists.
-
Oh, Ok, didn't know that.
These times experts tends to say that in the end using UTF-8 makes everything simpler for some reasons, than other Unicode encodings, so I thought it was the same with SFML.
I tend to use UTF8-CPP (http://utfcpp.sourceforge.net/) to manage conversions between encodings, as it works with std::basic_string and I know I'll have only UTF-N text anyway in my projects.
-
Using std::basic_string for UTF-8 or UTF-16 is really a bad idea in my opinion. Many functions of its public API don't make sense and cannot be used, because it considers that each item of the container represents one character -- which is false with these encodings.
A real UTF-8 string class must be able to hide the fact that some characters may be made of several bytes.
These times experts tends to say that in the end using UTF-8 makes everything simpler for some reasons, than other Unicode encodings
I think it's true when you consider size only. English strings, for example, only use the ASCII range and so a UTF-8 english string will be equivalent to an ASCII string -- no space wasted.
But for other languages, UTF-8 makes things more complicated to handle because one character can be composed by multiple bytes, and you have to hide this from the end user. So it makes your code more complicated and more expensive for the CPU.
UTF-32 is very easy to handle (one uint32 is always one character), it just costs a little more memory but this is not a problem at all for SFML's target platforms and applications.
-
Using std::basic_string for UTF-8 or UTF-16 is really a bad idea in my opinion. Many functions of its public API don't make sense and cannot be used, because it considers that each item of the container represents one character -- which is false with these encodings.
A real UTF-8 string class must be able to hide the fact that some characters may be made of several bytes.
I agree, there have been a lot of discussions about this problem in the beginning of this year in the boost mailing list. Some solutions have been proposed but I guess until one propose a working solution the problem will be "to be solved" for a long time :/
These times experts tends to say that in the end using UTF-8 makes everything simpler for some reasons, than other Unicode encodings
I think it's true when you consider size only. English strings, for example, only use the ASCII range and so a UTF-8 english string will be equivalent to an ASCII string -- no space wasted.
But for other languages, UTF-8 makes things more complicated to handle because one character can be composed by multiple bytes, and you have to hide this from the end user. So it makes your code more complicated and more expensive for the CPU.
UTF-32 is very easy to handle (one uint32 is always one character), it just costs a little more memory but this is not a problem at all for SFML's target platforms and applications.
Well, most Boost devs have other arguments about that and suggest that in the end even for asian languages UTF-8 is still the best compromise.
That being said, I'm not expert enough to have a clear advice about it. One thing they said, if I remember correctly, is that the algorithms to ... err... "read"? UTF-8 should be simpler to implement than for other encoding. Or somthing like that.
I'm using UTF-8 mainly because it's a simple solution for me but I never had a problem with the sfml::String class so I guess it don't matter since smlf::String will do the conversion automatically, right?
-
Could you give me a link to the related thread(s) on the boost mailing list? I'd like to read that.
-
Here is the more recent discussion about the subject, it's a review of a Google Summer of Code project : http://boost.2283326.n4.nabble.com/gsoc-Request-Feedback-for-Boost-Ustr-Unicode-String-Adapter-td3725600.html
They get again into the same arguments but don't talk a lot about the benefits of UTF-8 in this one.
I tried to find all the discussions that spawn in the beginning of the year :
- I think this one was the root from which other discussions spawned : http://boost.2283326.n4.nabble.com/general-What-will-string-handling-in-C-look-like-in-the-future-was-Always-treat-tt3224967.html
- http://boost.2283326.n4.nabble.com/string-proposal-tt3229406.html?bcsi_scan_0FB23122DF0CD2C6=0&bcsi_scan_filename=string-proposal-tt3229406.html
- http://boost.2283326.n4.nabble.com/string-gt-text-tt3243373.html
- http://boost.2283326.n4.nabble.com/string-Realistic-API-proposal-tt3244173.html
- http://boost.2283326.n4.nabble.com/UTF-String-UTF-String-library-1-5-ready-for-perusal-tt3297381.html
- http://boost.2283326.n4.nabble.com/string-Yet-another-Unicode-string-class-tt3300220.html
- http://boost.2283326.n4.nabble.com/UTF-String-Feedback-on-UTF-String-library-please-tt3301346.html
It's in chronological order of the first posting of each thread.
That being said, the boost::locale review had a lot of thinking on the Unicode subject because Artyom (the Locale author) says he's a specialist in localization domain and that in the end UTF-8 was the only long-term solution (he tends to say things in a harsh way...)
I think lot of the Boost.Locale documentation (http://www.boost.org/doc/libs/1_48_0/libs/locale/doc/html/index.html) gives informations about the subject, like http://www.boost.org/doc/libs/1_48_0/libs/locale/doc/html/index.html
Personally, I just wish they provide a std::text or something that is encoding aware and let me forget about this very subject :x
(but it's fascinating to see a lot of big brains having headache discussions about a such a subject)
-
Nice, thank you :)
-
No problem, looks like I'm not finished :D
Here is the root of the one I said was the root discussion :
http://boost.2283326.n4.nabble.com/General-Treat-narrow-strings-as-UTF-8-compilation-flag-tt3646453.html#a3686141
It referes to yet another root discussion... I'll let you find it XD
Also, some other discussions related to this (certainly refered in previous links):
- http://boost.2283326.n4.nabble.com/UTF-String-UTF-String-library-1-5-ready-for-perusal-tp3297381p3297381.html
- http://boost.2283326.n4.nabble.com/GSoC-Proposal-Preparation-For-Encoding-Awared-String-tp3381149p3381149.html << first post about the summer of code project
- http://boost.2283326.n4.nabble.com/GSoC-Proposal-Preparation-For-Encoding-Awared-String-tp3381149p3381149.html << abandonned implemetnation