...and I totally forgot point 5:
5) if you intend to seriously work with Unicode and UTF-8 and UTF-16 in your next C++ or Java application then probably your best choice is to use the ICU library from IBM (
http://www.icu-project.org/ ) which is open source, feature rich, mature and -probably most important- considered to be working correctly. Yes, it's a huge monster of a library, but if you want to create a truly international application you need such a monster of a huge library.
(ever asked yourself what a invisible, zero width, text direction changing character will do to your text render engine? ever asked yourself how selecting text with the mouse will work in a text editor being able to seamlessy mix left-to-right and right-to-left text? ever asked yourself how to compare two visibly totally identical strings which contain several, different control characters? No? Oh, what a pitty, but welcome to the wonderful world of Unicode...)
P.S.: have fun selecting the text below (it's from a Arabic news site, probably something about soccer):
English left to right text here ديفيد فيا يسطع في سماء يورو 2008 وصدمة لليونان في بداية رحلة الدفاع عن again some left to right text