Part of the EllisLab Network
   
 
Converting to UTF-8
Posted: 22 February 2007 09:51 AM   [ Ignore ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  489
Joined  06-16-2006

Anyone know of an easy way to encode all files in an directory to UTF-8? Or can someone provide me with a UTF-8 encoded CI 1.5.2?

Could CI be released in UTF-8 (too(?)) in the future?

 Signature 

Best regards, Zacharias.
Matchbox - Modular Separation | Wick - Controller Loader

Profile
 
 
Posted: 22 February 2007 11:36 AM   [ Ignore ]   [ # 1 ]  
Grad Student
Avatar
Rank
Total Posts:  62
Joined  05-28-2006
Zawk - 22 February 2007 09:51 AM

Could CI be released in UTF-8 (too(?)) in the future?

As far as CI is coded with english characters only, its ASCII encoded, and thus UTF-8 too grin

Profile
 
 
Posted: 22 February 2007 11:44 AM   [ Ignore ]   [ # 2 ]  
Grad Student
Avatar
Rank
Total Posts:  62
Joined  05-28-2006
Zawk - 22 February 2007 09:51 AM

Anyone know of an easy way to encode all files in an directory to UTF-8?

The problem is more yur editor capability to store a file in UTF-8 I guess. But to answer your question, I’d say : it depends on your OS and your tools grin

If you’re on GNU/Linux, just a

for f in `find . -type f`; do
  
cp $f $f.tmp && iconv -f ISO_8859-1 -t UTF-8 $f.tmp > $f && rm $f.tmp
done

whould suffice. This will convert all files recursively from the current directory to UTF-8 (in files are already in UTF-8, I don’t know the result wink).

On Windows, I guess you should rely on an external tool.

Profile
 
 
Posted: 22 February 2007 03:14 PM   [ Ignore ]   [ # 3 ]  
Research Assistant
Avatar
RankRankRank
Total Posts:  489
Joined  06-16-2006

I’m not on GNU/Linux grin But what do you mean about it being UTF-8 too? I put in a special sign in one of the ASCII files and set the character encoding to UTF-8 in a meta tag, and it showed the question mark. I can though change the files encoding in my editor .. one file at a time .. and it works perfectly, I’m just not in the mood to go all the files through raspberry

 Signature 

Best regards, Zacharias.
Matchbox - Modular Separation | Wick - Controller Loader

Profile
 
 
Posted: 22 February 2007 04:35 PM   [ Ignore ]   [ # 4 ]  
Research Assistant
RankRankRank
Total Posts:  895
Joined  07-10-2006

If you do a search on the forums for UTF-8, you should get multiple threads about CI’s existing support for UTF-8. Jozef wrote up a cheat sheet at one time with guidelines for using CI with UTF-8. It may be posted on the Wiki. As I recall, CI can support UTF-8 with some limitations (i.e., one or more libraries did not provide adequate support (validation comes to mind)). Some CI developers are appear to be using one of two methods for handling multilanguage sites (based on the wiki) with UTF-8 and other encodings.

Riri misread your post. He was stating that CI was written using ASCII Decimal which is the base character set for UTF-8.

There are some tools on SourceForge that language translators use to convert various documents with different encodings to other encoding (maybe the babel toolkit or something like that). You might try doing a separate search there. If you can’t find them, leave me a PM and I’ll try to find them here.

The only framework that I know of that claims to support UTF-8 using PHP native libraries is Akelos (a PHP implementation of Ruby on Rails). I could be wrong but believe that Bernie (the Akelos author) checks phpinfo to determine if mbstring is installed and uses Unicode encodings if true for the current language, but falls back on UTF-8 in conjunction with Harry Fuecks UTF-8 string handling functions for PHP (designed originally for the WACT framework).

Profile
 
 
Posted: 23 February 2007 04:45 AM   [ Ignore ]   [ # 5 ]  
Grad Student
Avatar
Rank
Total Posts:  62
Joined  05-28-2006
Zawk - 22 February 2007 03:14 PM

I’m not on GNU/Linux grin But what do you mean about it being UTF-8 too? I put in a special sign in one of the ASCII files and set the character encoding to UTF-8 in a meta tag, and it showed the question mark. I can though change the files encoding in my editor .. one file at a time .. and it works perfectly, I’m just not in the mood to go all the files through raspberry

That’s the default ‘save’ format of your editor I guess, not a CI problem - but look at the other answer below grin

Profile
 
 
Posted: 23 February 2007 04:50 AM   [ Ignore ]   [ # 6 ]  
Grad Student
Avatar
Rank
Total Posts:  62
Joined  05-28-2006
esra - 22 February 2007 04:35 PM

If you do a search on the forums for UTF-8, you should get multiple threads about CI’s existing support for UTF-8. Jozef wrote up a cheat sheet at one time with guidelines for using CI with UTF-8. It may be posted on the Wiki. As I recall, CI can support UTF-8 with some limitations (i.e., one or more libraries did not provide adequate support (validation comes to mind)). Some CI developers are appear to be using one of two methods for handling multilanguage sites (based on the wiki) with UTF-8 and other encodings.

Riri misread your post. He was stating that CI was written using ASCII Decimal which is the base character set for UTF-8.

That’s what I stated grin

I didn’t have in mind some problems with some libraries, but it’s only on character recognition I’m right ? For exemple, the validation could not validate the form because some UTF-8 specific characters won’t be treated as classic ones, even if they’re valid.

I’m sure that’s not what he had in mind (more a file save format one), but sure this’s a real problem grin I’ll check on the forums to see UTF-8 issues too.

Profile
 
 
Posted: 23 February 2007 11:01 AM   [ Ignore ]   [ # 7 ]  
Research Assistant
RankRankRank
Total Posts:  895
Joined  07-10-2006

Correct. I believe that it is a string handling problem with the Validator and possibly some helpers not being fully UTF-8 aware. Jozef or someone else identified some of the framework problems with UTF-8 string handling in a post back around August or September of last year. I printed the thread for future reference and will try to get back with a url. If you do searches for internationalization and localization or i18n or l10n, you might find the thread yourself.

Profile
 
 
Posted: 11 June 2007 04:18 PM   [ Ignore ]   [ # 8 ]  
Summer Student
Avatar
Total Posts:  30
Joined  05-03-2007

A STRICT HTML 4.01 page, which the latest DreamWeaver (CS) validated i.e. with no objections, rendered, perfectly, in both DreamWeaver’s internal rendering engine, as well as in the latest Internet Explorer (7).

When run through CI MVC, the page showed occasional AE (Euro signs).  At first, I thought it might be an escaping issue; but, after a day of reading forum postings, I tried changing the page DocType from UTF-8 to Western European, as follows:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

Bingo ... no more weird AE signs !

I don’t know if this constitutes a UTF-8 support bug ... or not?

I’m just posting fyi ...

I’ve read that PHP6 will support UTF-8; maybe the issue will disappear, automagically, with PHP6?

 Signature 

Mike

Profile
 
 
   
 
 
Post Marker Legend
New Topic New posts Hot Topic Hot Topic with new posts New Poll New Poll Moved Topic Moved Topic Sticky Topic Sticky topic
Old Topic No new posts Hot Old Topic Hot Topic with no new posts Old Poll Old Poll Closed Topic Closed Topic Announcement Announcements
Theme
Change Theme
Visitor Statistics
The most visitors ever was 719, on June 06, 2008 10:16 AM
Total Registered Members: 60711 Total Logged-in Users: 17
Total Topics: 73163 Total Anonymous Users: 1
Total Replies: 394596 Total Guests: 347
Total Posts: 467759    
Members ( View Memberlist )
Active Members:    awptiazulcmbojackCrucialDark Preacherinparojacksonj04JoostVjtkendallLuci3nMgM WebmwmerzNachoredwizSabotsocstix