ColdFusion Internationalisation
Using ColdFusion it's fairly straight forward to make web pages appear in more than one language.
First off you need to detect what language you want the page to be in, next you load the strings you want to display from a resource file and lastly display the page in the required language.
First you need to detect what language you want the page to display in and there's a couple of ways you can do this:
- Via a URL parameter eg index.cfm?lang=en for English
- Detect the users/browsers language preference.
With the first method above it better to use Friendly URLS eg index.cfm/en for English or index.cfm/fr for French.
To detect the user language preference you can look at the HTTP_ACCEPT_LANGUAGE cgi variable. It consists of a list of languages separated by commas and each language optionally having a preference from 1 to 0. eg en-us,en;q=0.7,fr;q=0.3 Generally you only want the first two digits of each language and compare to a list of supported languages like so:
2 userlang = left(listgetat(CGI.HTTP_ACCEPT_LANGUAGE,i),2);
3 if (listfind(variables.supported, userlang) gt 0) {
4 // use first supported language found
5 lang = userlang;
6 break;
7 }
8 }
You need to be careful setting the language this way as it may not be the actual language the user wants, so if you do do this you should also give a link to the main language or to all supported languages. Also be careful not to make your code dependent on having a value in this cgi variable as most robots do not set it.
Once you have determined the language you want, you need to load the strings that need to be displayed on the page. One way of doing this is to store all of the strings in XML files (one for each page and language) and read it into a structure like so:
XML file en.xml
2<resources>
3 <company1>Class</company1>
4 <company2>Software</company2>
5 <firstname>Justin</firstname>
6 <lastname>Mclean</lastname>
7 <title>Consultant/Director</title>
8 <email>Email</email>
9 <phone>Phone</phone>
10 <skype>Skype</skype>
11 <mobile>Mobile</mobile>
12 <office>Office</office>
13 <street>18 Norton St</street>
14 <suburb>Surry Hills</suburb>
15 <state>NSW</state>
16 <postcode>2010</postcode>
17 <webtech>Web technologies and applications</webtech>
18</resources>
The ColdFusion code to read the above XML file and parse the XML into a structure is:
2
3 <cfscript>
4 xml = XMLParse(xmlfile);
5
6 for (child in xml.resources) {
7 resources[child] = xml.resources[child].XmlText;
8 }
9 </cfscript>
To improve performance, the structures from the XML can be cached on a page and language basis in an application scope singleton.
Putting all of the above together we have:
Langauge component that is a singleton - language.cfc
2 <cfset variables.supported = "en,fr,ja,de,it,es,id,zh,ko,sv,is">
3 <cfset variables.default = "en">
4 <cfset variables.page = structnew()>
5
6 <cffunction name="init" access="public">
7 <cfscript>
8 super.init();
9
10 return;
11 </cfscript>
12 </cffunction>
13
14 <cffunction name="GetLanguage" access="public" returntype="string">
15 <cfargument name="languages" type="string" required="yes">
16
17 <cfscript>
18 var lang = variables.default;
19
20 // stop robots from erroring and give them the default language
21 if (arguments.languages is "") {
22 return lang;
23 }
24
25 // find if supported languages are in users list
26 for (i=1; i lte listlen(arguments.languages); i=i+1) {
27 userlang = left(listgetat(arguments.languages,i),2);
28 if (listfind(variables.supported, userlang) gt 0) {
29 // use first supported language found
30 lang = userlang;
31 break;
32 }
33 }
34
35 return lang;
36 </cfscript>
37
38 </cffunction>
39
40 <cffunction name="LoadPage" access="public" returntype="struct">
41 <cfargument name="Language" type="string" required="yes">
42 <cfargument name="Page" type="string" required="yes">
43
44 <cfscript>
45 var xmlfile = "";
46 var child = "";
47
48 if (structkeyExists(variables, language) and structkeyExists(variables[language], page)) {
49 return variables[language][page];
50 }
51 </cfscript>
52
53 <cffile action="read" file="#ExpandPath('.')#\languages\#page#\#language#.xml" variable="xmlfile">
54
55 <cfscript>
56 xml = XMLParse(xmlfile);
57
58 if (not structkeyExists(variables, language)) {
59 variables[language] = structnew();
60 }
61 if (not structkeyExists(variables[language], page)) {
62 variables[language][page] = structnew();
63 }
64
65 // set strings based on language
66 for (child in xml.resources) {
67 variables[language][page][child] = xml.resources[child].XmlText;
68 }
69
70 return variables[language][page];
71 </cfscript>
72 </cffunction>
73
74</cfcomponent>
Code to go in your Application.cfm or onRequest method of Application.cfc
2 // function to get an instance of a singleton
3 function getInstance(name) {
4 if (not isdefined("application._singletons.#name#")) {
5 instance = createobject("component","com.classsoftware.utils.#name#").init();
6 }
7
8 return application._singletons[name];
9 }
10
11 // function to remove a singleton
12 function removeInstance(name) {
13 if (isdefined("application._singletons.#name#")) {
14 application._singletons[name].remove();
15 }
16 }
17
18 // if asked remove the language component
19 if (isdefined("url.init")) {
20 removeInstance('language');
21 }
22
23 // get the language component (creates it if it dosn't exist)
24 request.language = getInstance("language");
25
26 // Parse user languages
27 languages = "";
28 if (structkeyexists(cgi,"HTTP_ACCEPT_LANGUAGE")) {
29 languages = cgi.HTTP_ACCEPT_LANGUAGE;
30 }
31 request.language = request.language.getlanguage(languages);
32
33 // convert frendly URLS to URL variables
34 if (listlast(cgi.PATH_INFO,".") neq "cfm")
35 {
36 rhs = right(cgi.PATH_INFO,len(cgi.PATH_INFO)-len(cgi.SCRIPT_NAME));
37
38 // if xxx.cfm/yy assume yy is language
39 if (listLen(rhs,"/") is 1) {
40 url.lang = listfirst(rhs,"/");
41 }
42
43 if (rhs neq "") {
44 noitems = int(listLen(rhs,"/")/2)*2;
45
46 for (i=1; i lte noitems; i = i + 2) {
47 item = listgetat(rhs,i,"/");
48 url[item] = listgetat(rhs,i+1,"/");
49 }
50 }
51 }
52
53 // Check if language is passed in URL
54 if (isdefined("url.lang")){
55 request.language = url.lang;
56 }
57
58</cfscript>
Finally you need to set the strings on the page(s) you want to display in multiple languages. Here's a simple example:
2<body>
3...
4 <div class="name">
5 <h2 class="fn n"><span class="given-name">#lang.firstname#</span> <span class="family-name">#lang.lastname#</span></h2>
6 <h2 class="title">#lang.title#</h2>
7 </div>
8...
your xml approach is non-standard. while XLIFF (a real-world xml standard used to store localizable data) is gaining popularity (i think the icu project might swap to it in the future for internal use), most i18n apps still rely on resource bundles (a recent example, flex 2 added them). just for instance, how do you think you'll manage very large, very complex localization data? it's one thing for a simple blog but pretty much anything beyond that is going to kill you sooner or later. what tools do you have to manage the translation process? rb have several well worn tools like icu4j's rbManager, jason sheedy's cf-based application, etc. there's a whole eco-system that's grown up around rb that can help.
Personally I think that relying on location ie trying to map IP address to country causes just as many issues eg not everyone in Australia speaks English as their first language. At least with the HTTP_ACCEPT_LANGUAGE CGI variable the browser user has some control over it eg in Firefox, IE and Safari you can edit your language preferences. Any reason why you prefer geo location?
You could use my method to match on the whole language/locale by removing the left(xxx,2).
Thanks for the pointer to XLIFF I'll have a look at it.
I have used resource bundles before but as far as I can it's equivalent to what I've done here. What advantages do you see resource bundles having over XML formated data?
I "invented" this method 6-7 years ago but we no longer use it, too unreliable on it's own. Pretty much nobody else does either, for instance if you've d/l anything from source forge lately you'll see IP lookup in action. Closer to home, farcry uses it on their logs. There are more reliable methods, last year there were rumors wafting out of the academic community about using packet travel time triangulation to exactly pinpoint users. But right now, IP lookup+http_accept_language works "best". but of course, not as good as simply asking users.
If you've used rb before for anything complex, we wouldn't be having this discussion ;-) there's decades of tools built up around rb, as i said before it's a whole eco-system. Answer the questions i asked in my previous comment, the answers will tell you why we use rb.
If the http_accept_language and the geo location differ which do you use? eg Say I'm on holiday with my laptop in Italy I'd rather web site appear in English (what my http_accept_language is set to) than Italian (the geo location I'm in).
I've not seen any issues with Mac http_accept_language on Safari or Firefox. Can you give any examples? It it just an issue with older browsers/versions of the mac OS?
Re using resource bundles I see your point with the large number of tools available. I'll convert my code so it can read rb format rather than XML (fairly easy as ColdFusion has functions to read ini files which are a similar format). I had a quick look at XLIFF but is seems overly complex esp if you had to generate the XML by hand.
I'm at home w/out access to our stats but imagine http_accept_language w/50-60 languages long, all quality (the q value) equal (which is supposed to mean any of those languages would be appropriate to serve content in). OSx tended (or used to anyway) to load up this info on some boxes. throw in a whole bunch w/klingon or tengwar as the first language. If you keep tracking these you'll see enough bizarre ones.
You can't use ini files. The cf methods that read ini files doesn't handle unicode. Grab the javaRB or rbJava CFC (which one you use depends on whether you want to store the rb on classpath) from my shop's freebie page:
http://www.sustainablegis.com/things.cfm
Have a look at your blog s/w. it (in ray's original form anyway) uses a simpler (and older) utf-8 based method, which is a bit akin to your xml method. It's ok for something like this but once you get really big apps w/lots of locales to handle managing stuff starts getting to be more work than the actual development.
If you like xml, there are a bunch of free XLIFF tools & i think rbManager can import/export it.
The vacation example may be contrived but there are are large number of people in any given country at any one time who's language of choice isn't the language a geo location system would give. The issue I see is you have no way of knowing if the geo location locale should be used or the http_accept_language should be used. It possible for either of them to be incorrect so I guess it's a matter of which one is correct more often?
I do agree with you that you should enable the user to select their own language if the one displayed isn't their preferred choice.
Re the quality value on OSX from my testing with Firefox only one language will have a q=1 and with Safari it only sends one language. Perhaps it was an issue with older browsers?
Thanks for the heads up on CF and ini files and unicode support and the link to the resources on your site. I'll take a look and see what I can do to improve in my code.
"large number of people"? Can I guess that's not something you actually have any stats for? I have no idea what market your apps are targeted at but from my experience this isn't generally something to worry much about. A manual fallback choice will handle the vast majority of these rather rare cases.
The OSx thing is an issue w/multi-lingual users & the way it adds this at the drop of a hat. If you keep looking long enough, you'll start to see this sort of thing & as I said I'm at home & can't access our stats.
Of course it depends on who the audience of your site is, if you're running a tourist site or a university you would have a higher non English preference audience than a blog on ColdFusion.
Out of curiosity I had a look at the stats for my blog at browser language vs geo location and about 93% of browsers had their preferences set to English, and 81% came from English speaking countries via Geo Location.