ColdFusion Internationalisation

Using ColdFusion it's fairly straight forward to make web pages appear in more than one language.

First off you need to detect what language you want the page to be in, next you load the strings you want to display from a resource file and lastly display the page in the required language.

First you need to detect what language you want the page to display in and there's a couple of ways you can do this:

  1. Via a URL parameter eg index.cfm?lang=en for English
  2. Detect the users/browsers language preference.

With the first method above it better to use Friendly URLS eg index.cfm/en for English or index.cfm/fr for French.

To detect the user language preference you can look at the HTTP_ACCEPT_LANGUAGE cgi variable. It consists of a list of languages separated by commas and each language optionally having a preference from 1 to 0. eg en-us,en;q=0.7,fr;q=0.3 Generally you only want the first two digits of each language and compare to a list of supported languages like so:

view plain print about
1for (i=1; i lte listlen(CGI.HTTP_ACCEPT_LANGUAGE); i=i+1) {
2            userlang = left(listgetat(CGI.HTTP_ACCEPT_LANGUAGE,i),2);    
3            if (listfind(variables.supported, userlang) gt 0) {
4                // use first supported language found
5
                lang = userlang;
6                break;
7            }
8        }

You need to be careful setting the language this way as it may not be the actual language the user wants, so if you do do this you should also give a link to the main language or to all supported languages. Also be careful not to make your code dependent on having a value in this cgi variable as most robots do not set it.

Once you have determined the language you want, you need to load the strings that need to be displayed on the page. One way of doing this is to store all of the strings in XML files (one for each page and language) and read it into a structure like so:

XML file en.xml

view plain print about
1<?xml version="1.0" encoding="utf-8"?>
2<resources>
3    <company1>Class</company1>
4    <company2>Software</company2>
5    <firstname>Justin</firstname>
6    <lastname>Mclean</lastname>
7    <title>Consultant/Director</title>
8    <email>Email</email>
9    <phone>Phone</phone>
10    <skype>Skype</skype>
11    <mobile>Mobile</mobile>
12    <office>Office</office>
13    <street>18 Norton St</street>
14    <suburb>Surry Hills</suburb>
15    <state>NSW</state>
16    <postcode>2010</postcode>
17    <webtech>Web technologies and applications</webtech>
18</resources>

The ColdFusion code to read the above XML file and parse the XML into a structure is:

view plain print about
1<cffile action="read" file="#ExpandPath('.')#\languages\#page#\#language#.xml" variable="xmlfile">
2        
3        <cfscript>
4        xml = XMLParse(xmlfile);
5
6        for (child in xml.resources) {
7            resources[child] = xml.resources[child].XmlText;
8        }
9        
</cfscript>

To improve performance, the structures from the XML can be cached on a page and language basis in an application scope singleton.

Putting all of the above together we have:

Langauge component that is a singleton - language.cfc

view plain print about
1<cfcomponent displayname="language" extends="singleton">
2    <cfset variables.supported = "en,fr,ja,de,it,es,id,zh,ko,sv,is">
3    <cfset variables.default = "en">
4    <cfset variables.page = structnew()>
5    
6    <cffunction name="init" access="public">
7        <cfscript>
8            super.init();
9
10            return;
11        
</cfscript>
12    </cffunction>
13
14    <cffunction name="GetLanguage" access="public" returntype="string">
15        <cfargument name="languages" type="string" required="yes">
16
17        <cfscript>
18        var lang = variables.default;
19        
20        // stop robots from erroring and give them the default language
21
        if (arguments.languages is "") {
22            return lang;
23        }
24                
25        // find if supported languages are in users list
26
        for (i=1; i lte listlen(arguments.languages); i=i+1) {
27            userlang = left(listgetat(arguments.languages,i),2);    
28            if (listfind(variables.supported, userlang) gt 0) {
29                // use first supported language found
30
                lang = userlang;
31                break;
32            }
33        }
34        
35        return lang;    
36        
</cfscript>
37    
38    </cffunction>
39        
40    <cffunction name="LoadPage" access="public" returntype="struct">
41        <cfargument name="Language" type="string" required="yes">
42        <cfargument name="Page" type="string" required="yes">
43
44        <cfscript>
45        var xmlfile = "";
46        var child = "";
47
48        if (structkeyExists(variables, language) and structkeyExists(variables[language], page)) {
49            return variables[language][page];
50        }
51        
</cfscript>
52        
53        <cffile action="read" file="#ExpandPath('.')#\languages\#page#\#language#.xml" variable="xmlfile">
54        
55        <cfscript>
56        xml = XMLParse(xmlfile);
57
58        if (not structkeyExists(variables, language)) {
59            variables[language] = structnew();
60        }
61        if (not structkeyExists(variables[language], page)) {
62            variables[language][page] = structnew();
63        }
64        
65        // set strings based on language
66
        for (child in xml.resources) {
67            variables[language][page][child] = xml.resources[child].XmlText;
68        }
69        
70        return variables[language][page];
71        
</cfscript>
72    </cffunction>
73    
74</cfcomponent>

Code to go in your Application.cfm or onRequest method of Application.cfc

view plain print about
1<cfscript>
2    // function to get an instance of a singleton
3
    function getInstance(name) {
4        if (not isdefined("application._singletons.#name#")) {
5            instance = createobject("component","com.classsoftware.utils.#name#").init();
6        }
7        
8        return application._singletons[name];
9    }
10
11    // function to remove a singleton
12
    function removeInstance(name) {
13        if (isdefined("application._singletons.#name#")) {
14            application._singletons[name].remove();
15        }
16    }    
17
18    // if asked remove the language component
19
    if (isdefined("url.init")) {
20        removeInstance('language');
21    }
22    
23    // get the language component (creates it if it dosn't exist)
24
    request.language = getInstance("language");
25        
26    // Parse user languages
27
    languages = "";
28    if (structkeyexists(cgi,"HTTP_ACCEPT_LANGUAGE")) {
29        languages = cgi.HTTP_ACCEPT_LANGUAGE;
30    }
31    request.language = request.language.getlanguage(languages);
32    
33    // convert frendly URLS to URL variables
34
    if (listlast(cgi.PATH_INFO,".") neq "cfm")
35    {
36        rhs = right(cgi.PATH_INFO,len(cgi.PATH_INFO)-len(cgi.SCRIPT_NAME));
37                
38        // if xxx.cfm/yy assume yy is language
39
        if (listLen(rhs,"/") is 1) {
40            url.lang = listfirst(rhs,"/");
41        }
42        
43        if (rhs neq "") {
44            noitems = int(listLen(rhs,"/")/2)*2;
45        
46            for (i=1; i lte noitems; i = i + 2) {
47                item = listgetat(rhs,i,"/");
48                url[item] = listgetat(rhs,i+1,"/");
49            }
50        }    
51    }
52    
53    // Check if language is passed in URL
54
    if (isdefined("url.lang")){
55        request.language = url.lang;
56    }
57
58
</cfscript>

Finally you need to set the strings on the page(s) you want to display in multiple languages. Here's a simple example:

view plain print about
1<cfset lang = request.language.LoadPage(request.language,"home")>
2<body>
3...
4            <div class="name">
5                <h2 class="fn n"><span class="given-name">#lang.firstname#</span> <span class="family-name">#lang.lastname#</span></h2>
6                <h2 class="title">#lang.title#</h2>
7            </div>
8...

Related Blog Entries

TweetBacks
Comments (Comment Moderation is enabled. Your comment will not appear until approved.)
relying on CGI.HTTP_ACCEPT_LANGUAGE alone is asking for trouble, we haven't solely used that technique for more than 5 years. your technique also ignores locale (which is often found in the language list in that CGI var), which is very important for the rest of the app. and you also fail to mention that you should *always* have a way for the user to swap language/locales manually. you might look at one of the geolocation technologies, like geoLocator CFC or one of the db based ones that do an IP lookup mixed w/CGI.HTTP_ACCEPT_LANGUAGE.

your xml approach is non-standard. while XLIFF (a real-world xml standard used to store localizable data) is gaining popularity (i think the icu project might swap to it in the future for internal use), most i18n apps still rely on resource bundles (a recent example, flex 2 added them). just for instance, how do you think you'll manage very large, very complex localization data? it's one thing for a simple blog but pretty much anything beyond that is going to kill you sooner or later. what tools do you have to manage the translation process? rb have several well worn tools like icu4j's rbManager, jason sheedy's cf-based application, etc. there's a whole eco-system that's grown up around rb that can help.
# Posted By PaulH | 4/29/07 5:29 PM
Yep agree with that solely relying on CGI.HTTP_ACCEPT_LANGUAGE is asking for trouble. Which is why I said in the article you should include links to other languages.

Personally I think that relying on location ie trying to map IP address to country causes just as many issues eg not everyone in Australia speaks English as their first language. At least with the HTTP_ACCEPT_LANGUAGE CGI variable the browser user has some control over it eg in Firefox, IE and Safari you can edit your language preferences. Any reason why you prefer geo location?

You could use my method to match on the whole language/locale by removing the left(xxx,2).

Thanks for the pointer to XLIFF I'll have a look at it.

I have used resource bundles before but as far as I can it's equivalent to what I've done here. What advantages do you see resource bundles having over XML formated data?
# Posted By Justin Mclean | 4/29/07 7:18 PM
We use IP lookup+http_accept_language & provide manual swapping. your technique is based on one thing, much of the time it's not set or sometimes is bizarre (if you see enough mac http_accept_language you'll know what i mean). The geoLocation technique does transparent locale selection from more than 1 source of info.

I "invented" this method 6-7 years ago but we no longer use it, too unreliable on it's own. Pretty much nobody else does either, for instance if you've d/l anything from source forge lately you'll see IP lookup in action. Closer to home, farcry uses it on their logs. There are more reliable methods, last year there were rumors wafting out of the academic community about using packet travel time triangulation to exactly pinpoint users. But right now, IP lookup+http_accept_language works "best". but of course, not as good as simply asking users.

If you've used rb before for anything complex, we wouldn't be having this discussion ;-) there's decades of tools built up around rb, as i said before it's a whole eco-system. Answer the questions i asked in my previous comment, the answers will tell you why we use rb.
# Posted By PaulH | 4/29/07 9:34 PM
Paul. Curious to know why geo location is seemed as "better". Surely something that takes the users preference rather than guessing based on their location would be better? I can see it would be useful if the http_accept_language was blank.

If the http_accept_language and the geo location differ which do you use? eg Say I'm on holiday with my laptop in Italy I'd rather web site appear in English (what my http_accept_language is set to) than Italian (the geo location I'm in).

I've not seen any issues with Mac http_accept_language on Safari or Firefox. Can you give any examples? It it just an issue with older browsers/versions of the mac OS?

Re using resource bundles I see your point with the large number of tools available. I'll convert my code so it can read rb format rather than XML (fairly easy as ColdFusion has functions to read ini files which are a similar format). I had a quick look at XLIFF but is seems overly complex esp if you had to generate the XML by hand.
# Posted By Justin Mclean | 4/29/07 10:29 PM
You're missing the point. it uses *both*. you can weigh one or the other depending on your app's preferences. Your vacation example is too contrived to worry about much. that's what the manual fallback method is for.

I'm at home w/out access to our stats but imagine http_accept_language w/50-60 languages long, all quality (the q value) equal (which is supposed to mean any of those languages would be appropriate to serve content in). OSx tended (or used to anyway) to load up this info on some boxes. throw in a whole bunch w/klingon or tengwar as the first language. If you keep tracking these you'll see enough bizarre ones.

You can't use ini files. The cf methods that read ini files doesn't handle unicode. Grab the javaRB or rbJava CFC (which one you use depends on whether you want to store the rb on classpath) from my shop's freebie page:

http://www.sustainablegis.com/things.cfm

Have a look at your blog s/w. it (in ray's original form anyway) uses a simpler (and older) utf-8 based method, which is a bit akin to your xml method. It's ok for something like this but once you get really big apps w/lots of locales to handle managing stuff starts getting to be more work than the actual development.

If you like xml, there are a bunch of free XLIFF tools & i think rbManager can import/export it.
# Posted By PaulH | 4/29/07 11:32 PM
But how do you weight one or the other? If someone puts their first preference as Klingon and your application/web site supports it isn't that what you should show?

The vacation example may be contrived but there are are large number of people in any given country at any one time who's language of choice isn't the language a geo location system would give. The issue I see is you have no way of knowing if the geo location locale should be used or the http_accept_language should be used. It possible for either of them to be incorrect so I guess it's a matter of which one is correct more often?

I do agree with you that you should enable the user to select their own language if the one displayed isn't their preferred choice.

Re the quality value on OSX from my testing with Firefox only one language will have a q=1 and with Safari it only sends one language. Perhaps it was an issue with older browsers?

Thanks for the heads up on CF and ini files and unicode support and the link to the resources on your site. I'll take a look and see what I can do to improve in my code.
# Posted By Justin Mclean | 4/30/07 12:08 AM
We chose one way (we try to honor users choices tempered by IP lookup), anybody can chose another weight but the idea is to have more than just the one bit of info to "guess" what locale the user wants & we can provide. If the user's browser says "klingon" but he's somewhere in Perth by IP, he should get en_AU, if that's something the app can provide.

"large number of people"? Can I guess that's not something you actually have any stats for? I have no idea what market your apps are targeted at but from my experience this isn't generally something to worry much about. A manual fallback choice will handle the vast majority of these rather rare cases.

The OSx thing is an issue w/multi-lingual users & the way it adds this at the drop of a hat. If you keep looking long enough, you'll start to see this sort of thing & as I said I'm at home & can't access our stats.
# Posted By PaulH | 4/30/07 3:35 AM
Re "large number of people" I don't have exact numbers but do have ball park figures for Australia. At any one time 1/2 million visitors of which 1/3 don't have English as their first language and 20% of the population don't speak English as their first language. So even at a conservative estimate you have 10% population wanting to view a web site in a language other than English (in Australia). ie Geo location would give the wrong answer around 10% of the time. Is http_language_access wrong 10% of the time? That I have no idea on.

Of course it depends on who the audience of your site is, if you're running a tourist site or a university you would have a higher non English preference audience than a blog on ColdFusion.

Out of curiosity I had a look at the stats for my blog at browser language vs geo location and about 93% of browsers had their preferences set to English, and 81% came from English speaking countries via Geo Location.
# Posted By Justin Mclean | 4/30/07 8:37 AM