Beluga - Your Localization Partner

This blog is about our company Beluga Linguistics, our daily work in the localization business, web services, Web 2.0 and language. Thanks for reading us.

Sunday, December 7, 2008

Quick Tips for Web Internationalization

In many of our projects we stumble over a series of technical issues in the source text/code we try to fix before the first words get translated in order to avoid re-adjustments later on once translation teams are on board and the project is running.

This part is commonly called internationalization (or i18n which stands for the 18 letters between the first and the last of this word). I18n of software is the way a software should be prepared in order to enable its translation in different languages and the successful localization (=L10n) to local target audiences.

Internationalization is nothing new and fortunately there is already a lot of written information about this topic out there in the web. One of the best places to get started is the W3C website about Internationalization and Localization.

The article Internationalization Quick Tips for the Web is especially interesting for those of you thinking about going international with your website/application as it summarizes very well the most important key concepts for a good technical design for future translation/localization. Use it as a checklist and see how fit your backend is ;-) I will try to give you some real live feedback about the implications of those concepts for translators and users of your website in the following lines:

Encoding. Use Unicode wherever possible for content, databases, etc. Always declare the encoding of content.

It's quite frustrating seeing those little quadrates or other strange symbols on your screen caused by wrong encoding (if you have a flash player make sure you don't forget to set him to UTF-8 encoding as well ;-)). And don’t start with i18n thinking that you can take care of this issue later on. Changing the encoding when you already have 6 languages on weekly updates is getting more and more complex, time consuming and expansive. A good basis is key to successfully target international markets.

Escapes. Use characters rather than escapes (e.g. á á or á) whenever you can.

If you are going to translate a 40.000 word strong website on an online string editor and you need to change all your characters to escapes you will simply get crazy... There are quite a lot of them. Have a look at this example in Czech:

Jako efektivnější se nám jeví pořádání tzv. Road Show prostřednictvím našich autorizovaných dealerů v Čechách a na Moravě, které proběhnou v průběhu září a října.

would be:



Do you really want the translator to take care of this? And what about the proofreader, would you like to control if accents are correctly set or not? The price per word would definitely increase and most probably you would have mistakes in the translation later on. So better set it right from the very beginning and the translation teams can focus on what is really counts: correct, fluent and natural language.

Language
. Declare the text-processing language of documents and indicate any internal language changes.

This is a question mainly about accessibility and search. Feel free to read the tutorial and make sure your language header includes one of the different solutions (e.g.: )

Presentation vs. content. Use style sheets for presentational information. Restrict markup to semantics.

The separation is needed so that the translator can focus on the actual text strings and not on tagging. That's not their job and it's much more likely that they introduce errors which then cause the site to crash or might mess up the site design.

However, it is important not to delete all “code” from the text. The translators should have the possibility to see variables, modify their position or delete them if not needed in their language. The following example shows how translators need to work variables and syntax:

EN source: A request has been made to [Name] to add [him/her] as a friend.
Turkish: “[Name] adlı kişiye arkadaş olarak ekleme isteği gönderildi.”

In Turkish we can't use him/her and the translator would have to add "person" after [Name] since it is an obligation in TR. The Turkish syntax is Subject+Object+Verb and in English it is Subject+Verb+Object.

Russian: [Name] получит запрос и [должен/должна] будет подтвердить свое включение в твой список друзей.

[him/her] here is replaced with gender-dependent versions of "must" (as in "must confirm his inclusion into your friends list"). This sound more natural to Russians then using of [him/her] = [его/ее] in a stricter form.

Images, animations & examples. Check for translatability and inappropriate cultural bias.

Most of the biggest social networks today avoid pictures at all and work with neutral icons instead. A good localizer should always come up with examples made for the target audience (e.g. “Rock am Ring” is a famous German festival most Germans will have heard of but crossing the boarder this Festival will get more and more unknown)

Forms. Use an appropriate encoding on both form and server. Support local formats of names/addresses, times/dates, etc.
There is a funny item in today’s social networks which is “X time ago”. In Spanish it is “hace X tiempo”. You see the word order is changing and needs to be correctly put into a local syntax order. The time variable will represent “seconds/minutes/hours/days/… and for each of them you have to bear in mind that there are singular and plural forms (1 hour vs. 2 hours). In Polish e.g. you need to differentiate between the following numbers for plurals:

- one: 1
- few: 2-4, 32-34, 42-44...
- other: 0, 5-31, 35-41, 45-51...; 1.31, 2.31, 5.31...

More about this can be found here: Language Plural Rules

Text authoring. Use simple, concise text. Use care when composing sentences from multiple strings.

It’s very important to think about language scalability first before coming up with a tool rendering activity feeds for example. So the best is to keep your message simple. Translation will add enough extra complexity. The simpler, the better.

Another common error we observe in i18n projects is the re-use of certain terms in different places of the website (e.g. “group” gets translated one time and then will be used as a variable across the website in different contexts). This might work in English but not for other languages.

The best way is to set up a binding term-list and use strings only in a clearly defined area so that the translator knows in which context the word turns up and can adjust the text accordingly. Bear this in mind and you will create a much better reading experience for your international users and won’t have problems with more complicated languages (e.g. Finish has 16 different cases).

Navigation. On each page include clearly visible navigation to localized pages or sites, using the target language.
Your language changer should always indicate the language in the target language (Español, Deutsch, Italiano, Polski…) and in order to avoid cultural conflicts, please don’t use flags! Spanish e.g. is the official language in 15 countries. So it is not wise to put only the Spanish flag there. The same counts for Portuguese. Many Brazilians doesn’t feel comfortable with the Portuguese flag on a website targeting the Brazilian market. Swiss and Austrians might feel the same about seeing the German flag. Don’t get yourself into trouble and simply use text to indicate languages in your menu.

The position of the language changer should be in the upper right corner. That’s the place where most people expect this feature.

Right-to-left text. For XHTML, add dir="rtl" to the html tag. Only re-use it to change directionality.

Important if you consider targeting Jewish and Arab users with your website. A great example for good “rtl” localization is the IKEA website.

Check your work. Validate! Use techniques, tutorials, and articles at http://www.w3.org/International/


No comments: