Internationalisation function calls: anti-patterns and best practices

Localization is the process of translating products into different languages. To localize a software application, you wrap UI texts in function calls that retrieve a carefully crafted translation of the UI text into the currently active language. Because this function is typically provided by so-called I18N (internationalization) libraries, let’s refer to these calls as i18n calls.

At first glance, writing i18n calls seems trivial. For example, for a German language user, the i18n call:

i18n('Share')

retrieves the string Teilen (German for Share) from the following translation object (gettext .po file format):

msgid "Share"
msgstr "Teilen"

However, after analyzing the i18n calls in our applications, we discovered a set of anti-patterns that lead to low-quality translations and sometimes even posed a threat to the stability of our apps.

This blog post discusses these anti-patterns and explains how to avoid them.

I’m writing this blog post with JavaScript and gettext (the arguably most popular software translation system) in mind, but some of the aspects I discuss apply independently of the exact technologies you use.

What poorly written i18n calls do

Every time you write i18n calls, ask yourself if they do at least one of the following things.

  • Can they potentially break your app?
  • Do you use them instead of program/business logic?
  • Are they biased in favor of the language(s) you know?
  • Don’t they provide enough context for the translators?

If that’s the case, refactor the calls to prevent the worst.

Breaking your app with text localization

Yes, you can. For example, if you wrap a template string (of whatever template your are using) into an i18n call.

Imagine you use an obscure template syntax that uses square brackets. Let’s say your i18n call is:

i18n('[You have {count} new notifications.]')

First, your program resolves the call. Then, the template engine evaluates the result.

If a translator forgets to add the closing ], the template engine can’t evaluate the result of the i18n call and fails - at runtime.

Instead, first make the i18n call and then pass it to the template. Make sure you always use the same construct for interpolation (replacement).

const message =
  i18n('You have __count__ new notifications.');
const templateString = `[${message}]`;

Using text localization for logic

Programmers sometimes abuse i18n calls for functionality that should be part of the program logic. This is more likely to happen if your i18n initiative starts with (multilingual) developers managing the translations into other languages themselves.

For example, developers might handle ID-to-name mappings via the translation framework. This can lead to i18n calls like i18n(categoryTypeId). The i18n framework is supposed to look up the value of categoryTypeId and return different translations depending on the value, for example like this:

Value of categoryTypeId Result of i18n(categoryTypeId)
9b4e_org Organization
a13d_role Role

Typically, you want to create the translation files automatically using a script that extracts all parameters of your i18n calls. Once you pass a variable into an i18n call, you can no longer ensure all values the variable can have are available as translation IDs. Instead, you should handle such ID-to-name mappings within your program logic, for example like this:

const categoryTypeTranslationMapping = {
  '9b4e_org': i18n('Organization'),
  'a13d_role': i18n('Role')
};

In your UI component, you can then reference the text as categoryTypeTranslationMapping[categoryTypeId].

Using text localization that is biased in favor of the language(s) you know

Of course, you can’t possibly know the details of all languages your app will be translated to, but you should at least consider the following two points:

  • Sentence structure isn’t the same in all languages. That’s why you should never concatenate i18n calls like this:
  i18n('You have') + ' ' __count__ + ' ' + i18n('new notifications.')

Better use interpolation:

  i18n('You have __count__ new notifications.')
  .replace('__count__', count)
  • Many languages use more than just two grammatical numbers. For example, a common additional number is the dual , which applies only to the count distinction two. That’s why you shouldn’t handle plurals in your program logic, for example like this:
  let message;
  if(count === 1) {
    message = i18n('You have one new notification.');
  } else {
    message = i18n('You have __count__ new notifications.')
      .replace('__count__', count);
  }

Once you translate your app into Arabic or Slovenian, the translators won’t be able to translate the message correctly, if count equals 2.

That’s why you should use gettext’s generic pluralization support.

Then you can write the following i18n call:

  message = i18n('You have __count__ new notifications.', count)
    .replace('__count__', count);

In the translation file, the translator can add translations for all grammatical numbers. Slovenian translators need to fill in translations for all six Slovenian grammatical numbers, in a predefined order they either know or need to learn.

msgid "You have one new notification."
msgid_plural "You have __count__ new notifications."
msgstr[0] "Unfortunately, I don't know Slovenian."
msgstr[1] "I hope you get my point."
msgstr[2] "Note that msgid_plural indicates"
msgstr[3] "that this message"
msgstr[4] "needs to be "
msgstr[5] "pluralized."

Providing not enough context for the translators

Sometimes, you have one-word translation strings, for example for button labels. The naming conventions for button labels differ between languages. In German, it’s commonly the bare infinitive (without zu / to), while in Norwegian it’s the imperative.

English German Norwegian
Share Teilen (infinitive: to share) Del (imperative: share it!)

When they see the string Share, how can German translators know they should use the infinitive (Teilen as in to share) while the Norwegian translators know they should apply the imperative (Del as in Share it!)?

gettext provides msgctxt (message context) to address this problem. Together with the message ID, you may pass a message context to the i18n call - like this:

i18n('Share', 'button label')

In the .po file, the message will look like this:

msgctxt "button label"
msgid "Share"
msgstr "Teilen"

Now, the translators understand they need to apply their convention for button labels - in the example above, they choose to use the bare infinitive. If the message ID Share appears in another context, the translators can translate them differently.

Forcing the translators to understand a complex markup language

If you can avoid it, don’t force your translators to use HTML or any other complex markup language. It is, however, not always possible to completely get rid of all markup in an i18n call without sacrificing translatability. That’s why you should ideally support a lightweight markup language like Markdown in your i18n calls, for example using this library.

Like this, the i18n call:

i18n('You find more information about our <strong>licenses</strong> on our <a href="http://signavio.com">website</a>.')

becomes:

markdown.toHTML(i18n('You find more information about our **licenses** on our [website](http://signavio.com).'))

Conclusion

The anti-patterns above help you to avoid writing i18n calls that are ugly hacks. Concepts like interpolation, message context and simplified markup solve the most common problems your translators encounter. Finally, to really ensure your translators can do their work, ensure they come to you to complain whenever they see an i18n message ID that looks strange or is impossible to map to a decent translation.

Photo: goodmami / CC BY-SA 2.0