Originally posted internally at Automattic a while ago, I’m publishing the following piece concerning software internationalization (i18n) for whomever it may serve.
Consider the following piece of UI:
You don’t need to know exactly which part of your application this is in order to understand the label Trash. Translators and the translation engine, however, will only see this:
This is a problem. Forget figure 1: now you just have that one word, with no other context. Do you still really know what Trash means? Definitions in US English include:
- Useless things to be discarded; rubbish; refuse.
- A container into which things are discarded.
- To discard.
- To free from trash, or worthless matter; hence, to lop; to crop.
I am excluding the more figurative meanings because no sane person would expect a piece of our UI to mean any of those (“To disrespect someone or something.”, “A collar, leash, or halter used to restrain a dog in pursuing game.”).
We all work with assumptions that we are not aware of. For instance:
- One same word, trash, can refer both to a container or the things it contains.
- One same word, trash, can refer both to a thing or to the action of putting new things in that thing.
- One same word, trash, refers to any quantity or kind of “stuff” (i.e., both a countable and uncountable noun).
These assumptions become a problem in the tricky world of internationalization.
In Portuguese (pt-pt), assumptions #1 and #3 still hold, but #2 doesn’t. So the string Trash can’t be fully translated. In GlotPress, someone will interpret it as the noun and translate accordingly (Lixo). Then, in the UI, the developer meant the verb and suddenly the UI makes no sense — imagine if you saw Recycling bin on a button: are you still sure it means “Move to trash”, or could it perhaps mean “Go to trash”?
Likes are another fun thing. Like could mean the action of liking (I like this) or the countable unit of social credit (I got a like). The same applies to Likes (she likes this; I got two likes). Again, our feed reader’s Portuguese translation was off because of that duality until it was fixed.
Another testimonial by my colleague Marek provides a different take:
Same for “add” in Czech — the best example is the sidebar of My Sites. It’s there for many items and we don’t have a single word in Czech that would make sense in all those contexts.
So what can I do?
Firstly, always consider strings on their own and ask yourself: “How else could this be interpreted?”. As builders, we’re used to think in terms of “How can I break this?”, so just apply that to i18n. There are numerous assumptions that we all make that are very difficult to recognize with no external help. Speaking multiple languages is helpful, but — short of going and learning a new language — there are resources1 out there that make a good starting point.
Secondly, prefer syntactic context over thematic context. That is:
// this is way more helpful: translate( 'Trash', { context: 'verb' } ) // than: translate( 'Trash', { context: 'deletion of posts' } )
Not only is the former typically clearer for i18n, it also allows reuse of the string across the screens of our product, something which the very specific thematic context doesn’t.
Lastly, don’t be afraid to ask when in doubt. i18n challenges go way beyond string context: how you elaborate the strings is also an important part. Also, I’m available for pings anytime — languages are fun!
Photo credit: the Rosetta Stone, photographed by Hans Hillewaert. Via Wikimedia Commons.
- This originally pointed to an internal resource. If the reader knows an adequate public resource, let me know. ↩