Translating

From ENIGMA
Revision as of 15:00, 15 October 2011 by IsmAvatar (talk | contribs) (→‎Testing)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Internationalization is important for both ENIGMA and LateralGM. As a result, translating is usually as simply as copying and editing a text file, and does not require a full recompile.

Rationale

Due to some minor controversy over the way translation files are done, here's is the rationale for the way LateralGM does it.

Suppose that you want the string "Manual" to appear in your program. Once will be in the menu, indicating the help file, and once will be in some settings (say Sprite), indicating that the settings will be tweaked manually, not automatically set (such as the Bounding Box). This is an actual use case of LateralGM. To translate that string, you would simply wrap it in a translation function, say _T (or in Java's case, Messages.getString), so it becomes _T("Manual") in both cases. This is convenient because if a translation file does not exist, it will simply default to the program's native string "Manual". To translate to, say, German, you'd simply create a translation file, strings_de.txt with the contents: Manual= Manuelle.

A problem arises, however, especially to German speaking people who open up the Menu and see "Manuelle", which, in context, means "to do something manually". What are we doing manually? See, the German word for a help file manual is "Handbuch" (in English, this is very similar to a Handbook), but obviously it doesn't make sense to set the Sprite's settings to Handbuch. We have an ambiguous English word that can be translated multiple ways depending on the context. Enter Context Keys.

For example, one string would be "Menu Manual", and the other would be "Manual Settings", or perhaps simply "Menu.MANUAL" and "Sprite.MANUAL". However, while this helps the German translation to show simple things like "Handbuch" versus "Manuelle", it doesn't help the native English version, which now shows these clunky context clues along with the string. To resolve this, we provide an English Translation, which we refer to as the #Native File. This file maps "Menu.MANUAL" to "Manual" and "Sprite.MANUAL" to "Manual".

Another problem arises when we have hundreds of strings running around throughout the program and we want to translate the whole program. Does that mean we have to pick through the entire program by hand, hunting down each and every string, and copying them into our translation file? To resolve this, we simply populate the Native file with every translatable string in our program.

Since it can be hard to see when an ambiguity will arise in the native language, we would have to keep going back in, editing the source code with a context key, and recompiling every time an ambiguity does arise. To resolve this problem once and finally, we provide context keys to every string. This isn't that big of a deal since we already have every string in a native file.

This has a few minor downsides, but they are mostly to the programmer working in the native language, and not to the translator. Probably the most notable is that inside the program you can have these clunky context keys floating around, and finding a specific string would usually consist of opening the native file, searching for the string, looking up the context key, and then finding the context key in code. Vice versa, to look up a context key's string, you'd have to open up the native file and hunt down the key, however decent IDEs usually alleviate this process by simply doing the lookup for you if you mouse over a context key wrapped in a translation function (Eclipse does this). Recycling keys is difficult (you'd have to look up the string to get the key), but is generally discouraged - either the new string has its own context and should get a new context key, or the system hosting the key should be made more modular so that the string (or display owner) can be fetched without performing another key lookup.

Locale

First, you must understand your locale. This will usually consist of 2 pieces of information:

Native File

The program that you are translating should provide a "native file", or a file with all the strings available in the program in the "native language of the program" (usually US English). This will provide you with a base reference file that should always be up-to-date and accurate.

In LateralGM, the native file is inside the LateralGM bundle, org/lateralgm/messages/messages.properties

In the Plugin, the native file is inside the enigma.jar bundle, org/enigma/messages/messages.properties

In ENIGMA, currently you have to hunt down every string in the program and do it the hard way. If an ambiguity comes up, you have to edit the source code and recompile. See #Rationale (which explains why they're doing it wrong).

Translating LateralGM

The Java Locale, which is usually the same as your system locale, detects your language and country as per whatever your system is set to (which is usually whatever you installed it as). It represents each aspect using the alpha 2 form - that is, using 2 characters, such that English is represented as "en", Spanish as "es", United States as "US", Spain as "ES", and Mexico as "MX". It then separates them with an underscore "_". For example, US English would be "en_US", and Mexican Spanish would be "es_MX".

LateralGM's translation file is located inside LateralGM's bundle (which is usually a jar, which can be treated as an archive/zip file), at org/lateralgm/messages/messages.properties, and is natively written in US English (en_US), however that file should not be changed for translations - usually you would copy it to another filename and translate it. Translations of it will be in the same directory, with names like messages_xx_XX.properties or messages_xx.properties, where xx_XX and xx represent the Locale string.

The translation file follows a very simple format: KEY: value. The KEY, which usually takes the format ClassFile.PROP_NAME should not be changed, because that is what LateralGM uses to look up individual strings. The value is what you want to change/translate. Any line beginning with the hash symbol "#" is a comment, and will not appear in the actual program - so you may translate them if you wish, or leave them be. They generally help identify sections of strings, so it might be helpful to translate them.

Example

If I wanted to translate LateralGM from its native language (US English) into Mexican Spanish, I would first look up the alpha-2 Language and Country codes for Mexican Spanish, which becomes "es_MX". I would then look inside LateralGM org/lateralgm/messages/ to see if a "messages_es.properties" or "messages_es_*.properties" file already exists. If it does, it would simply be a matter of porting those files. If "messages_es_MX.properties" already exists, the language has already been ported, but feel free to look through it and make sure that everything is correct and up to date. Otherwise, start with a base file that you would like to work with for translating, and then copy it as org/lateralgm/messages/messages_es_MX.properties and then start editing your new copy with a text editor of your choice (notice, if your language has special characters, please make sure your text editor supports those characters). At any point, you may save your work and test it out in LateralGM by simply ensuring that your file is inside the LateralGM bundle in the proper location, and ensuring that your system/java locale are correct, and then simply running LateralGM (no recompile necessary). It should automatically start using your strings.

Don't forget to mention your revision and base file at the top of the file in comments (#revision 123 \n #based off messages_es_ES).

If you are writing the first translation for both your language and your country, it might be helpful to make a copy of your file - one for just the language, and one for the language_country - so that other countries that speak the same language can also benefit from it. Alternatively, simply omit the country from the filename. Either way, be sure to specify the country in the tags/comments of the file that doesn't mention it in the filename (it doesn't hurt to specify it in both files).

Testing

To test out your changes, they need to be copied into LateralGM's bundle (again, probably a JAR, treat it like a zip). Navigate inside LateralGM, to the location org/lateralgm/messages/ and then copy your translation file(s) inside. No recompile is necessary. Then, if your Locale is set up correctly, you should be able to just run and see the changes. Otherwise, see #Overriding Locale.

Overriding Locale

Since Java automatically picks up your locale, it will automatically use the correct translation file associated with your System/Java locale.

To override the System/Java locale, you should usually go into your Operating System's settings and change it, but if you need to just override it for Java, you can use command-line parameters: java -Duser.language=en -Duser.region=US -jar lgm16b4.jar. Other methods may be documented by Java, such as creating a properties file, or setting environment variables.

Translating Plugin

The Plugin is set up exactly the same as LateralGM, so refer to #Translating LateralGM above, but note that the #Native File is inside the enigma.jar bundle rather than the LateralGM bundle.

When translating the Plugin, do be cautious that some of the strings are error messages or information messages which may be useful to not only the user, but people trying to help the user as well. Obviously, the developers would be able to help a great deal, but most of the developers don't speak the translated language, so a translated error message could cause problems with getting help. We're working on ways to deal with this on the internal side of things, but still have a long way to go. A few considerations:

  • As long as the developer has the translation file, they could simply search for the string in it, and find the Context Key.
  • Another possibility is to use universal Error Codes, a simple numeric code which the user would report and a developer would immediately recognize the code. Sequences of string characters could also be used (instead of entirely numeric) or even possibly the Context Key itself.
    • Since there are no formally defined error codes at this point (other than the Context Keys), there's no hard definitive answer for how to handle this at this point.

Overriding the Locale would be done LateralGM-side, and the Plugin would simply detect and use the same locale. There's no reason to have a separate locale for each, although it could happen if e.g. LateralGM has the translation file for your language and the Plugin does not, or vice versa, in which case, whichever program is lacking in a translation file would likely just revert to the Native File.

Testing

Testing plugin translations is achieved in a similar way to LateralGM. You simply copy your translation file into the Plugin (it's probably a JAR named enigma.jar, so just treat it like a zip) to the location org/enigma/messages/. Then make sure your Locale is correct, and go ahead and run.

Translating ENIGMA

Unknown.

Tagging/Comments

It is important to tag your translation files with comments at least indicating the revision that you used and the base file that you based it off of. Feel free to also include your name and anything else you want to say about the translation. This is helpful to anybody else in the future who wishes to look over your translation file to update/correct it. The revision will help identify what has changed between the last time it was translated and the latest revision. The base file will help correct any errors that may have occurred either because the base file was out of date or something got lost between the stages of translation (information inevitably gets lost when you translate from English to Spanish, and then Spanish to French).

If you are writing a language file for a language without a country, it is important to also mention the country that your language is generally intended for, or at least your language background.

Updating

Translation files become out of date quickly, since they don't update themselves every time a string is added/changed/removed to the program. One of the most useful tools for getting files back up to date is diff. First, look at the translation file's revision and base file (which should be in a comment inside the file, near the top, see #Tagging/Comments). If the base file is the native file, simply do a diff on the native file between the revision specified and the latest revision to see what all has changed. If the base file is not the native file, you might need to do some tree-climbing. Possibly do a diff on the base file to see what has changed there, although again, it could be out of date. Look at the base file to see what revision and base file it specifies.

Lost in translation

Translation files based on another translated file (other than the native file) could have translation problems/errors that propagated across the multiple translations. Because of this, consider re-translating them directly from the native file instead.

On the other hand, sometimes a smart translator will recognize a better word/translation for something that English (the native language) simply cannot express. In these cases, it might be useful to translate from that language instead.

When in doubt, think about where the string appears in the program, the functionality of that particular aspect of the program, and what the string is trying to convey.