We often tell our clients about the benefits of using a translation memory. A TM is a searchable database of segments of source text and their target translations. However, like any database, the value of the TM is proportional to the efforts spent maintaining it.
TMs are most valuable for clients whose localization needs include periodic updates of technical materials such as sales information, manuals, and software. We also maintain TMs for clients in healthcare, particularly insurance, where plan documents are constantly being generated and translated in order to meet regulatory requirements. A third type of client includes global businesses whose marketing and sales material needs to remain consistent across several platforms (print materials, websites, mobile apps, video).
TM maintenance is part of our regular service to these clients. For new clients who have been either handling localization in-house or spreading it across several vendors, TM cleanup and maintenance can be long overdue, and will require an investment of focused effort.
How is a TM created?
At the start of a translation project, our project managers use a computer-assisted translation (CAT) tool to break the source text down into segments. A segment is a discrete chunk of text. It might be a short sentence, a clause, or a page heading. The CAT tool displays the segments in a two-column interface so that a linguist can view the text in the source column and enter the translation in the target column. When the job is completed, the paired source-target segments are saved in the TM.
A TM can also be created after the fact, by uploading source and target documents and running the alignment function to create source/target pairs. This works best with formulaic, structured documents like contracts, policies, manuals, and the like, and requires human review to ensure accuracy. This can be a helpful first step if a new client needs updates on a document that another linguist or agency had translated in the past.
How does a translator use a TM?
When a previously translated document is updated with new content, a translator uploads the new source document into the CAT tool with the TM. The CAT tool scans the new document for matching source segments and auto-populates the corresponding target translations. Although every match still needs to be checked by the translator, the TM speeds the process and reduces the cost.
Problem: Source text quality
In order for a TM to be useful, a client’s source text needs to be consistent. In the best-case scenario, the source text was created using controlled language and technical writing tools. The writers will have followed the company style guide and used approved technical and brand terminology consistently. If all of the departments work with the same language partner, and everyone follows these guidelines, a single TM can cover all the types of documents that need to be localized.
In the real world, the departments of a global business don’t always coordinate with each other with respect to style guides. In this case, if one department uses the term “employee” and another uses “team member” to talk about the same people in the same type of source material, the CAT tool won’t be able to use the TM to recognize a match. Instead it will serve up a partial match and save yet another version of the same segment after the translation is complete. Over time, if the source text is always different, the TM will end up saving multiple translations of what is essentially the same phrase.
Problem: Variable target texts
It’s not uncommon for businesses to localize in a piecemeal fashion, with each department (or individual) sourcing their own translations from a different language partner. All the translations may be accurate, but they vary in structure and vocabulary. A simple piece of source content like a company history might have been translated differently with each project. When the business decides to centralize and streamline the localization function, there will be multiple TMs (or multiple versions of the same segment in one TM) and it will remain unclear which is the “right” one.
Problem: Incomplete multilingual glossaries / termbases
A flawed termbase is both a cause and an effect of variable target and source texts. A company may have established a glossary for English language publications, but may not have approved a translation for each source term.
In these situations, a TM needs to be cleaned up in order to maximize its value.
Cleaning a TM
TM maintenance can be done efficiently by automating some of the steps and engaging human translators in a systematic manner.
Step one: reduce volume
TMs store more than simple source/target pairs. They can also store metadata about each segment: when it was added to the TM, a history of who edited it and when, and when it was last used and for which jobs. Before starting the substantive part of TM maintenance, the first step is to prune it down. Sorting out the segments that haven’t been used in years is a logical first step. This process can be automated.
Step two: remove inconsistencies in source segments
CAT tools are designed to identify both perfect matches and “fuzzy” matches between the segments in a new source document and the segments stored in the TM. When a TM autosuggests a segment that’s a 90% match, the translator needs to make changes to the target segment in the TM, and either overwrite it or save it as a new segment. If the same type of document is being translated in different departments by different language partners, too many versions begin clogging up the TM. After stripping out segments that haven’t been used in a while, best practices require normalizing the segments that remain. Otherwise identical segments with slight differences (“did not” vs. “didn’t” for example) can be identified and consolidated using automated processes. Metadata can help determine which segments can stay. For example, you can prioritize by recency or by translator.
Step three: Correct inconsistent terminology
After the TM has been cleaned using automated processes, the next step is the terminology review. Using a list of approved corporate terminology loaded into a term base, any number of different QA tools (we use Xbench) can generate a terminology consistency analysis. After reviewing the mismatch report to exclude false errors, translators can begin editing target texts to bring term use into alignment. With particularly large volumes, a team of translators can work together on a cloud-based tool to ensure consistency.
When this process is complete, a final check should be made to eliminate possible errors introduced during the editing process (because nobody is perfect).
Step four: keep it clean!
Now that your translation memories are clean, you should be enjoying better TM discounts and faster turnaround times. Your language services partner should be following best practices for TM maintenance, but they’ll need your help:
- Don’t make preferential changes to approved texts. “Preferential changes” are edits that “improve” readability without changing the meaning. Preferential changes cause more harm than good because they create inconsistencies which will require additional review.
- Maintain consistency in source documents by using controlled language and glossaries and following style guides.
- If you are using external reviewers for new localized content, always submit the reviewers’ changes for reconciliation by linguists and incorporation into the TM. If the changes are made over the objections of the linguists, the TM should include metadata to ensure that the segments are identified as client preferences, so the same issue doesn’t arise during the next project.
Many communications professionals are well aware of the inconstancies floating around in their libraries of localized content, but they can’t always make the case for investing in a TM cleanup project. In many cases, this is because the localization function is seen as a cost center rather than a revenue generator.
However, if your business localizes high volumes with some regularity, and your localization function has not been centralized, the costs are going to be harder to estimate and contain. TM cleaning and consolidation allows you to make the best use of your existing multilingual content, and paves the way toward streamlining the process in the future.
One way to proceed is to start with one language, then assess the ROI of the effort in terms of higher discounts, faster turnaround times, and overall brand consistency. You can then decide whether to move forward with TM cleanup across all the languages you use in your global business.