3 Tips for Navigating the World of Foreign Language Data

By John Del Piero on August 8, 2016

Rarely does a review project shape up exactly the way we predict. Litigation support teams need agility and flexibility to be prepared for everything e-discovery can and will throw their way.

Growing data volumes are an obvious contributor to this reality, but so is today’s international landscape. Globalization means more foreign language documents are finding their way into company data stores, and that results in added complications during e-discovery for both litigation and investigations.

If you’re starting to see that foreign language data is becoming a bigger part of everyday e-discovery, here’s how to get ahead of the complexity.

1. Think multilingually.
It is important to always be prepared for foreign language data that may appear in your collections. Odds are good that your business—or your client’s business—involves some dealings in another country, whether via product sales, outsourced services, or recruiting efforts. Modern business means foreign language documents are always a possibility, if not likely.

For example, our team recently kicked off a relatively small internal investigation involving five custodians. After initial strategizing with the client, we knew we might need to handle foreign language data. Even though we didn’t know what languages or volumes to expect, we were fortunate to have prepared the right technological workflows, including tapping a specialized translation plugin for our review workspace, in advance. It turned out that this small investigation became a big one, and more than 10 million documents involving English, Russian, and several Middle Eastern languages were collected when all was said and done.

Bonus Tip: You can also use early case assessment workflows to perform analytics on your case and identify which foreign languages are used in which documents.

2. Hone in on foreign language insights with the right technology.
The days of setting aside individual documents with foreign language content during a manual, linear review so they can be attended to separately by native speakers are more or less behind us. Case teams can now take advantage of text analytics to identify those documents at the very start of the review. The benefit here is that, while still requiring a separate workflow, these documents can undergo a first-pass review simultaneously alongside the English documents—instead of being flagged and funneled into a separate process as reviewers churn through the entire data set manually.

Working with foreign languages in your e-discovery software also means identifying the right stop words—common terms that the system will ignore, such as “the” or “it”—for searching and analytics, so be sure to have a proper understanding of those dictionaries from the start. You can also get creative during searching by looking into slang or other regional terms that could be present in your data set.

Creating a unique analytics index for each language is a good way to ensure you’re making the most of your system’s conceptual analysis of the data. Additionally, work closely with foreign language experts to identify any foreign names or terms that could but should not be translated, such as “Deutsche Telekom,” and dig into foreign keyword search criteria that may uncover the most important files by helping to create clusters—conceptually related groups of documents that can be automatically organized by the system.

Bonus Tip: Taking note of some special considerations for use on foreign languages, leverage email threading and other analytics features on this data for better organization with minimal human input.

3. Know you have options for translation.
All of those technology options mean that a slow linear review by native speakers is no longer necessary—at least not to the full extent it once was. However, once you’ve identified potentially relevant materials via these workflows, you still need to get the data into the hands of the experts on your project. You can’t build a convincing case strategy based on second-hand reports of the stories the documents are telling—at some point you’ll need accurate document translation to provide evidence.

Fortunately, even translation is a different animal when you have the right technology and workflows in place. Machine translation is a very low cost option, but you must be careful. It can provide a gist meaning, but is unreliable for the true meaning of any sentence. While convenient and fast, machine translation may produce misleading information—and some of it may be simply incomprehensible. For reliable accuracy, consider human revision of the machine’s results.

For instance, on that same case of 10 million documents, our team ended up with more than 70,000 files that required translation—and the task seemed daunting. Working closely with Linguistic Systems, a Relativity developer partner, we were able to identify a collaborative, hybrid workflow that utilized post-editing of the machine translation to split the difference between the cost-effectiveness of machine translation and the refined accuracy of human translation. In the end, it cost 65 percent less than we anticipated for a manual translation—and we gathered all the insight we needed, easily within the time allowed.

Bonus Tip: Specialized tools that can be added directly to your review workspace support translation workflows in real time, so you don’t have to move data around. Discovia worked with the Relativity Developer Partner, Linguistic Systems, Inc., who does this translation work through their proprietary LSI Translation Plug-in, an application in the Relativity Ecosystem.

When it comes down to it, tackling foreign language data is yet another example of how modern e-discovery requires a healthy balance of technology, expertise, and collaboration. How do you ensure you’re sticking the landing on feats like these? Let us know in the comments.

Foreign Language Data in eDiscovery Now the Norm

By Paige Hunt Wojcik on August 1, 2016

In his bestselling book, The World is Flat, the award-winning New York Times columnist Thomas Friedman explains how the “flattening of the world” happened at the dawn of the 21st century, leaving us with a truly global economy in which the playing field has been leveled for international commerce.

One of the many implications of this macro-trend has been an acceleration in cross-border business deals, which has inevitably led to an increase in cross-border litigation. Since there are an estimated 7,000 languages spoken in the world, there is often more than one language used in documents that are subject to litigation discovery.

The reality of the eDiscovery industry in 2016 is that the presence of foreign language data in our collections is no longer the exception – it’s now the norm. And since this reality isn’t going to reverse course any time soon, litigation teams need to understand their options for addressing foreign language data:

• Ignore it – You can always do nothing and hope for the best. This is obviously not recommended, but does tend to be the default choice when the foreign language data volume is small and intermixed with English documents.
• Use Machine Assisted Translations. Our company was recently involved in a matter involving a multi-national technology client that faced a large investigation in Russia with 159 relevant custodians. The total initial collection scope was over 8 TB of custodian data, with 1.3 TB user-generated data or 10,080,503 documents in English, Russian and other foreign languages. By using machine assisted translation, we were able to yield $2.75 Million in cost savings for the client.
• Document Reviewers. A network of U.S.-based, licensed attorneys can be hired on an hourly basis to review foreign language documents. The hourly rates will vary based on the language needs and the location.
• Certified Translations. This is the most expensive and time-consuming option of all and is typically only utilized for key documents, such as deposition and trial exhibits.

Regardless of the plan of strategy for dealing with the foreign language data, Discovia has the solution for clients. We have deep partnerships with fully vetted companies to provide machine-assisted translation, document reviewers and certified translations.

A new trans-Atlantic data-transfer framework, nicknamed Privacy Shield, has just been approved by EU and U.S. officials. After the European Court of Justice declared the original Safe Harbor framework invalid in 2015, Privacy Shield was crafted to make it easier to move personal information from Europe to the U.S.

In spite of the aggressive efforts by the framers to guarantee strict levels of protection for Europeans’ data when companies move that information to servers in the U.S., Privacy Shield is expected to face immediate scrutiny from European privacy watchdogs and the courts.

Of course, this implementation is complicated by the recent Brexit vote and its aftermath. While Theresa May, the new British Prime Minister, has made it clear that she will push ahead and make a successful Brexit, the first of many expected preemptive legal challenges to the decision to withdraw from the EU is being brought in London. This will likely prolong an already complex process and the impact on data transfers between the U.S., the UK and the EU remains very hard to predict.

So even if Britain were to negotiate a way to remain a part of Privacy Shield or get a standalone deal with the U.S., businesses will have to be prepared for the possibility of dealing with a myriad of ring-fenced IT systems that can no longer move data freely across borders or to the cloud.

Nothing will happen quickly in this space. Discovia will continue to work with clients on a case-by-case basis to navigate privacy and data transfer issues. The good news in the eDiscovery world is that our option sets are constantly widening. For example, the ever-increasing adoption of third-party cloud solutions, along with advances in the scalability of mobile processing and hosting applications, make “every corner of the world” highly accessible.