The Evolution of Translation Services in eDiscovery

In a global economy, lawsuits and government investigations frequently involve foreign language documents. Case teams can face thousands or even millions of documents written in multiple different languages. Lawyers in eDiscovery and litigation support professionals must quickly arrange for the translation of these documents so the case review can proceed.

With the advent of eDiscovery, many translation services emerged to do data translation. These businesses offered human translators with some legal knowledge. The manual translation process proved to be very costly and time-consuming. Review and production could be significantly delayed when large volumes of foreign language documents had to be translated.

Translation services have evolved over the years, with a tech revolution leading the way. However, a few things that have remained constant are the importance of translation accuracy, document chain of custody, security, and confidentiality.  Let’s take a look at how technology has revolutionized translation in Discovery and where it’s going…

On-the-Fly Translation Apps for eDiscovery

About twenty years ago, publicly available translation Apps started to arrive on the scene. Google Translate is probably the best-known. Introduced in 2006, it’s a free tool that translates text and other media from one language into another. The translation is done “on the fly” – you drag and drop your documents/text into the App and the translation immediately appears.

But for eDiscovery professionals, data security and confidentiality concerns are big red flags with on-the-fly translators like Google Translate. Uploading your documents to an app poses security concerns – do you know for sure what happens to your data after you upload to a 3rd party wesbite for translation? If a hack occurs, will your client’s sensitive information be compromised? There is also a risk that your document chain of custody could be challenged if you use an on-the-fly translator.

Early Machine Translation Apps

Early machine translation Apps were rule-based. The machine translation software analyzed the source text, word for word, using a set of rules.

Early versions of Google Translate used the statistical machine translation (SMT) method. This method leverages the most common previous translations to translate a document or file phrase by phrase.

SMT Shortcomings for eDiscovery

Since then, legal tech companies and eDiscovery service providers started offering SMT-based machine translation in eDiscovery. The tool’s website often came with a plug-in to eDiscovery platforms.

However, there are shortcomings with SMT technology for data translation in eDiscovery. With SMT, a translation is only as good as the tool’s existing reference texts. Words or phrases in eDiscovery documents that aren’t in the SMT translator’s library won’t be found or will be garbled or translated out of context. SMT translators can give you the general gist of the text. However, they can fall short when translating grammatically incorrect language, idioms,  cultural nuances, and context in language.

Anyone using SMT technology for legal data translation will want a human to review the translation results. In some instances, you may be able to limit the human review to key documents or documents from particular custodians.

From a practical perspective, your translation tool should ideally be well integrated with your discovery solution. If you use standalone solutions for data translation, you’ll have to migrate the translation results to your eDiscovery review platform. With a massive number of foreign documents, this will be time-consuming. Also, there is always a risk that data gets dropped during the migration.

Big Step Forward in Accuracy: Neural Machine Translation

Neural Machine Translation (NMT) combines deep learning techniques that mimic human neural and artificial intelligence to teach a machine to use patterns to generate a correct translation. Neural translation systems try to work similarly to the human brain, constantly looking for the right patterns and making decisions.

The NMT software is “trained” to find the most accurate translation based on examples of translations for specific text. Think of a search for patterns in the source language that match patterns in the target language.

The machine learns to recognize patterns in the source material that help it interpret the context, leading to more accurate predictions of likely word sequences. NMT technology constantly learns from feedback throughout a translation,  improving its ability to translate your foreign language documents more precisely.

Due to the ongoing learning capability, NMT is more accurate than SMT and other translation technologies. eDiscovery data translations using NMT will typically require less human editing/review. These factors are moving machine translation in eDiscovery towards the NMT model.

As of 2020, NMT could instantaneously translate texts with 60-90% accuracy. Ten years after releasing Google Translate based on SMT, Google modified the translator to use NMT. Google reported that its NMT system reduced translation errors by approximately 60% compared to its earlier SMT version. So, for now, you will still need to do some quality checks on your NMT translation results though.

It should be noted that a 2023 translator service analysis of Google Translate warned that due to its limitations, the tool shouldn’t be used for official or important business documents.

So, is the Future Generative AI (GenAI) Translation?

GenAI, still in its infancy in terms of being proven for use in legal matters, will likely turn out to be a staple for ESI translation in eDiscovery down the road. ChatGPT is the most famous example of GenAI. GenAI goes beyond translating an input into an output. This technology can create new translations because it learns from sequences of sentences. GenAI tools appear to be better at recognizing context and producing translations that more precisely present the tone and meaning of the original content.

NB – Keeping Your Data Secure and Compliant During Translation

With sensitive client information in legal matters, assessing your eDiscovery translation service provider’s data security and regulatory compliance protocols is important. You’ll want to confirm your data translation provider uses secure file transfer protocols and high-grade security protocols to protect your client data stored in their systems. Or better yet, use a provider whose translation tool is completely integrated into their Discovery solution.

Look for eDiscovery Systems with Machine Translation built in!

One of the best solutions to consider for data translation is integrating a machine translation engine into your eDiscovery platform, so you won’t have to worry about the security of your data stored at a translation service’s repository or data in transit. You’ll also eliminate the cumbersome task of moving translations done in a standalone translation tool into your review system, as discussed above. This makes for a smoother workflow. The risks of data loss or corruption on the move also go away. Stay open to change and carefully evaluate eDiscovery solutions that embed machine translation into your systems, translation services are constantly evolving!


Data Collection in eDiscovery: Are Field Collections a Thing of the Past?

Digital transformation (DX) is a core pillar of successful businesses today. Corporations continue to invest in technology to improve their productivity, enable secure remote data access, and provide a seamless online experience for customers and users. As a by-product DX has also dramatically changed how eDiscovery data collection is conducted. The once exclusively manual data collection in eDiscovery has increasingly given way to remote collection.

Traditional Field Collection in eDiscovery

Traditionally, companies involved in lawsuits or investigations have hired eDiscovery providers to collect electronically stored information (ESI) that may be relevant to the case. The digital data was often only available on an individual custodian’s computer(s) or on premise network servers.

Manual ESI collection was time-consuming and costly. A skilled professional had to perform the collection to properly preserve the metadata and establish a chain of custody that met evidence admissibility standards.

A consultant would travel to the company network locations and custodians to extract the needed data from their laptop, desktop, or mobile phone. You’d often see a parade of people entering and exiting a conference room where the consultant was trapped for hours, performing the collections. Of course, sometimes you got lucky and the consultant could collect email data from a company’s central email server.

Manual ESI collection methods also involve expensive fees for the skilled forensic experts who conduct the collections. On top of that, the travel expenses add up, especially with multiple (global) company sites to visit.

The other negative of manual data collection in eDiscovery is that it takes time. If you have a sensitive case that you want to settle quickly, you will be frustrated with the weeks of travel time collecting ESI manually.

The Shift to the Cloud and Remote Work

Digital transformation has shifted much of our corporate data to the Cloud. Using cloud-based software, productivity and collaboration Apps like Slack and MS Teams at work means data is now centralized on Cloud servers. The same applies to mobile phone data, including SMS data – stored on the carrier’s central repositories.

This means that the bulk of corporate legal case data is no longer restricted to individual employee devices, requiring you to hire an expert for manual eDiscovery data collection. Cloud data storage and secure access technologies have opened the door to more efficient, cost-effective eDiscovery collections.

The explosion of remote workers has also played a role in the decline of field collection in eDiscovery. Today, when employees work from a home office or travel office, their work is typically saved to a Cloud server or Virtual Machine. So, manual data extraction from their devices is no longer necessary.

Summary – Cloud is effectively escorting manual collection methods out the door

  • Now that businesses store the bulk of their data in a centralized location or Cloud, law departments can reduce collection time and costs. Emerging technology allows you to collect data from the Cloud using secure channels or APIs. The technology ensures the metadata remains intact and maintains the chain of custody.
  • For example, businesses using “Microsoft 365 as a Software as a Service (SaaS) offering in the Microsoft cloud” have one central place to collect custodians’ emails digitally. All the documents, Slack notes, and Teams data are readily available in the Cloud for any eDiscovery collection. Similarly, businesses that use Gmail and Google Docs will have access to custodian data via Google Accounts.
  • Employees may also use various communication Apps to store data in the Cloud. The shift to the Cloud is effectively escorting manual collection methods out the door.

Hold on… manual collections are not completely gone yet!

Finally, don’t overlook instances where you may still need to use manual data collection in eDiscovery. For example,

  1. All the legacy computers and legacy servers may not be connected to the central data repositories, and these will need to be manually catalogued.
  2. There may be a damaged custodian laptop containing important case data. Here, you could hire a forensic expert to try to recover data or fragments from the hard drive or elsewhere on the computer network.
  3. Also, you could bump into a key custodian-employee who only works on an offline computer and isn’t tech-comfortable enough to do a Self-Collection using an imager kit. In this scenario, sending someone to do a field collection in eDiscovery can solve the problem.
  4. There are also scenarios where the volume of data is simply too large and the only way to transfer the data sensibly is by using portable physical drives. This can also help overcome any speed or data connectivity issues.

“A thing of the past, with occasional appearances”

With so many businesses moving their data to the Cloud, most of the potentially relevant legal data nowadays resides in a central, cloud or network location. This means eDiscovery teams can go to a single or a few central sources to collect the bulk of legal case data using remote technology. Still, it’s important to remember that there are situations where ESI data collection can only be done physically or in person on a custodian’s machine or network location. Whilst these cannot be ignored… for the vast majority of collections, however, manual ESI collection in eDiscovery is becoming a thing of the past – of course with occasional “flavor of the old school” appearances as needed!