eDiscovery – finding the needle in an ever growing haystack

in Advisory, 05.04.2016

In the recent KPMG global survey, two-thirds of the respondents stated that they have not undergone an assessment of their litigation readiness. However, 54% of the organizations feel confident of their policies and procedures, but for those who still wonder what eDiscovery really entails, this article should bring forth some further insights and clarity into the matter.

Vast amount of electronic documents

From the simple, alleged breach of contractual conditions to the far more severe breaches of regulation or law, every company will have, at some point, to face an investigation and will need to disclose a set of documents as requested by the opposing party or government bodies. In the past, evidence that were provided in the form of digital documents were becoming increasingly popular. Currently, one can observe cases where only the electronically stored information (ESI) is taken into account and paper documents have become non-existent. The electronic documents are very easy to copy and their volume goes far beyond of what is manually manageable.

The increase in the generation of new data is such that, already in 2013, SINTEF stated that “A full 90% of all the data in the world has been generated over the last two years.”. EMC reckons that, between 2013 and 2020, the quantity of data will multiply by 10 from 4.4 ZB to 44ZB (a zetabyte being a billion of terabytes). Providing the small subset of responsive documents relating to an investigation in such a situation is a considerable and daunting task and the global survey shows that 36% of the respondents spent more than $1 million in eDiscovery.

The biggest challenge of an eDiscovery exercise arise from the number of documents to analyze. In fact, to manually review a million of documents would require several years for a single person. The second challenge is to maintain the chain of custody for each of those documents as ignoring the source of a relevant document would jeopardize its validity as evidence. Finally, the third challenge is the monitoring of the documents during the review itself and the ability to classify the information.

In order to meet those challenges, we can leverage existing software solutions. The standard work stream for an eDiscovery exercise is comprised of the following steps:

  1. Identification of the data sources (and discussion with the stakeholders);
  2. Data collection and preservation;
  3. Indexing, analysis and review;
  4. Production (i.e. exports of the documents from the system);
  5. Presentation to the involved parties.

This standard model is called Electronic Discovery Reference Model or EDRM.


(Source: edrm.net)


The first task of an eDiscovery solution would be to perform the indexing of the provided data. This indexing step achieves the following:

  • The deduplication of the documents: the deduplication rate reaches sometime more than 80 percent;
  • The computation for each file of a digital fingerprint that will change radically even though the content of the document has changed by a single character. This is how we know whether a document has been altered;
  • The creation of a search index (hence the name of the step) allowing efficient searches on the documents content as well as on the file meta-data such as author or creation date.

Once the data indexing step has been completed, the data volume usually decreases significantly. However, this is often not enough. The next step is to remove all the documents that are obviously irrelevant such as newsletters or documents created or modified outside of the relevant timeline. This set of operations, which are often considered trivial whereas they greatly decrease the number of documents, is called Early Case Assessment or ECA. Then, the eDiscovery solutions will allow us to perform very specific searches by applying (a combination of) keywords in order to further decrease the amount of documents that will have to be manually reviewed. Reviewers and litigation support staff having a good knowledge of the case background will elaborate those searches in order to select a minimum amount of documents without discarding the relevant ones.

Those three steps (documents identification, collection and indexing), ECA and searches decrease the number of documents whereas the relevancy of the remainder increases.

The costliest step now has to take place. The remainder is often close to one percent of the original set of documents, but this can still represent hundreds of thousands of items. The solutions will help us to manually review them by providing a friendly user interface allowing a case manager to set up the review and the reviewers to tag the documents. The interface also allows the administrators to monitor the progress of the overall review and to set up a quality assurance process. This review step demands a strong cohesion between the team members who must all have a good comprehension of the topic to investigate.

Once the manual review has ended, several scenarios can take place. The company may want to keep the documents for itself and wait for the opposing party to state that they have found something (in the case of the original dataset has been provided to the opposing party), having found the same document before the opposing party allows for defense preparation. The company may also want to provide exculpatory evidence or may be obliged to provide the complete set of relevant documents.

The redaction and production are the last steps of an eDiscovery exercise. Most of the solutions offering a review interface provide the same interface for the redacting phase, but with tailored tools, such as single word or entire page redacting tools or find and redact tools with a note or a redaction ID that maps to an individual identity.

Emerging trends

Regarding the analytical evolution, the global survey shows that cost is the main concern for 29% of the respondents. It is therefore on that point that the latest technological developments and vendors’ marketing campaigns focused. Terms such as Predictive Coding, Technology Assisted Review or Automated Review became trendy. Those technologies all have the same goal: diminishing the number of documents to review manually. In order to reach that goal, these new technologies try to determine automatically the relevancy of documents based on examples given by an experienced reviewer. Once the tool has analyzed these examples (between a few tens and a few thousands), it will build up a model of a relevant document. This model is used to determine the relevancy of the next documents.

Regarding operational ergonomics, the time spent online increases daily and we see the rise of the on-the-go mobile review platforms (e.g. tablets) becoming more popular. However, in some situations where the connection to the internet is not fully reliable, the offline feature provided by some solutions allows the review to continue after the initial download of a catalog of documents.


Further information:


Leave a Reply

Your email address will not be published.