Thursday, April 3, 2014

Predictive Coding – A Look at the State of the Technology, the Impact on Review and Review Attorneys, Case Law, and What the Future Holds

PART I of IV: How Far Have We Come and Where Do We Stand

This is the first of four blogs on predictive coding that I will be posting in the next month.  This first entry will focus on how the technology and use of predictive coding has changed and where exactly it stands in the industry today.  The second will analyze and discuss the impact predictive coding has had on reviews and review attorneys compared to predictions regarding the same.  The Third will cover case law on the topic. Finally, the fourth will provide some predictions about what the future holds for predictive coding (yes, more predictions).

I am writing these blog entries in part due to my participation in the upcoming ACEDS conference, where I will be speaking on a panel about Information Governance.  This four part blog series will appear in the conference material as a part of that panel.  If you are interested in the field of eDiscovery and pragmatic discussions about eDiscovery issues framed in the context of real life situations involving real people, I suggest you consider attending the conference, which will be held in Hollywood Beach, Florida, April 27-29.  Additionally, I believe the ACEDS eDiscovery certification is a worthwhile endeavor and certification.  If you would like more information about it, it can be found on the ACEDS website (, or feel free to contact me as well.  Now, enough with the longwinded introduction and onto the actual substance of the entry:

For the past several years, predictive coding has been the topic de jour in the eDiscovery industry.  It was discussed at every conference and software vendors were scrambling to add a predictive coding module or functionality to their tool and clamoring to show its ROI and impact, often unrealistically overstating reality.  The former is no longer the case, as the industry has moved on to Bring Your Own Device (“BYOD”) as the current hot topic.  However, this is not because predictive coding was a fad or no longer matters, rather it is because it has maturated as a concept within the industry; when you mention predictive coding now in the legal industry, there is a general understanding of the concept and paradigm, and you will receive nods of general awareness rather than blank stares from those you are talking to.  Within the group of those intimately familiar with predictive coding, the distrust of the technology has subsided, and the distrust is often now focused on the process employed to run it and whether there has been sufficient training rather than the concept or idea itself.

The term Predictive coding AKA Technology assisted review (“TAR”) or Computer Assisted Review (“CAR”) among others, has itself grown and expanded, and although people have different thoughts of which is the most accurate term to use, in a very broad sense it is understood within the industry to refer to a technology and process whereby advanced mathematics is leveraged in combination with human input (generally coding) to cull or group a population of documents.  The exact workflow and technology differs by platform and matter, but generally the idea is to leverage technology to reduce the amount of material that is reviewed by humans in an accurate and defensible manner.  The concept itself has gained enough traction that EDRM developed a framework for it known as the Computer Assisted Review Reference Model (“CARRM”):

You can read more about that on their website at:

Despite the growing knowledge base and understanding of the concept, (as opposed to just the knowledge of the term), that has not necessarily translated into vastly increased use of the technology.  I was at a recent industry event where the presenter engaged in a bit of ad hoc polling.  One question they asked was how many people knew what predictive coding is.  The response was universally yes, the participants did know what predictive coding is.  He then asked how many had actually used predictive coding on a live project.  About 60% of the audience indicated they had.  However, when asked how many times they had used the technology, for most, the answer was only once or at most twice.  This can be attributed to many things including the fact that the industry has only started to accept predictive coding technology relatively recently and hence there have not been many opportunities for many companies to use the technology multiple times.  But while lack of opportunity speaks to this number in part, there is more to the story.  Most people who I have spoken to not only used predictive coding technology relativity few times, but they have done so despite having multiple opportunities to use it, but for which they chose not to.  I would say many companies will use the technology in one in ten or one in fifteen cases, by choice after considering the predictive coding option.

So why are companies choosing not to use predictive coding in every matter or project?  There are a number of reasons, a few of which include:

  • The Cost of the Technology – predictive coding is often an extra or add on expense to purchase that is not included as a part of standard technology licensing or even for use on an ad hoc basis.  Even if a company or attorney would like to use the technology, there simply may not be budget to purchase the technology.  If the cost comes down, not surprising it will be used more often.
  • The Technology is not Viewed as Being Effective Enough– not all predictive coding tools are created equal, even if the core technology they rely on is very similar or even the same at times.  Whether it is the base technology, the user interface, or the transparency and reporting of a particular tool, perceived deficiencies regarding some or all of these aspects can turn a user off of a particular tool.  If the tool a company spent large amounts to license and work into their processes has sub-par predictive coding functionality, they are not likely to abandon the tool just to utilize predictive coding, at least not quickly.  I work with one client who has an ECA tool and they were given the predictive coding module for that tool as a part of their license.  Nevertheless, after testing they are hesitant to use the module on live data because it lacks transparency and therefore trustworthiness regarding how it makes its decisions and the developers/sales people are unable or willing to explain and clarify better.  This is not a judgment or decision on predictive coding as a whole, but rather on the particular tool available to this client.  For them it poses too great of a risk to use outside of testing.
  • The Human Cost to Use the Tool is Too High For the Matter – even if you can afford to purchase or license the of your choice technology, it takes human time and expertise to use and train the predictive coding technology.  Often the person training the system is one who is the most knowledgeable about a matter.  At the beginning of the matter this is often a partner or high-level associate, both of whom bill at a higher rate than a junior associate and certainly more than a review attorney.  While the technology generally works the same on small and big cases, due to the mathematics of sampling, there is a minimum amount of training and sampling that must normally be done in a predictive coding project regardless of population size.  If the document count falls below a certain threshold, normally 50,000 documents give or take, it can often cost more for the higher priced lawyers to complete that training than will be gained by reducing the review population via the technology.  Additional related considerations is finding the time and pressure to make that partner or high-level associate actually review and train the system on several thousand documents, which can take days.  This is not an activity they typically perform and it can be like pulling teeth to get them to do so.  Thus, actually implementing a predictive coding project can be difficult to coordinate and implement.
  • The Matter or Material is Too Sensitive – there are simply some matters that are so important to a client, perhaps because the matter at issue threatens the very core and existence of their business, or the money and negative PR at issue is just so great, that they want eyes on review of every document.  While you could still use predictive coding technology to group and organize such a review, given that all documents will be looked at, decision makers often feel the time and expense of predictive coding is simply not worth it in such a case.
  • The Party Receiving the Data Does Not Agree – this is most applicable when the government is requesting something.  If the DOJ “suggests” you not use predictive coding, most people listen.   That is not to say that the DOJ is necessarily opposed to using predictive coding, in fact they have agreed to its use previously, including the high profile Anheuser-Busch InBev/Modelo merger (more on that in part three of this series), but they do not always agree to its use as a matter of course, which will obviously impact the responding party

What my experience has taught me is that at the end of the day, the clients and companies footing the bill like predictive coding because it saves costs.  Most are hands off in the process and details and are only generally aware that it is going on or being used.  While they want to comply with their duties to produce, it is the cost savings, not the arguably more accurate and consistent results that drives their adoption of the technology.  If the cost savings are not there they are not using it, and often that decision is being made on a matter by matter basis.  Even when the cost savings are there, that provides the motivation to push counsel to agree to its use, which is not always easy either.  Just as there are attorneys who still prefer to review paper, there are many more that are unwilling to avoid review of documents that hit on a search term just because a computer indicated they possibly could.  So on any given project, there are multiple hurdles to pass before utilizing predictive coding, even when the technology itself works well or even when the economics of it make sense.

What does all this mean?  Well, I suggest that it means the concept and idea is stable and accepted (even if understood on a superficial level by many, it is nevertheless understood now), and it will continue to be used and in fact its use should increase.  Similarly, software developers will continue to develop the technology because their customers will demand they do so.  However, and despite that, predictive coding will not be used in all matters, and may not even be used in most matters, and it will not spell the end of document review or the position of document reviewer.

For a further discussion of, and thoughts on, predictive coding’s impact on document review and document reviewers, please read part two of this series, which will be posted in the coming days.

1 comment:

  1. Just the eDiscovery information I was looking for... thanks!