Sunday, April 27, 2014

Predictive Coding Part III – A Look at the State of the Technology, the Impact on Review and Review Attorneys, Case Law, and What the Future Holds

PART III of IV: A High-Level Overview of Predictive Coding Case Law and Its Impact on the Use of Predictive Coding

This is the third of four blogs on predictive coding that I will be posting.  The first entry focused on how the technology and use of predictive coding has changed and where exactly it stands in the industry today.  The second entry discussed the impact predictive coding has had on contract review attorneys.  This third installment will cover some case law on the topic. Finally, the fourth will provide predictions about what the future holds for predictive coding.

I am writing these blog entries in part due to my participation in the upcoming ACEDS conference, where I will be speaking on a panel about Information Governance.  If you are interested in the field of eDiscovery and pragmatic discussions about eDiscovery issues framed in the context of real life situations involving real people, I suggest you consider attending the conference, which will be held in Hollywood Beach, Florida, April 27-29.  Additionally, I believe the ACEDS eDiscovery certification is a worthwhile endeavor and certification.  If you would like more information about it, it can be found on the ACEDS website (www.aceds.org), or feel free to contact me as well. 

In many ways, 2012 was the peak of the predictive coding buzz.  It was being discussed at every eDiscovery conference, software vendors were scrambling to develop their own predictive coding technology, link to coding technology platforms, or spin doctoring the capabilities of their product so that it appeared to contain predictive coding functionality, and most blogs had a thing or two to say about the topic, even if they were just token comments.  Case law, or at least case law analyzed and discussed by legal blogs and publications, also seemed to be discussing the technology in some breadth and depth.  The most notable example of this being the Da Silva Moore case in which Judge Peck played a leading, if somewhat controversial, and frankly overstated, role.  But there were others as well.

Since 2012, cases discussing predictive coding have been few and far between, and even those that were heralded for bestowing judicial approval and endorsement of predictive coding in 2012, lacked the finality and power many predicted they would have; although important at the time, the lasting impact of these decisions on the outcome of the matter has been relatively small.  Arguably, we still do not have a seminal predictive coding case, although Da Silva Moore is probably the closest thing we have to it thus far.

Some of the better known cases thus far include:
  •  Da Silva Moore: The judge Peck case from 2012 in which he infamously endorsed predictive coding and was subsequently attacked by Plaintiff (often personally and unnecessarily) for doing so.  Although Judge Peck endorsed predictive coding he actually did no more than acquiesce to a plan submitted by both parties to use the technology. Since 2012, the predictive coding aspect of the case has been fairly quiet, and Judge Andrew Carter recently denied the motion for class action.
  • Global Aerospace Inc., et al, v. Landow Aviation, L.P. dba Dulles, went a step further than Da Silva Moore In Global Aerospace, the defendants wanted to use predictive coding themselves, but plaintiffs objected.  Virginia County Circuit Judge James H. Chamblin, ordered that Defendants could use predictive coding to review documents.  Like Da Silva Moore, the court did not impose the use of predictive coding, rather, the court allowed a party to use it upon request.  In 2013, it became the first case in which a court approved the results of predictive coding.  Although the approval of the results is a success for proponents of predictive coding, the impact of this decision and its power to influence others will likely be limited as the details and results will not transfer to other matters.
  •  Kleen Prods., LLC v. Packaging Corp. of Am. went farther yet in that the plaintiffs in Kleen asked the court to force the defendants to use predictive coding when defendants reviewed their own material.  Although an interesting question, it is one the court never ultimately answered as the parties agreed on a protocol leveraging key terms instead.
  • Fed. Hous. Fin. Agency v. HSBC, 2014 WL 584300 (S.D.N.Y. Feb. 14, 2014), is another matter from the Southern District of New York, (like Da Silva Moore), in which the court, without much fanfare or publicity (unlike Da Silva Moore) approved of one defendant’s use of predictive coding despite objections from Plaintiff, noting that the technology had a “better track record in the production of responsive documents than the human review.” ---- As an aside, there are certainly studies that suggest this, and at times it is probably true, but not all predictive coding technologies are created equal, and not all implementations of it are created equal.  Just because you use predictive coding does not mean you will be accurate or have better precision and recall than key terms.  You must look beyond the fact that it is predictive coding if you want to know if it is being used properly and if it will lead to solid results; a Ferrari will only get you someplace without getting lost and more quickly than a minivan if the driver of the Ferrari knows where they are going and knows how to drive.
  •  EORHB, Inc., et al v. HOA Holdings, LLC, C.A. No. 7409-VCL (Del. Ch. Oct. 15, 2012).  A matter in which a Delaware judge ordered both parties to use predictive coding.
  • Anheuser-Busch InBev and Grupo Modelo 2nd Request is an example of governmental endorsement of predictive coding.  This matter involved the merger and acquisition of these beverage industry giants, who obtained the DOJ’s agreement to use predictive coding on a second set of requested documents.  That agreement likely saved the companies the costs associated with reviewing millions of documents.
  • Gabriel Techs. Corp. v. Qualcomm, Inc., 2013 WL 410103 (S.D. Cal. Feb. 1, 2013), suggests predictive coding fees may be recoverable.  In this matter, the court awarded the defendants attorney's fees under 35 U.S.C. § 285 based on plaintiff’s bad faith.  A portion of those fees included approximately $3 million paid for using predictive coding.  In addition to suggesting those fees were recoverable, its silence regarding whether or not to use the technology up front, is also indicative of the trend that predictive coding is de facto accepted by the industry and courts and its use is something not worth arguing before the court.
Why are there so few opinions about predictive coding at this point?  Is it just too common a practice to litigate absent the unusual?  Are the parties simply finding it not worth fighting over and that it does produce a reasonable result?  Yes and yes I would say.  Additionally, the use of predictive coding depends very much on the details of the matter: which technology is used, how it is used, and just as importantly how antagonistic the parties are.  All of that combined means it is difficult for any court to say predictive coding is acceptable across the board except at such a high-level that it is virtually meaningless as a guide or precedent for other matters.  So, while we are bound to see more cases and some opinions where it is approved or endorsed, the value and power of those will be very little as there will almost always be differences in cases that may or may not warrant the use of predictive coding.  Instead, decisions in the future will likely focus on the details and defensibility of implementation and results.

Regardless of case law or court or government endorsement and approval, the reality is that predictive coding is being used even without opinions discussing it or approving it. This use, often by agreement or at least knowledge between parties, but at times covertly, will certainly continue despite the paucity of  opinions touching on it or specifically endorsing it.

Thursday, April 10, 2014

Predictive Coding Part II – A Look at the State of the Technology, the Impact on Review and Review Attorneys, Case Law, and What the Future Holds

PART II of IV: The Impact of Predictive Coding on Contract Review Attorneys

This is the second of four blogs on predictive coding that I will be posting.  This first entry focused on how the technology and use of predictive coding has changed and where exactly it stands in the industry today.  This second entry discusses the impact predictive coding has had on contract review attorneys.  The Third will cover case law on the topic. Finally, the fourth will provide some predictions about what the future holds for predictive coding.

I am writing these blog entries in part due to my participation in the upcoming ACEDS conference, where I will be speaking on a panel about Information Governance.  This four part blog series will appear in the conference material as a part of that panel.  If you are interested in the field of eDiscovery and pragmatic discussions about eDiscovery issues framed in the context of real life situations involving real people, I suggest you consider attending the conference, which will be held in Hollywood Beach, Florida, April 27-29.  Additionally, I believe the ACEDS eDiscovery certification is a worthwhile endeavor and certification.  If you would like more information about it, it can be found on the ACEDS website (www.aceds.org), or feel free to contact me as well.  

About two years ago, I succumbed to the notion that predictive coding was the future in the eDiscovery industry (which was actually fairly accurate) and that this potentially meant trouble for contract review attorneys and their jobs (which has not proven to be true thus far). Click here to read my article on the subject from two years ago.   I was not the only one to succumb to this, and even those outside the eDiscovery industry picked up on it, including the New York Times in a 2011 article by John Markoff, but I must now admit I was wrong and I was being too short sighted.

Two years later and two years wiser (I hope at least!) and document review and contract review attorneys are still common in the eDiscovery industry, and arguably have not been impacted much by predictive coding.  Why is this? In part because of when and if predictive coding is used: predictive coding, although much more accepted and utilized than it was two years ago, is still not universally used.  In my post last week, I briefly discussed some of the reasons for this (cost, trust, human time, sensitivity/importance of the material, objection from the opposing party).  Also in part because of how it is used.  In practice, predictive coding is often employed as a method to prioritize documents or to cull documents, but not as a complete review eliminator; common predictive coding workflows prioritize the responsive material to the front of the review but do not remove documents from the review population, or they may act to cull and remove some, but not all, of the data, leaving the remaining to still be reviewed. 

Despite this, there is no doubt that predictive coding is used and reasonably often.  Since the advent and adoption of predictive coding, the underlying framework of litigation in US courts has not changed and litigants and subpoena recipient’s need to produce material pursuant to discovery obligations continues, and hence the need to cull that data in a defensible and reasonable manner still exists.   Predictive coding technology has become an entrenched part in this and is viewed as a reliable and acceptable tool by the industry now more than ever.

So if the tool that was designed to reduce document review is viewed as viable and is being used, why isn’t document review being reduced?  The answer is that it is in fact being reduced, but not from what it was, rather it is being reduced from what it would be absent the technology; year over year numbers may not decrease, but if the technology were not being used, current numbers would be greater than last years and greater than what they are using predictive coding technology. 

The fact is that data continues to grow exponentially.  One of my clients who is very proactive in their approach to eDiscovery, who is sophisticated and knowledgeable, and who is using predictive coding, is still reviewing the same if not more data per custodian than they were in previous years, even when using predictive coding, because the amount of data they preserve, collect, and search continues to rapidly grow; the new technology, although effective, is only allowing them maintain the status quo, if that.  Without the technology, they would be faced with an unmanageable amount of data to review/produce (at least from a cost perspective).  As an aside, another way that people are tackling big data is via information governance, including how to store less, collect less (searching pre-collection is increasingly a buzz topic), getting data off legal holds etc.   Big data and information governance will be the topic that the panel I am on at the annual ACEDS conference at the end of April will speak to.

What this glut of data means for contract review attorneys, is that even when there is approval and budget and acceptance of predictive coding technology and its use, there is still a place and need for document review and review attorneys.  Moreover, for the document reviewers, the technology has not changed their role or needed skill set much either.  Predictive coding generally adds a step to the process that takes place prior to contract review attorney involvement, and thus, by the time a review attorney is brought in, the process and what they are doing is very much what it has been as of late.

I fully expect predictive coding to continue to push the envelope and gain more and more acceptance and traction in the industry, however I do not see that translating into the extinction of document review or review attorneys.  I continue to think the larger and more real threat to document review comes from things like law schools producing a greater supply of attorneys then the market demands, as well as the proposed changes to the FRCP, and the sentiment those changes embody; corporations are saying enough is enough we need to scale back eDiscovery and the FRCP are a way to do that.

I and others were not wrong that predictive coding would have an impact on the legal industry, we just underestimated how much data growth would negate much of that impact, and in turn, we overestimated the impact of predictive coding on contract review attorneys, who, as it turns out, are not going anywhere for the time being. 

Thursday, April 3, 2014

Predictive Coding – A Look at the State of the Technology, the Impact on Review and Review Attorneys, Case Law, and What the Future Holds

PART I of IV: How Far Have We Come and Where Do We Stand


This is the first of four blogs on predictive coding that I will be posting in the next month.  This first entry will focus on how the technology and use of predictive coding has changed and where exactly it stands in the industry today.  The second will analyze and discuss the impact predictive coding has had on reviews and review attorneys compared to predictions regarding the same.  The Third will cover case law on the topic. Finally, the fourth will provide some predictions about what the future holds for predictive coding (yes, more predictions).

I am writing these blog entries in part due to my participation in the upcoming ACEDS conference, where I will be speaking on a panel about Information Governance.  This four part blog series will appear in the conference material as a part of that panel.  If you are interested in the field of eDiscovery and pragmatic discussions about eDiscovery issues framed in the context of real life situations involving real people, I suggest you consider attending the conference, which will be held in Hollywood Beach, Florida, April 27-29.  Additionally, I believe the ACEDS eDiscovery certification is a worthwhile endeavor and certification.  If you would like more information about it, it can be found on the ACEDS website (www.aceds.org), or feel free to contact me as well.  Now, enough with the longwinded introduction and onto the actual substance of the entry:

For the past several years, predictive coding has been the topic de jour in the eDiscovery industry.  It was discussed at every conference and software vendors were scrambling to add a predictive coding module or functionality to their tool and clamoring to show its ROI and impact, often unrealistically overstating reality.  The former is no longer the case, as the industry has moved on to Bring Your Own Device (“BYOD”) as the current hot topic.  However, this is not because predictive coding was a fad or no longer matters, rather it is because it has maturated as a concept within the industry; when you mention predictive coding now in the legal industry, there is a general understanding of the concept and paradigm, and you will receive nods of general awareness rather than blank stares from those you are talking to.  Within the group of those intimately familiar with predictive coding, the distrust of the technology has subsided, and the distrust is often now focused on the process employed to run it and whether there has been sufficient training rather than the concept or idea itself.

The term Predictive coding AKA Technology assisted review (“TAR”) or Computer Assisted Review (“CAR”) among others, has itself grown and expanded, and although people have different thoughts of which is the most accurate term to use, in a very broad sense it is understood within the industry to refer to a technology and process whereby advanced mathematics is leveraged in combination with human input (generally coding) to cull or group a population of documents.  The exact workflow and technology differs by platform and matter, but generally the idea is to leverage technology to reduce the amount of material that is reviewed by humans in an accurate and defensible manner.  The concept itself has gained enough traction that EDRM developed a framework for it known as the Computer Assisted Review Reference Model (“CARRM”):



You can read more about that on their website at: http://www.edrm.net/resources/carrm

Despite the growing knowledge base and understanding of the concept, (as opposed to just the knowledge of the term), that has not necessarily translated into vastly increased use of the technology.  I was at a recent industry event where the presenter engaged in a bit of ad hoc polling.  One question they asked was how many people knew what predictive coding is.  The response was universally yes, the participants did know what predictive coding is.  He then asked how many had actually used predictive coding on a live project.  About 60% of the audience indicated they had.  However, when asked how many times they had used the technology, for most, the answer was only once or at most twice.  This can be attributed to many things including the fact that the industry has only started to accept predictive coding technology relatively recently and hence there have not been many opportunities for many companies to use the technology multiple times.  But while lack of opportunity speaks to this number in part, there is more to the story.  Most people who I have spoken to not only used predictive coding technology relativity few times, but they have done so despite having multiple opportunities to use it, but for which they chose not to.  I would say many companies will use the technology in one in ten or one in fifteen cases, by choice after considering the predictive coding option.

So why are companies choosing not to use predictive coding in every matter or project?  There are a number of reasons, a few of which include:

  • The Cost of the Technology – predictive coding is often an extra or add on expense to purchase that is not included as a part of standard technology licensing or even for use on an ad hoc basis.  Even if a company or attorney would like to use the technology, there simply may not be budget to purchase the technology.  If the cost comes down, not surprising it will be used more often.
  • The Technology is not Viewed as Being Effective Enough– not all predictive coding tools are created equal, even if the core technology they rely on is very similar or even the same at times.  Whether it is the base technology, the user interface, or the transparency and reporting of a particular tool, perceived deficiencies regarding some or all of these aspects can turn a user off of a particular tool.  If the tool a company spent large amounts to license and work into their processes has sub-par predictive coding functionality, they are not likely to abandon the tool just to utilize predictive coding, at least not quickly.  I work with one client who has an ECA tool and they were given the predictive coding module for that tool as a part of their license.  Nevertheless, after testing they are hesitant to use the module on live data because it lacks transparency and therefore trustworthiness regarding how it makes its decisions and the developers/sales people are unable or willing to explain and clarify better.  This is not a judgment or decision on predictive coding as a whole, but rather on the particular tool available to this client.  For them it poses too great of a risk to use outside of testing.
  • The Human Cost to Use the Tool is Too High For the Matter – even if you can afford to purchase or license the of your choice technology, it takes human time and expertise to use and train the predictive coding technology.  Often the person training the system is one who is the most knowledgeable about a matter.  At the beginning of the matter this is often a partner or high-level associate, both of whom bill at a higher rate than a junior associate and certainly more than a review attorney.  While the technology generally works the same on small and big cases, due to the mathematics of sampling, there is a minimum amount of training and sampling that must normally be done in a predictive coding project regardless of population size.  If the document count falls below a certain threshold, normally 50,000 documents give or take, it can often cost more for the higher priced lawyers to complete that training than will be gained by reducing the review population via the technology.  Additional related considerations is finding the time and pressure to make that partner or high-level associate actually review and train the system on several thousand documents, which can take days.  This is not an activity they typically perform and it can be like pulling teeth to get them to do so.  Thus, actually implementing a predictive coding project can be difficult to coordinate and implement.
  • The Matter or Material is Too Sensitive – there are simply some matters that are so important to a client, perhaps because the matter at issue threatens the very core and existence of their business, or the money and negative PR at issue is just so great, that they want eyes on review of every document.  While you could still use predictive coding technology to group and organize such a review, given that all documents will be looked at, decision makers often feel the time and expense of predictive coding is simply not worth it in such a case.
  • The Party Receiving the Data Does Not Agree – this is most applicable when the government is requesting something.  If the DOJ “suggests” you not use predictive coding, most people listen.   That is not to say that the DOJ is necessarily opposed to using predictive coding, in fact they have agreed to its use previously, including the high profile Anheuser-Busch InBev/Modelo merger (more on that in part three of this series), but they do not always agree to its use as a matter of course, which will obviously impact the responding party

What my experience has taught me is that at the end of the day, the clients and companies footing the bill like predictive coding because it saves costs.  Most are hands off in the process and details and are only generally aware that it is going on or being used.  While they want to comply with their duties to produce, it is the cost savings, not the arguably more accurate and consistent results that drives their adoption of the technology.  If the cost savings are not there they are not using it, and often that decision is being made on a matter by matter basis.  Even when the cost savings are there, that provides the motivation to push counsel to agree to its use, which is not always easy either.  Just as there are attorneys who still prefer to review paper, there are many more that are unwilling to avoid review of documents that hit on a search term just because a computer indicated they possibly could.  So on any given project, there are multiple hurdles to pass before utilizing predictive coding, even when the technology itself works well or even when the economics of it make sense.

What does all this mean?  Well, I suggest that it means the concept and idea is stable and accepted (even if understood on a superficial level by many, it is nevertheless understood now), and it will continue to be used and in fact its use should increase.  Similarly, software developers will continue to develop the technology because their customers will demand they do so.  However, and despite that, predictive coding will not be used in all matters, and may not even be used in most matters, and it will not spell the end of document review or the position of document reviewer.

For a further discussion of, and thoughts on, predictive coding’s impact on document review and document reviewers, please read part two of this series, which will be posted in the coming days.



Sunday, February 16, 2014

Olympics of eDiscovery – One Can Dream

Most evenings the past week and a half my wife and I have managed to catch some of the Olympic competition currently taking place in Sochi, Russia.  Something we tune into with some interest every few years.  The Olympics really are a great concept, men and women athletes of diverse background and culture converging together for a few weeks of competition and sport, putting aside differences, history, and politics (for the most part) to compete and prove who is the best at various disciplines.

The competition has inspired me (no not to compete, that would be too cliche) to wonder, what if we could have an Olympics of eDiscovery?  In a geeky eDiscovery way, wouldn't that be great?  I image a competition amongst the various software and tool providers to determine who is best at different tasks: collection, processing, culling, review, TAR, and production to name a few.  This would not be a Gartner style report (which I do find helpful and a must read by the way) but instead, the tools would go head to head at the same time and place using the same data set and hardware horsepower.  Everything would be transparent and there would be a level playing field – no marketing or PR spun statistics, and no closed door exercise where only the “results” are presented.

Medals would be given in each category for different aspects such as speed, accuracy, efficiency, cost, and ease of use for example.  The end result would be bragging rights for the software producers and real useful knowledge and results for consumers like you and me who would finally have some objective data points to make apples to apples comparisons to the extent that is possible in this industry, and also hopefully a little fun as well.

I invite all software vendors big and small to consider this idea and throw your hat in the ring.  If you agree to participate, we, the users, will come.  So kCura, Symantec, Ipro, Kroll, Lexis, FTI, and any others, are you up for it?  I for one would love to see this, and think it would be of great interest to the eDiscovery community.

Sunday, February 9, 2014

Quality Control in eDiscovery – The difference Between Luck and Repeatable Success

As an eDiscovery project manager and Director of Client services responsible for ensuring the successful management of my client’s eDiscovery needs, having in place solid processes and procedures that are repeatable and defensible are keys to my success, my team’s success, and most importantly the success of my clients’ projects individually and collectively. Quality Control (“QC”) efforts are a crucial component to my processes and to the success of any project and I strongly encourage you to build them in to your eDiscovery processes and procedures in order for you and your clients to have full confidence in your eDiscovery.

Price, reputation, and plans are all important things to question your eDiscovery vendor about, but so too is QC, and it is not something you should wait until the end of a project to discuss.  All too often at the beginning of a project, people are focused on things like search terms and deadlines, and only turn to QC once the project is ready to wrap up, but really QC should be thought of from the start and should be built into any eDiscovery process, whether it be for preservation, collection, review or production (or any others).  QC will have its greatest impact and save you the most time and money the sooner you start it.  While it can be a cleanup tool at any time in the process, it can serve to prevent further error if started early in a project and its results are then used to identify points of misunderstanding or deficiency in your training or process.  Particularly in review (although not exclusively), once identified, the lessons learned during QC can become examples to provide to your team and retrain them to prevent future error and minimize the amount of recoding or other rework needed at the end of a project, which could blow budget and deadlines.

How much QC you perform and how you carry it out are secondary to the fact that you are performing it; amount QC’d and method of QC are only means to the end, which is accuracy.  If you are correcting the mistakes and have a clean product, that is ultimately what matters.  That being said, there is no one universal QC method to employ in all cases or all situations.  My teams have certain standard QC processes that we perform across clients and across projects, but for each project we also devise QC procedures unique to the purpose and idiosyncrasies of that project. 

My team’s familiarity with our clients, the tools we use, and our eDiscovery subject matter expertise allow us to properly craft these.  However, more and more eDiscovery tools are building methods and applications to assist even non-savvy users in QC.  One such functionality that many document review platforms are starting to incorporate is a method for creating random samples either by front end users or on the back end by administrators.   But even if you program does not offer this capability, you could use Excel to create a random sample of your material for QC; QC is not limited to only those who are technologically sophisticated or have the funds to afford expensive eDiscovery software.

To close out this article, I would like to again stress that while how you QC is important, the fact that you are doing it and doing it early in your project are what matters most.  Although performing QC will still have utility if you start it late in a project (and indeed at times it may be unavoidable),  in most instances the sooner you start the better, so you can identify issues and correct them before they perpetuate and potentially blow your budget or deadlines at the end.  That is not to say that performing QC at the start of a project alleviates the needs to QC at the end, rather QC at the beginning sets up a successful, succinct, and efficient QC at the end of a project.

QC may not make your product perfect, and it does not mean mistakes will not happen and still may not be caught, but what it will do is minimize those risks, while also providing an air of reasonableness to your actions so that if something does go wrong you can stand behind your efforts to avoid the error and point to your repeatable defensible process.