The 4 myths used to justify not using email conversation analysis (and why they’re faulty)

Email conversation threading in DISCO

Why isn’t every review team using email threading for their reviews? Email continues to be the dominant form of business communication, and most legal cases will contain a substantial amount of email to review for the foreseeable future. Since review teams still need to review email, why not do it in the most efficient way possible: that is to say, why not use email threading and conversation analysis?

We hear four major reasons from the industry for not using the best available technology for email review, but those reasons don’t withstand any real scrutiny. At least they don’t withstand scrutiny when using DISCO as your ediscovery tool. Some might even call them myths.

Myth 1: Analytics are expensive

Let’s start with the term “analytics.” What does that even mean? In an ediscovery context, people have taken this to mean predictive coding, email threading, near-duplicate detection, and language detection, among others. Legacy software providers (e.g. Relativity, Ringtail, Recommind) group these features together, provide them with a single name, and charge extra for them at the beginning of the case. If you wanted any of one of these features, but not the others, you had to buy them all, much like a cable tv package. The end result is that people end up not using email threading when it would have been very helpful because they may not have wanted predictive coding.

DISCO busts this myth by providing email threading and conversation analysis for every matter without any additional cost; no upcharge and no cost-benefit analysis necessary. You have access to the features but are not required to use them. In fact, with DISCO’s flat-fee pricing model all of our advanced feature like machine learning, visualization, and near-duplicate detection is included in the standard per-gig price.

Myth 2: The case is too small for analytics

Some may consider this a variation of the first myth, but I think it deserves to be addressed separately. The claim is that the case only contains a few thousand or tens of thousands of documents (or whatever amount of documents you consider to be a small dataset), so analytics would not be helpful.

Setting aside the overbroad naming convention, almost no data set is too small to benefit from email threading and conversation analysis. An email by itself may not make much sense. An email in context of the larger thread makes dramatically more sense. Think about how you read emails in your daily life – isn’t it always presented in a threaded manner so you have context, instead of a standalone email? Similarly, the ability to review a single email, or possibly two emails, out of a thread of many, many emails while still seeing everything contained in the entire thread, makes even more sense during a review. While it is true that the benefits increase dramatically with data size, reviewers save significant amounts of time by not having to look at every single document, even on very small data sets.

Myth 3: Email analysis only works on natives and not on produced (non-native) images or scanned emails

The problem here is that most legacy software use the metadata from the natives or load files to create the threads. If documents are missing or metadata is missing, the software cannot build the threads or determine which email is inclusive of all others. Some technology also uses textual analysis to identify inclusive emails, meaning they contain unique content not found within other emails in the conversation. Even if the software uses both metadata and textual content analysis to thread, inclusive email identification can often be incorrect.

In Relativity, one example of this is called the “dreaded inferred” inclusive email. Incorrect inclusive email identification occurs when an email system inserts confidentiality footers or email signatures into the bottom of an email chain. This unique footer content confuses the system because it’s comparing the original email to an email down the line that contains the confidentiality footer or email signature. The unique content, that is irrelevant to the conversation text, prevents the system from correctly identifying inclusive emails. Most legacy platforms claim this gap is a “conservative” measure and will mark both the original and the subsequent email(s) as inclusive, negating the benefit of inclusive email identification for that thread.

DISCO doesn’t rely exclusively on the metadata of a document to thread or identify inclusivity. DISCO uses the native metadata when available but can determine some metadata by simply looking at the information on the page. For example, an image of an email still contains much of the metadata, such as to, from, date, and subject. DISCO can use this information, just like a human reviewer would, to make sure emails are placed into the correct threads. DISCO also has the ability to ignore bates numbers and signature footers when determining the unique content in a thread. Finally, DISCO unifies conversation threads where the emails exist in different formats, e.g. client-collected native emails and opposing party’s produced images.

In a nutshell, DISCO can create correct threads and identify most-inclusive emails both from the native documents and from produced/imaged documents that contain no metadata fields.

Myth 4: An inclusive-only workflow is too complicated and risky

The complaint here is setting up a workflow that only looks at the inclusive documents in a thread is hard to do and risky because a person is not putting eyes on every single document. Again, that may be true in other platforms, but it is not true in DISCO.

Workflows are very simple to set up in DISCO. In an ad hoc review, a reviewer can easily filter documents to view only the inclusive emails and start reviewing them. In a managed review context where more controls on reviewers are necessary, with a simple toggle you can limit the documents to only the inclusive emails and attachments and their parents.

The end result is that reviewers in either context are then presented with only the most inclusive emails for each conversation for review. They review a much smaller subset of the documents while still seeing all of the content contained in every document in the database.

It behooves every case team to ignore the myths and adopt the technology that allows them to complete the review in the fastest, most accurate, and most cost effective fashion.

Interested in learning more?

Read this case study to learn how a firm saved 4 weeks of review and over $40,000 by using DISCO's email conversation analysis.

Learn more

Kent Radford

Kent is the co-founder and general counsel for CS Disco, Inc., a legal technology company based in Houston, Texas. Prior to going in-house, Kent worked for both litigation boutiques and Amlaw 200 firms. Kent represented Fortune 500 corporations, privately held companies, international defense contractors, major oil and gas companies, famous individuals, and the world’s largest aviation company in a wide range of litigation, including commercial, intellectual-property, labor and employment, mass-tort litigation, and environmental.

Kent earned his J.D. with honors from the University of Texas School of Law in 2000. Before going to law school, Kent taught communication at the University of New Mexico and Illinois State University.