Many lawyers ask what is Technology Assisted Review (TAR). While it seems like a relatively new invention, the fact is, it’s a lot older than most would ever imagine. This week we take a little time out to look back at the history behind the technology with BIA’s Brian Schrader.
First things first though, we should take a moment to talk about terminology. The eDiscovery industry, like the government or military, seems to enjoy creating new names and abbreviations whenever the opportunity arises, and the growth of TAR in the eDiscovery realm presented an unescapable opportunity to do just that. From Suggestive Coding to Predictive Coding to Computer Assisted Review (CAR) to Technology Assisted Review (TAR), the industry has no shortage of terms for what is essentially the application of statistical theories to document review. For purposes here, we’ll stick to the most popular term used today: TAR.
Believe it or not, TAR really isn’t the new invention it’s made out to be. It sounds new, but the theories behind it go back 200 years or more. Indeed, one theory, called the Naive Bayes classifier, is based on theories that were first introduced to the public in 1812. That’s not a typo – we really mean it: 1812. Well, actually that’s the date it was first published. It was likely developed more than 50 years before that.
So, while the algorithms and theories have clearly evolved and gotten better over time, it’s clearly not the youngster it often purports to be. Indeed, if that fact was shared more clearly with Courts from the outset – specifically that TAR isn’t some new aberration but is based on hundreds of years of research and development – it would likely have become much more accepted much earlier. While we seem to now have turned the corner, and TAR is becoming much more accepted and routine, had the industry better explained the history of the science, that acceptance likely would have come years before.
At its heart, TAR is essentially based on mathematical equations and statistics that look at document contents, word usage and placement, how words are associated with other words, etc. It can get really complex, of course, but it’s all based on mathematical and statistical algorithms that have been used in industry after industry for centuries now.
What’s more, most everyone reading this article likely had first-hand experience with this technology long before it was introduced into the eDiscovery realm. A very clear and simple example of TAR goes back to the earliest days of the Internet. Most of us have, at least once in our lives, likely shopped for a book on Amazon. In that process, you probably saw the list of similar books or a link to More Like This– which took you to a list of books similar to the one you’re considering. Well, Amazon uses the same basic technology behind TAR to recommend books – and they’ve been doing it for nearly two decades – with pretty convincing results.
But it doesn’t stop there – you’ve probably been using some form of this technology in eDiscovery long before it was ever highlighted as it is today. Much of the near duplicate, conceptual grouping, discussion thread and other such technologies that have been in use for much longer than TAR are based on versions of the same technology. The difference is all in how the results of the statistical document analysis is applied.
So, while the current application of this technology is relatively new, we should not lose sight of the fact that the underlying science is centuries old, and that, in fact, the technology in one form or another had been in use in eDiscovery long before the term “predictive coding” was first coined.
When you start from that perspective, TAR seems much less imposing and much more like what it really is: the application of an existing and developed area of science that, when used properly, helps to amplify human knowledge and expertise, resulting in significantly more efficient and accurate outcomes.