In the Practical Uses of Brainspace webinar, Charles Duff from Brainspace, the leader in eDiscovery analytics, discusses the many features that Brainspace incorporates to facilitate robust document analysis. BIA’s Lisa Prowse and Barry Schwartz discuss how BIA and our clients use Brainspace to improve review, gain better and deeper insights into their data, reduce the overall review time, and ultimately save money.
The webinar panel also discusses topics around advanced review analytics in a clear and understandable manner, while providing real-world examples of its everyday use, thereby demonstrating its value, its efficiency and its dramatic cost savings in tackling the ever-expanding quantity of documents that today’s matters and cases bring to the table.
In the Practical Uses of Brainspace webinar, you will learn about:
- The many different modules of Brainspace and how they work to help you – including the infamous cluster wheel, concept search, thread analysis, communication pattern starburst, and dashboard options.
- Use cases, including investigating IP theft, intelligence & data mining, early case assessment and more.
- Walkthrough specific case studies which detail how the platform was used to save both time and money.
Watch the Webinar
Michael: Hello everyone, and welcome to the webinar channel of the Association of Certified E-Discovery Specialists. My name is Mike Quartararo, president of ACEDS. Today we are joined by our fabulous partner BIA for a webinar entitled, “Practical Uses of Brainspace”. Before we get started, please know that we love questions, and are always happy to take questions. You can submit your questions in the Q and A widget located on the bottom of your screen. All questions are anonymous. Also, if you’d like a copy of the slide deck you can download it from the resource widget also at the end of the page at the bottom of your screen. Without further delay, I’m very pleased and super excited to introduce our presenters and for that, I’ll hand it over to Mark MacDonald at BIA. Mark, take it away.
Mark: Thank you very much Michael, and welcome everyone to the “Practical Uses of Brainspace” webinar. On behalf of the presenters, thanks everybody for attending. We hope you enjoy the discussion. There’s going to be a lot of information so I’m not going to waste too much time here. But as Michael said, feel free to shoot questions in. I’ll do my best to prioritize them and get them answered at the end. Any questions we don’t get to will be part of our BIA blog and all the attendees will receive a link to that next week. Today we’re joined by Barry Schwartz. Barry is the SVP of advisory services here at PIA, 35+ years of experience across many, many, many, matters. Also a resident expert at machine learning technologies like Brainspace. Lisa Prowse runs our docucentre. She’s the SVP of legal services here at BIA. This is Lisa’s world, so we’re really excited to have her here with Charles Duff as well, senior solutions architect from Brainspace. So before we kick off guys tomorrow’s Halloween, so anybody knows what a zombie’s favorite technology might be? It’s braaaaainspace. I hope somebody laughed out there. Thank you very much. It gets only better from here, so Charles, go right ahead. Thank you.
Charles: Yeah so let’s flip over to the actual slides. So the first slides that we’re showing the persons that are on this webinar. The topics of discussion today, the benefits of Brainspace, of course, BIA has run hundreds if not thousands of projects through Brainspace, and are experts in the various analytical technologies that can be applied anywhere from document analysis with the unstructured text to improving review workflow and cutting down on the number of non-responsive documents. We’re looking to accelerate and create efficiencies wherever possible, and of course, we will be able to provide some real-world examples today. So sort of the overall topic today sort of the flow of the webinar is to talk about what is Brainspace first. I’ll be going through a few slides and I’ll take the slides from here.
And so at the end of the day, Brainspace can be summed up as an innovative augmented intelligence technology that seamlessly connects a human expert to world-class machine learning and interactive visual analytics designed to enable users to reduce legal risk and collect actionable intelligence. There’s a lot of intelligence there. And you know we all know the big sort of problem that we all run into right? Data is ever-growing but the teams are not, the budget is not, and the numbers are staggering and they will keep the data growing, not only from the current platforms that are used but of course new platforms are introduced all the time. So with Brainspace, we focus on what we call augmented intelligence right. You see on the side here the investigators, the attorneys, the litigation consultants. So we deploy many technologies within Brainspace primarily various forms of machine learning. Although machine learning is technically a subset of what official intelligence, you know any data scientist will tell you that from a practical standpoint and how people tend to actually use those terms. We tend to focus on the word intelligence, and so where they assume there is some level of creativity or abstract above all thinking whereas machine learning primarily and in people’s heads and in practice with algorithms is to gain knowledge throughout the use of various algorithms. So if you think about typing something on a typewriter and sending it out via snail mail then we move into an email and to chat and to text and now people are throwing in acronyms all over the place. Now we go to social media and throw in some emojis right. The evolution of the way people communicate not only in direct communicative ways like mail and text but really any unstructured forms like Word documents, PowerPoint presentations like this one. So it’s a very difficult task for humans to analyze all of that data and try to define what is actually in there. You know there’s various forms of AI of course as it applies an unstructured attack, but the actual intelligence piece of that and the computer making decisions, we’re not really there yet, right? So at Brainspace, we believe that because of this, that we need to combine the best of both the human and the current technologies which are every-changing as well. Which include visually representing the results of, let’s say the multiple patented algorithms that Brainspace provides so that it is easy and intuitive for the human. We call that augmented intelligence. We call that Brainspace.
So as noted previously, Brainspace has a variety of technologies that when used together in creative ways help drive all areas of post-processing, and we’ll get into these things later in the webinar. But uncovering the unknown unknowns to determining overly broad search terms that help reduce the number of non-responsive documents that make it into the review population in the first place. And of course, once you begin the review, you’re at some point going to have to read something, you’re going to want to accelerate that process and we have some stories there for you as well. So Brainspace clients of course including BIA have provided metrics back to us and you see that up to 90% faster. That is a real number that we have received back from our clients. And you know, although Brainspace plays heavily in the e-Discovery realm, we have primarily we’re an unstructured data analytic platform. We are used to analyze data in a variety of ways and intelligence agencies around the world and aerospace and engineering and patents. So really it’s the focus of unstructured data, yet the culmination of all of the visuals and the patented algorithms and just flow and ease of use of Brainspace and what it provides to other tools. You’ll actually hear about that later, we have many connectors. All of that plays very nicely certainly for e-Discovery professionals.
So just run through these two slides really quickly, there’s a lot to talk about so I won’t cover all of these in detail, but there is a lot you can do in Brainspace. From data visualization concept servers to predictive analytics, integration to other platforms, and even put in PSP’s and emails directly into Brainspace to take action and learn from that very quickly. And you see the last one, portable learning, that’s a very interesting topic that we will cover towards the end of the webinar. Of course, BIA has case studies, we have case studies ability and instances. You can see from here various forms of an investigation into government inquiries, various forms of litigation. You can apply the technologies in different ways. It’s not sort of a linear step one, step two, step three scenario. What we really like to do is give you the ability to pick and choose what works best for you for that particular case so that you can gain the most accuracy and efficiency and move the line as quickly as possible. Get the answers, reduce non-responsive data, and so on. So Brainspace, as you can see, we’ll get into some of the visuals later on, but that sort of sums up what we do and who we are, and how we approach the foundational problems of not only e-Discovery, but going into investigations and other markets as well. So now that you have gotten a glimpse of why there is a Brainspace, I will turn that over to Barry and Lisa and myself and we’ll talk about how BIA is using those technologies.
Barry: Thank you, Charles. First, what we want to do is give you a couple of examples of how we at BIA have used Brainspace to rapidly review and accurately review documents in several cases. Then we’ll go into the nitty-gritty of actually how we use it and the implications of its various features. So this first example is a real example from about a year ago, where a client of ours was faced with 100,000+ documents and a limited budget. Through the use of Brainspace, the single attorney was able to review less than 4,000 documents and have a high rate of precision and recall, 88%. These are tire terms and I’m not really going to go into the discussions of those necessarily, but that talks about the level of accuracy of the review and the amount of documents that come back that should be produced. In this instance, the client saved over 127,000 dollars. In our second example, not reflected here, there were actually over 4 million documents in the data set, using Brainspace and the camel model, the continuous multimodal learning model, which Lisa will go into a lot of detail on in just a minute or two, only 180,000 documents were reviewed. We actually, with further refinement, probably could have gotten that review set to under 100,000 documents. And the result was there was a 400,000 dollar savings in the cost of review with a review team of about ten people. The overall cost for that project was less than $100,000. One thing I should point out is that in using this type of model, a hybrid review, coupling Brainspace with human eyes, we’ve developed protocols that have been widely accepted, including several times with the DOJ. That’s an important note to make because the use of these types of tools now is becoming and has become widely accepted. That’s a very important point to make here.
Some of the real-world uses of Brainspace, and this is where I’ll turn it over to Lisa in a moment, is email threading so that only the most inclusive email and its branches need to be reviewed, and that cuts down a lot of the review time that’s typically spent on a linear review. Brainspace also provides near-duplicate analysis so that similar documents can be put together for the expediency of review. It also can do exact text duplicate analysis, and as I mentioned earlier, continuous multimodal learning, which Lisa will go into great detail on. Overall, as we’ve been saying, both Charles and myself, it drastically increases the efficiency and cost-effectiveness of a document review, and it allows for insight into the metadata. It allows for robust QC review and corrects if there are any issues, and it promotes efficient batching to allow or review prioritization. Lisa, if you want to take it from there?
Mark: I wanted to ask a question with your point on using this for government regulatory investigations. Is it fair to say that the agencies that you’ve dealt with who have approved Brainspace actually prefer it because it means less documents for them to receive?
Barry: That’s actually a good point, Mark. Yes, when we were discussing things with outside counsel for our client we did not have direct communication with the DOJ, however, the feedback was they appreciated getting fewer documents that were more on point rather than receiving not even a data dump, but a less cultured responsive set that would have happened with the normal eyes on human review. Coupling that with the Brainspace technologies, a very focused review set was provided and a responsible set was provided. Lisa?
Lisa: Yep. At BIA we try to use Brainspace. I would say we probably use Brainspace in every single one of the matters that we actually have our document review team reviewing. But we also try to encourage our clients to use it even if we’re not doing a review just because we see so much value, especially for the cost of it, which is significantly lower than it was years ago. It’s gotten to the point where it just doesn’t make sense not to run the analytics on your data set. It beats price points now. So assuming we’ve got a client, we still have a lot of clients who are wary about the process, and they don’t want to go full force into a full-time project. We still try and get all the other benefits out of it, including using that process and just not using it in the way that the client is expecting. One of the first thing that we do though is we’ll go through and run the email threading and the near dupe analysis, and then we use it, assuming that the client has said they just don’t want to tar workflow, we still use those pieces for review prior prioritization. It allows us to get a group of documents quickly that seem to be very similar to what the client is looking for. Rarely does the client have a lot of feed documents, for example, or example documents for us to use so a lot of times we’ll take, even if they’ve only got one or two, we can go in and we can find all documents that are very similar to those and pull those and kind of start to build from that. We kind of build-out. So it basically allows us to focus on documents very quickly and start to get those responsive documents as opposed to just starting with your pivot custodian, or just one section of documents and trying to move your way through. You’re really never, you’re just kind of, you won’t have all your responsive documents until you’re done reviewing. We’re trying to get the most responsive documents upfront so we can start rolling productions and those sorts of things.
Barry: Hey Lisa?
Lisa: Yep go ahead.
Barry: It’s interesting we switched over from government agencies accepting particular methods and of course many government agencies purchase Brainspace as well for some of the same reasons you have. They are becoming more accepting of the different methods and technologies that are used. And from what you see in the market, if I were to, as you said to run all of my projects through Brainspace because there is more often than not going to be some outcome that is more efficient and accurate than the traditional methods, keywords, key custodians, and so on. If you relay that back to your clients, are they more, and show them the results, are they more accepting of that? Is there sort of an evolution in that?
Lisa: They definitely are more accepting of it and it’s one of those things where if we, sometimes that client isn’t entirely sure that it’s going to work, and we basically just will run it just to show them and show them that this is what we can do if you want us to run all the data. And they’re usually pretty impressed that we can identify, especially like near dupes and textual dupes, textual exact duplicates that don’t have the same hashes. That’s a huge piece. Where I think that we’re at now is clients who are working with agencies, government agencies, probably don’t have such an issue with the analytics. If the client themselves don’t have a lot of technical background or the opposing side is very non-technical and it’s just in the realm of it’s not something we want to try and email threatening to the other side who simply can’t figure out what email threading means. To that end, what we try and tell the clients is we can still run email threading, and we can still run all of these processes, it doesn’t change the outcome. We still wouldn’t be propagating coding. We’re still going to look at all the documents you just look at them in a different order, and you’re going to get a lot more benefit out of it.
Charles: In that sense, it’s interesting when you talk about trying to defend a method. When you think about that traditionally, you have to defend why you didn’t look at something.
Charles: Either it’s a sort of attribute base which is DNA, STD period, standard procedures, or it’s agreed upon these search terms where everything else doesn’t get with that. Or you start to get into statistical methods like predictive coding or review population is defined by a group of documents above a cut-off, and the other half or whatever you may do some sampling but you sort of agreed that at certain legal and so you’re not going to do it. So I think when people think about seeing them out, they’re thinking about it in or any really tar in general, they’re sort of thinking about it in that way. That it can be used to help define some sort of cut off, whether there is power or a revision of course, HTML, or predictive learning in general. But it’s interesting because if I am using predictive analytics to define a cut-off, I can use those same predictive analytics to sort, and I don’t have to defend anything at that point. I’m just using it as a way to put the good documents at the top.
Lisa: And that’s exactly it. For clients who don’t want to do a traditional predictive coding tar type model where you’re propagating coding for a certain section of documents, that’s exactly what we do. We will still use the CMML model as you can see on the screen. We basically will go in and take the documents and we’ll run them and get scores for them. We will use those scores to badge. Just because a document has a low score doesn’t mean it’s not going to get reviewed though. So if a client has specifically said they need all documents reviewed, then it just keeps grouping those documents and like on this slide we rescore at regular intervals, so we’re probably rescoring a couple of times a day usually. We’ll review 500-1000 documents pre-score, and then go back and dump the earlier batches and re-batch with a new score. And all that does is it just keeps, it’s almost like a funnel where it just keeps bringing the most relevant documents to the top for us to look at. So once we get to a certain point we can, based on the model and based on what we’re seeing with our eyes, we can tell the client, look we pretty much have gone through as many of the responsive documents as we can get, the model says everything else is probably going to be non-responsive. We can still go through that, but we can also go through it very quickly at that point because at that point we’re really looking for an outlier, and not just a coin flip as far as if this document is going to be responsive or not responsive or if it’s going to be privileged or not privileged. Because of the way things are grouped, we’re looking for outliers at that point and they will stand out easier. If we got one responsive document in the middle of 50 non-responsives, that’s going to stand out. I’m sorry, go ahead.
Mike: And that’s the ah-ha moment for clients.
Mike: When they see we’ve identified everything but the outliers and we can find the outliers typically.
Lisa: And quite often that is where we can kind of finally get a client to go, oh, I actually do agree, this makes sense. The way that we’ve done it and it works and even if a client has approval and could use tar but has decided they don’t want to use a propagation model. A lot of times we can still get to that point where we’re literally, we’ve found everything that we can find that’s responsive, and everything else pretty much looks like it’s not responsive, and we can do a lot of sampling sessions and stuff. Quite often the client will be pretty okay and open to go ahead and propagate the coding just for those very low score documents. Now we also take very good care to exclude documents that would unnecessarily get a low score so if it’s a document that simply doesn’t have text like a logo, or an image, or a screenshot, we’re going to pull that out. We’re not going to run all of our excels that don’t have a lot of text because clearly the whole thing is based on the text and the metadata. If it doesn’t have good text, we’re not going to artificially say its low score and therefore it can’t be responsive. We feel pretty confident that the things we keep in and how low scores are probably going to be non-responsive at that point. It is a very good way, even if you’re not going to propagate coding, it’s a very good way to go through the documents, especially if you continue to have that scoring process and batch. You’re constantly bringing the good docs up. Frankly, you need the good docs, you need the high responsive rate documents in the front end of that review because if a review team sees a lot of not responsive documents in the beginning, they’re view of the case starts to skew a little bit and you start to try to find responsive things in a non-responsive document. So because you’ve gone through 200 documents and you feel like well, something should have been responsive in 200 documents, so it seems important to get the most important documents in front of the reviewers first. Leave the junk, basically, for the end.
Charles: To shape that a different way as well, for the same reason, if I put the most likely to be relevant, even if I’m going to review the entire population. If I put the most likely to be responsive at the top, and I find a nice batching out, nice rich sets, looking at a lot of relevant information. Then I can start to gain intelligence on what else would be relevant. So there are certainly cases where something looks un-relevant, is tagged un-relevant, but had you known the context that that information was provided in and you would have known that had you seen other relevant information first, having that method can give you a better understanding of what relevancy truly means. In conjunction with how people speak on that topic, nobody comes out and says a particular thing right, like ‘I’m committing fraud”, nobody does that and so if you can garner that information quicker upfront you have a better understanding of the case and that continues to accelerate the finding of good information.
Lisa: And another thing that we notice sometimes is that it helps us in particular if a client is pretty sure they know what they are going to find in the documents and that’s what they are telling us to look for and we just aren’t seeing that. They’re sure that there are going to be spreadsheets that say this, and we’re just not finding that information like they’re expecting. The fact that we’re using the term process and you know the camel model and stuff to push the relevant docs up either tells us that A, we haven’t found any of those and we haven’t been able to train the model or B, they’re just not there and that’s when we can go back to the client and tell them that maybe they didn’t get the right custodian, or maybe you didn’t get the right collection because we’re not seeing those documents and we should have seen them at this point in the review.
Another huge area where we use it is QC quality control for the review as well as for redactions and other things. In the past, QC has really been, you would take a random sample of documents and you would basically re-review them to make sure that they were coded correctly the first time. Or if you knew you had a bad reviewer, you might go through the documents to make sure that they are coding correctly. While there’s obviously a place where we’re narrowing in if you’ve got a bad reviewer, but if you’ve just got a team of people that you expect are reviewing pretty well random sample really doesn’t get you a whole lot of information other than yeah we looked at a couple of thousand documents and they looked good so we’re good to go. We use Brainspace heavily for QCs. Such that we will take a set of documents and we will pull in the entire email threads and see how consistently they’re coded across the email thread. Knowing that at some point in the thread it may have changed from responsive to not responsive, but it’s never going to go not responsive, responsive, not responsive in that chain. If there’s a break in the chain it’s going to be consistent and stay there. We also will bring in the textual duplicates obviously because those are documents that we couldn’t de-dupe out with hash deduplication, but they still have the exact same text. For example, a Word doc that was printed to PDF, or a lot of times we’ll have emails that are on vastly different systems and you know some of them have been collected from a Mac and others collected from a PC and at some point they just didn’t read up against each other. This will allow us to identify those because it’s identifying exact text. And obviously there’s no reason why the two documents with the exact same text shouldn’t have the same coding. We can also expand that down to the near duplicates. Near duplicates obviously at some point could vary. You could have a draft contract that is not responsive or as privileged up to a certain point, and then once it gets released to the public it’s no longer privileged or now it becomes responsive or something.
We can, if we take a document that we can see has been coded to say responsive, we can pull all near-duplicates and we can pretty much look down lists and see if there is an aberration that we want to hone in on for QC. Then we will also QC what we call overturns, both system and reviewer overturn. So an overturn is where, let’s say Brainspace gave the document very high scores far as relevancy and a document reviewer tagged that one in particular that we want to look at it, And the opposite too if Brainspace gave it a very low score and we said it’s responsive, we want to look at that because that could indicate either note that we miscoded something else that we need to correct so that we correct the camel model, or it could mean that we’ve got some bad text and we maybe need to look at that or pull those documents out. It just gives us a lot of areas that we can QC and a lot more bang for our buck as far as that versus just a random sample of 2,000 documents. They look good and we can move on and it doesn’t really do those sample sets of documents for QC every really shows you a whole lot of information that you need to go back and look at to look things up. I really like using the analytics boards for QC.
Charles: On the note of what we call it the consistency report or the turnover. It’s really interesting these predictive analytics are of course based on text and the algorithms that Brainspace provides with the unstructured text sort of feed that end, so we use technologies all together to try to get the best out of all of them. And one of the interesting things is as you said if there’s a document tagged as non-responsive, and it has a high score, I can go look at it, maybe there is some reviewer education that needs to happen there. But what if the document is really not responsive? Or vise versa. Another interesting thing you can take, we sort of show you what made that document relevant in the background, and you can take that and find out all other documents with that same attribute and mass tag so we’ve actually turned this sort of seemingly negative thing into a review accelerator. I just wanted to point out another creative way to do that.
Lisa: Yeah, I don’t even consider those negative anymore. But exactly what you’re saying. If we’ve got a document that the system has scored very low and the reviewer looks at it or the outside comes and looks at it and says ‘oh this is a key document that we really want’ and it just doesn’t happen to have any of the keywords that they are looking for or something, the only reason we are seeing it is because of the conceptual and the clustering and stuff. We will take that and we will see what terms does it have in it, find all documents that are very similar to that, and take a look at those in particular and see do we need this for those and button, code those as responsive, and find anything similar. With that, I’m going to turn over to Barry to talk about how we used this for privilege reviews.
Barry: Thank you, Lisa and Charles. We clicked at the same time. Lisa did touch on some of the points with respect to the privilege review, but as we have here on the slide, it allows us to group those documents that are under consideration for privilege or not but do have privilege themes in them, as Charles was just saying, to make sure that the privilege calls are consistent so that the language that makes a document privileged is flagged and identified in all other similar documents so that we don’t leave any documents behind from the privileged log that should be on that log. And we do produce those that shouldn’t be privileged. It’s also an important consideration, as Lisa was saying, to identify where, if at all, an email thread breaks privilege so that the more recent parts of a threat or even a lower part of the thread will have non-privileged information as it was shared with outside parties, the opposing parties – somebody other than council or internal to the company or the organization where privilege is indeed broken.
A couple of the examples that Lisa mentioned, and I’ll go over again, are draft emails. Typically draft emails between attorneys reflecting comments from clients and so forth are almost always considered privileged. But the final version may not be privileged. Sometimes the draft emails or the drafts distributed through email have privilege broken, so in some instances, a document is sometimes privileged and sometimes not privileged, and those all have to be accounted for. One thing to remember with Brainspace is that each document stands on its own, so we’re looking at the four corners of the document, not necessarily the families during the analysis using Brainspace. When we have a family of documents and there’s a responsive document in the way Brainspace primarily works is it’s a yes/no determination. It’s responsive/not responsible, it’s privileged/not privileged, it’s hot/not hot. If other documents are brought along in the family, those are coded but they’re not coded as responsive or yes or privileged or whatever. They’re coded as responsive family, or if it’s a non-responsive document it’s coded as a non-responsive family so that those calculations, those determinations, don’t affect the calculus of Brainspace.
Charles: Just to add in there, anytime you’re dealing with predictive analytics, you need to be aware that when you’re talking about supervised in the sense that you need to tag a document so that predictions can be made on the remainder of the population, it will take whatever you give it. And if you say an email that says “Hi Bob” is responsive, because the attached word document is a key hot document, then ‘Hi Bob’ now looks as if it’s responsive. So you want to make that distinction, and you can tag, as you said, because of family or not because of a family but for predictive analytics, in general, you want to make sure you stay in the school.
Lisa: And I would join. For somebody who’s new at reviewing for a tar or unpredictive type process, it actually really helps to just switch over to the extract in text view. I know it’s not anywhere near as pretty as looking at emails and spreadsheets and PowerPoints, but if you look at the text view, you get in that mindset of this is all the analytics engine can see is what is right here on these four corners. Anything else should not cloud or impact my judgment as to whether the document is responsive or privileged or whatever. Once you code like that for even just a few hundred documents and you get the feeling of nothing else matters, just what’s on my page, then it starts to make sense and it makes it easier.
Barry: Exactly correct. A couple of other things that we use Brainspace for. You can see them on the slide here. Many times consoles change over the course of a matter, and the previous council may have produced documents and the only method of getting those documents into the current review platform is to OCR the imaged production. So we can take those productions, run them and the text against the text of other documents that are in the current document set, and see what was previously produced and what is in our current set of documents that we don’t necessarily need to absolutely review again. Or because they’ve already been reviewed or produced. It speeds up that part of the document review process rather significantly. It also identifies holes and misinformation from prior productions.
And the same, and I’ll skip to opposing party productions first, same with opposing party productions. When we’re looking at documents that are received from the opposing council, we want to know and we use Brainspace for this, what documents they produced that we already had. And what are the new documents that they produced that we didn’t know about? And then also identify gaps where they should have produced documents that we know about that they had but they didn’t produce, because there is a possibility that they didn’t do a full production, there’s spoliation of evidence, and you know other considerations and those are key points when you’re dealing with opposing councils to verify the completeness of their productions. Likewise, on third-party productions, many times our clients have agreements with their vendors, their consultants, their experts, and so forth. We want to know what they’ve produced that we didn’t know about and what they produced that we didn’t know about. And it just makes for a much more, using these tools, Brainspace in particular, it makes for a much more informed review of all the documents in the matter. Our documents, their documents, and third-party documents. Lisa?
Lisa: Yeah I was just gonna add that where clients, we’ve got a lot of clients where they just didn’t see any value in using any type of analytics on their own data set. Usually, because it was a small data set and they may have already literally just agreed to produce everything they had collected. Then they come back to us and say okay now we’re getting our opposing productions in and there’s a million five, there’s a ton of documents that not only did they not anticipate, they certainly didn’t anticipate the cost of reviewing a million documents from the other side, especially when very few of them are probably going to be helpful. We’ve found using analytics on the outside production to be just a vastly superior way of going through that information than anything else that we’ve ever come up with. Even if we’re talking about just email threading, just so that they can just look at the most inclusive email because they don’t need to look at the lower parts because you’re not coding for responsive or privilege, these are closing productions. There’s absolutely no need to look at anything lower in the chain or higher in the chain other than that final email. And just like Barry said, being able to remove documents that you’ve already seen in your set that you don’t have different control numbers and stuff so you can’t use against them. But that brings to light okay so you looked at 200,000 documents and opposing just gave you a million, and once we look at what’s unique, here’s 50,000 that you’ve never seen before and don’t have anything to do with the documents you’ve already looked at. This is where you want to spend your time, don’t spend your time with the other 500,000 documents that are already sitting there.
Barry: And it’s a huge leg up when you have a search and produce data dump to your side. It just allows for zeroing in on those documents that you absolutely, positively need to look at.
Charles: Hey Lisa?
Charles: Quick question, you know, do the reviewers themselves, and ask this question on behalf of the audience, do the reviewers themselves have to be experts in Brainspace or have some working knowledge of the tool? How many people on the review team need to have this expertise?
Lisa: Not at all. Brainspace is fairly easy to learn. But actually most of our clients we just keep them in relativity anyway. We’re got Brainspace hooked up with our relativity and everything kind of goes back and forth. Anything that we run in Brainspace is populated in relativity, so the reviewers, for the most part, we have a few people that act more like an admin and go in and run certain searches and stuff, but to the extent that we need to batch out documents for 20 people or something, all that is done in relativity. Everything looks exactly like it does in relativity, there’s nothing new or fun to learn or anything. It literally is just reviewing through relativity but instead of we’re normally in the relational pain, you can bring up a family, well now you can bring up the email threads from Brainspace, and you can also bring up the near dupes, and at the same time we use all of the same workflows that we have for productions and our production QC and everything else we incorporate Brainspace into all of that because it’s all of that field and data is already in relativity and it stays there and we can use it for the life of the case.
Barry: I’ll chime in on that too. There’s a lot of types of users that can get into Brainspace. We’re asked that question a lot, like who actually uses Brainspace. Meaning the title role, and really there is no definition on that. You can have partners, analysts, paralegals, contract review – it’s really depending on how much information you have, what you’re looking for, and the type of human being you are.
Lisa: Yeah I was gonna say the type of person you are.
Barry: Yeah, yeah. Some people like to get into Brainspace, some people don’t. So when you talk about a learning curve, it’s interesting a lot of the visuals which I’ll show here in a minute, you can get in and find information really quickly. But if you’re looking to use primarily predictive analytics, the rank that we provide is automatically written back in a relatively and the user, the reviewer on the Relativity end who has just gotten a batch to review, may have no idea that Brainspace was used on it, they just know that they’re getting a highly rich set of relevant documents in their batch.
Charles: Yeah so with that I will take back over the slides. We want to talk about how some of the visual analytics that Brainspace provides and sort of sum all of this up. So we sort of started with the foundational level of Brainspace and what the intent is, and then, of course, BIA’s uses and now we want to show you some particular things. So we’re called Brainspace for a reason. We create when we pull in the text to metadata. We massage some of the text, we strip out some things that are unnecessary, and we take the rest and say show me multiple concepts. We have a patent on multiple concepts extraction, and then also freeze detection. A document is not about one thing, and that one thing should not be one word. If I tell you a document is about the agreement that doesn’t help me a whole lot. What is it really about? Give me full phrases as concepts that actually help me determine quickly the nature of that document. When we extract all of those things, we create this, as you can see on the slide, this sort of hypersphere of related concepts. So we track them from the documents and we relate them to each other. And when you go to a concept search in Brainspace, you’re actually accessing the brain. When we have been involved in patents and education and engineering and aerospace and searching for hot documents and using your keywords of the concepts or steps, what you’re doing in Brainspace. And the aim is that you may not know exactly what you need, right? You may know what you’re looking for, but you don’t know what that thing actually looks like. And you’re hoping that your keyword will get you there. And you don’t have to take the iterative process of getting a population via keywords and then you read some documents and you learn more now about what is actually there and some new key terms and maybe some hidden phrases pop up and you go search for those documents. We want to keep you from doing that. So we try and surface those things at the very beginning. That’s the foundation of what the brain is. It’s uncovering what is actually in the data with what you think is in the data. And that tends to surface good information pretty quickly.
So a couple of other things about Brainspace. We do employ various open source technologies. You guys on the webinar are probably familiar with some of these, but then, of course, we put our secret sauce on top of those and create the brain and sort of accelerate everything from there. There is a particular slide that I like. If you, from a very high level, I the data, send relativity, I process it, I push a button, I get to text metadata to Brainspace. What happens? First, and you don’t necessarily have to start here, we can get creative. But primarily people start with what they know key words, example documents, and so on. They put that in our visual analytics and find key information very quickly and identify large sums of non-responsive data so we can get that out of the way. We tagged those documents, we feed that to be CMML (Continuous Multi-Modal Learning) it’s our version of COW but that was taken. And so we apply a ranking very quickly, takes only a few seconds, to millions of documents. And I think y’all I’ll let Lisa talk about this again, but I think a lot of this success has come from this particular workflow.
Lisa: Yeah it has. This is exactly what we do. We have searches set up in relativity for anything that’s .9 to .99 anything that’s .8 to .89 so that we can quickly, once we’ve rescored, we just grab those same searches that are already they’re dynamically repopulating with the new scores or the documents with new scores, and we badge it up that way. And this is how we try to consistently give the reviewers the most responsive or most whatever we’re looking for documents at a time.
Charles: And there is this next slide, what we call diverse active. Lisa mentioned that earlier. There are two primary reasons for diverse activities. And let me just preface this with diverse active on a high level is meant to give you 200 to 250 documents that are the best possible representation of some people call it the null set, the untagged documents. And so we do that in three ways. First of all, we can do random. But random can, although it’s great for statistical measurement purposes, it’s not necessarily all that great for seeing what is really out there and diverse active does a very good job at that. So we sort of look at it in three ways. So density is the first one. We know all of the analytics when you first put the data from relativity into Brainspace, we do a lot of analysis. We know how many documents will have impact prediction lines on other documents. And if I am trying to, from a density standpoint, give you a good document, I want to give you one where if you tag that I can make more accurate decisions on a lot of other documents as opposed to only one or two other documents, that accelerates learning. Diversity says to do that but look all around Brainspace. We want to be very diverse. We don’t want similar documents, random can give you fairly similar documents and we don’t want to do that, that even further accelerates learning. And then the uncertainty says you know we’ve already had an initial predictive rate and we know which documents we haven’t really seen so there’s sort of in the .5 range, .4, .6 maybe and we want to give you those so that we can learn from those. And when you combine all three of those along with the brain, the results of the brain and the other algorithms that we use although it’s a very complicated query, but it gets you 200 documents that if you review those, your accuracy of the predictive rank will skyrocket and we’ve seen that over and over. BIA uses it, many people use it. And for that same reason, you can, even if you’re not using CMML, you can still let’s say at the end of the review, you have a set where you’re sure nothing, no relevant information is out there, you can target that population for a diverse active round and it will give you a very good representation of what is actually in there so you can feel more confident while you’re taking this sort of very low effort high-value approach to being sure. More sure than you currently are.
So let’s run through some of the Brainspace visual analytics and then we’ll open it up for questions. I’m cognizant of time here. So the cluster wheel, we’re not going through a live demo today by the way, just going through some sides here. Our cluster wheel is essentially a map of your data. This happens automatically, and we cluster based on similarity, not a concept, I want to point that out, it’s not a concept. We’ll say a similarity wheel if you want to say that but we call it cluster wheel. And so you can see the blue clusters, those are roughly talking about the same thing from a high level. Yellow is talking about yellow and green is talking about green. And these are particular topics in your data, right. It’s the context of things being used to rate people speaking to each other about different things. And the further you go out if you’ve seen a demo of Brainspace or if you’d only do it for a little bit, you can click on these and they sort of drill out and get more distinct about what they’re talking about. So maybe if you have a cluster around sports inside there, you have a cluster of football and basketball. Within basketball you have NCAA, and WNBA, and NBA so you get more specific. A lot of people use the cluster wheel just to see a map of their data. And an interesting thing is you can also create a new wheel on the subset of data. So if I have my entire data set in Brainspace, I want to see a map of a particular person, I can create a new wheel on that person and get a feel without having to read a bunch of documents with what that person is involved in and what they are communicating about.
So next up is communication analysis. This is essentially a link analysis. We automatically email thread and draw the visualization for you. So the first time you get into Brainspace when the initial analysis is done you’re looking at all of these things. That’s the goal of the schematic. And you can drill into the particular persons, and domains, and query using the visual. So the circle, the big yellow circle, represents a particular person. I can click on that person and say only show me communications to and from them, and then maybe see where their documents lie on the cluster wheel if you want to use all of these technologies together.
Then the next one, the dashboard. The dashboard is sort of more metadata-driven. We give you, Lisa was talking about exact duplicate detection and even timelines and other things. We visually represent that to you so that you don’t have to go build a query. That takes time and it’s not very efficient, and how many closed parentheses do I have? We want to stop that. So we give you visuals that you can click on and it doesn’t feel like you’re searching anymore. It feels like you’re clicking on things that are important to you. If you look at the blue sort of squares, the different shades of blue, we actually call that a term heat map. We take our concept search and make your search terms better, and then we display that over time. We can tell you in a very easy visual manner what time period your relevant documents are most likely to be and we sort of highlight that for you.
Mark: Since we’re at the top of the hour and we’re been actually responding to questions in real-time, so I think we can donate maybe the next couple of minutes if we go over by a couple of minutes just hitting the highlights of our final slide and then like I said before we’ll send the questions and answers to all the attendees. I just wanted to say that so you can at least have a couple of minutes of extension time here before the session ends.
Charles: Absolutely, thank you for the time check. Our supervised learning control set includes predictive coding and CMML. We talked a lot about that so we represent that visually as well. As well there is conversation analysis. This is a different look at communications. We can actually put text, social media chat, bank accounts, phone numbers, and you know all on the same graph. How people are communicating about the topic over time you can pretty deep with that. Our concept search I mentioned earlier, you put in what you know, we will tell you what is actually there. And typically people use that to make their search terms better and actually get to a better starting place for even for CMML. If we can add better, if there’s better, if there’s junk in and junk out let’s give it even better information upfront. Lisa mentioned that a lot of the results of Brainspace are written in relatively and email threats are written into relativity as well automatically. And so it’s primarily used on the relativity side with some annuities. But there is a thread viewer within Brainspace as well. Multi-language support, you know as far as the creation of the brain and running auger rhythms we actually can conduct the same type of analysis on I believe 28 major business languages around the world and over 300 languages in general. In the last point that I’ll get to is that we call this portable learning, if you think about the flow that we just ran through, we use the visuals in concept search to find some good information right off the bat, right? And then we said because we tagged that, Brainspace go predict for me, what else is out there so I don’t have to go look for it, right? Now we want to because we know what a particular type of fraud looks like or harassment and so on, we’ve encapsulated that within CMML, we can save that into what we call affordable models. We have different libraries’ worth. And if another case comes in, doesn’t have to be the same custodian, doesn’t have to be the same data, it has to be roughly the general meaning of relevancy. We can now apply those attributes as we’re now taking the documents from one case to another and we’ll apply an attribute that made those documents relevant and apply it to another case. And that can be a mold over time. And so you would end upbringing in a case from relativity to Brainspace and immediately applying predictive ranks and bashing out instead of even at that point having to go. You may still apply search terms and such.
So with that the last slide I’ll actually skip over because it sort of sums up the same thing. That is Brainspace, and how BIA uses it.
Mark: Excellent Charles. Thank you so much. Barry, Lisa, same to you. We are four minutes over and there was a lot of information there. We welcome all your questions, and like I said we’ll do a follow-up blog responding to them all. We’re gonna sign off here and again say thanks to everybody who attended. Have a great day.