In the Data Breach Discovery: Regain Control and Save Your Reputation webinar, we quickly learn that the number of reported corporate data breaches continues to soar higher and higher every year.
The topic of data breaches is becoming more commonplace, as we hear and see the headlines for the big corporations – like Google, Facebook, and Quora – that are affected. But it impacts organizations of all sizes. It may surprise you, in fact, that the majority of data breaches affect smaller organizations.
Even worse news: The average cost of a data breach is $3.86 million.
In the Data Breach Discovery: Regain Control and Save Your Reputation webinar, you will learn about:
- Cyber risk and what to do if your organization is affected
- Definition and types of cyber threats
- How to respond to a data breach
- Requirements for reporting data breaches
- How to use eDiscovery methods and technologies to quantify the issue and notify affected individuals
- Data Breach Discovery™ workflow
In this webinar, you’ll also learn how to use Data Breach Discoverv in an efficient, cost-effective and complete manner, so you can regain control of your data and notify those whose personally identifiable information (PII) may have been compromised.
Watch the Webinar
Webinar Q&A Session
We had a wonderful Q&A session at the end of the webinar, and we wanted to share our answers here as many of our online readers may have the same questions.
Is there a push in the US to merge the state regulations into a federal arena to limit the currently confusing regulatory environment, especially for businesses that work online and store their data in the cloud instead of physical data centers?
We think there is definitely momentum to standardize US privacy laws, whether at the federal level or by agreement among the states. GDPR might provide a roadmap for this. Note that privacy laws are generally dictated by where the individual resides, not where the data resides.
Can the existence and maintenance of a documented response plan mitigate breach liability? Are insurers requiring one?
A response plan likely will not mitigate breach liability, but it will almost certainly help contain direct costs and breach costs. Many insurers assist their clients with basic incident response plans.
According to a 2018 study by the Ponemon Institute, the #1 way to reduce the cost of a data breach is to have an incident response team, which is just one aspect of having a properly maintained incident response plan. Insurance companies are beginning to ask for a copy of an insured’s incident response plan during underwriting, but I have not heard of any insurance company refusing to issue a policy based solely on the lack of an incident response plan (likely because insureds can hastily prepare and submit a basic plan during the underwriting process just to “check the box” on that requirement). Please note that the mere existence of a plan does not necessarily mitigate any breach liability. Plenty of people draft a plan and then just leave it on the shelf without any further implementation or training. If employees are not trained on how to spot an issue or respond to an incident, or if there are no response team members identified and in place prior to a breach, then simply having a policy accomplishes very little towards mitigating breach liability.
Question for Chris Dix: When helping a business associate respond to a PHI-related data breach, do you typically review all of the related business associate agreements (BAAs) to ensure breach notifications are provided to each covered entity per the contractual requirements of each BAA?
In an ideal world, the BAAs are reviewed immediately after a breach, rather than during “data breach discovery.” A business associate should keep track of the notification requirements and other obligations under all of its BAAs – before a breach occurs. BAAs should be reviewed immediately after a breach, rather than later, because the timeframes for notification applicable under most BAAs are much shorter than the statutory time frames for notification generally applicable in the absence of a contract (e.g., hours rather than days). If the business associate has not reviewed its BAAs until after the “data breach discovery” effort is underway, then the contractual notification obligations under the BAAs are likely already past due at that point.
Have you seen any trends in the insurance industry as to efforts to establish standard forms and coverage policies? Given the associated costs and liability of a breach, it seems like the natural approach would be via & in cooperation with insurance companies that already insure existing businesses and firms.
There is a move towards standardization in the insurance industry, but it is still relativity early in the life cycle of cyber breach insurance. We think we will see significant changes to this business in the next 5-10 years.
Please discuss “dark and unstructured data” and how it’s important to know what you have and where it is. Also the simple best practice of checking your backups. Are you able to return to business from your backups? You’d be amazed at the number of companies that think they are okay because they back up everything only to find out they don’t know what the process would be to start up from the backup or how to get it in their system.
All company data that can be compromised is within the scope of notification requirements. If a bad actor can find and exploit it, then the company is responsible for it. Investing in regular data mapping, defensible data disposition and information governance efforts are all ways this can be addressed.
What are your thoughts about the security of third party vendors? It seems like more and more of the large breaches are coming via third parties. Why aren’t more organizations requiring their vendors to have stringent security in order to do business with them?
We ARE seeing a move toward security requirements and security audits for third-party vendors, and high-cost data breaches are one of the drivers of this change.
When you get to the capture / review phase – who actually does that work?
Data capture and review can be done by outside counsel (in the case of small matters) or by employees of the company. However, we believe the most efficient, cost-effective and quickest way to perform the review is by using contract staffing with appropriate guidance, oversight and quality control measures.
What is a good estimate of the time it takes from when you receive the data until you have a completed Notification List?
Data Breach Discovery projects can take from a week to two months, depending on the volume of data, clarity of mission and quality of execution. A mid-sized breach (less than 500GB), where the company, counsel and consultants are on the same page, is likely to take about six weeks from the receipt of data.
For the duration of a Data Breach Discovery project – who is in charge? Who is giving the orders and making the decisions?
Great question! Ultimately, the company must be the decision-maker (likely the general counsel), but hopefully with outside counsel and expert consultants providing meaningful input and support for those decisions.
Mark: Good afternoon, and welcome everybody, and thank you for attending today’s webinar on data breach discovery sponsored by our friends at ACED and hosted by BIA. My name is Mark MacDonald. I’m the senior vice president of business development here at BIA. Today we’re thrilled to have Ryan Bilbrey, disputes and investigations expert from Reckoning Consulting Partners. And Chris Dix, a founding and valued member of the ACEDS Jacksonville chapter, and partner at Smith Halsey and Busey. Ryan and Chris will be defining the issues and presenting best practices and solutions for how to handle a data breach. We’re also going to talk a little bit about the people, the processes, and the technology that are needed to help mitigate a breach. These days it’s well known that CIOs and CSSP’s top five issues are external threats, change in business infrastructure, data loss, and an ever-changing market. External threats are the number one concern for most organizations, and we know that organized crime is a driving factor in major attacks. The reason for this is purely financial. Attacks can be sophisticated and simple, but two things are for sure – you will be attacked, and nothing is secure. So with that, I’m going to turn the conversation over to Ryan and Chris. We look forward to a great discussion. Please forward your questions through the UI, and we’re going to reserve 10 or 15 minutes at the end. For those questions we can’t get to or respond to, we will post on our blog, and we’ll send everybody that’s attending today, access to that as well as a copy of the presentation.
Chris: Before we get down into the weeds today and talk about these eDiscovery issues within the breach context, I want to provide a bit of a backdrop for the listeners on what exactly are we talking about when we’re talking about a data breach? And the first slide that we have here talks a little bit about the unauthorized access used or retrieval of data by an individual group, application, or service. That’s a baseline kind of definition. The reality of companies and people these days is that we live in a country with 50 states that have a federal government, and all the states and federal governments have a variety of different agencies, all of which are interested in protecting their constituent. So there’s a wide variety of different laws and regulations that are increasingly defining this term breach and data breach. One difference that we see is that in some states like Florida, Florida considers unauthorized access to data to be a breach. Whereas in some other states and jurisdictions, you have to actually have someone using or taking the data, exfiltrating the data, in order for it to be considered a breach. And so you’re got to understand the definition of what a breach is in the jurisdictions that are applicable to you in order to determine whether you’ve actually had a breach.
The other big issue I wanted to touch on before we move much further relates to ransomware. I think in the early days of hearing about ransomware, a lot of people considered ransomware not to be a breach. The reason was in the early days you click on an email and open up a computer program that would begin to encrypt your data, and you pay the ransom, and it was a relatively small amount of money, and the program would decrypt your data. You’d be right back to hopefully being able to continue on. These days the ransomware tends to be the tail end of a much larger nefarious problem, kind of as Mark suggested just a minute ago. These days you’ll click on an email that has a PDF or a Word document which deploys a very small and innocent-looking program onto your computer or system. That program is then used to deploy a secondary set of programs and software which then takes a look at your network and either finds where the data might be stored, or collects information like usernames and passwords. Once the usernames and passwords have been collected, a third or even fourth wave of bad software programs are deployed on your system, and those are finally usually when the ransomware shows up, and it gets deployed in a strategic and intentional way by the bad actors. They are targeted, particular companies, or people, or systems. They know all about the systems, and by the time they finally get to triggering the ransomware attack and disabling your system. Many times, the ransomware is intended to cover the tracks or to divert attention away from the fact that they have maybe taken information from you. While you’re dealing with trying to retrieve your data, the bad guys are getting away and are never to be found again. And so I just wanted to cover that at the beginning here because I think there’s a misperception that ransomware is not actually a breach. That’s the other little kind of trick for people that are covered by HIPAA is that the HIPAA Security Rule requires covered entities and business associates have to ensure the confidentiality, integrity, and availability of electronic personal health information. So if you have ransomware and your data is encrypted, even though it remains confidential, and the integrity isn’t going anywhere, the availability isn’t there. You may have what is ultimately determined not to be a data breach because the personal health information didn’t get taken by the bad actors, but you may have a HIPAA breach or violation, because the data of your patients was no longer available, or was unavailable during a critical time. With medical providers, you’ve got a hospital, and you need to get access to people’s medical records even for a short period of time that could have drastic effects. And one last thing that increasingly people are obtaining cyber insurance. A lot of companies these days don’t have that coverage. Many people are figuring that out and getting the coverage, and it’s important to think about – what does your insurance company consider to be a breach? Because if you don’t follow what is in that definition or you don’t report an event to them that falls within that category, and even though you might have purchased insurance, you might not be able to take advantage of it. People don’t often think about insurance when they have a breach, but it’s something to consider.
Ryan: I think a very good point, Chris. I’d like to kind of echo and expand on a couple of those. One is that it’s not necessarily just an external party, and you kind of hit on this, but one of the big new things that happen are insider threats. People inside a company that are stealing data, exfiltrating in some way, that’s as much a risk as anything else. You’ve also got employed errors and negligence. This kind of ties into the HIPAA concerns you have there, other things that people could do that could accidentally expose data. This happens on a pretty regular basis, and that is still considered a data breach and something to think about. Intentional, unintentional, inside-outside, all over the place, and as you mentioned, the definitions are so broad-ranging state by state, jurisdiction by jurisdiction, that a lot of what we’re going to talk about today is not generalities, but we’re talking like a composite picture of what you do in a data beach, not necessarily any one particular case in any one jurisdiction but just some general rules that bring a lot of this stuff together. I guess, to sum up, this slide, and we’ll move on from here, it says at the bottom of the slide there that breaches are hugely impactful to companies. They cost money, reputational damage, time. A data breach can lead to a loss of customer base that can lead to a measurable loss in revenue. So it’s very significant. Not just the direct costs of dealing with the data breach, but the downstream costs your business costs as you move from there. So that’s great.
So move on to the next slide here, which is a little bit of discussion about the scope of data breaches. So on the screen are some numbers from 2017 over 1500 data breaches, 158 million social security numbers exposed. That was a big jump in the number of breaches from 2016. And you also see a number there, that’s the 3.86 million typical data breach costs. That is worldwide. The US is actually the most expensive place in the world to have a data breach and to deal with a data breach. The costs of dealing with that are much higher here. And between 2017 and 2018, it’s actually kind of interesting, the number of breaches actually went down slightly from 2017 to 18, but the number of PII records that were exposed was actually up pretty dramatically, like 126% up. Also, the cost of a breach is going up. I think there’s a lot more focus on it and more scrutiny about what you do post-breach.
Mark: Do you think that’s because hackers are getting smarter and more targeted in their attacks?
Ryan: I think that’s definitely the case. 2018 saw some very specific incidences. Marriott, and I am sorry if anybody out there is from Marriott, I’m not meaning to throw you under the bus, but Marriott data breach in 2018 had 383 million people worldwide affected by their breach with some serious information. Facebook had a pretty significant breach, and Google had two different breaches, each of over 50 million accounts. So those numbers add up very quickly for something that these large-scale things. I think they definitely are focused because everyone knows where the data is right now. Chris, what are you seeing in your practice?
Chris: I think it was said earlier, but we’re shifting our mentality from a mindset where we take measures to prevent an attack to measures to mitigate the attack, assuming it’s going to happen. And so the numbers are such that it’s irresponsible in some ways to not be thinking that you’re going to have a problem and to go ahead and take steps to plan for and deal with the remediation before it happens. And so I tell people regularly that I would rather talk to you before you have an incident than afterward because we can plan somethings and do some things and put some procedures in place so that if and where there is an incident, and there will be, then we’re not scrambling to figure out what we’re going to do. Everyone has a plan, and you follow it. And that substantially reduces the impact and the cost. There’s a lot of studies that have been done about the cost of a breach and how to mitigate that. And one of the big things is, it seems obvious but, to have a plan on what to do and know what to do when a breach occurs. And even it’s as simple as, do the people in your company understand what a breach is? If you were just to ask someone in your company – how would you know when there’s a data breach or a cyber-attack going on? Unless people have had training and prepared for that, you get a variety of answers. In some cases, you might have employees that are noticing things and not thinking that there’s a problem or not reporting them, and ultimately, it exacerbates the damage and the impact once someone finally figures out that you have a problem. That’s kind of what I’m seeing on the ground as things are happening, unfortunately.
Mark: It’s funny, some of what you just said rings true for eDiscovery. For the last 20 years, we’ve been talking about building a process and having a plan in place, and now here we are facing a new challenge – data breach discovery. The same rings true. Have a plan, have it in place, and be prepared for when it happens.
Ryan: Yeah, and we’re going to get to that a little bit later. I think that’s the core of what we’re talking about here, is that exact topic.
Chris: I wanted to add one thing on, I wanted to give a nod to all the information governance people if there are any that are listening here. If you’ve got a breach and you have been neglecting to get rid of all your data that you don’t need any more at your company, then that exacerbates the problem in terms of cost and in time. And so as best you can, there’s a good benefit to controlling your data so that you can reduce your eDiscovery cost by not having as much data that you need to collect, review, process, and produce. The same goes for data breaches. If you’re not having the large footprint, then it can substantially help you contain and deal with the problem once it happens.
Mark: Good points.
Ryan: So moving on because I think we’ve already had some discussion about some of the elements of what we have here on the screen, do you want to expand a little bit about some of the different types of private data that are out there, as well as revisit some of the disclosure requirements?
Chris: Our slide here talks about PII, PHI, FERPA, PCI, and GDPR. That’s a lot of different letters. The reality I see is that, unfortunately, companies don’t necessarily know what regulations they’re subject to before they have an incident. In an ideal world, you would know who you’re supposed to report to when you have an incident before you have that incident. You’d know what kind of regulations you’re subject to so you can plan for and address those things. The reality is that a lot of people, like PCI, is a good example, a lot of companies don’t know or focus on the fact that they are governed when they take credit card payments, they’re governed by a set of rules that the payment card industry came up with. The consequences of not complying with those rules are that credit card companies can charge you the business a lot more per transaction. In a lot of cases, depending on your margins of each sale that you make, it could be the difference in whether you’re making a profit or not. I know of small businesses that have gone out of business because they couldn’t afford to pay the higher payment card fees that were imposed on them after they didn’t comply with the rules. So I encourage anyone that is listening that doesn’t know what regulations they’re subject to, now is the time. Figure that out, and that way when it comes time, when you do have an incident, you don’t have to figure this out on the fly as you’re trying to deal with the other issues that we talked about earlier like the business interruptions, and the potential reputational harm. We’d rather be focusing on getting things back up and running than trying to figure out who is going to be showing up on your doorstep to start asking a bunch of questions.
Another thing I wanted to mention too before we move on is the regulators, and I think it’s fairly universal regardless of which jurisdiction you’re in. I see this regularly where a company will say – well, I don’t know how many people were affected by this incident. And I think companies are hoping that the regulators will say that’s okay! How about your best guess? Or just round it down and assume it’s not that bad. No. The reality is, and I’ve seen it, is that almost uniformly a regulator will say that you can’t prove which data and which customers were affected by an incident, and you have to assume that it’s everyone. That means you have to tell all those people. In many jurisdictions that 500 people is the threshold for whether you have to notify everybody. If that’s the deciding factor in most cases, it puts it over that 500 for sure. Whether it’s 501 or 5000, or 5 million, it’s a number that you have to assume is as big as possible. And that kind of takes this into what we’re going to talk about for the balance of this presentation. If you’re in a situation where you have to assume that all of your customers or all of the data that you had might have been accessed, how do you go about identifying who those people are and how to contact them? And so with that, let’s move to the next slide on data breaches.
Ryan: Yeah, Chris. Just to follow up, one of the things that I’ve seen with that whole well at what threshold do we notify? I find the clients have said, why don’t we just notify everyone. On the one hand, I’m sure that’s cheaper, but with a couple of downsides. You are opening yourself wide up to downstream litigation. Also, invariably there is data that almost every company has in their system that isn’t just their customers; that isn’t just their people. You get data from all kinds of strange places, and it would be rare to be able to say ‘just my customers.’ That is an answer, in my opinion, but it’s not necessarily a very good answer.
Chris: I’d like to ask back, well, what do you mean by everyone? It’s not just customers; how do you know who everyone is? And sometimes once you figure out who everyone is, maybe there’s a different opinion as to whether we have to contact all of those people.
Ryan: That’s exactly right. One other point here, and we’ll move on. We’ve talked about how the different laws, the different jurisdictions, etc. Right here on the screen, we have just five of the different types of private data. Well, as it turns out, your data breaches don’t happen just in each of those lanes. You don’t have just a breach for one thing or just a breach for another necessarily. It would be very rare that you don’t have at least a couple of those things. I mean, I’ve almost always seen even in the PII project, there’s PHI in there. We’re moving towards a brand new world as of last year with GDPR, so it’s a complicated inter-woven and inter-connected web of problems here that we’re trying to address.
With that, let’s move on to the response. So the term data breach response is pretty commonly used, and pretty commonly known. Maybe it’s not particularly defined. We usually think about that in terms of the actual investigation. What happened? How did it happen? How are we going to fix it? What was exposed? It also gets into proactive planning and investigations. And then there’s this gap of what happens in the middle, and then you notify everybody. This struck me for at least a couple of years now where it’s an area of magically you get all these names, and you know how to notify me. I’m sure almost everyone that’s on this call, including you, including me, we’ve all got that letter from somebody that said apologies, but your data was exposed. How did they go to that point? That’s what we’re going to get to here in a couple of minutes. Almost every breach, verified breach of any size, is going to require some sort of notification. Whether a notification to law enforcement, state authorities, regulatory body, or to the individuals. This is happening all day every day, and we will focus on the US mostly, but it’s in every state, it’s everywhere. This is a very big, broad, wide-ranging problem. Chris, do you work with, I mean we’re going to talk today about how we get that data and find those people, but do you work with the companies that actually do the incident response type things? And we will touch on this and then move on, but do you have experience working with those kinds of companies?
Chris: Yeah, there are several. I won’t go into individual names, and I don’t want to promote one over the other. But there are a couple that dominates the industry. They do a very good job, but they’re very expensive. They come in with a team of people, and their hourly rates are that each member of their team typically exceeds my hourly rate, and they’re there for weeks. And so that’s an area where cyber insurance comes in. If you have insurance to pay for those kinds of efforts, that can be huge, that can save your company. You have to do those things and remediate them. If your insurance company is paying for it rather than the company having to pay for it, that can be the difference between surviving and not. The other thing too that comes up as we’re in this magical area here is how quickly do we need to tell the people that we have to tell? The time frames are getting a lot shorter. HIPAA has a 60 day notification time frame. They have a list of all the breaches that have been reported to OCR. You can go on list and sort it by your space, and you can search by name – we used to call it the wall of shame. But that has a 60-day window. The Florida data breach notification rules have a 30-day time frame. GDPR has, I believe, 72 hours before you have to tell regulators, maybe not everyone at the 72 hour mark, but you’re making disclosures 72 hours after the first time you realize you have an incident. Seventy-two hours, normally the hair is still one fire, and you’re running around trying to recover. It’s not the time you want to go and make disclosures to regulators, and then they show up and start asking more questions while you’re still trying to get back on your feet and manage client relationships and restore order.
Ryan: That’s right on the money.
Chris: The key I found is that we just talked about a variety of people and things that have to get done, you have to have at least a plan and a list of people that are one your call list. Make it a list that you don’t store on your computer that you’re not going to be able to get to when you have ransomware attached. If you print one thing, print your data breach response plan out and have it in a hard copy somewhere so that you can know what to do when you can’t get to your computer. Have that team in place, figure out who those people are, and have that set up before you have an incident.
Ryan: Yeah, that response time has always been fast, and it gets faster, and it’s not going backward. If anything, it’s going to be moving towards more and tighter and tighter requirements on that time frame. So Chris, let’s move on. We’ve led up to this, so let’s just lay it out right here. What’s the challenge we’re facing here? We will talk about structure data, i.e. data that is in databases that are exposed. The real problem, as I see, is a huge amount of loose email files that can often be exposed. You have an email server that’s exposed with attachments, it gets very big, very quick. We’re stating this in general terms, and there are almost infinite variations on what can be depending on location and whatnot, the challenges. We have this huge amount of unstructured data, we have to quantify the issue for law enforcement and regulatory reporting, and then most likely notification list for the affected individuals. It’s a huge thing. I have a stat up here, and this is just my kind of back of the envelope calculation. In my mind, 25 email boxes, standard email boxes that people use every day, could yield as many as one million documents. That sounds like a lot. When you’re sitting down at the computer to try and get through a million documents, that’s a daunting number and a big and really expensive number.
Chris: Some of these documents can have multiple people’s PII on them, so it’s not like a typical eDiscovery type of process. It’s very specialized and requires special processes.
Ryan: That’s right, and we’ll get into that in two more slides, we’re going to get to the actual workflow. But you know, there are some commonalities with the eDiscovery workflow, and we certainly leverage a lot of that, but it’s a very different process. If the company has never been through that, it’s a task that is terrifying. Just like Chris said, have a plan because the time to invent and write the plan is not when you have a breach and all the notification and everything else hanging over your head.
Mark: Like changing a tire at a hundred miles per hour.
Ryan: That’s exactly right. Chris, anything from you there?
Chris: I do think that there’s, for the eDiscovery people that aren’t data security people or even vice versa, we’re at the part here when there is a decent amount of overlap. You have an incident that is causing you to have to look at all the data that you’ve got to figure out, depending on whether you’re trying to find names and contact information or whether you’re trying to find documents or things that are related to a lawsuit, it’s a specialized form of that when you’re dealing with data security. But the process, I think, is something that is easier for people that have done eDiscovery before to understand and do. It’s harder for the data security folks that aren’t used to the post and put the fire out and get back to order. It’s harder for them to understand what comes next. So if you’re an eDiscovery person, I think you’re in a good place to learn what we’re about to talk about.
Ryan: With that, let’s move on to the solution and start talking about how we actually do this. I think that’s what the meat of this is about. We call the solution data breach discovery. It’s a data breach, but it’s not the investigation or the notification, it’s the bridge process. So data breach discovery quick, efficient, cost spectra review of all the stuff that we’ve been talking about. How do we do this? What are the keys to success? I have some bullet points up on the slide, so eDiscovery principles and allowed tools in. So you know we do almost all of these. If we’re talking about unstructured data, we’re going to go through that eDiscovery data processing deduplication process. We very likely will use keywords to a certain extent, and we’re almost certainly going to leverage a review platform. A lot of options out there, I think probably everybody on this knows what the market leader is, and that’s probably what a lot of people are using. But all those eDiscovery tools are the foundation of all this. Next, advanced text analytics. Such things like named entity recognition, finding names, finding different pieces of information that might be a pointer to PII, using regular expression searching to look for things like social security numbers, to a certain extent clustering. It does certainly work, especially if you’re in the early stage of the exploratory process. Streamline workflows, and we’ve kind of talked about that, this goes in with the have a plan. Even just this part of this is incredibly complicated, and you need to have a plan going into it. What is step one? What is step two? What is step three? You need to be flexible with it because if at the moment you’re too rigid, your system is going to break. You need to be flexible, but having a plan going in is so important. Database analysis, few things come into play. One is you might get a structured data source that was breached, and you might get a database. In my mind that’s relatively easy to deal with because you already have the data, it’s searchable and queryable, you can see it. You might be able to get from your client a list of names, and we talked earlier that it would be great to get a customer list, the employee list from HR, things like that. You can either use as a base, as a QAQC backstop, saying hey I found these people, but do they actually exist? Things like that. And then at the end, the deduplication process. You’re going to get a lot of duplicate individuals, and that process at the end of deduplicating that is largely going to rely on the ability to do some pretty good database analysis. And then finally, Chris, if you would weigh in on this, the detailed understanding of privacy issues. We’ve talked a lot about it, but put a bow on that for us, would you?
Chris: Sure. I think that the key is that you have to understand enough about what you’re going to tell people before you tell them. I see this often. I think Equifax had a series of problems with this where they’re saying things and telling people things, and it turns out to be different or only part of the full picture. I think you’ve got to go through and do all these things, come up with that, deduplicated good quality final product before you can then intelligently interact with the regulators and with customers and with people that are being notified, even shareholders and owners of the company. If you start saying things and then having to change your tune, especially for regulators, their impression on whether you’re doing or did a good job, or did what you could before the incident happened, that perspective goes negative, and you don’t want that to happen so it is critical that all these steps take place and get done correctly so that you’ve got the best information that you can provide. In much the same way in an eDiscovery case, if you produced a bunch of junk to the other side ad hadn’t really done a good job going through it an eliminating things that weren’t relevant, or duplicate, or things that were corrupted or password-protected, if you give someone junk, they’re going to start thinking that you didn’t do a good job or didn’t know what you were doing or trying to hide something. With a data breach situation, you don’t want the regulators thinking that you’re trying to hide something.
Ryan: Good. I normally wouldn’t read a bullet, but I want to read this one because I think it’s important: critical success factor – company, counsel, and consultants working together as a team. This is difficult. I mean, it’s a high-stress situation. There are different priorities, even within the company. The CIO might have a different priority than the CEO, or the chief marketing officer may be thinking about messaging. So there are a lot of different things that are pulling people in different directions, but it’s really important to the overall successes to get everybody going in the same direction and also to have someone in charge, someone that you can ultimately look at and say we can debate this, and at the end of this someone is going to make a decision. If there’s one thing we’ve talked about over and over, there’s no single answer to this. There’s no bright and shining path that you can walk down without fail, and having someone that is going to ultimately make the call on how to address these issues is absolutely vital.
Chris: I completely agree with you on having someone in charge. I think part of what makes a good person in charge is making sure there’s communication between the different groups that are working together, so they understand how they fit within the bigger picture and what new things are happening. I think a lot of times, people are in the dark about what other people are doing, and that can lead to delays or confusion, or like I mentioned before, things not being accurate or incomplete. And that’s the worst kind of scenario. So sometimes having a PR firm or a company that’s experienced with messaging, even if it’s internal messaging depending on the scope of the project, can be a really helpful tool to make sure everybody is doing the right thing in the right direction at the same time.
Ryan: That’s a great point. So very briefly, look at this slide. So this is how we conceptualize this process, the data rediscovery process, at the very highest level. It’s kind of a simple mnemonic – detect, assess, capture. Detect the breach, find the data, assess the data, capture the data. I don’t need to spend a whole lot of time because we really want to spend kind of the rest of our time here talking about this, which is our detailed workflow slide. So there is a lot going on here. Let me just kind of set this up really quickly. The upper area in the dark green is the incident response. These are the firms we were talking about before that are doing the actual investigation into the incident and evaluating the results. They are the ones that are probably going to make the call on whether there was actually a breach. They may actually be doing the collection of those compromised data sources. I had mentioned before, 25 mailboxes might have a file server, maybe a database. They’re likely the ones that would collect that. There’s kind of a cross over point therefrom that to the ‘data breach discovery’ part of it. So it could be that the more downstream you fall in, but let’s just say it’s there, we just really want to focus on this lower part in the light green. So within that lower part of the data breach discovery section, the upper path of the workflow is database sources. As I mentioned, I personally think that dealing with databases is a lot easier. I come from a database background for a long time, so it’s easy to deal with. You can see it, search it, query it. There shouldn’t be anything that you can’t really find. So just real quick, you get the source, assess that source, you normalize the data, so you make sure you’re looking at the right things, and you extract key names and data from it. You have to make the determination at that point, of course – are we looking for PII? Are we looking for PHI? What exactly is that criteria? I think it’s a fairly straight forward process. Sorry Chris, go ahead.
Chris: Just to add one thing. I think it’s certainly key that if you have a database, you have to have the people there that know how to operate it. I’ve seen it in bankruptcy contacts where the people are gone, and even though you have good data, you don’t have someone that understands how that data works or where it comes from, and what should be a relatively easy task becomes increasingly difficult. So you want to make sure you have the right personnel to deal with those databases. The other thing too is that, and we see this in eDiscovery too, database files don’t make for a good paper exhibit. If you’ve ever tried to print and convert n Excel spreadsheet to paper, database files are like that on steroids. If you don’t think about what you really want at the end of the day and make sure you’re doing your database queries in a way that gets you what you want, you can end up with something that isn’t all that helpful. And even if you get what you want, I feel like that can be difficult to explain to people that aren’t like you or I and that don’t understand databases and don’t understand where the data came from. If you can’t show them a piece of paper that has what you’re looking for, then that causes confusion or concern. And so it’s very important to have the right people to help you deal with that data because it’s so powerful. It is easier to get what you’re looking for, but the devil is in the details. I just see it as I have all the data, and it’s just sitting there, and I’m having a hard time turning it into something that can be easily presented to a judge or a regulator to explain that situation.
Ryan: Yeah, I’ve sadly been through numerous projects in my past where somebody insisted that we print out a database. I feel like you may be destroyed the better part of the North American forest and definitely use the better part of an office to store all of it. So you are absolutely right. What’s the end game? What’s the end goal? Let’s do our analysis in that direction and certainly have the right technical resources and the right subject matter and expertise coming together to do this process.
Chris: One other thing Ryan. Sometimes a little bit of dialogue with the person that you’re giving them information to can be helpful in that regard. We’ve found that if we go to whoever we’re producing that information for and say, what are you looking for? What do you need? If you begin with the end in mind, you can go back and get what they’re looking or, and you don’t end up producing something that isn’t what was wanted. You don’t’ waste time getting information and wondering if it is helpful or potentially harmful depending on what you have to produce.
Ryan: I think it’s kind of analogous to a meet and confer. You know, get on the same page, don’t waste a lot of time because it is the client’s money that you’re wasting at the end of the day. The purpose isn’t to do this for no reason, and the purpose is to realistically notify the individuals that had their data compromised and keep that in mind and work toward that end. If you can do it cooperatively, I think that is certainly a positive. So let’s talk about that lower branch here of the workflow, which is unstructured data. We’re talking about email and office files, and it’s broader than that as well. You could have PDFs, scanned PDFs or PDFs that are saved and are searchable, you can have pictures, you can have diagrams, and all matters of documents that can get emailed around depending on what the client is. So getting an understanding of that, and it could really vary.
Step one is process and filter. We talked about this before; it’s really an eDiscovery principle. The whole concept is that you’ve got a funnel and you start with a bunch of data going into the top of the funnel, and you have to bottom that funnel to be as small as possible in a defensible and reasonable manner because at the end of the day humans are going to be putting their eyes on this. Humans are going to be documenting what you find and typing in the names and different PI elements or privacy elements. That is generally the most expensive part of this process. It’s the same thing that’s really always been the case in eDiscovery. Human beings sitting in front of a computer screen, revealing document after document and typing in information is by far the most expensive part of this process. So we do whatever we can to make that population be smaller in a defensible manner, but also give them tools to do that in a good way. So one of the things that we do is the assessment or the prospecting process – that middle chevron box on the lower branch there. We want to understand and organize our documents. What do I have? Do I have 500 page PDFs that are semi-searchable and not searchable? Do I have a large population of things that are not really relevant and couldn’t possibly work? Do I have a lot of other non-searchable documents like photographs? I’ve almost always seen that passport or driver’s license photos show up pretty regularly, and doesn’t matter what document population. Generally, they don’t know CR properly. You can’t tell anything, and you won’t get a keyword hit out of that. So understanding what those documents look like using analytics. We had the legal type show legally drop here in New York, and everywhere you went, what were they talking about? AI and analytics still. Although the predictive coding, TAR, whatever you want to call it, process it doesn’t really work for this because you’re not trying to classify documents but trying to identify specific data. There are a lot of analytics elements that can be used at this phase. I talked earlier about regular expression searching, but one of the big things that is part of this process is understanding where the traps. If you run a keyword search, what if you have handwritten documents and didn’t OCR well? What if you have multi-generation photocopies where you it looks like its okay, but it’s not a good enough quality or good enough results? If you’re doing an expression search for a social security number, you have to be pretty precise. It interprets one of those letters as a dollar sign, and suddenly you’re not going to find a hit. So designing different ways to get at that data. And if there’s really a period of time where you have humans that are assessing, before you turn it over to the reviewers, you have humans that are assessing those documents to understand, to cull, and most importantly, to organize them appropriately before you turn them over to those endpoint reviewers who are actually going to do that work.
Chris: Ryan, can I make a point right there?
Ryan: You may.
Chris: I sometimes think that it’s so obvious that we don’t think about it, but you just described how you’re going to have a team of people looking through different documents. In the normal eDiscovery manner, you might have something that is sensitive and some that is not as sensitive, but in this case, we’re dealing with something that is entirely sensitive. Every single document, ideally that you are looking through, has something that is confidential in one or more ways. So it’s critical that the people that are going to be doing these reviews not only sign an agreement and agree that they’re not going to take what they look at and learn and do something with it, but also you make that clear upfront. You’re going to be looking at important and sensitive data here, so setting that tone and that expectation, and then in some ways, you have to monitor people as well to make sure that even the reviewers are being careful. You don’t want to have a breach while you’re trying to mitigate and figure out who you have to notify about the breach. I just mentioned that because I think it gets assumed or dealt with in a cursory kind of way, and I really think it helps to set that expectation for all the team members upfront that we’re really trying to protect things here and not make it worse. If you’re not committed to that, we don’t want you on our team.
The other thing I wanted to mention, you talked a little bit Ryan about photos. You’re right. Photos like passports, driver’s licenses, but also in the medical industry, there are a lot of pictures that have health information in them. It can be difficult to try and extract data from a photograph or even more difficult from a video. Sometimes there’s text on those records and also a picture, so there’s a combination of different pieces of information. You really have to understand different data types and maybe deal with them in different ways. A couple of other problem areas are password-protected files and encrypted files. That would be some sort of error report that you’re getting that identifies the files you couldn’t get to. If they were password protected and encrypted, maybe that means they weren’t breached right, and you can rule those out. But you have to know that that happened or exists. The other thing is the non-English documents. If you’re searching for the things only in the English language and you get records in Spanish or any other language, you could potentially be overlooking what you’re looking for and only getting part of the population by only searching in English. I know there’s great tools that do all of that, kind of normalize the database and language that you’re using to search. In some cases, a better solution is if you realize you have documents in a different language, maybe getting a review team that speaks that language is a better way to have a more accurate understanding of the data you’re looking at.
Ryan: Yeah, definitely, that is the case. I think to sum all of that up, and the bottom line is that keywords alone are not the answer to this problem. They’re a partial answer and guideline, but there is a lot more to this than just that.
Mark: Absolutely, it’s a multi-pronged approach. The one point I was going to make about Chris’ comment about password-protected files is that a password-protected file if you’re a bad actor, that’s a flag that could be the Holy Grail inside there. A computer forensics company like BIA can crack passwords, and it’s not something that is uncommon in our industry, so don’t think because it’s a password-protected file, you can put it aside. This is also a good opportunity for the lawyers and corporations on this webinar to understand that there is an opportunity to consult with your custodians and employees. I’ve been at BIA for 12 years now, and I’m one of those people that have personal things in my corporate emails. Purchases of land, apartments, back and forth with medical and doctors. I’m one of those people that if my work email should ever get compromised, there’s a treasure trove there. That’s not an open invitation for any bad actors to hack my email! But more and more so especially if you have a longevity at a company, our emails are not just corporations. It’s not just a personal stuff; it’s not just corporate stuff; it’s all part of the puzzle. Ryan drew a bullseye and put me right in the middle of it, thanks for that. Mr. Snowden, if you’re out there, please have mercy.
Ryan: It’s an excellent point. The stuff you find in your own and other email boxes, just read the news, it’s amazing what you find in email. Okay. So let’s talk about capture. So, Chris, you made a great point a minute ago. Everyone that works in these things has to be part of that team and with the understanding that this is very sensitive information. Hopefully, you can trust your team. A lot of time though it’s contract reviewers that may be doing this, so you’re absolutely right, I think there’s probably an equal or greater degree of sensitivity in this kind of matter. I think that the kind of things I’ve seen, people are not allowed to have cell phones at their work station. USB is disabled, and all these kinds of things to keep things private. So when they’re doing that, we use the capture, and we’re actually getting the data now. We want to focus on quality. This is a fairly mundane task, people having to go through these documents and typing the information. This is different than eDiscovery review as people actually have to capture names, social security numbers, and all kinds of things. One letter typed wrong leads to a problem downstream. Fat fingering or just a lack of attention. So focus on quality, viewer metrics are important. If people are going too slow, they’re digging in too much, if they’re going too fast, they’re not focusing well enough or maybe are just a super reviewer, and you want to clone them. All these things are things to pay attention to.
Also, we talked about the end goal before. It’s really imperative to work with the company, and especially with counsel, to understand what data is important on this particular matter. What are we going to capture? How are we going to capture it? If you jump into a matter and just go ‘oh I’m going to create these fields that we’re going to capture’ and counsel isn’t part of that, you may realize 50 thousand documents in that you missed key information and need to double back and rework and you probably don’t have the time or the budget to do that. So these are all things that are really important. I think it was mentioned earlier that one of the big challenges here is that you might have multiple individuals on a document listing people with social security numbers, very common in emails or in a document, and planning on how to get those.
So the last thing, deduplication. No matter what you do, I would say it is; I’m going to go out on a limb and say that it is impossible to get through this process and get a unique list at the end of the data capture, there’s going to be a deduplication process. Data comes from two different points – different documents have you know John Smith and John Smith with social security and John smith with a username and password and John Smith with an account number. How do you pull all those together? It’s a complicated process. You need thinkers, you need people that have database skills, and you need to work with the client and understand the plan going in and what we’re going to do. At the end, hopefully, you have a unique, I should probably put quotes there, but a list of affected individuals that goes to company and counsel, and at that point, it’s up to them to do the actual notification. Wash your hands, and your job is done.
Mark: So look, we’ve got a few minutes left here and so much information. Both you and Chris, thank you so much. A couple of questions that I think are relevant and important and I think will be beneficial for everybody. First one is, what happens if you do this wrong?
Chris: Well, eventually, the people that you’ve produced the information to, people will figure it out. And so it’s important to make sure you get it right, but also if there’s things you aren’t sure of, you should make that clear in your representation whether it’s to customers or regulators. I always stress to the clients that it seems obvious, but we need to be accurate with what we’re saying. If we can’t be accurate, we should explain what it is that we don’t know. I think that will go a long way. If you do have something that turns out to be totally different or wrong, if you weren’t so adamant about being 100% right and knowing everything, then it gives you a little bit of grace if you have to go back and correct or adjust what you said before. Does that answer your question?
Mark: It does, yeah. We have a couple minutes left here. Like I said, thank you, everybody. We’re going to do a wrap up with some final thoughts. There were a lot of questions that were submitted, but unfortunately, we couldn’t get to most of them. We will reply to those and post them on the BIA blog and make sure everybody has a copy of those. Ryan, any final thoughts before we conclude here?
Ryan: Yeah. Just really quick, got three points just to reiterate. Teamwork and cooperation are vital. Communication between and amongst the team and clarity in that communication is absolutely mandatory. And then having a planned workflow but having some flexibility in that workflow will predispose you towards success.
Mark: Excellent. Chris, any final thoughts on your end?
Chris: Yeah. I think we used to say an ounce of prevention is worth a pound of cure. I think we’re past the point where we can prevent a breach from happening, it’s inevitable. I like to say now that an ounce of preparation is worth a pound of cure. The more you can prepare and understand what you’ve got and what you’re dealing with before it happens, and even simulate that through tabletop exercises or other things, then the better you can be. The more prepared, the easier it is to deal with when the data breach occurs.
Mark: Excellent. So we’re going to end our data breach webinar on time. I would like to thank ACEDS and every single one of you for attending and being part of this today. Thanks to Ryan Bilbrey, thank you to Chris Dix. And we will see you, hopefully, not when you have your next data breach, but certainly on our next webinar. Thanks for attending everybody, have a great day.