HaystackID® Acquires Business Intelligence Associates, Inc.Read More

6 Data Collection Inefficiencies That Waste Time & Money

6 Data Collection Inefficiencies That Waste Time and Money

To Fix Your Inefficient Data Collection Process, Stop Doing These 6 Things.

Inefficient data collection practices can slow your progress and send your eDiscovery costs soaring. Even worse—if data is not collected in a defensible manner, you may have to repeat the entire process or face sanctions if the original data is no longer retrievable with its original metadata intact.

So, how do you defensibly and efficiently collect all potentially responsive data, protect yourself against wasted costs or sanctions, and not break the bank? Start by avoiding these 6 data collection inefficiencies:

1. Stop over-collecting data.

Over-collection is a big inefficiency trap. Not all data you preserve gets collected—nor should it. As discussed in our Inefficient Data Preservation blog, we recommend casting a wide net and preserving any data that is potentially relevant or responsive to a given legal matter. But just because you take steps to preserve everything doesn’t necessarily mean you will collect all that data on hold.

Once you’ve acted to preserve data (by issuing legal holds, turning on automated preservation functions and/or disabling system auto deletion options), try to focus collections to a subset of that data. Here’s where an eDiscovery expert and a truly effective Meet & Confer process really come into play; both will help you narrow down what you truly need to collect for your case. Depending on the demands of your case, you might end up collecting some, all, or none of the data you initially preserved. 

Once you’ve more narrowly defined the data to collect, use targeted data collection methods that allow you to filter your collection efforts by the sources, data types, primary custodians, date ranges and other aspects that you’ve worked out during those discussions and negotiations. That will help to drastically control the data size actually collected (and thereby the costs) throughout your litigation. Indeed, while some consider those initial conferences onerous, quite the opposite is true. Used correctly, those negotiations will help you control the entire eDiscovery budget from the very outset.


If opposing counsel tries to talk you out of narrowing your collection by warning against potential missing data, don’t fall for it! Your preservation steps will remain in place regardless. If the data collection needs to change as the legal matter develops, you can easily go back and collect the data you need using those targeted data collection techniques. It’s a delicate balance, but as long as you don’t remove the preservation measures, and you’re careful to collect any data sources that might be fragile, targeted data collections are the single most effective way to limit your overall costs, timelines and efforts from the very outset.

2. Stop collecting data based on search terms.

While narrowing your data collections, start by focusing on primary custodians, data types, primary sources and date ranges. These are good methods to reduce the data you collect from the outset.  It’s not a great idea to collect based on search terms – especially not very complex search terms. There are several reasons for that recommendation.

First, many enterprise systems simply don’t do well with search upon collection (see Greg Buckle’s discussion of M365 search issues for a good discussion of this issue: M365 eDiscovery Search Alert – eDiscovery Journal). Sure, some enterprise systems are better than others, but unless you’re incredibly certain – and willing to put your IT team on the stand to back that up – it’s best not to rely on those systems. (Better to wait until proof of their efficacy is better documented and more widely accepted.)

Second, and more importantly, if you use search terms as a basis for collection, you will have to repeat that process and re-collect data each time search terms change (and you know they will). That quickly becomes inefficient, expensive, and burdensome to your IT teams. So, we recommend not using search terms at the point of collection (or if you do, that you use very broad terms). You can always run more complex searches and work with opposing to tweak search terms based on actual search results once the collected data is initially ingested. It’s much more effective and efficient (both in terms of timing and costs) to do that more granular searching and culling after collections.

3. Stop relying on self-collections.

To re-phrase a common refrain… JUST DON’T DO IT.  Of the many reasons for this, here are our top three:

  • Self-collections will never have the defensibility, let alone the reliability or accountability that is essential in eDiscovery and for ultimate admissibility of any of that data. With custodian self-collections, you will almost NEVER know where the data came from – which computer, server, application, folder structure, etc.
  • Nearly all courts view custodian self-collection very poorly. Courts actively discourage such efforts with a primary assumption that custodians will not collect any data that reflects badly on themselves, others, or their organization.
  • You can be nearly certain that the metadata will not be preserved. This is simply because custodians don’t know how to find and collect data properly. When they simply drag and drop files somewhere, critical metadata will be destroyed during that self-collection. Intentional or not, this taints your entire collection process. Identifying/documenting all data collection sources and properly preserving all metadata are critical aspects of any defensible data collection process. Lacking these steps you may find it impossible to lay a foundation for the admissibility of that evidence.

The bottom line: Ask custodians about data as much as you can (CQ is your best friend), but don’t let people collect their own data. Asking custodians to go it alone or use their own efforts is a losing strategy no matter how it goes.


A tool like our DiscoveryBOT solution allows for the involvement of custodians in the process, but it is designed to be fully defensible and to collect certain data regardless of the custodian’s actions.  In other words, there’s nothing wrong with using proper data collection tools to get input from the custodian – as long as those tools don’t rely solely on the custodian to identify and collect data in a forensically sound manner.

4. Stop collecting data without leveraging Custodian Questionnaires.

A well-formed custodian questionnaire (CQ) goes a long way toward keeping your data collection process defensible and effective. First, a good questionnaire helps you target important sections of resources you’ve already identified. For example, you’ll learn from general discussions with the IT team which data storage locations (be that servers or cloud storage solutions) the identified custodians access and routinely use. What the CQs reveal are which sections (folders, groups, etc.) of those storage assets are relevant to the legal matter. Having that information enables you to target data collection efforts on those larger resources to just the most relevant portions. You’d be surprised how much that can reduce your overall data collection sizes.

Don’t assume IT knows where all the potentially responsive or relevant data is stored anymore – those days are over. Maybe a certain team uses Slack or some other online or cloud-based system unbeknownst to anyone outside the team, including IT. Unless you talk to the custodians themselves, you may never uncover clearly relevant and critical data sources. This is even more the case in COVID era, where many more digital chats and data sources exist in WFH and other remote work situations. The CQs are about making sure you don’t miss any potential data sources, and their effectiveness in doing so should not be underestimated.

In a perfect world you will have gathered intel from custodian interviews and questionnaires BEFORE any data collection efforts really begin. However, sometimes your case timeline will necessitate collecting data before you get the CQs back. (Keep in mind that it takes people time to complete CQs, and you can’t always sit around and wait.)

Even before CQs are completed, you can start collecting data from clearly relevant systems and resources like email. While you’re collecting and processing email, you can use and re-use the CQs to help define and narrow data collections from all other resources.

5. Stop doing physical, onsite data collections.

In today’s modern, often cloud-based computing environments, fewer and fewer situations require any data collections onsite. Even when our experts collect from physical laptops, desktops, and servers, they often use remote data collection tools that complete those tasks in a fraction of the time – and at a fraction of the cost – required by the old-school, boots-on-the-ground methods. Many of those solutions operate fully remotely and securely transfer all collected data in a fully encrypted manner over the internet. (No need to ship hard drives anymore.) If you’re still doing collections on-site, or sending out hard drive-based collection kits through the mail, this data collection inefficiency is costing you valuable time and money.

In fact, the best way to collect email is not from a custodian’s individual computer, but rather from a cloud resource or centralized server. Collecting from the source – like M365 (a.k.a. O365) – is the more definitive and complete method, as many such systems only store a portion of a custodian’s email on the local machine. A custodian’s device might keep only the past year’s email, whereas as the cloud or server hosts the entire email account.

The same is true for organizations that utilize One Drive (or similar equivalents), as all the data is synchronized and stored in easily accessible cloud resources. Sometimes this eliminates entirely the need to collect from individual custodian’s computers at all.

Truly remote collections are fully defensible and much more efficient than onsite collections or even putting hard drives in the mail. Leveraging these modern, forensically sound data collection techniques and tools does more than save time and money. It minimizes business disruptions to the organization as a whole and to custodians’ individual daily routines. 


Don’t forget to ask custodians if they have email archives or other data on their computers that are not synchronized or part of the cloud systems. Such practices are becoming less and less common as businesses move to O365 and other such centralized, cloud bases solutions, but it’s still an important area to consider. Yet, even if they have such local data that needs to be collected, that data can still be collected in a fully remote and defensible manner.

6. Stop doing full forensic imaging!

Full forensic imaging is nearly always overkill. These days it is much more efficient to collect targeted data. Full forensic imaging is the best way to vastly – and we mean vastly – increase costs. You only need full drive imaging if there are concerns about data deletion, criminal acts, and the like. Otherwise, a full forensic image only creates exponentially larger amounts of data – data that you will have to pay someone to store, process and cull. If you are still doing full forensic imaging for routine legal matters, unfortunately you are checking all the data collection inefficiency boxes. You’re wasting incredible amounts of time and money while creating substantial business and custodian disruptions (that are entirely avoidable), all for nothing. 

Defensible Data Collections

The most important element of any data collection effort is defensibility. If opposing parties or their eDiscovery experts can poke holes in your data collection methods, then your evidence – and thereby your case – can quickly crumble.

Think of your favorite detectives and forensics experts on CSI, SVU, Dexter or The Wire, with their gloves and tweezers and measuring tapes. Their job is to handle physical evidence carefully so that it doesn’t get changed or tampered with. The same care is required for digital evidence. You must be able to prove that you’ve maintained evidentiary integrity from the initial touch to final production. Otherwise very bad things (very expensive things) can happen.

If you’ve ever tried to tackle data collections ad hoc, putting out fire after whack-a-mole fire, or experienced data integrity or admissibility issues, you already know how bad it can get. Don’t let the desire to save a few pennies end up costing you real dollars – or worse – your entire case. If you follow the steps above – and take some time to consult an eDiscovery expert early in your process, you can avoid immense costs, burdens and frustrations later. We invite you to reach out today for lasting, defensible solutions to the data collection inefficiencies thwarting your eDiscovery process.