Hash values act as a document’s unique digital fingerprint.
A hash value is a unique identifier that results from applying a checksum to a file or data set and most industry usages follow the NIST policy and guidelines for using hash values in computer forensics and cybersecurity. In the legal technology field, hash values act as a digital fingerprint for each document or email message in the data set. Digital Evidence, like any other type of evidence, requires identification, collection, chain-of-custody, examination, analysis and authentication usually during a legal proceeding.
How do Hash Values relate to eDiscovery?
In eDiscovery, several checksums have become standard in order to identify duplicate documents such as a spreadsheet, word processing files and emails. By running data through a hash process such as by using an MD-5 or SHA-1 checksum calculation, a unique identifier is established, stored and used throughout the document lifecycle during litigation, regulatory proceedings or any other legal matter where that digital evidence is relevant.
In eDiscovery, the way in which hash values have been used over time has created industry-wide challenges as many times has values are calculated or enhanced or embellished with other data making it’s universality difficult. For example, every ESI Processing platform has a unique formula for hashing emails. If you were to process the exact same email in LAW, Nuix, Relativity Processing and TotalDiscovery, for example, and you’d get different hash values from each, even if they all used MD5 or SHA-1.