Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees,
Lillis, David; Breitinger, Frank; Scanlon, Mark
Publication Date: November 2018
Publication Name: Digital Forensics and Cyber Crime. ICDF2C 2017,
Abstract: Perhaps the most common task encountered by digital forensic investigators consists of searching through a seized device for pertinent data. Frequently, an investigator will be in possession of a collection of “known-illegal” files (e.g. a collection of child pornographic images) and will seek to find whether copies of these are stored on the seized drive. Traditional hash matching techniques can efficiently find files that precisely match. However, these will fail in the case of merged files, embedded files, partial files, or if a file has been changed in any way. In recent years, approximate matching algorithms have shown significant promise in the detection of files that have a high bytewise similarity. This paper focuses on MRSH-v2. A number of experiments were conducted using Hierarchical Bloom Filter Trees to dramatically reduce the quantity of pairwise comparisons that must be made between known-illegal files and files on the seized disk. The experiments demonstrate substantial speed gains over the original MRSH-v2, while maintaining effectiveness.,
Download Paper:
BibTeX Entry:
@incollection{lillis2018bloomfiltertrees,
author = {Lillis, David and Breitinger, Frank and Scanlon, Mark},
editor = {Matou{\v{s}}ek, Petr and Schmiedecker, Martin},
title = {{Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees}},
booktitle = {Digital Forensics and Cyber Crime. ICDF2C 2017},
year = 2018,
publisher = {Springer},
pages = {144--157},
isbn = {978-3-319-73697-6},
doi = {10.1007/978-3-319-73697-6_11},
series = {Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering},
volume = {216},
abstract = {Perhaps the most common task encountered by digital forensic investigators consists of searching through a seized device for pertinent data. Frequently, an investigator will be in possession of a collection of ``known-illegal'' files (e.g. a collection of child pornographic images) and will seek to find whether copies of these are stored on the seized drive. Traditional hash matching techniques can efficiently find files that precisely match. However, these will fail in the case of merged files, embedded files, partial files, or if a file has been changed in any way. In recent years, approximate matching algorithms have shown significant promise in the detection of files that have a high bytewise similarity. This paper focuses on MRSH-v2. A number of experiments were conducted using Hierarchical Bloom Filter Trees to dramatically reduce the quantity of pairwise comparisons that must be made between known-illegal files and files on the seized disk. The experiments demonstrate substantial speed gains over the original MRSH-v2, while maintaining effectiveness.},
}