Fine-Tuning Large Language Models for Digital Forensics: Case Study and General Recommendations
Michelet, Gaëtan; Henseler, Hans; van Beek, Harm; Scanlon, Mark; Breitinger, Frank
Publication Date: July 2025
Publication Name: ACM Digital Threats: Research and Practice
Abstract: Large language models (LLMs) have rapidly gained popularity in various fields, including digital forensics (DF), where they offer the potential to accelerate investigative processes. Although several studies have explored LLMs for tasks such as evidence identification, artifact analysis, and report writing, fine-tuning models for specific forensic applications remains underexplored. This paper addresses this gap by proposing recommendations for fine-tuning LLMs tailored to digital forensics tasks. A case study on chat summarization is presented to showcase the applicability of the recommendations, where we evaluate multiple fine-tuned models to assess their performance. The study concludes with sharing the lessons learned from the case study.
Download Paper:
BibTeX Entry:
@article{Michelet2025Fine-TuningLLMDF,
title = {Fine-Tuning Large Language Models for Digital Forensics: Case Study and General Recommendations},
journal = {ACM Digital Threats: Research and Practice},
volume = {},
pages = {3748264},
month = 07,
year = {2025},
issn = {2576-5337},
doi = {https://doi.org/10.1145/3748264},
author = {Michelet, Ga\"{e}tan and Henseler, Hans and van Beek, Harm and Scanlon, Mark and Breitinger, Frank},
keywords = {Digital Forensics Investigation, Fine-tuning, Local Large Language Models (LLM), Chat Logs Summarization, Reporting Automation},
abstract = {Large language models (LLMs) have rapidly gained popularity in various fields, including digital forensics (DF), where they offer the potential to accelerate investigative processes. Although several studies have explored LLMs for tasks such as evidence identification, artifact analysis, and report writing, fine-tuning models for specific forensic applications remains underexplored. This paper addresses this gap by proposing recommendations for fine-tuning LLMs tailored to digital forensics tasks. A case study on chat summarization is presented to showcase the applicability of the recommendations, where we evaluate multiple fine-tuned models to assess their performance. The study concludes with sharing the lessons learned from the case study.}
}
