Fine-Tuning Large Language Models for Digital Forensics: Case Study and General Recommendations

Michelet, Gaëtan; Henseler, Hans; van Beek, Harm; Scanlon, Mark; Breitinger, Frank

Publication Date:  July 2025

Publication Name:  ACM Digital Threats: Research and Practice

Abstract:   Large language models (LLMs) have rapidly gained popularity in various fields, including digital forensics (DF), where they offer the potential to accelerate investigative processes. Although several studies have explored LLMs for tasks such as evidence identification, artifact analysis, and report writing, fine-tuning models for specific forensic applications remains underexplored. This paper addresses this gap by proposing recommendations for fine-tuning LLMs tailored to digital forensics tasks. A case study on chat summarization is presented to showcase the applicability of the recommendations, where we evaluate multiple fine-tuned models to assess their performance. The study concludes with sharing the lessons learned from the case study.

Download Paper:

Download Paper as PDF

BibTeX Entry:


      @article{Michelet2025Fine-TuningLLMDF,
title = {Fine-Tuning Large Language Models for Digital Forensics: Case Study and General Recommendations},
journal = {ACM Digital Threats: Research and Practice},
volume = {},
pages = {3748264},
month = 07,
year = {2025},
issn = {2576-5337},
doi = {https://doi.org/10.1145/3748264},
author = {Michelet, Ga\"{e}tan and Henseler, Hans and van Beek, Harm and Scanlon, Mark and Breitinger, Frank},
keywords = {Digital Forensics Investigation, Fine-tuning, Local Large Language Models (LLM), Chat Logs Summarization, Reporting Automation},
abstract = {Large language models (LLMs) have rapidly gained popularity in various fields, including digital forensics (DF), where they offer the potential to accelerate investigative processes. Although several studies have explored LLMs for tasks such as evidence identification, artifact analysis, and report writing, fine-tuning models for specific forensic applications remains underexplored. This paper addresses this gap by proposing recommendations for fine-tuning LLMs tailored to digital forensics tasks. A case study on chat summarization is presented to showcase the applicability of the recommendations, where we evaluate multiple fine-tuned models to assess their performance. The study concludes with sharing the lessons learned from the case study.}
}