Digital Forensics: Leveraging Deep Learning Techniques in Facial Images to Assist Cybercrime Investigations

Anda, Felix

Publication Date:  May 2021

Publication Name:  PhD Thesis, School of Computer Science, University College Dublin,

Abstract:   We are living in a digital era where most transactions are contact-less, social media plat-forms are commonplace and a part of our daily life is recorded either in a permissive or surreptitious manner. Whether we are present in an online meeting, daily social media feed, a peer-connected calendar, a live gaming or video stream, hundreds of bytes of our information are sent through a network to a server. The exponential growth of storage is also enabling thousands of multimedia content to be stored locally on digital devices but at the same time challenging digital investigations that are hampered by the accumulation of such devices that were stored in a forensic laboratory awaiting to be processed by an expert in a timely manner. The size and amount of information that requires analysis is increasing, leading to an ungovernable digital forensic backlog. Smartphone users are able to produce original content such as audio, images and videos, and thanks to the internet, are able to broadcast data worldwide in a matter of seconds. Digital forensic practitioners have become overwhelmed by the amount of data that they encounter and are requiring the implementation of artificial intelligence as tools and techniques to aid investigations, to discover, gather and analyse records swiftly. To address the digital forensic backlog, the creation of age estimation models to assist digital forensic investigations has been proposed. Although some models perform well for the entire age range, in certain age ranges such as the underage group, the models perform wholly inadequate. Influencing factors on underage age estimation have been evaluated and it has been determined that certain elements have strong, mild or weak correlations with the machine-predicted performance. These considerations are key on the curation of datasets and will yield better results on future trained models. The largest underage dataset with age and gender labels has been collected and several models have been experimented with different image pre-processing techniques, neural network architectures, etc. Hyper-parameter optimisation was introduced and the best score for facial age estimation was obtained. The scores were evaluated with a chosen test dataset that contains faces that can be spotted by well-known face detectors, such as Viola Jones. A novel facial embedding approach was proposed and a distribution evaluation metric was introduced instead of a single value. The performance achieved surpasses the state-of-the-art facial age detectors for subjects under the age of 25.

Download Thesis:

Download Paper as PDF

BibTeX Entry:


      @phdthesis{anda2021PhD.DeepLearningFacialImage,
title="{Digital Forensics: Leveraging Deep Learning Techniques in Facial Images to Assist Cybercrime Investigations}",
author={Anda, Felix},
school={School of Computer Science, University College Dublin},
month=05,
year=2021,
address={Dublin, Ireland},
abstract={We are living in a digital era where most transactions are contact-less, social media plat-forms are commonplace and a part of our daily life is recorded either in a permissive or surreptitious manner. Whether we are present in an online meeting, daily social media feed, a peer-connected calendar, a live gaming or video stream, hundreds of bytes of our information are sent through a network to a server. The exponential growth of storage is also enabling thousands of multimedia content to be stored locally on digital devices but at the same time challenging digital investigations that are hampered by the accumulation of such devices that were stored in a forensic laboratory awaiting to be processed by an expert in a timely manner. The size and amount of information that requires analysis is increasing, leading to an ungovernable digital forensic backlog. Smartphone users are able to produce original content such as audio, images and videos, and thanks to the internet, are able to broadcast data worldwide in a matter of seconds. Digital forensic practitioners have become overwhelmed by the amount of data that they encounter and are requiring the implementation of artificial intelligence as tools and techniques to aid investigations, to discover, gather and analyse records swiftly. To address the digital forensic backlog, the creation of age estimation models to assist digital forensic investigations has been proposed. Although some models perform well for the entire age range, in certain age ranges such as the underage group, the models perform wholly inadequate. Influencing factors on underage age estimation have been evaluated and it has been determined that certain elements have strong, mild or weak correlations with the machine-predicted performance. These considerations are key on the curation of datasets and will yield better results on future trained models. The largest underage dataset with age and gender labels has been collected and several models have been experimented with different image pre-processing techniques, neural network architectures, etc. Hyper-parameter optimisation was introduced and the best score for facial age estimation was obtained. The scores were evaluated with a chosen test dataset that contains faces that can be spotted by well-known face detectors, such as Viola Jones. A novel facial embedding approach was proposed and a distribution evaluation metric was introduced instead of a single value. The performance achieved surpasses the state-of-the-art facial age detectors for subjects under the age of 25.}
}