Explaining the technology for detecting child sexual abuse online

This piece explains the main technologies for detecting known and new/unknown child sexual abuse material, as well as grooming, in the context of the EU draft Regulation to prevent and combat child sexual abuse.

multiple transcluent pink and blue squares in front of a black telescope with a black stool behind it

EU policymakers are about to decide on a very significant legislative proposal: the draft Regulation to prevent and combat child sexual abuse. While the aim of protecting children from sexual violence online is uncontroversial, critics have warned that the proposal is technologically unsound.

Some of the key disagreements and challenges regarding technology arise in the context of measures to detect known (previously identified) child sexual abuse material, new (or not yet identified) child sexual abuse material, and grooming. This piece aims to provide some clarity around the capability of technology to detect abuse.

The detection of known child sexual abuse material

Technologies like perceptual hashing are crucial in identifying known child sexual abuse, but policymakers should be aware of and take into account their limitations around accuracy and security.

A crucial way to detect known child sexual abuse material is via a process called “hashing”. The detection tool creates a hash - a unique digital fingerprint - for an image. This hash is then compared against a database of hashes of known child sexual abuse material to find matches. “Cryptographic hashing” can be used to identify exact matching, while “perceptual hashing” can be used to determine whether the content is similar enough to constitute a match, for example even where the image has been resized, cropped or rotated. Hashing can also be used for other multimedia content, like videos.

Regarding these technologies, experts mainly disagree on their accuracy (how well a tool is able to detect child sexual abuse material) and security (resistance to attacks). For example, PhotoDNA, one of the main tools for perceptually hashing photos and videos, has been said to have a false positive rate of 1 in 50 billion. However, there is no independent review of this technology. Some experts have questioned this rate. For all perceptual hash tools, efficient attacks have been described that created false negatives (where the tool did not catch known child sexual abuse images because small changes had been made to them), as well as false positives (where the tool wrongly identified non-abuse images as known child sexual abuse material). Scientific literature has also shown that, if parts of the perceptual hashing tools are moved to user devices, the tools could be reverse engineered. This means that they could be decoded in order to extract sensitive information from them, in some cases even identifying specific people in the original image.

The detection of new/unknown child sexual abuse material

The detection of new/unknown child sexual abuse material and grooming poses more difficult problems, particularly regarding the accuracy of the technologies. The review of material to identify child sexual abuse material from “false positives” would require substantial human resources.

The detection of new or unknown child sexual abuse material poses greater technical challenges than detecting already known images and cannot be achieved through “hashing”, which requires an identified imagine. The identification of unknown child sexual abuse material can only be done through artificial intelligence, based on machine learning. Classifiers (algorithms that sort data into classes based on pattern recognition) can be trained to detect nudity, faces, colours etc. It is particularly challenging to detect the age of a person shown in the content, especially to determine whether they are a teenager or a young adult.

The tools that are used to identify unknown material are automated, but the work necessary to verify whether child sexual abuse material (CSAM) has accurately been identified requires human oversight. Assessing the impact of the introduction of these measures requires an analysis of the “precision”, “recall” and the “false positive error rate” of the relevant tools.

Precision

The percentage of material identified by the tool as child sexual abuse material, that actually depicts children being sexually abused.

The EU Commission, in its Impact Assessment of the proposal, refers to Thorn’s Safer tool as one example of a machine learning tool that can be used to detect new/unknown CSAM. According to the Commission’s Impact Assessment, Thorn’s CSAM Classifier can be set at a 99.9% precision rate. So out of 1000 images that the tool identifies as CSAM, 999 would actually be CSAM and one image would not be CSAM. The Commission refers to “data from bench tests”, which appear to be from Thorn’s own tests, though they have not been independently verified.

Recall

The percentage of messages - out of all messages that contain child sexual abuse material - that the tool is able to detect.

In the Commission’s Impact Assessment, it is stated that, when set at 99.9% precision rate, the Thorn CSAM Classifier identifies 80% of the total CSAM in the data set.

The EU Parliament’s Complementary Impact Assessment uses the following scenario: If the Safer Tool is applied to a platform in which 1 billion messages are exchanged daily and out of these messages, 10,000 messages contain CSAM the tool will be able to correctly identify 80% out of the 10,000 messages with CSAM. Based on this recall rate, it would identify 8,000 messages containing CSAM, but fail to identify the other 2,000 messages.

False positive error rate

The percentage of messages incorrectly identified as containing child sexual abuse material, out of all the messages exchanged.

The Commission’s Impact Assessment does not provide specific false positive error rates for Thorn’s classifier or the other tools it mentions.

The Parliament’s Impact Assessment notes: “If 0.1% [an optimistic estimate] of all messages would be falsely flagged as CSAM (i.e., a false positive) and this percentage is applied to one billion messages exchanged each day, it results in 1 million false positives per day. It takes one person approximately 10 seconds to classify whether reported content is indeed CSAM or whether it is a false positive. This means that one person could classify about 2500 messages per day. Per 1 billion messages, this would require 400 people on a permanent basis to classify those images. Taking into account training, holidays and weekends, it is more likely that a team of 800 people would be required to classify those images. This workload is deemed not feasible, regardless of whether it is the responsibility of [an EU-wide] Centre or law enforcement.”

So even with a very low false positive rate, identifying the true positives from the false positive would be very resource-intensive. A very high number of people falsely flagged would be affected.

The detection of grooming

The detection of grooming requires the analysis of text through machine learning. Technologies for grooming detection find patterns pointing to possible concrete elements of suspicion. They indicate the estimated probability that a conversation is grooming. Flagged conversations are then subject to human review.

Regarding the accuracy of these technologies, the EU Commission states that Microsoft has reported that its tool developed under Project Artemis has an accuracy of 88%. However, Microsoft itself “recommends against reliance on this figure when discussing EU policy”, adding that it relates to “a single English-language technique trained on a small data set of known instances of [grooming]”. Moreover, there is no independent review of this accuracy level. Some technology experts have said that with text alone, “it can be difficult to get error rates significantly below 5 - 10%, depending on the nature of the material being searched for”. For 1 billion messages exchanged daily, this would mean between 50 - 100 million false positives every day. In this context, human review is completely unfeasible.