DT
David Thiel
@det.bsky.social
Harm reduction in technology; tech, data science and T&S research at io.stanford.edu & tsjournal.org
Engineering lead, AI censorship Death Star
651 followers287 following289 posts
We used a combination of methods to determine this: perceptual hashing, cryptographic hashing, and k-nearest neighbors analysis using the image embeddings. Seeded from a small subset of the dataset, PhotoDNA identified hundreds of instances, the URLs of which which were reported to NCMEC.
These URLs were also passed to Canadian Centre for Child Protection's Arachnid API for verification by human analysts. We then used the image embeddings of confirmed CSAM instances to run KNN queries, which identified hundreds of thousands of potential CSAM candidates.
DT
David Thiel
@det.bsky.social
Harm reduction in technology; tech, data science and T&S research at io.stanford.edu & tsjournal.org
Engineering lead, AI censorship Death Star
651 followers287 following289 posts