BLUE
Profile banner
DT
David Thiel
@det.bsky.social
Harm reduction in technology; tech, data science and T&S research at io.stanford.edu & tsjournal.org Engineering lead, AI censorship Death Star
651 followers287 following289 posts
DTdet.bsky.social

As a follow-up to our work on computer-generated CSAM, we took a closer look at the training data used to train various generative models—most prominently, Stable Diffusion 1.5—to see to what degree CSAM itself might be present in the training data.

1

DTdet.bsky.social

We used a combination of methods to determine this: perceptual hashing, cryptographic hashing, and k-nearest neighbors analysis using the image embeddings. Seeded from a small subset of the dataset, PhotoDNA identified hundreds of instances, the URLs of which which were reported to NCMEC.

1
Profile banner
DT
David Thiel
@det.bsky.social
Harm reduction in technology; tech, data science and T&S research at io.stanford.edu & tsjournal.org Engineering lead, AI censorship Death Star
651 followers287 following289 posts