BLUE

David Thiel

@det.bsky.social

Harm reduction in technology; tech, data science and T&S research at io.stanford.edu & tsjournal.org Engineering lead, AI censorship Death Star

651 followers287 following289 posts

DTdet.bsky.socialDec 20, 2023 5:16pm

As a follow-up to our work on computer-generated CSAM, we took a closer look at the training data used to train various generative models—most prominently, Stable Diffusion 1.5—to see to what degree CSAM itself might be present in the training data.

DTdet.bsky.socialDec 20, 2023 5:16pm

We used a combination of methods to determine this: perceptual hashing, cryptographic hashing, and k-nearest neighbors analysis using the image embeddings. Seeded from a small subset of the dataset, PhotoDNA identified hundreds of instances, the URLs of which which were reported to NCMEC.

David Thiel

@det.bsky.social

Harm reduction in technology; tech, data science and T&S research at io.stanford.edu & tsjournal.org Engineering lead, AI censorship Death Star

651 followers287 following289 posts