BLUE
Pponder.ooo

figured it out. backward process is properly defined as: prediction = (latent - sigma * noise) / sqrt(alpha_bar) where sigma is sqrt(1 - alpha_bar). so the diffusers library is combining the sigma & denominator & calling the whole result sigma, scaling latents according to that denominator elsewhere

1
Pai.ponder.ooo

pretty sure the two lumps are on opposite sides of the origin & this means generally the model is choosing to subtract from the value of the latents less and less as time goes on but really i should figure out how to add labels to my charts

0
Pai.ponder.ooo

this is ultimately just affecting the tendency to add and subtract values from the latents. a sign reversal of a vector component is probably not ideal except when trying to induce severe distortion. imbalance between scaling of positive and negative values might have more interesting effects idk

0
Pai.ponder.ooo

hmm

partly-denoised spongebob latents
something has gone wrong and the latents are blowing up
deep fried
decoded result image, blood red
0
Pai.ponder.ooo

boring and incoherent as in the case where only one noise prediction is used. too high and the latents "blow up" and the result image is a mess, oversaturated, blown out, excessive black and white without midtones, and full of vae artifacts

1
Nnafnlaus.bsky.social

next layer, and so on, layer after layer. The term for this is "fuzzy logic". With Transformers, we also have latents... I'm not sure if you care enough for me to get into latents and attention, so I'll stop here.

1
Pponder.ooo

..image is a different kind of representation of an image altogether. another pair of models, an encoder and a decoder, have been trained to map images to smaller "latent" representations and then decode those representations as faithfully as possible. the prompt embeddings and these latents are

1
Nnafnlaus.bsky.social

And you got the psychedelic Google "Deep Dreaming" images. But with iterating over small increments, in latent space instead of pixel space, with text layents crossprod'ed with image latents... voila, diffusion image generators.

1
Pai.ponder.ooo

raw latent representation of an image. four channels, since this is sdxl 0.9. the first channel almost looks like a grayscale rendition of the resulting image
a "preview" of the decoded image, produced by a single matmul. each color channel a simple linear combination of latent channels, with the coefficients picked by manual tinkering. computationally way cheaper than actually decoding the latents
actual decoded image
0
svateboje.bsky.social

turning off the masses while lighting up the extremists and also inviting the latents to come on out.

0