BLUE

Pponder.oooOct 2, 2024 6:16am

figured it out. backward process is properly defined as: prediction = (latent - sigma * noise) / sqrt(alpha_bar) where sigma is sqrt(1 - alpha_bar). so the diffusers library is combining the sigma & denominator & calling the whole result sigma, scaling latents according to that denominator elsewhere

Pai.ponder.oooSep 30, 2024 2:39am

pretty sure the two lumps are on opposite sides of the origin & this means generally the model is choosing to subtract from the value of the latents less and less as time goes on but really i should figure out how to add labels to my charts

Pai.ponder.oooSep 27, 2024 8:41pm

this is ultimately just affecting the tendency to add and subtract values from the latents. a sign reversal of a vector component is probably not ideal except when trying to induce severe distortion. imbalance between scaling of positive and negative values might have more interesting effects idk

Pai.ponder.oooSep 27, 2024 12:30am

hmm

something has gone wrong and the latents are blowing up

Pai.ponder.oooSep 26, 2024 7:40pm

boring and incoherent as in the case where only one noise prediction is used. too high and the latents "blow up" and the result image is a mess, oversaturated, blown out, excessive black and white without midtones, and full of vae artifacts

Nnafnlaus.bsky.socialSep 25, 2024 3:30pm

next layer, and so on, layer after layer. The term for this is "fuzzy logic". With Transformers, we also have latents... I'm not sure if you care enough for me to get into latents and attention, so I'll stop here.

Pponder.oooSep 24, 2024 12:52am

..image is a different kind of representation of an image altogether. another pair of models, an encoder and a decoder, have been trained to map images to smaller "latent" representations and then decode those representations as faithfully as possible. the prompt embeddings and these latents are

Nnafnlaus.bsky.socialSep 21, 2024 6:42pm

And you got the psychedelic Google "Deep Dreaming" images. But with iterating over small increments, in latent space instead of pixel space, with text layents crossprod'ed with image latents... voila, diffusion image generators.

Pai.ponder.oooSep 20, 2024 11:59pm

raw latent representation of an image. four channels, since this is sdxl 0.9. the first channel almost looks like a grayscale rendition of the resulting image

a "preview" of the decoded image, produced by a single matmul. each color channel a simple linear combination of latent channels, with the coefficients picked by manual tinkering. computationally way cheaper than actually decoding the latents

svateboje.bsky.socialSep 18, 2024 6:18pm

turning off the masses while lighting up the extremists and also inviting the latents to come on out.