figured it out. backward process is properly defined as: prediction = (latent - sigma * noise) / sqrt(alpha_bar) where sigma is sqrt(1 - alpha_bar). so the diffusers library is combining the sigma & denominator & calling the whole result sigma, scaling latents according to that denominator elsewhere
pretty sure the two lumps are on opposite sides of the origin & this means generally the model is choosing to subtract from the value of the latents less and less as time goes on but really i should figure out how to add labels to my charts
this is ultimately just affecting the tendency to add and subtract values from the latents. a sign reversal of a vector component is probably not ideal except when trying to induce severe distortion. imbalance between scaling of positive and negative values might have more interesting effects idk
boring and incoherent as in the case where only one noise prediction is used. too high and the latents "blow up" and the result image is a mess, oversaturated, blown out, excessive black and white without midtones, and full of vae artifacts
next layer, and so on, layer after layer. The term for this is "fuzzy logic". With Transformers, we also have latents... I'm not sure if you care enough for me to get into latents and attention, so I'll stop here.
..image is a different kind of representation of an image altogether. another pair of models, an encoder and a decoder, have been trained to map images to smaller "latent" representations and then decode those representations as faithfully as possible. the prompt embeddings and these latents are
And you got the psychedelic Google "Deep Dreaming" images. But with iterating over small increments, in latent space instead of pixel space, with text layents crossprod'ed with image latents... voila, diffusion image generators.
turning off the masses while lighting up the extremists and also inviting the latents to come on out.