P
..image is a different kind of representation of an image altogether. another pair of models, an encoder and a decoder, have been trained to map images to smaller "latent" representations and then decode those representations as faithfully as possible. the prompt embeddings and these latents are
..both provided as inputs to the model that does the denoising steps; neither form of encoding is really trivially fully human-interpretable
P