BLUE

SGsegyges.bsky.socialOct 18, 2024 5:07am

LSTM, RNN, convolution(al), cross(-)entropy, entropy, logit, logistic, softmax, sampling, contrastive, CUDA, NCCL, QKV, FlashAttention, stochastic, SVM, Gauss, Gaussian, augmentation, backprop(agation), hessian, jacobian, optimizer, GAN, reinforcement learning, RLHF, layernorm, rmsnorm

RHtheophite.bsky.socialOct 17, 2024 5:04pm

this is AlignProp, which is reinforcement learning, is based on human feedback, but is not 'RLHF' in the explicit algorithmic sense that most people mean it. arxiv.org/abs/2310.03739

Aligning Text-to-Image Diffusion Models with Reward Backpropagation

Text-to-image diffusion models have recently emerged at the forefront of image generation, powered by very large-scale unsupervised or weakly supervised text-to-image training datasets. Due to their u...

RHtheophite.bsky.socialOct 17, 2024 5:02pm

my guess is that it's something like AlignProp where RLHF steps are intercalated into traditional fine-tuning, rather than just plain-vanilla RLHF. also if i were doing it i would freeze almost the entire model and just RLHF cross-attention.

SGsegyges.bsky.socialOct 17, 2024 4:59pm

much more familiar with LMs where there's a lot of work on preventing rlhf from forcing mode collapse, you have to kl penalty it, etc. given that "vanilla" finetuning works nicely on the earlier SD releases the swap to explicit rlhf feels risky

RHtheophite.bsky.socialOct 17, 2024 4:56pm

what was the underlying technical concern here? the weird training instability issues you get with RLHF over image models? you can resolve some of that by using some sort of PEFT (e.g., LoRA) or by using something like AlignProp during training although i have never gotten AlignProp to work myself.

SGsegyges.bsky.socialOct 17, 2024 4:47pm

i agree that they are using it to tag data but i am real iffy on any sourcing that they're using rlhf specifically, which is a different thing than just generically finetuning

SGsegyges.bsky.socialOct 17, 2024 4:38pm

huh, you are correct, sometimes they are this is an oddity to me. are there any models besides sdxl that are known to use rlhf as opposed to finetuning directly on some more curated set?

NLnatolambert.bsky.socialOct 17, 2024 4:22pm

One of the things that drives me crazy as a hole in current RLHF literature. Human preferences: High noise, low bias LLM as a judge: Low noise, high bias. We don’t know what this bias is. Drives me crazy. A lot of it seems like LLMs are better, but there is something we haven't measured.

SGsegyges.bsky.socialOct 17, 2024 4:21pm

rlhf is LLMs, very different domain but same issue where your smaller curated dataset ends up defining your house style

RHtheophite.bsky.socialOct 17, 2024 3:30pm

RLHF is not that but you get the basic idea