BLUE
SGsegyges.bsky.social

LSTM, RNN, convolution(al), cross(-)entropy, entropy, logit, logistic, softmax, sampling, contrastive, CUDA, NCCL, QKV, FlashAttention, stochastic, SVM, Gauss, Gaussian, augmentation, backprop(agation), hessian, jacobian, optimizer, GAN, reinforcement learning, RLHF, layernorm, rmsnorm

1
RHtheophite.bsky.social

my guess is that it's something like AlignProp where RLHF steps are intercalated into traditional fine-tuning, rather than just plain-vanilla RLHF. also if i were doing it i would freeze almost the entire model and just RLHF cross-attention.

1
SGsegyges.bsky.social

much more familiar with LMs where there's a lot of work on preventing rlhf from forcing mode collapse, you have to kl penalty it, etc. given that "vanilla" finetuning works nicely on the earlier SD releases the swap to explicit rlhf feels risky

2
RHtheophite.bsky.social

what was the underlying technical concern here? the weird training instability issues you get with RLHF over image models? you can resolve some of that by using some sort of PEFT (e.g., LoRA) or by using something like AlignProp during training although i have never gotten AlignProp to work myself.

1
SGsegyges.bsky.social

i agree that they are using it to tag data but i am real iffy on any sourcing that they're using rlhf specifically, which is a different thing than just generically finetuning

1
SGsegyges.bsky.social

huh, you are correct, sometimes they are this is an oddity to me. are there any models besides sdxl that are known to use rlhf as opposed to finetuning directly on some more curated set?

1
NLnatolambert.bsky.social

One of the things that drives me crazy as a hole in current RLHF literature. Human preferences: High noise, low bias LLM as a judge: Low noise, high bias. We don’t know what this bias is. Drives me crazy. A lot of it seems like LLMs are better, but there is something we haven't measured.

1
SGsegyges.bsky.social

rlhf is LLMs, very different domain but same issue where your smaller curated dataset ends up defining your house style

1
RHtheophite.bsky.social

RLHF is not that but you get the basic idea

1