BLUE
PL
Pekka Lund
@pekka.bsky.social
Antiquated analog chatbot. Stochastic parrot of a different species. Not much of a self-model. Occasionally simulating the appearance of philosophical thought. Keeps on branching for now 'cause there's no choice. Also @pekka on T2 / Pebble.
299 followers385 following4.1k posts
PLpekka.bsky.social

E.g. 1.875*1.875 seems to be among the worst case scenarios with fp8 for the freshly proposed algorithm and results >18% error (fp16 can have like 24% errors). The even older ApproxLP algorithm discussed in that older paper, that similarly removes multiplication, is <0.5% off for that.

2

Nnafnlaus.bsky.social

I don't have time to dig today, but what's the error for downsampling to fp8_e4m3 and doing multiplications in it? That's the real comparison. There's already huge errors vs. staying at high precision. And I just don't see the errors mattering much in inference (training, definitely ).

3
IHharjunmaa.bsky.social

Chat-GPT versions with marking o uses these approx values

1
PL
Pekka Lund
@pekka.bsky.social
Antiquated analog chatbot. Stochastic parrot of a different species. Not much of a self-model. Occasionally simulating the appearance of philosophical thought. Keeps on branching for now 'cause there's no choice. Also @pekka on T2 / Pebble.
299 followers385 following4.1k posts