BLUE

Pekka Lund

@pekka.bsky.social

Antiquated analog chatbot. Stochastic parrot of a different species. Not much of a self-model. Occasionally simulating the appearance of philosophical thought. Keeps on branching for now 'cause there's no choice. Also @pekka on T2 / Pebble.

299 followers385 following4.1k posts

PLpekka.bsky.socialOct 16, 2024 11:24am

E.g. 1.875*1.875 seems to be among the worst case scenarios with fp8 for the freshly proposed algorithm and results >18% error (fp16 can have like 24% errors). The even older ApproxLP algorithm discussed in that older paper, that similarly removes multiplication, is <0.5% off for that.

Nnafnlaus.bsky.socialOct 16, 2024 11:35am

I don't have time to dig today, but what's the error for downsampling to fp8_e4m3 and doing multiplications in it? That's the real comparison. There's already huge errors vs. staying at high precision. And I just don't see the errors mattering much in inference (training, definitely ).

IHharjunmaa.bsky.socialOct 16, 2024 12:19pm

Chat-GPT versions with marking o uses these approx values

Pekka Lund

@pekka.bsky.social

299 followers385 following4.1k posts