Agreed. This coversation has me convinced that not only is this not an ideal solution, but ideal solutions are probably already being implemented in improved FP8 ops in new hardware. FP8s don't need to be accurate. But there are better ways to do this.
E.g. 1.875*1.875 seems to be among the worst case scenarios with fp8 for the freshly proposed algorithm and results >18% error (fp16 can have like 24% errors). The even older ApproxLP algorithm discussed in that older paper, that similarly removes multiplication, is <0.5% off for that.
But according to their results there's minimal difference between this approximation (in fp8?) and regular fp16. Which seems surprising. Also makes me wonder if they have chosen to publish only tests where the results were favorable.
I wouldn't trust something like this for anything higher than FP8. But FP8 and lower are great for inference. Just pretty useless for training!
Error for same values would still be something like 5.5% for FP8_e4m3, I think. And while they mostly talk about FP8, there are all kinds of comparisons to FP16, and the claimed differences seem to be surprisingly small.
This paper is talking about FP8 and less. This is only useful for inference, not training. You need high precision for training.
ベースモデルはFLUX.1 dev fp8です。
まず絵をイメージして文を書きます。 DeepL翻訳にかけます。少し文章の形を整えます。 画像サイズに引っ張られることを意識しながら縦か横か決めます。 生成ボタンを押して1分くらいでこの通りです。詳しくはALT欄やTensor Artの私の投稿を見てください。
ローカルですよね~。モデルですね。私が使っているFLUXモデルはflux1-dev-fp8で、モンモンさん曰く、エロは規制が入り生成出来ないって言ってました。確かに、陰毛透けパンティのプロンプトいれても、レースパンティ止まりです。私もエロに対応出来るFLUXモデルをさがしてます。 いやいや、ゆさおさんが本気をだしたら、確実に負けます。(勝手に勝負事になってしまいました。すみません。)