Fabian Giesen

Fabian Giesen

2,522 Photos and videos

Tweets

Pinned Tweet

Fabian Giesen @rygorous

18 Nov 2022

-> mastodon.gamedev.place/@rygo…

Fabian Giesen (@rygorous@mastodon.gamedev.place)

12K Posts, 73 Following, 4.22K Followers · Abstraction maker, abstraction breaker. FUN FACT: things I prefix with FUN FACT are sometimes fun and sometimes factual, but very rarely both.

mastodon.gamedev.place

Fabian Giesen

Fabian Giesen @rygorous

24 Dec 2024

New blog post: "UNORM and SNORM to float, hardware edition" fgiesen.wordpress.com/2024/1…

UNORM and SNORM to float, hardware edition

I mentioned in a previous post that doing exact UNORM or SNORM conversions to float in hardware was not particularly expensive, but didn’t go into detail how. Let’s rectify that! (If yo…

fgiesen.wordpress.com

111

11,925

Fabian Giesen

Fabian Giesen @rygorous

26 Nov 2024

We just released Oodle 2.9.13. Significantly increased BC7 encoding speed (about 20-25% encode time reduction for non-RDO on typical content, 25-30% encode time reduction for RDO) at slightly increased quality. Also several bug fixes and experimental WASM 64-bit support.

9,003

Fabian Giesen

Fabian Giesen @rygorous

7 Nov 2024

New blog post: "Exact UNORM8 to float" fgiesen.wordpress.com/2024/1… a satisfying solution to a problem that, quite possibly, nobody has

Exact UNORM8 to float

GPUs support UNORM formats that represent a number inside [0,1] as an 8-bit unsigned integer. In exact arithmetic, the conversion to a floating-point number is straightforward: take the integer and…

fgiesen.wordpress.com

103

10,967

Fabian Giesen

Fabian Giesen @rygorous

4 Nov 2024

New blog post: "BC7 optimal solid-color blocks" fgiesen.wordpress.com/2024/1… clearing out my "I should write this up" queue, this technique is from... *checks git logs* May 2017. Oh my. (I have quite the backlog.)

BC7 optimal solid-color blocks

That’s right, it’s another texture compression blog post! I’ll keep it short. By “solid-color block”, I mean a 4×4 block of pixels that all have the same color. A…

fgiesen.wordpress.com

8,092

Fabian Giesen

Fabian Giesen @rygorous

26 Oct 2024

New blog post: "Why those particular integer multiplies?" fgiesen.wordpress.com/2024/1… some explanation and some speculation on the integer SIMD multiplies offered in x86, along with some history

Why those particular integer multiplies?

The x86 instruction set has a somewhat peculiar set of SIMD integer multiply operations, and Intel’s particular implementation of several of these operations in their headline core designs ha…

fgiesen.wordpress.com

8,444

Fabian Giesen

Fabian Giesen @rygorous

25 Oct 2024

New blog post: "Inserting a 0 bit in the middle of a value" fgiesen.wordpress.com/2024/1… I guess it's 2-for-1 bit hacks week.

Inserting a 0 bit in the middle of a value

This one originally came up for me in Oodle Texture’s BC7 decoder. In the BC7 format, each pixel within a 4×4 block can choose from a limited set of between 4 to 16 colors (ignoring some…

fgiesen.wordpress.com

114

10,878

Fabian Giesen

Fabian Giesen @rygorous

24 Oct 2024

New blog post: "Zero or sign extend" fgiesen.wordpress.com/2024/1…

Zero or sign extend

A while back I had to deal with a bit-packed format that contained a list of integer values encoded in one of a pre-defined sets of bit widths, where both the allowed bit widths and the signed-ness…

fgiesen.wordpress.com

6,265

Fabian Giesen

Fabian Giesen @rygorous

30 Jan 2024

We released Oodle 2.9.12 last week: radgametools.com/oodlehist.h… Some SDK/compiler updates and bug fixes. Also, max texture size limit bumped from 16384x16384 to 2097152x2097152, which should be good for at least the next 4 months or so.

19,398

Fabian Giesen

Fabian Giesen @rygorous

30 Oct 2023

New blog post: "Entropy decoding in Oodle Data: x86-64 6-stream Huffman decoders" fgiesen.wordpress.com/2023/1…

Entropy decoding in Oodle Data: x86-64 6-stream Huffman decoders

It’s been a while! Last time, I went over how the 3-stream Huffman decoders in Oodle Data work. The 3-stream layout is what we originally went with. It gives near-ideal performance on the las…

fgiesen.wordpress.com

15,475

Fabian Giesen

Fabian Giesen @rygorous

9 Oct 2023

We just released Oodle 2.9.11 (website isn't updated yet, soon!) This one is focused on Oodle Texture improvements. * Faster end-to-end latency in multi-threaded encoding especially on 24 core machines, most noticeable for BC[145].

8,267

Fabian Giesen

Fabian Giesen @rygorous

9 Oct 2023

* Re-designed mode/partition selection logic in baseline BC7 encoder (no RDO). Roughly 2x faster at all encoding effort levels at typically same or better result quality. For BC7 RDO, works out to around a 1.2x speed-up typically.

3,929

Fabian Giesen

Fabian Giesen @rygorous

9 Oct 2023

* AVX-512 paths are significantly faster on AMD Zen 4 CPUs (avoid memory-destination forms on VPCOMPRESSD). This affects mainly BC1 and 3. Plus a bunch of SDK/compiler version updates and smaller fixes!

3,643

Fabian Giesen

Fabian Giesen @rygorous

14 Jun 2023

Oodle Texture PSA: if you're on a Zen 4 machine, in current releases, encode textures with OodleTex_BCNFlag_AvoidWideVectors (disables usage of AVX-512 instructions). Some of the hot AVX-512 loops heavily use the store forms of VPCOMPRESSD which are quite slow on Zen 4.

7,599

Fabian Giesen

Fabian Giesen @rygorous

14 Jun 2023

Workaround (VPCOMPRESSD to reg separate store) will ship in the next release, is essentially perf-neutral on Intel. FWIW, even with the fix, the AVX-512 kernels are pretty much tied on speed with AVX2 on Zen4 anyway. (Although I suspect they are a bit more power-efficient.)

4,387

REVISION

Fabian Giesen retweeted

REVISION @revision_party

12 May 2023

The world has dimmed for us. With sadness and his loved ones in our hearts, we say farewell to our friend and fellow main orga - @acrydfr. More than 25 years of demoparties wouldn't have been the same without him. And will never be again. We miss you, Ben. pouet.net/topic.php?which=12…

9,853

Fabian Giesen

Fabian Giesen @rygorous

7 May 2023

New blog post: "A very brief BitKnit retrospective" fgiesen.wordpress.com/2023/0… Small codec for a special-purpose application that was only interesting by itself for a relatively short time, but ended up influencing LZNA, Kraken, Mermaid and Leviathan

A very brief BitKnit retrospective

UPDATE May 7, 2023: I wrote this post yesterday somewhat in a huff (for reasons not worth going into) and the original post contained several inaccuracies. These have been corrected in this version…

fgiesen.wordpress.com

10,557

Fabian Giesen

Fabian Giesen @rygorous

21 Apr 2023

Oodle 2.9.10b was released earlier this week. Data: Mermaid Optimal1 and higher levels compress much faster (>2x is typical in our tests) - Data: Selkie, Kraken, Leviathan Optimal1 also compress faster, but less drastically so

5,058

more replies

Fabian Giesen

Fabian Giesen @rygorous

21 Apr 2023

Disclaimer on the Mermaid speedup: the ~2x speedup is for 256k chunked encoding in a throughput-bound scenario (e.g. going wide on many chunk encodes at once, our typical use case). Results will vary with larger chunks or no chunking (more bottlenecked on match finding) or

3,232

Fabian Giesen

Fabian Giesen @rygorous

21 Apr 2023

when encoding a single continuous stream and measuring latency. (Bottlenecked by critical path latency not overall time spent in optimal parse portion of encoder.)

2,894