Does finetuning on arXiv dramatically increase benchmark contamination? Surely all these ablation studies and examples in the appendix are a source for leakage

Vipitis

Vipitis @Vipitis

16 Jul 2024

New institution LLM benchmark: blind debugging. Model can request certain inputs and gets outputs. Based on that information alone: can the model 1. predict the algorithm and 2. predict a Bugfix for said algorithm?

Vipitis

Vipitis @Vipitis

2 Jul 2024

"see less often" should instead be "show me never again"

Vipitis

Vipitis @Vipitis

17 Jun 2024

Battlemage before December 👀

Vipitis

Vipitis @Vipitis

16 Jun 2024

Shadertoy buffers are RGBA and 32bit floats. so they could easily hold the position and velocity of particles in a 2D simulaiton. experiments with 1 particle: shadertoy.com/view/lXK3WV

Vipitis

Vipitis @Vipitis

15 Jun 2024

print(f"{s=}") > printf(s)

Vipitis

Vipitis @Vipitis

8 Jun 2024

Stochastic binary search where you sample against a normal distribution?

Vipitis

Vipitis @Vipitis

27 Apr 2024

The worst solution is always the one you aren't using.

Vipitis

Vipitis @Vipitis

22 Apr 2024

Did you know that Intel ExtraSS implementation is open source? This seemingly contains the scripts to generate the dataset, model architecture and training code... 👀 github.com/ltkong218/IFRNet/…

Vipitis

Vipitis @Vipitis

21 Apr 2024

It works on my other machine.

Vipitis

Vipitis @Vipitis

9 Apr 2024

In times where everything is an "LLM" where is the love for lLM or the little language model?

Vipitis

Vipitis @Vipitis

8 Apr 2024

When is Melville Sound?

Vipitis

Vipitis @Vipitis

5 Mar 2024

Any alarmist blog post or sentient machines fiction will end up in training data and cause language models to act in this way. If you want models to be more friendly - write about "helpful AI" than a system prompt telling a model 'you are a helpful AI assistant' makes it better.

Vipitis

Vipitis @Vipitis

2 Mar 2024

Intel seems to be dropping some Gaudi (3?) News in the coming weeks. Just in time for GDC 🤔

123

Vipitis

Vipitis @Vipitis

27 Feb 2024

where does the dependency spiral begin?

Vipitis

Vipitis @Vipitis

22 Feb 2024

We need an evaluation benchmark against safety or overdone diversity. Targeting API products specifically, were you pay for it and don't know what preprocessing is done on your inputs.

Vipitis

Vipitis @Vipitis

16 Feb 2024

Will replacing the multi billion dollar movie industry help Sam get the $7B