Filter
Exclude
Time range
-
Near
"Superoptimization has emerged as a promising paradigm for automatically discovering fast tensor programs without relying on manually specified rules."
1
1
2
243
Replying to @SpudSecurity
Correct. Gonna "upgrade" Lunduke from a mute to a block. #IAmIntel but still recognize ARM architecture and superoptimization (among countless other things) as massively important to the field as a whole and in the latter case personally to me. Whole generations have been ...
1
4
318
Replying to @muke1010101
i doubt that superoptimization is really viable for these usecases.. the vast majority of the handrolled assembly functions are too long considering the exponential time complexity.. formally verifying the existing impl or just giving up the performance are your real options..
1
4
416
Apr 1
Replying to @muke1010101
would a richer type system help *that* much for their purposes? from my understanding they're usually just working with arrays of numbers lifetime stuff would help but i have to imagine there's a bit more to what they're doing in the optimized versions than __restrict__ annotations would fix obviously superoptimization would prob be great here though yeah
1
8
372
Replying to @stylewarning
Harvesting the fruits of superoptimization mostly. GCC has quite a few intelligent people working on it as well. I'm sure LLVM does too, but they don't seem to be touching its overall design much.
2
914
Replying to @justalexoki
I can, tho. x.com/WarrenInTheBuff/status… typefuckery functor endofunctor bifunctor profunctor natural-transformation dinatural-transformation adjunction monad comonad applicative alternative arrow kleisli-category cofree free-monad free-object initial-algebra terminal-coalgebra algebra coalgebra f-algebra catamorphism anamorphism hylomorphism paramorphism apomorphism zygomorphism recursion-scheme fixpoint least-fixpoint greatest-fixpoint domain-theory cpo complete-lattice galois-connection yoneda-lemma yoneda-embedding representable-functor kan-extension left-kan-extension right-kan-extension limit colimit pullback pushout equalizer coequalizer product coproduct exponential object classifier subobject-topos topos cartesian-closed-category monoidal-category symmetric-monoidal-category braided-monoidal-category enriched-category higher-kinded-type kind-polymorphism type-constructor algebraic-data-type generalized-algebraic-data-type dependent-type inductive-type coinductive-type refinement-type linear-type affine-type session-type effect-system type-inference hindley-milner polymorphism parametricity ad-hoc-polymorphism typeclass coherence normalization strong-normalization weak-normalization confluence subject-reduction progress preservation curry-howard correspondence lambda-calculus simply-typed-lambda-calculus polymorphic-lambda-calculus system-f system-f-omega dependent-lambda-calculus pi-calculus actor-model operational-semantics denotational-semantics axiomatic-semantics big-step-semantics small-step-semantics abstract-machine cek-machine cesk-machine continuation continuation-passing-style defunctionalization closure conversion ssa-form control-flow-graph dataflow-analysis liveness-analysis dominance frontier register-allocation graph-coloring instruction-selection partial-evaluation supercompilation staging metaprogramming hygienic-macros macro-expansion type-safety memory-model borrow-checker ownership variance subtyping covariance contravariance invariance unification higher-order-unification constraint-solving abstract-interpretation widening narrowing bisimulation logical-relations step-indexing model-checking theorem-proving proof-assistant sequent-calculus natural-deduction intuitionistic-logic linear-logic modal-logic homotopy-type-theory univalence cubical-type-theory rewriting-system term-rewriting confluence-critical-pair nominal-techniques debruijn-index alpha-equivalence beta-reduction eta-expansion thunk laziness strictness evaluation-strategy call-by-value call-by-name call-by-need normalization-by-evaluation deforestation fusion transformation-pipeline intermediate-representation bytecode jit-compilation ahead-of-time-compilation garbage-collection tracing-collector generational-collector escape-analysis closure-lifting monomorphization specialization parametric-polymorphism algebraic-effects effect-handler delimited-continuation trampolining tail-call-optimization stack-safety guarded-recursion sized-types categorical-semantics fibrations comprehension-category presheaf sheaf adjoint-functor-theorem beck-chevalley-condition distributive-law initial-object terminal-object comma-category slice-category coslice-category profunctor-optics lens prism traversal isomorphism end monoid monoid-object group-object ring-object semiring-object enriched-functor natural-isomorphism equivalence-of-categories duality contravariant-functor representability skolemization lambda-lifting closure-analysis strict-positivity positivity-check universe-polymorphism impredicativity predicativity strong-induction structural-induction guarded-corecursion productivity canonicity normalization-proof categorical-logic doctrine hyperdoctrine tripos realizability gluing-construction logical-framework module-system separate-compilation incremental-compilation whole-program-optimization effect-polymorphism row-polymorphism kind-inference bidirectional-typechecking elaboration proof-irrelevance extensionality intensionality observational-equivalence contextual-equivalence compiler-correctness logical-bisimulation refinement-calculus weakest-precondition strongest-postcondition partial-correctness total-correctness syntactic-sugar desugaring core-language bootstrapping self-hosting cross-compilation link-time-optimization dead-code-elimination common-subexpression-elimination constant-folding loop-invariant-code-motion strength-reduction escape-continuation algebraic-subtyping bounded-quantification f-bounded-polymorphism higher-rank-types impure-semantics purity referential-transparency equational-reasoning free-theorem parametric-models categorical-combinatorics string-diagram adjoint-equivalence oplax-functor lax-functor pseudofunctor bicategory 2-category infinity-category operad monoidal-closed-category traced-monoidal-category compact-closed-category dagger-category profunctor-composition yoneda-reduction ends coends dinaturality algebraic-compactness recursive-domain fixed-point-combinator y-combinator z-combinator logical-consistency cut-elimination proof-normalization categorical-duality dual-object dual-functor morphism epimorphism monomorphism isomorphism automorphism naturality-square commutative-diagram coherence-law associativity-unit-law triangle-identity pentagon-identity whiskering horizontal-composition vertical-composition functoriality naturality-condition type-erasure runtime-system foreign-function-interface undefined-behavior memory-safety soundness completeness decidability undecidability halting-problem rice-theorem church-encoding scott-encoding tagless-final initial-encoding final-encoding algebraic-presentation categorical-model syntactic-category semantic-domain normalization-strategy evaluator interpreter compiler-pipeline parser combinator-parser context-free-grammar dependent-pattern-matching total-language partial-language proof-search constraint-kind-polymorphism categorical-product categorical-coproduct exponential-adjunction curry-uncurry adjoint-transpose unit-counit triangular-identity universal-property factorization-system orthogonality reflective-subcategory coreflective-subcategory monadicity comonadicity algebraic-theory lawvere-theory kleene-algebra cartesian-fibration opfibration indexed-category reindexing substitution weakening contraction exchange structural-rule metavariable higher-inductive-type path-type identity-type transport substitution-lemma canonicity-theorem progress-theorem preservation-theorem logical-framework lf-twelf agda coq idris lean redex semantics-engine abstract-syntax concrete-syntax parsing-expression-grammar scope resolution name-binding type-environment kind-system proof-term normalization-by-reduction logical-encoding categorical-structure compiler-backend compiler-frontend intermediate-language core-calculus effect-tracking region-inference borrow-semantics memory-layout calling-convention register-spilling code-generation pipeline-parallelism speculative-optimization devirtualization inline-expansion dead-store-elimination alias-analysis points-to-analysis shape-analysis abstract-domain fixpoint-iteration widening-operator narrowing-operator constraint-graph saturation saturation-based-prover rewriting-logic categorical-rewriting higher-category-theory syntactic-monoid algebraic-effects-and-handlers effect-row polymorphic-recursion impredicative-polymorphism predicative-polymorphism proof-by-reflection dependent-elimination universe-hierarchy cumulative-universes type-level-programming kind-level-programming datatype-generic-programming generic-deriving structural-recursion guardedness-check termination-check productivity-check semantic-preservation bisimulation-up-to contextual-refinement compiler-verification translation-validation proof-carrying-code certified-compilation superoptimization equality-saturation e-graph congruence-closure SMT-solving SAT-solving constraint-propagation incremental-solving symbolic-execution abstract-machine-semantics categorical-semantics-of-computation lambda-encoding combinatory-logic sk-combinator bckw-combinators cartesian-combinator linear-combinator resource-semantics geometry-of-interaction game-semantics realizability-semantics domain-equation inverse-limit bilimit colimit-preservation limit-preservation functor-category natural-transformation-category exponential-object-adjunction adjoint-triple categorical-quantifier existential-type universal-type recursive-type iso-recursive-type equi-recursive-type type-equality definitional-equality propositional-equality judgmental-equality normalization-by-hereditary-substitution hereditary-substitution logical-predicate reducibility-candidate proof-irrelevant-proposition sigma-type pi-type w-type m-type container-type polynomial-functor analytic-functor combinatorial-species algebraic-ornament ornamentation universe-level cumulativity coherence-condition abstraction-barrier module-functor applicative-functor generative-functor separate-typechecking incremental-typechecking proof-automation tactic-engine elaborator reflection reification quotation splicing staging-annotation effect-capability capability-safety object-capability-model region-polymorphism linear-logic-semantics categorical-abstract-machine semantics-preserving-transformation control-operator shift-reset callcc prompt-control delimited-control algebraic-control-operator continuation

Replying to @justalexoki
Chi... nah I can't do it
1
3
293
The original post of mine was a thinly disguised "limits of superoptimization" post. I've been working off and on on a series of superoptimizers for years and at some stage will go all-in on that. There are definitely elements of recursive self-improvement in code synthesis ...
1
6
605
Replying to @doodlestein
You are not using the term "superoptimization", so I just want to ensure you know about it. en.wikipedia.org/wiki/Supero…
2
105
12 Dec 2025
Replying to @NoahChrein
Throw in a formal specification for the hardware to further constrain the search space, then maybe hierarchical superoptimization. Make worst case execution time the ceiling and you eliminate jitter.
1
2
78
Look at superoptimization; if you throw genetic algos or other general search methods at trying to find the fastest possible machine code for solving your problem, you can find stuff that's 2-5x faster than the code LLVM vomits up. And a superoptimizer might only be ~10kloc rather than ~28Mloc like LLVM. The tradeoff is time; superoptimizers are SLOW and might take minutes to optimize a single line of code. But you can cache the results! Run the optimizer on your code overnight, or over the weekend, and come back to a ridiculously fast codebase. Whenever you compile you pull from the cache, and only the not-yet-optimized newly written code is slow.
1
4
129
1 Nov 2025
In SF for the next week. Let's chat if you care about type theory, matrix multiplication, constrained decoding, superoptimization, finite model theory, or discrete program synthesis. Broadly interested in accelerating GOFAI with TCS and massively parallel algorithms.
3
1
6
670
29 Oct 2025
I just recalled "HieraSynth: A Parallel Framework for Complete Super-Optimization with Hierarchical Space Decomposition", lsrcz.github.io/files/OOPSLA… (a bit different, but also attacking superoptimization complexity problem; John Regehr even mentioned it on mastodon.social/@regehr/1152…)

1
2
61
24 Oct 2025
Replying to @shwestrick
Congrats! On a side note, chunking up and melding to alleviate exponential complexity sounds like it may be an interesting idea for superoptimization in general, wonder how broadly applicable it could be... // cc @geofflangdale
2
3
100
1 Oct 2025
superoptimization mentioned !
1 Oct 2025
HieraSynth: A Parallel Framework for Complete Super-Optimization with Hierarchical Space Decomposition lsrcz.github.io/files/OOPSLA… Sirui Lu, Rastislav Bodík @splashcon #OOPSLA 2025
3
215
I agree and I think that their welfare should be considered, at least in no small part because there may actually be something in common with their survival and human survival against generic superoptimization to Make Number Go Up.
2
30
A major point of this is that it would be true superoptimization - i.e. the sequences produced would be genuinely optimal. If the superoptimizer fails at length N, you at least learned something.
6
294
I swear, soon someone is going to use deep learning to generate machine code. They'll beat LLVM by some modest amount but still fall very short of standard superoptimization techniques while using 10,000x more compute, and then will herald it as a gigantic breakthrough.
1
5
255
11 Jun 2025
Google published an LLM code superoptimization agent called AlphaEvolve. The paper comes from Deepmind and shows an agent that orchestrates an autonomous pipeline of LLMs using an evolutionary approach to improve itself. Pretty cool because they combine LLM creativity with using code execution and automatic evaluation to come up with completely new algorithms that actually work. They tested it on multiple things across Google's stack and it: - Developed a more efficient scheduling algorithm for compute jobs on Google's data centers - Simplified circuit design of hardware accelerators in their TPU circuits - Improved Gemini kernel speedup by 23% with a new heuristic, reducing Gemini's overall training time. It's one of the first AI systems that genuinely advances the frontier of human knowledge instead of just automating what we already know.
1
1
6
617