Corey Hackworth

Corey Hackworth

Users
Tweets

Corey Hackworth

@CorwinTheGrey

Jun 9

FastMath is such an important tool for my own kid's progress. It has me wanting a version that does Calculus content for myself.

matheo

matheo @SauvageMatheo

Jun 7

Replying to @FastBonus_

Fastmath

Tanya Argibay

Tanya Argibay

@TeamUp2Lead

May 31

Congratulations to our Intermediate Math Teachers & Stallions who went for the GOLD and reached FAST Math Levels 4 & 5!🏅🐎 Watching our Stallions grow in confidence and achievement has been incredible. So proud of each and every one of you! 💚🎉 @browardschools @BCPS_South @ElementaryBCPS @StirlingElem #GoForTheGold #StallionsLeadTheWay #FASTMath #AcademicExcellence #ProudMathCoach

0:13

402

applesorce

applesorce @applesorce

May 31

Replying to @gasshi47

fastmathだ、いいな

Kevin John Parrish

Kevin John Parrish

@kparrish51

Apr 26

Replying to @Riazi_Cafe_en

Advanced Numerical Continuous Fourier Transform in Python (2025–2026 State-of-the-Art) Recent academic work (2025) emphasizes Clenshaw–Curtis product-integration (exponential quadrature) for highly oscillatory or decaying integrals like Fourier transforms, delivering spectral accuracy and stability far beyond simple Riemann sums. Complementary libraries like FINUFFT (Non-Uniform FFT) combine arbitrary high-order quadrature with O(N log N) evaluation at millions of frequencies, outperforming naive FFT or direct summation for continuous FTs (especially non-uniform or singular functions). On the performance side, GitHub’s fastest implementations replace NumPy/SciPy FFT with: • pyFFTW (v0.15.1, Oct 2025) — direct FFTW wrapper (the gold-standard C library), multi-threaded, 2–10× faster than NumPy.fft with minimal code changes. • Rocket-FFT (Dec 2025) — Numba JIT support for numpy.fft/scipy.fft, ideal for small-to-medium arrays in loops. • FINUFFT (pip install finufft) — CPU (or GPU via cuFINUFFT) for true continuous FT approximation at production speed. 1. Fastest Drop-in FFT Upgrade (pyFFTW Rocket-FFT Numba) Replace the original scaled-FFT demo with this production-ready version. It uses pyFFTW for raw speed and Rocket-FFT Numba for JIT in custom workflows. import numpy as np import pyfftw import numba as nb from rocket_fft import fft as numba_fft # pip install rocket-fft import matplotlib.pyplot as plt # Enable multi-threading (FFTW wisdom cache for repeated calls) pyfftw.interfaces.cache.enable() pyfftw.config.NUM_THREADS = 8 # adjust to your cores def f(t): return np.exp(-np.pi * t**2) # Gaussian test (exact FT known) # Parameters N = 4096 T = 20.0 dt = T / N t = np.linspace(-T/2, T/2, N, endpoint=False) # centered grid y = f(t) # === PYFFTW (fastest CPU FFT) === def continuous_ft_pyfftw(y, dt): yf = pyfftw.interfaces.numpy_fft.fftshift( pyfftw.interfaces.numpy_fft.fft(y) ) * dt freq = pyfftw.interfaces.numpy_fft.fftshift( pyfftw.interfaces.numpy_fft.fftfreq(N, dt) ) return freq, yf freq, yf_pyfftw = continuous_ft_pyfftw(y, dt) # === NUMBA ROCKET-FFT (JIT for repeated calls / loops) === @nb.njit(fastmath=True) def continuous_ft_numba(y, dt, N): yf = numba_fft(y) * dt # manual shift for speed yf = np.roll(yf, N//2) freq = np.fft.fftfreq(N, dt) # freq calc is cheap freq = np.roll(freq, N//2) return freq, yf # First call compiles; subsequent calls are blazing fast freq_nb, yf_nb = continuous_ft_numba(y.astype(np.complex128), dt, N) # Exact reference exact = np.exp(-np.pi * freq**2) # Plot (error typically < 1e-10 for this Gaussian) plt.figure(figsize=(9, 5)) plt.plot(freq, np.real(exact), 'k-', lw=2, label='Exact') plt.plot(freq, np.real(yf_pyfftw), 'r--', label='pyFFTW (FFT)') plt.plot(freq, np.real(yf_nb), 'b:', label='Numba Rocket-FFT') plt.xlim(-5, 5) plt.xlabel(r'Frequency $\nu$') plt.ylabel('Real part of $\hat{f}(\nu)$') plt.legend() plt.title('Continuous FT – 2026 Fastest Implementations') plt.grid(True) plt.show() Performance gain: pyFFTW routinely beats NumPy by 3–10×; Rocket-FFT Numba shines for arrays < 10k or inside loops (compilation ~200 ms). 2. Advanced Quadrature (Clenshaw–Curtis Exponential from 2025 Paper) The original tfquad (left Riemann) is replaced by Clenshaw–Curtis exponential quadrature for superior accuracy on oscillatory/decaying functions.

122

shikihuiku

shikihuiku @shikihuiku

Apr 16

Replying to @dgtanaka @ProjectAsura @iwasakiCGtech

別件ですけどshader内は基本Fastmathのように演算順序とfused演算に明確な規定がない上にFlushToZero丸めもされるので宜しくお願いします

matheo

matheo @SauvageMatheo

Mar 26

Replying to @FastBonus_ @Roobet

Fastmath

Rachel

Rachel

@rachelgoodlad

Mar 10

My son told me today his new favorite subject is Math, but “when I say that, I *really* mean FastMath.”

This tweet is unavailable

362

matheo

matheo @SauvageMatheo

Mar 9

Replying to @FastBonus_ @Roobet

Fastmath

matheo

matheo @SauvageMatheo

Mar 8

Replying to @FastBonus_ @Roobet

Fastmath

Bluelight

Bluelight

@bluelightct

Mar 4

Replying to @sarah_cone

Hi Sarah, love to hear that it got your daughter excited about math! I made one of those new FastMath apps (the one with the ghosties). More coming from us at @PlaycademyEdu / @Superbuilders !

7,141

Kevin D. Keck

Kevin D. Keck

@kdkeck

Mar 4

Replying to @sarah_cone

Which FastMath?

5,631

matheo

matheo @SauvageMatheo

Feb 26

Replying to @FastBonus_ @Roobet

Fastmath

matheo

matheo @SauvageMatheo

Feb 12

Replying to @FastBonus_ @Roobet

Fastmath

matheo

matheo @SauvageMatheo

Feb 11

Replying to @FastBonus_ @Roobet

Fastmath @0xxghost

matheo

matheo @SauvageMatheo

Feb 10

Replying to @FastBonus_ @Roobet

Fastmath @Apiii52

GCC - GNU Toolchain

GCC - GNU Toolchain

@gnutools

Feb 9

GNU Tools Weekly News Week 24 (February 8, 2026) Release updates for GNU toolchain: * binutils 2.46 was released * inbox.sourceware.org/binutil… * gcc 16.0 regression status * P1: 29 * P1 P2 P3 total: 860 General/big GNU toolchain news: * GCC easy issue to tackle of the week: * New feature this week, This is a highlight of one bug report that would be a good issue for someone new to GCC to fix * If others want to sponsor an issue please let me know and I can add that one for the week (gdb, binutils and glibc issues welcome too) * lower mempcpy to memcpy when result is unused * gcc.gnu.org/bugzilla/show_bu… * Reach out to Andrew Pinski for mentoring on this issue GCC commits: * note most of the commits from now until the release are bug fixes which have not been listed here normally * doc: Move parameter docs to the GCC internals manual * Removal of CONST_CAST and related macros (C ification cleanup) GCC discussion: * GCC bugzilla stats * 118 new issues filed * 136 issues closed glibc commits: * math: Order signed zeros in f{max,min}{,mag}{f,l,f128} * math: Optimize f{max,min}imum{,_num,_mag,_mag_num}{f,l,f128} * AArch64: Optimize memcpy for Kunpeng 950 processor * AArch64: Add if('fastmath') to math-vector-fortran.h binutils/gdb commits: * PowerPC: Support for Elliptic Curve Cryptography Instructions (RFC02669) * [gold] Note gold and dwp deprecation in NEWS * bpf: add may_goto instruction

1,593

matheo

matheo @SauvageMatheo

Feb 9

Replying to @FastBonus_ @Roobet

Fastmath

Quant Beckman

Quant Beckman

@quantbeckman

Jan 3

About measuring latency correctly (inspired by a follower's question): -Fix CPU/OS variability before measuring (performance governor, pinned core, stable frequency). -Disable/limit background noise (other processes, updates, indexing, antivirus, cloud agents). -Separate hot-path timing from setup/validation/logging. -Use realistic inputs (sizes, dtypes, contiguity) and the real access pattern. -Control randomness (fixed seeds) and log the exact environment (CPU, threads, versions). -Run enough repetitions to see tails; report p50/p90/p99/p999 and max. -Use wall-time and CPU-time when useful (to detect blocking/preemption). -Measure allocations and temporaries (profile memory traffic, not just runtime). -Check for hidden multithreading and pin/set thread counts explicitly. -Compare one big call VS many small calls (overhead vs throughput tradeoff). -Benchmark end-to-end (queue → compute → output), not only the kernel in isolation. -Track regression baselines (store past results; detect performance drift). -Validate correctness under optimization (fastmath, fused kernels) with numerical tolerances. -Use flame/profilers only as guidance; confirm with microbenchmarks counters. -Correlate latency spikes with system events (context switches, IRQs, page faults). -Use hardware counters to classify bottlenecks (cache misses, branch misses, stalled cycles).

1,768

Quant Beckman

Quant Beckman

@quantbeckman

29 Dec 2025

Some thoughts about JIT / native code (what actually gets you closer to micros) -Numba: a. @njit(cache=True, fastmath=True) (use fastmath carefully due to accuracy). b. Stick to native types (NumPy arrays) and avoid Python objects inside. c. parallel=True only if you know it helps; for latency it can hurt due to scheduling overhead. -Cython: a. Type everything (cdef), disable boundscheck/wraparound, use memoryviews. -C/C /Rust extension: a. For the true hot core: SIMD (AVX2/AVX-512), intrinsics, and careful alignment/cache use. -PyPy isn’t usually the path to “nano,” and mixing with NumPy isn’t ideal. -ctypes/cffi can help, but the Python↔C boundary still costs; do big calls, not lots of tiny ones.

2,276