Inspired by @InstLatX64, today I'm introducing the #SiliconGang Microarchitecture Cheat Sheet: bit.ly/2JTplfJ
This can be viewed by all, and it offers centralised information about CPU μarch design such as caches, buffers, instruction width, etc.
Some notes below:
This is a nice trick called LoopFrog
dl.acm.org/doi/full/10.1145/…
Using an LLVM-based compiler to insert hints, we achieve a geometric mean loop speedup of 43%, translating to whole-program speedups of 9.5% on SPEC CPU 2017 benchmarks, with only modest area and power overheads.
1/3
LoopFrog runs multiple loop iterations from a single thread in parallel within the microarchitecture. The core can spawn future loop iterations as new microarchitectural threadlets based on compiler-inserted hints, which can leapfrog execution beyond the parent thread’s...
2/3
...instruction window, exposing a new, medium-grained parallelism, orthogonal to traditional ILP and TLP. LoopFrog monitors data dependencies between executing threadlets, forwards data for true dependencies and squashes speculative threadlets on ordering violations.
3/3
dl.acm.org/doi/epdf/10.1145/…
"SHADOW: Simultaneous Multi-Threading Architecture with Asymmetric Threads"
"dynamically balances ILP and TLP by executing out-of-order and in-order threads simultaneously on the same core"
Could this be what AheadComputing is working on?
#CPU#μarch
dl.acm.org/doi/epdf/10.1145/…
"ATR: Out-of-Order Register Release Exploiting Atomic Regions"
Interesting μarch idea here. Instructions that have finished execution but not been committed yet, and also do not contain conditional branches can free up their PRF entry early
#CPU#μarch
Apple A19 Pro die shot
Die Size 98.68mm²
P-Core 2.966mm²
P-Core with L2 & Shared Logic 5.486mm²
E-Core 0.782mm²
E-Core with L2 & Shared Logic 2.217mm²
SLC 11.026mm²
Somehow looks to be smaller than the A18 Pro (~104mm²)
#Apple#iPhone#CPU#A19tieba.baidu.com/p/1020632072…
ARM's new μarches added to the block diagram repo
Scarce info this year, these diagrams will be updated and mistakes corrected once more devices are released
μarch Block Diagrams: bit.ly/32qLLew
μarch Cheat Sheet: bit.ly/2JTplfJ#ARM#CPU#MicroArchitecture
.@Apple A19 SoC chip analysis based on images by @chipwise_tech: 2 P-cores with 8MB shared L2$, 4 E-cores with 4MB shared L2$, a 8-core NPU (Apple calls it 16-cores), 2x 6MB System Level Cache (SLC) and a 5-core GPU. All on TSMCs N3P and smaller than the previous A18.
Geekerwan's Initial Review of the A19 is out - bilibili.com/video/BV1cBp4zQ…
P Core has ~12% more perf and 6% increase in IPC
E Core has ~25% more perf and 17% increase in IPC
If these result are accurate then the E core is simply a monster by this point.
#iPhone#A19#CPU#Apple
Geekerwan review for A18 is out
P core is confirmed as the same core present in the M4, but clocked at ~4.04Ghz
E core is the same as the E core in the A17, but clocked at ~2.4Ghz
Power efficiency seems improved as well
youtu.be/QK_t1LfEmBA?si=t9dk…#iPhone#A18#Review#CPU
So P core is ~15% faster compared to A17 (7% clock speed, 7% IPC)
E core is ~14% faster compared to A17 (All clock speed)
All in all this seems like a solid generational improvement, total power consumption has reduced as well