hey validators,
lets make solana faster by enabling xdp.
xdp improves turbine packet handling by cutting packet overhead, copies and context switches. more shred propagation headroom = better validator performance under load and more room for solana to scale.
add:
--experimental-poh-pinned-cpu-core <core> \
--experimental-retransmit-xdp-cpu-cores <core> \
--experimental-retransmit-xdp-zero-copy
bnxt_en:
--experimental-poh-pinned-cpu-core <core> \
--experimental-retransmit-xdp-cpu-cores <core> \
--experimental-retransmit-xdp-interface <physical_interface> \
extend your systemd service:
AmbientCapabilities=CAP_NET_RAW CAP_NET_ADMIN CAP_BPF CAP_PERFMON
CapabilityBoundingSet=CAP_NET_RAW CAP_NET_ADMIN CAP_BPF CAP_PERFMON
LimitMEMLOCK=2000000000
notes:
- dont run poh and xdp on the same physical core
- keep poh away from the xdp cores shared L3 / numa domain
- validate your actual topology with lscpu -e/hwloc or solanahcl/solanahcl/blob/main/public/topology/
easy rule of thumb:
- 7965wx: 24c across 4x ccds -> 6 cores/ccd first ccd: 0–5c, next ccd starts at 6c
- 9375f: 32c across 8x ccds -> 4 cores/ccd first ccd: 0–3c, next ccd starts at 4c
common xdp-capable drivers:
i40e, ixgbe, ice, igc,
mlx5_core, mlx4_core,
bnxt_en
dont use bonded interface with xdp.
for broadcom / bnxt_en: dont enable zero-copy. copy mode is the safe path.
if you run into "huge page alloc failed", remove the zero-copy flag and set the xdp interface to the physical interface:
--experimental-retransmit-xdp-interface <physical_interface>
check the ring sizes: it needs to be power of 2
IFACE=$(ip route get 1.1.1.1 | awk '{for(i=1;i<=NF;i ) if($i=="dev") print $(i 1)}')
sudo ethtool -g "$IFACE"
# if you see values like rx: 511 or tx: 511, thats likely the issue.
sudo ethtool -G "$IFACE" rx 512 tx 512
restart and voila, you are not a potato anymore.