๐ ๏ธ DevLog โ Gateway Capacity Tracking Is Now in Testing
A quick update on the capacity-aware gateway work from the last devlog.
๐น Current progress
The first version is now implemented and being tested across the hosted API path.
The gateway can now coordinate shared router-session usage through Redis, so multiple gateway instances have better awareness of which sessions are already busy before routing more traffic.
๐น What changed
Core gateway updates:
- Redis-backed session capacity tracking
- per-session concurrency limits
- existing routing flow preserved
- capacity/runtime status exposed through management endpoints
Ops UI updates:
- admin view for capacity mode, tracked sessions, lease TTL, and Redis state
- dedicated Sessions view for live Cortensor session usage
- visibility into live occupancy, recent traffic, labels, latency, token usage, and gateway hit distribution
๐น Why this matters
This helps the hosted API path handle bursts more safely.
With multiple gateway replicas, traffic can now be coordinated against the same shared session pool instead of blindly overloading the same router session.
This gives us:
- better load protection
- clearer real-time session visibility
- easier debugging during stress tests
- more predictable hosted inference traffic distribution
๐น What comes next
This is not full queue-based admission control yet.
Next, weโll also be looking into the queue layer so overload behavior is safer when sessions are busy or all sessions are at capacity.
The goal is:
- stronger backpressure
- clearer overload handling
- more predictable burst behavior
#Cortensor #AIInfrastructure #DevLog #Inference #Gateway #DecentralizedAI