𝗧𝗼𝗽 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝗶𝗲𝘀 𝘁𝗼 𝗜𝗺𝗽𝗿𝗼𝘃𝗲 𝗔𝗣𝗜 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲
𝟭. 𝗖𝗮𝗰𝗵𝗶𝗻𝗴
A cache hit never touches the database. On a miss, you query the DB and write to cache so the next caller doesn't pay the same cost.
The part that engineers usually get wrong is invalidation. TTL is easy to implement and will absolutely serve stale data at the worst moment. Event-driven invalidation is accurate, but now you have a new thing that can break.
𝟮. 𝗖𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗼𝗻 𝗣𝗼𝗼𝗹𝗶𝗻𝗴
When you open a new connection, for each request, a few things happen: TCP handshake, TLS, Auth, etc. This takes 50–200ms, or even more. Pool your connections.
𝟯. 𝗔𝘃𝗼𝗶𝗱 𝗡 𝟭 𝗤𝘂𝗲𝗿𝗶𝗲𝘀
Every slow codebase I've worked in had this problem. You fetch a list of records, then loop through them and query related data for each. And it works fine locally with 10 rows, but in production with 2,000, it's 2,001 database round-trips per request.
We can fix this with one JOIN. Also, we need to index the columns in your WHERE clause.
Before you change anything, run a profiler and verify this is actually the problem. I've assumed N 1 before and been wrong.
𝟰. 𝗣𝗮𝘆𝗹𝗼𝗮𝗱 𝗖𝗼𝗺𝗽𝗿𝗲𝘀𝘀𝗶𝗼𝗻
This is something usually forgotten. A 120 KB JSON response becomes roughly 18 KB. If you're not doing this, you're sending the client unnecessary work.
You can choose Brotli or gzip.
𝟱. 𝗔𝘀𝘆𝗻𝗰 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴
Operations that take seconds don't belong inside an HTTP response. Return 202, put the job on a queue, process it in the background, fire a webhook when it's done. Your p99 will thank you.
𝟲. 𝗛𝗧𝗧𝗣/𝟮
HTTP/1.1 runs one request at a time per connection. That made sense in 1997. HTTP/2 multiplexes everything over a single TCP connection, allowing all requests to be in flight at once, with header compression on top. If your infrastructure supports it and you haven't switched, worth looking at why not.
𝟳. 𝗕𝗮𝘁𝗰𝗵𝗶𝗻𝗴
Ten API calls are 10 round-trips, but also 10 times the latency cost. Let clients bundle operations into one request and process them in parallel on the server.
In REST: POST /batch or GET /users?ids=1,2,3.
GraphQL handles this without you having to think about it.
𝟴. 𝗠𝗲𝗮𝘀𝘂𝗿𝗲 𝗙𝗶𝗿𝘀𝘁
Set up OpenTelemetry and look at actual traces before touching anything. I've watched teams spend weeks optimizing the wrong layer.
Here is a real example: API handler: 12ms, network: 113ms, DB query: 680ms. Everyone was looking at the API layer, but the problem was that it was sitting in the database the whole time.
An afternoon of instrumentation would have shown them that almost immediately.
𝟵. 𝗣𝗮𝗴𝗶𝗻𝗮𝘁𝗶𝗼𝗻
Never return 10,000 rows. Return 20 with a cursor for the next page.
Offset pagination scans and discards rows on every call, at page 500 you're scanning 10,000 rows to show 20. Cursor-based picks up exactly where it left off.