Pushing Python to 20,000 Requests/Second

⚠️ Some information, tools, or techniques discussed may have changed or evolved since the publishing of this article.

Originally published at https://tjaycodes.com/pushing-python-to-20000-requests-second/

Can you send 20,000 requests per second from a single Python application? That’s over 1.2 million a minute and nearly 2 billion a day. Most developers would say no. Python is great, but it isn’t known for that kind of raw network performance.

I wanted to test that assumption. This article shows how I combined an async Python script with a Rust-based library and deep OS-level tuning to achieve that number. The full code and test setup are available on GitHub.

If you prefer to watch a video, I made a 3 minute video going over this:

The Right Tool for the Job: rnet

Standard Python libraries are great, but for extreme throughput, you need something designed for it. For this test, I used rnet, a Python networking library built on top of the Rust library wreq.

This hybrid approach gives you the best of both worlds:

A key advantage of rnet is its robust TLS configuration, which is effective at bypassing Web Application Firewalls (WAFs) like Cloudflare that often challenge standard Python clients.

The Code: A Simple Async Worker

The client script itself is straightforward. It uses asyncio to create a pool of concurrent workers that send requests as fast as possible. The main logic involves creating rnet clients and gathering the tasks.

Python

1# From send_request/rnet_test.py
2
3async def run_load_test(wid, clients, counter, total_requests):
4    local_success = 0
5    # ... error handling setup ...
6
7    while i < total_requests:
8        try:
9            # Main request loop
10            resp = await clients[i % len(clients)].get(url)
11            # ... process response status ...
12            local_success += 1
13        except Exception as e:
14            # ... handle errors ...
15        finally:
16            i += 1
17    return [local_success, local_fail, local_statuses, local_errors]
18
19# ... main function sets up asyncio loop and runs the test ...
20

But the code is only a small part of the story. The real performance gains come from tuning the machines themselves.

The Secret Sauce: OS and Server Tuning

You cannot achieve this level of concurrency with default system settings. Your OS will start dropping connections long before your code breaks a sweat. Both the client machine sending the requests and the server receiving them needed significant tuning.

Client-Side Tuning

This script configures the client machine to handle a massive number of outgoing connections.

client/tune_server.sh

Server-Side Tuning

The server needs to be ready to accept and process this flood of traffic.

remote/startup_script.sh

The Results

I ran these tests using Vultr cloud servers.

Interestingly, even during the 10 million request test, the CPU usage was not maxed out. This suggests the bottleneck wasn’t the CPU, and there’s still more performance to gain by investigating other factors like the network fabric or kernel scheduler.

Conclusion: Python Isn’t Slow

This experiment shows that when it comes to I/O-bound tasks, Python’s perceived “slowness” is often a myth. Performance is a full-stack problem. By choosing the right library and tuning the underlying operating system, Python can handle enormous network loads, putting it in the same league as traditionally “faster” languages for this kind of work.

So next time you need to build a high-performance scraper, a load testing tool, or a real-time data ingestion service, don’t count Python out.

Take a look at my Projects or Contact me if you want us to work on something cool! The consultation is Free!