A common misconception about Python is that it’s a low-performance
language and unsuitable for server development. However, this notion is
contradicted by the fact that renowned large language model serving
systems are predominantly written in Python, including vLLM,
SGLang,
and JetStream,
all of which utilize a high-level server framework called
fastapi
.
fastapi
enables programmers to define handler functions
that process user requests to different server paths.
uvicorn
, a server library, asynchronously calls these
handlers. uvicorn
’s performance relies on asyncio
,
a feature introduced in Python 3.4 and 3.5. It exposes the Linux system
call epoll
to Python programmers, which is the underlying
mechanism behind Nginx’s high-performance asynchronous serving using a
single OS thread.
The way asyncio exposes epoll
is not straightforward.
The epoll
API is quite simple. We associate one or more
sockets with an epoll
file descriptor by calling
epoll_ctl
. Subsequently, each call to
epoll_wait
returns a list of sockets that are ready to be
read from or written to. If you’re curious about how a single-threaded C
program can use epoll
to handle concurrent user requests,
please refer to this
example.
The package asyncio
abstracts the epoll
concepts into event loop and coroutine. My basic
understanding is that an event loop is linked to an epoll
file descriptor. Typically, we use asyncio.run
to create
the epoll
file descriptor and then use it to execute a
coroutine. Since the given coroutine may spawn other coroutines, and all
of them share the same epoll
file descriptor, it makes
sense for the event loop to repeatedly check for ready-to-run coroutines
and execute them until they complete or encounter an await
statement, which put them back to the non-ready mode.
With this likely over-simplified mental model of
asyncio
, I was able to write a single-threaded Python
server that can handle client connections concurrently. The program
structure closely resembles the C version. Moreover, the calls to
loop.sock_accept
, loop.sock_recv
, and
loop.sock_sendall
demonstrate that the event loop is linked
to sockets, similar to how the C API of epoll
associates
epoll
file descriptors with sockets.
import asyncio
import socket
async def start_server(client_handler, host="127.0.0.1", port=8888):
# Step 1: create a socket and bind it to the given port.
= socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket 1)
server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR,
server_socket.bind((host, port))100) # Set the backlog to 100
server_socket.listen(False) # Non-blocking mode for asyncio
server_socket.setblocking(print(f"Server listening on {host}:{port}")
# Step 2: run a loop to accept client connections.
await accept_clients(server_socket, client_handler)
async def accept_clients(server_socket, client_handler):
= asyncio.get_running_loop()
loop while True:
# Step 1: accept the client connection:
= await loop.sock_accept(server_socket)
client_socket, client_address print(f"Connection from {client_address}")
# Step 2: process the connection asynchronously, so the loop continues without waiting.
False)
client_socket.setblocking(
asyncio.create_task(client_handler(client_socket, client_address))
async def handle_client(client_socket, client_address):
try:
= asyncio.get_running_loop() # Get the event loop created by asyncio.run
loop while True:
# Step 1: read data from the client
= await loop.sock_recv(client_socket, 1024)
data if not data:
break # Client disconnected
print(f"Received from {client_address}: {data.decode()}")
# Step 2: send a response back to the client
= (
http_response "HTTP/1.0 200 OK\r\n"
"Content-Type: text/plain; charset=utf-8\r\n"
"Content-Length: 13\r\n" # Length of "Hello, Client!"
"\r\n"
"Hello, Client!"
)await loop.sock_sendall(client_socket, http_response.encode())
except Exception as e:
print(f"Error with client {client_address}: {e}")
finally:
print(f"Closing connection to {client_address}")
client_socket.close()
if __name__ == "__main__":
try: # Run the server
"127.0.0.1", 8888))
asyncio.run(start_server(handle_client, except KeyboardInterrupt:
print("Server shut down")
Runing the above program in a terminal session brings up a server
listening the local port 8888. In another terminal session, we could run
curl to access the server. The server would then print the HTTP request
sent by curl and responsds with the string
Hello World!
.
The following curl command sends an HTTP GET request:
curl http://127.0.0.1:8888
The server prints the HTTP request:
Received from ('127.0.0.1', 55980): GET / HTTP/1.1
Host: 127.0.0.1:8888
User-Agent: curl/8.7.1
Accept: */*
The following command sends an HTTP POST request:
curl -H 'Content-Type: application/json' \
-d '{ "title":"foo","body":"bar", "id": 1}' \
-X POST http://127.0.0.1:8888
The server prints the HTTP request and the posted data:
Received from ('127.0.0.1', 55977): POST / HTTP/1.1
Host: 127.0.0.1:8888
User-Agent: curl/8.7.1
Accept: */*
Content-Type: application/json
Content-Length: 38
{ "title":"foo","body":"bar", "id": 1}