A common misconception about Python is that it’s a low-performance
language and unsuitable for server development. However, this notion is
contradicted by the fact that renowned large language model serving
systems are predominantly written in Python, including vLLM,
SGLang,
and JetStream,
all of which utilize a high-level server framework called
fastapi.
fastapi enables programmers to define handler functions
that process user requests to different server paths.
uvicorn, a server library, asynchronously calls these
handlers. uvicorn’s performance relies on asyncio,
a feature introduced in Python 3.4 and 3.5. It exposes the Linux system
call epoll to Python programmers, which is the underlying
mechanism behind Nginx’s high-performance asynchronous serving using a
single OS thread.
The way asyncio exposes epoll is not straightforward.
The epoll API is quite simple. We associate one or more
sockets with an epoll file descriptor by calling
epoll_ctl. Subsequently, each call to
epoll_wait returns a list of sockets that are ready to be
read from or written to. If you’re curious about how a single-threaded C
program can use epoll to handle concurrent user requests,
please refer to this
example.
The package asyncio abstracts the epoll
concepts into event loop and coroutine. My basic
understanding is that an event loop is linked to an epoll
file descriptor. Typically, we use asyncio.run to create
the epoll file descriptor and then use it to execute a
coroutine. Since the given coroutine may spawn other coroutines, and all
of them share the same epoll file descriptor, it makes
sense for the event loop to repeatedly check for ready-to-run coroutines
and execute them until they complete or encounter an await
statement, which put them back to the non-ready mode.
With this likely over-simplified mental model of
asyncio, I was able to write a single-threaded Python
server that can handle client connections concurrently. The program
structure closely resembles the C version. Moreover, the calls to
loop.sock_accept, loop.sock_recv, and
loop.sock_sendall demonstrate that the event loop is linked
to sockets, similar to how the C API of epoll associates
epoll file descriptors with sockets.
import asyncio
import socket
async def start_server(client_handler, host="127.0.0.1", port=8888):
# Step 1: create a socket and bind it to the given port.
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_socket.bind((host, port))
server_socket.listen(100) # Set the backlog to 100
server_socket.setblocking(False) # Non-blocking mode for asyncio
print(f"Server listening on {host}:{port}")
# Step 2: run a loop to accept client connections.
await accept_clients(server_socket, client_handler)
async def accept_clients(server_socket, client_handler):
loop = asyncio.get_running_loop()
while True:
# Step 1: accept the client connection:
client_socket, client_address = await loop.sock_accept(server_socket)
print(f"Connection from {client_address}")
# Step 2: process the connection asynchronously, so the loop continues without waiting.
client_socket.setblocking(False)
asyncio.create_task(client_handler(client_socket, client_address))
async def handle_client(client_socket, client_address):
try:
loop = asyncio.get_running_loop() # Get the event loop created by asyncio.run
while True:
# Step 1: read data from the client
data = await loop.sock_recv(client_socket, 1024)
if not data:
break # Client disconnected
print(f"Received from {client_address}: {data.decode()}")
# Step 2: send a response back to the client
http_response = (
"HTTP/1.0 200 OK\r\n"
"Content-Type: text/plain; charset=utf-8\r\n"
"Content-Length: 13\r\n" # Length of "Hello, Client!"
"\r\n"
"Hello, Client!"
)
await loop.sock_sendall(client_socket, http_response.encode())
except Exception as e:
print(f"Error with client {client_address}: {e}")
finally:
print(f"Closing connection to {client_address}")
client_socket.close()
if __name__ == "__main__":
try: # Run the server
asyncio.run(start_server(handle_client, "127.0.0.1", 8888))
except KeyboardInterrupt:
print("Server shut down")Runing the above program in a terminal session brings up a server
listening the local port 8888. In another terminal session, we could run
curl to access the server. The server would then print the HTTP request
sent by curl and responsds with the string
Hello World!.
The following curl command sends an HTTP GET request:
curl http://127.0.0.1:8888
The server prints the HTTP request:
Received from ('127.0.0.1', 55980): GET / HTTP/1.1
Host: 127.0.0.1:8888
User-Agent: curl/8.7.1
Accept: */*
The following command sends an HTTP POST request:
curl -H 'Content-Type: application/json' \
-d '{ "title":"foo","body":"bar", "id": 1}' \
-X POST http://127.0.0.1:8888
The server prints the HTTP request and the posted data:
Received from ('127.0.0.1', 55977): POST / HTTP/1.1
Host: 127.0.0.1:8888
User-Agent: curl/8.7.1
Accept: */*
Content-Type: application/json
Content-Length: 38
{ "title":"foo","body":"bar", "id": 1}