Implement streaming web server from scratch Link to heading

These days, chatbots based on LLMs are widely used. The outputs from the chatbots are emitted token by token, which is typically displayed in a streaming fashion to users. Today, let’s implement a server that displays a streaming text html page from scratch without relying on third-party library or framework in Python.

Image

Server-Sent Events Link to heading

Server-sent events (SSE) protocol is probably the easiest and most straightforward way to implement a server that displays a streaming text page. Here are some reference pages if you are interested in technical details

Below shows server-client communications with SSE

  1. client sends a request to server
  2. server responds with 200, sending an HTML page with EventSource object
  3. client sends text/event-stream request to the server
  4. server responds with text/event-stream header with an empty body
  5. server repeatedly sends streaming text data
  6. client receives and processes the data as it arrives

HTML page Link to heading

Below shows minimal HTML page that contains EventSource object, indicating SSE. Every time the server sends event-stream data, the page will simply display the new text.

<!DOCTYPE html>
<html>
<head>
</head>
<body>
<div id="sseData"></div>
<script>
if(typeof(EventSource)!=="undefined") {
    var eSource = new EventSource("/");
    eSource.onmessage = function(event) {
        document.getElementById("sseData").innerHTML = event.data;
    };
}
else {
    document.getElementById("sseData").innerHTML="Your browser doesn't support SSE.";
}
</script>
</body>
</html>

Server side Link to heading

The server needs to do the usual, i.e., creating a socket object and listening for a new request. For this example, we will create a simple function that completes a given text word by word to emulate chatbot output.

def streamer(text):
    tokens = text.split()
    for end in range(len(tokens) + 1):
        yield ' '.join(tokens[:end])
        time.sleep(0.1)

The logic would follow what I described in SSE section, but there is a catch.

When the streaming event finishes, the server can close the connection. However, the client may ask for additional event-strem data. If the server has no more data to update, it should return 204 No Content.

Full code Link to heading

Below is a bare minimal server code that implements a streaming text via SSE where we implement everything manually.

# server.py
import socket
import time


html = '''<!DOCTYPE html>
<html>
<head>
</head>
<body>
<div id="sseData"></div>
<script>
if(typeof(EventSource)!=="undefined") {
    var eSource = new EventSource("/");
    eSource.onmessage = function(event) {
        document.getElementById("sseData").innerHTML = event.data;
    };
}
else {
    document.getElementById("sseData").innerHTML="Your browser doesn't support SSE.";
}
</script>
</body>
</html>'''

text = '''Rust is a multi-paradigm, general-purpose programming language that emphasizes performance, type safety, and concurrency. It enforces memory safety, meaning that all references point to valid memory, without requiring the use of automated memory management techniques, such as garbage collection. To simultaneously enforce memory safety and prevent data races, its "borrow checker" tracks the object lifetime of all references in a program during compilation. Rust was influenced by ideas from functional programming, including immutability, higher-order functions, and algebraic data types. It is popular for systems programming.[13][14][15]

Software developer Graydon Hoare created Rust as a personal project while working at Mozilla Research in 2006. Mozilla officially sponsored the project in 2009. In the years following the first stable release in May 2015, Rust was adopted by companies including Amazon, Discord, Dropbox, Google (Alphabet), Meta, and Microsoft. In December 2022, it became the first language other than C and assembly to be supported in the development of the Linux kernel.

Rust has been noted for its rapid adoption,[16], and has been studied in programming language theory research.[17][18][19]'''


def streamer(text):
    tokens = text.split()
    for end in range(len(tokens) + 1):
        yield ' '.join(tokens[:end])
        time.sleep(0.1)


# Create a TCP socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

# Bind the socket to a port
s.bind(('localhost', 8000))

# Listen for incoming connections
s.listen()

while True:
    try:
        # Accept a connection
        conn, addr = s.accept()
        request = conn.recv(1024).decode()
        print(f'------------------request----------------\n{request}')

        if 'Accept: text/event-stream' in request:
            # this is from previous connection
            response = 'HTTP/1.1 204 No Content\r\n\r\n'
            conn.sendall(response.encode())
            print(f'------------------response----------------\n{response}')
        else:
            # new connection
            response = f'HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nContent-Length: {len(html)}\r\n\r\n{html}'
            conn.sendall(response.encode())
            print(f'------------------response----------------\n{response}')

            request = conn.recv(1024).decode()
            print(f'------------------request----------------\n{request}')
            if 'Accept: text/event-stream' not in request:
                # ignore any other request
                response = 'HTTP/1.1 204 No Content\r\n\r\n'
                conn.sendall(response.encode())
                print(
                    f'------------------response----------------\n{response}')
            else:
                # SSE; send head first
                response = 'HTTP/1.1 200 OK\r\nContent-Type: text/event-stream\r\nCache-Control: no-cache\r\n\r\n'
                conn.sendall(response.encode())
                print(
                    f'------------------response----------------\n{response}')

                # then send data
                for data in streamer(text):
                    response = f'data: {data}\n\n'
                    conn.sendall(response.encode())
                    print(
                        f'------------------response----------------\n{response}')

        conn.close()
    except Exception as e:
        print(e)

To run, we would simply execute

python server.py

and open up a browser for http://127.0.0.1:8000 page. You should see a streaming text!