VLLM Connection Refused Error on Windows (WSL2)

Running the vLLM OpenAI-compatible API server on Windows WSL2 can expose a subtle networking issue: the server appears to start normally, yet every incoming HTTP request immediately fails with a Connection Refused Error. After a few retries, health checks conclude the service is unavailable, even though the process is running.

The root cause is a small but meaningful race condition in the server’s socket setup. A single additional line of code fully resolves the problem.

This article explains the underlying issue, why it is more visible on WSL2 than native Linux, and broader takeaways for socket server reliability.

The Connection Refused Fix for WSL2

Navigate to the following file, located in the installed vllm package.

vllm/entrypoints/openai/api_server.py b/vllm/entrypoints/openai/api_server.py

Yes, you’ll have to edit the library files itself for this unfortunately. Keep in mind that if you reinstall or update VLLM, this change will be removed.

Not to worry though, because the PR for this fix is open on the VLLM github, and might be merged by the time you are reading this. So if you are on an older VLLM version (<=11.0) and facing this issue, simply update your VLLM.

PR: https://github.com/vllm-project/vllm/pull/20275

Navigate to line 1833 or so (this might change a little by the time you read this article) and make the following change (the line with the comment)

Python

    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
    sock.bind(addr)
    sock.listen(2048)  # <- Close race condition: start listening immediately
    return sock

    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
    sock.bind(addr)
    sock.listen(2048)  # <- Close race condition: start listening immediately
    return sock

And that’s it.

On native Linux, the delay between bind() and asyncio invoking listen() is extremely small—typically too small to cause visible failures.

WSL2, however, introduces additional latency during process and event-loop startup due to its virtualization layer. This increases the window in which the socket exists in a bound-but-not-listening state. During that interval:

every incoming SYN receives an immediate rejection,
clients such as curl, service probes, or health-checkers mark the service as unreachable,
the server appears broken despite eventually reaching the listen state.

Adding sock.listen() immediately after bind() eliminates that vulnerable window.

Related Causes of “Connection Refused” Errors

While this race condition explains the behavior observed in vLLM under WSL2, ECONNREFUSED can arise for other reasons. Common scenarios include:

1. Server not yet listening

Clients attempt to connect before the server has completed initialization. Verify that the Python process has actually started successfully.

2. Binding to the wrong interface

For example, binding to 127.0.0.1 when clients connect via LAN, or using IPv6 when clients use IPv4.
Using 0.0.0.0 or ensuring consistent protocol families addresses this.

3. Port conflicts or TIME_WAIT exhaustion

A previous process may still own the port. Using SO_REUSEADDR and ensuring proper socket shutdown helps prevent lingering conflicts. You can find out whose occupying a socket with the command:

sudo lsof -i :<port>

4. Firewall or network isolation problems

Windows Firewall, WSL2 port forwarding, or container networking rules can block traffic unexpectedly.

This marks the end of the VLLM Connection Refused Error on Windows (WSL2) article. Any questions or comments can be left down in the comments section below. If you have any requests for future content, do let me know!