I noticed that quite often my AJAX requests were getting “stuck” and not completing for close to a minute.
Time to first byte (TTFB) for the requests was way too high. I am running a simple single threaded Flask server, only serving a single GET request from the web app, but this was happening almost every time.
After finding many other people with the same issue, I started to eliminate parts of my stack to figure out what was causing the hang up. Eventually I found that the only thing that fixed it, and what many others had discovered, was switching Flask dev server over to a threaded model fixed the issue.
I was only sending one request, how the heck was the server getting locked up?
Turns out the people who write Chrome are real smart. One of the things they do is preemptively open TCP connections to servers to which you are likely to make subsequent HTTP requests. This makes a lot of sense, when your site asks Chrome to load index.css from your server, there is a decent chance you are going to load another resource after that. To speed things up, Chrome opens a 2nd speculative TCP connection to the server while the first request is loading.
If you are using a singly threaded server for development this can be a real problem. Normally, the server would handle your real request and after that Chrome would close the unused speculative connection. However, if the speculative connection is opened by your server first, Chrome just holds it open while the server is unable to service the real request. This causes most singly threaded servers to just timeout waiting for the speculative connection to issue a request. Eventually it times out and the actual requests finished quickly.
For Flask, the easiest workaround in the dev environment is just to run the dev server with
threaded=True , this allows the server to service both connections with out blocking. But what about production? It depends how your app is deployed. If you app is behind a heavy duty reverse proxy like nginx, you shouldn’t need to worry since it will properly service both requests simultaneously and proxy the real one to your server. If you are using something like gunicorn only, you will need to make sure you use an async worker model since the sync workers can still get hung up.