New Adventures in Comet: polling, long polling or Http streaming with AJAX. Which one to choose?
There is currently several techniques available to create highly responsive, event driven AJAX based applications in a browser. The main goal of such applications is to keep clients up-to-date with data arriving or changing on the server side. The most popular technique used with AJAX currently is called polling. With polling, an AJAX application will poll the server for data every time the content of the page require an update. As an example, a chat based application will poll the server every 10 seconds to see if new chat messages are available. Technically, it means the browser will open a connection to the server every time data are required:
There is several problems with this approach. The first one is scalability. The number of requests made to the server can be extremely high if the frequency of polling is set to a small value. As an example, if you expect your AJAX applications to be deployed on a small server but still support 10 000 simultaneous users, the performance of your application might be extremely bad. Not only the server but also the network can become saturated with all those requests. The second problem is well illustrated with the above picture. Sometimes if there is no data on the server, the response will not contains any data. Doing such “void” request overload the server for nothing.
Fortunately there is better technique which is called long polling (a.k.a Comet) that helps solving this problem. With long polling, you open a persistent connection and wait for the server to push data when available:
This technique may solve the scalability problems associated with the polling technique. I use ‘may’ here because it really depends on which server you are using. If your server support asynchronous request processing, then you are possibly in a good shape. If not, then long polling might give extremely bad results. Why? Because most probably on the server side the request will block using a thread until new data comes. So if 10 000 AJAX applications open one long polled connection, that means 10 000 threads will blocks waiting for data to come. All servers that use blocking IO technologies will suffer that problem. It can be doable, but you will need to make sure your entire stack (server, os, etc.) can support 10 000 threads (which consume a lot of memory). Something almost impossible with Java based servers. Fortunately for Java user there is solution like Jetty, GlassFish and Tomcat. All of those server use a technique called asynchronous request/response processing which allow them to not block on a thread, but instead park the request object. The request object is resumed only when the server have new data to push. Which server is the best? Well, Grizzly always perform well, right? 🙂 OK enough marketing! The request is resumed when the server push the data back to the client. Of course, there might be some performance problems associated with the push operation. See this blog for more info. If the server can’t push data fast enough, the AJAX application might not be updated as fast as you expect. Also, if the server receives a lot of update, you might ends up in a situation where your are mostly doing the polling technique as your request is never parked because the server always execute pushes.
The third technique is called http streaming. Http streaming is similar to the long polling technique except the connection is never closed, even after the server push data:
Here the AJAX application will only send a single request and receive chunked(partial) responses as they come, re-using the same connection forever. This technique significantly reduce the network latency as the browsers and server don’t need to open/close the connection. As an example, gmail is using that technique to update the mail interface in a real time fashion. For the purist, Http streaming is sometimes considered as abusing the Http protocol by keeping the connection open for a very long time. Personally I would have no problem abusing the protocol if the performance of my application is extremely good with that technique :-). Http streaming suffer the same problem as long polling: if the server push data too often, it might severely impact performance of the network and the AJAX applications. Why? If your AJAX application gets too many updates, it might not be able to render the page as fast as the updates arrive, hence producing a potential lost of data on the client side. The server might also have trouble updating 10 000 clients every second, hence you need to make sure you pick up the right server (like GlassFish :-)).
Now which technique is the recommended one? I don’t think normal polling is a candidate for recommendation. Although it is widely used right now, I will never recommend that technique if your server support long polling/Http streaming. So long polling or Http streaming? This is a controversial question 🙂 Since those techniques are considered fairly new, things might evolve over the next couple of months so my recommendation might become wrong (maybe wrong :-)). From all the applications I’ve seems in production over the last few months, I would recommend:
- Use long polling when your AJAX application doesn’t need to be updated every second. Why? Because getting server push every second (or few seconds) is mostly doing the same as polling. I would for use use long polling for AJAX applications that get updated every 30 seconds or more. Again I don’t have any performance data (yet!) to prove this blind statement.
- Use Http streaming when your AJAX application requires frequent updates. To be coherent with myself, it means AJAX application that needs to be updated every few seconds. I would avoid using Http streaming if the server normally push data every 5 minutes as an example. I would instead use long polling as the price of re-opening the connection is probably lower than keeping the connection opened for such a long time, as you are wasting a resource. Again, a blank statement only based on my experience. But stay tuned for performance data associated with each technique ;-).
Finally, the HTTP 1.1 specification suggests a limit of two simultaneous connections from a browser to the same server (host). That means that if your AJAX application open two long polling/http streaming connections to the same server, any other AJAX applications or every browser tab opened to the server will never reach that server, because the browser is blocking the connection (try with with IE). Hence when designing long polling/http streaming AJAX applications, make sure only one connection use such technique so there is always one available for normal requests. This limitation must also influence which techniques to use. If your AJAX application requires more than two long polling connections, I would recommend long polling instead of Http Streaming independently of the frequency of server pushes. The reason is you can probably distribute (or load balance) the connections on the client side to make sure all the AJAX components have a chance to connect to the server. That one is not a blank statement 🙂
As usual, thanks for reading. I’m always surprised to see the number of hits I’m getting. Thanks!. A+
var gaJsHost = ((“https:” == document.location.protocol) ? “https://ssl.” : “http://www.”);
var pageTracker = _gat._getTracker(“UA-3111670-3”);