Friday’s Trick #3: Preventing memory leak with websocket/comet applications
Both Websocket and/or Comet applications can easily crash your Webserver if the resources associated with the upgraded/suspended connections aren’t managed adequately. Today I will explain two pitfalls to avoid when writing asynchronous applications, independently of the transport used (WebSocket or/and Comet).
An asynchronous application can crash with an out of memory (OOM) error under the following conditions:
- Too many suspended connections: for every suspended connection (or upgraded for Websocket), the Webserver always have some resources associated with it like byte arrays, buffer, etc. If you suspend/upgrade too many connections, you can easily run into OOM as the garbage collector will never be able to reclaims those resources.
- Disconnected Connections: WebServer that aren’t supporting Comet like Tomcat 5.5 or all Jetty version (unfortunately!) aren’t detecting when a connection get closed, either by a Browser or a Proxy. In that case, resources associated with those connections will never be reclaimed by the garbage collector.
One point to note here is that most if not all Comet API support a timeout when suspending a connection:
The above means the connection will be suspended for 30 seconds if there is no activity happening, and then resumed. This is used most of the time when the browser is using the long polling technique, e.g if a server side event occur, you resume and clean the resources associated with that connection. If no events, then the resource will be cleared after 30 seconds, so the probability of OOM is reduced. You can apply the same solution for Websocket or Http Streaming, e.g suspend/upgrade for a long period of time like:
Of course setting a higher time out increase the OOM probability. Hence you need to be extra careful when setting that value. One solution is to make sure you aren’t suspending too many connections per server by clustering your application, and distributing the load amongst your nodes. Another solution is to monitor the number of suspended connections, and resume them using some policy (like FIFO) when a threshold is reached. For example, with Atmosphere, you configure the policy by just doing:
broadcaster.setSuspendPolicy(threshold, POLICY.RESUME); // or POLICY.REJECT
Atmosphere will start resuming connections or indicate that the limit has been reached when the threshold is reached. If you aren’t using Atmosphere, you must implement a similar mechanism but that can be painful. 🙂
The second issue, which I consider more serious, is when Webserver aren’t able to detect when a connection get closed by the browser or a proxy. This can happen if you are using Jetty (all versions) or a WebServer that doesn’t support Comet natively. The effect is extremely bad, as all suspended connections’ resources will be locked in memory forever and never reclaimed by the garbage collector. You can clean some of them by using the @Suspend timeout — but it complexify your application logic, e.g is it a timeout or a disconnection. Worse, for WebSocket and Http Streaming, which usually never times out (or times out after a long period), you are under a high risk on OOM. With Atmosphere, all you need to to is to tell the framework to clean idle resources for you after a certain delay:
<init-param> <param-name>org.atmosphere.cpr.CometSupport.maxInactiveActivity</param-name> <param-value>30000</param-value> </init-param>
Using that mechanism you are guarantee that if the WebServer isn’t detecting the closed connection, Atmosphere will emulate it and appropriately tell your application that a connection has been closed (not resumed, which usually isn’t implemented using the same logic).
For any questions or to download Atmosphere Client and Server Framework, go to our main site and use our Nabble forum, or follow the team or myself and tweet your questions there! You can also checkout the code on Github.