Home > Atmosphere, Comet, Websocket > Friday’s Trick #3: Preventing memory leak with websocket/comet applications

Friday’s Trick #3: Preventing memory leak with websocket/comet applications

Both Websocket and/or Comet applications can easily crash your Webserver if the resources associated with the upgraded/suspended connections aren’t managed adequately. Today I will explain two pitfalls to avoid when writing asynchronous applications, independently of the transport used (WebSocket or/and Comet).

An asynchronous application can crash with an out of memory (OOM) error under the following conditions:

  • Too many suspended connections: for every suspended connection (or upgraded for Websocket), the Webserver always have some resources associated with it like byte arrays, buffer, etc. If you suspend/upgrade too many connections, you can easily run into OOM as the garbage collector will never be able to reclaims those resources.
  • Disconnected Connections: WebServer that aren’t supporting Comet like Tomcat 5.5 or all Jetty version (unfortunately!) aren’t detecting when a connection get closed, either by a Browser or a Proxy. In that case, resources associated with those connections will never be reclaimed by the garbage collector.

One point to note here is that most if not all Comet API support a timeout when suspending a connection:

@Suspend(30, TimeUnit.SECONDS);

The above means the connection will be suspended for 30 seconds if there is no activity happening, and then resumed. This is used most of the time when the browser is using the long polling technique, e.g if a server side event occur, you resume and clean the resources associated with that connection. If no events, then the resource will be cleared after 30 seconds, so the probability of OOM is reduced. You can apply the same solution for Websocket or Http Streaming, e.g suspend/upgrade for a long period of time like:

@Suspend(60, TimeUnit.MINUTES);

Of course setting a higher time out increase the OOM probability. Hence you need to be extra careful when setting that value. One solution is to make sure you aren’t suspending too many connections per server by clustering your application, and distributing the load amongst your nodes. Another solution is to monitor the number of suspended connections, and resume them using some policy (like FIFO) when a threshold is reached. For example, with Atmosphere, you configure the policy by just doing:

broadcaster.setSuspendPolicy(threshold, POLICY.RESUME); // or POLICY.REJECT

Atmosphere will start resuming connections or indicate that the limit has been reached when the threshold is reached. If you aren’t using Atmosphere, you must implement a similar mechanism but that can be painful. :-)

The second issue, which I consider more serious, is when Webserver aren’t able to detect when a connection get closed by the browser or a proxy. This can happen if you are using Jetty (all versions) or a WebServer that doesn’t support Comet natively. The effect is extremely bad, as all suspended connections’ resources will be locked in memory forever and never reclaimed by the garbage collector. You can clean some of them by using the @Suspend timeout — but it complexify your application logic, e.g is it a timeout or a disconnection.  Worse, for WebSocket and Http Streaming, which usually never times out (or times out after a long period), you are under a high risk on OOM. With Atmosphere, all you need to to is to tell the framework to clean idle resources for you after a certain delay:

<init-param>

    <param-name>org.atmosphere.cpr.CometSupport.maxInactiveActivity</param-name>

    <param-value>30000</param-value>

</init-param>

Using that mechanism you are guarantee that if the WebServer isn’t detecting the closed connection, Atmosphere will emulate it and appropriately tell your application that a connection has been closed (not resumed, which usually isn’t implemented using the same logic).

For any questions or to download Atmosphere Client and Server Framework, go to our main site and use our Nabble forum, or follow the team or myself and tweet your questions there! You can also checkout the code on Github.

About these ads
Categories: Atmosphere, Comet, Websocket
  1. Alberto Biasão
    October 26, 2012 at 11:35 am

    Hello Jean-François,

    I’ve been trying to deploy an application on an IBM WebSphere server, using JSF 2 and primefaces. As we have some assynchrounous behaviors defined in the requirements we decided to go for PrimePush/Atmosphere — which made my life much easier!. However, we first struggled with a memory leak due to too many suspended threads — just as you described. So, I’ve tried to use your approach, by setting maxInactiveActivity. Hopefully It solved the memory leak, but it then started to throw the following NPE:

    java.lang.NullPointerException: null
    at com.ibm.ws.webcontainer.srt.SRTServletRequest$SRTServletRequestHelper.access$200(SRTServletRequest.java:2822) ~[com.ibm.ws.webcontainer.jar:na]
    at com.ibm.ws.webcontainer.srt.SRTServletRequest.getAttribute(SRTServletRequest.java:307) ~[com.ibm.ws.webcontainer.jar:na]
    at org.atmosphere.cpr.AtmosphereRequest.getAttribute(AtmosphereRequest.java:572) ~[atmosphere-runtime-1.0.0.jar:1.0.0]
    at org.atmosphere.container.BlockingIOCometSupport.cancelled(BlockingIOCometSupport.java:166) ~[atmosphere-runtime-1.0.0.jar:1.0.0]
    at org.atmosphere.cpr.AsynchronousProcessor$1.run(AsynchronousProcessor.java:119) ~[atmosphere-runtime-1.0.0.jar:1.0.0]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) [na:1.6.0]
    at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) [na:1.6.0]
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) [na:1.6.0]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) [na:1.6.0]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180) [na:1.6.0]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204) [na:1.6.0]
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [na:1.6.0]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [na:1.6.0]
    at java.lang.Thread.run(Thread.java:736) [na:1.6.0]

    May you give us any clue where the problem should is?

    Thanks,
    Alberto

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 52 other followers

%d bloggers like this: