NOTES INDEX This page is part of the collection of "notes" - these can be considered to be micro-blogs.
Netty Cache Thread Memory Issues

Netty Cache Thread Memory Issues

When running spring-boot using netty (we ran: 4.1.69), and async code, your server might run into a memory issue after running for a while. In our case we were using spring-cloud-gateway (we ran: 3.0.5), and after two weeks of running, we started hitting long garbage collection times. To get around this, we did multiple code fixes, amongst one was disabling the thread-local caches for non-netty threads.

To do this, add JVM option:

-Dio.netty.allocator.useCacheForAllThreads=false
Code language: Properties (properties)

For an explanation, see: https://github.com/spring-projects/spring-framework/issues/21174

Quote:

The PooledByteBufAllocator from Netty creates ThreadLocal caches even for non-Netty Threads. These caches quickly move to Old Gen and do not get collected during normal G1 collections.

Why is this a problem?

If any operation which is executed with subscribeOn(Schedulers.elastic()) causes Netty ByteBuf allocations (lets say using a TcpClient or returning WebFlux responses), a new TheadLocal cache is setup by Netty. Whilst the threads from Schedulers.elastic() eventually get reclaimed, the associated cache does not – at least not during normal G1 collections.

Running the application with -Dio.netty.allocator.useCacheForAllThreads=false fixes above issue by only using ThreadLocal caches in Netty Threads. Maybe this should be the default as – whilst it hurts performance a little with above usage – fixes the potential leak.

In general, if you want to investigate netty memory leak issues, you can use the following JVM options:

-Dio.netty.leakDetectionLevel=paranoid -Dio.netty.leakDetection.targetRecords=40
Code language: Properties (properties)

If you do not see enough info, try to enable the DEBUG logs of package: reactor.netty.channel

See also https://netty.io/wiki/reference-counted-objects.html for more information.

Note: after changing the cache flag, we did run some performance tests, and the difference was not really noticeable.

December 18, 2021

Leave a Reply

Your email address will not be published.