Spring Webflux:
EventLoop vs Thread per Request Model
Spring Webflux Introduction
Spring Webflux has been introduced as part of Spring 5 and with this it started to support Reactive Programming. It uses an asynchronous programming model. Use of reactive programming makes applications highly scalable to support high request load over a period of time.
What Problem It Solved Compared with Spring MVC
Spring MVC uses Synchronous Programming model, where each request has been mapped to a thread and which is responsible to take the response back to the request socket. Cases where applications make some network calls like, fetch data from DB, fetch response from other application, file read/write etc., in such cases, the request thread has to wait to get the desired response. Request thread here is blocked and no CPU utilization in this period. That’s why this model uses a large thread pool for request processing. This could be fine for low request rate applications, but for applications with high requests rate can ultimately make the application slow or unresponsive. This certainly affects business in today’s market.
For such applications, reactive programming can be used effectively. Here, request thread sends the required input data to the network and doesn’t wait for response rather it assigns a callback function, which will get executed once this blocking task completes. This way the request thread makes itself available to handle another request. If used in a proper manner, not a single thread can be blocked and threads can utilize the CPU efficiently. Proper manner here means, for this model, it is required for every thread to behave reactively, hence DB drivers, inter-service communication, web server etc. should use reactive threads as well.
Embedded Server
Spring Webflux by default has Netty as an embedded server. Apart from that it is also supported on Tomcat, Jetty, Undertow and other Servlet 3.1+ containers. Thing to note here is that Netty and Undertow are non-servlet runtimes and Tomcat and Jetty are well-known servlet containers.
Before the introduction of Spring Webflux, Spring MVC used to have Tomcat as its default embedded server which uses ‘Thread Per Request Model’. Now with Webflux, Netty has been preferred as default which uses ‘Event Loop Model’. Although, Webflux is also supported on Tomcat, but only with versions which implement Servlet 3.1 or later APIs.
Note: Servlet 3 introduces the feature for asynchronous programming and Servlet 3.1 introduced additionally asynchronous I/O allowing everything asynchronous.
EventLoop
EventLoop is a Non-Blocking I/O Thread (NIO) which runs continuously and takes new requests from a range of socket channels. If there are multiple EventLoops, then each EventLoop is assigned to a group of Socket Channels and all EventLoops are managed under a EventLoopGroup.
Following diagram shows how EventLoop works:
Here, EventLoop is shown to explain its usage at the server side. But it works the same way when it’s on the client side and it sends requests to another server i.e. for I/O requests.
a. All requests are received on a unique socket, associated with a channel known as SocketChannel.
b. There is always a single EventLoop thread associated to a range of SocketChannels and hence all requests to that Sockets/SocketChannels are handed over to the same EventLoop.
c. Request on EventLoop goes through a Channel Pipeline where a number of Inbound channel handlers or WebFilters are configured for required processing.
d. After this, EventLoop executes the application specific code.
e. On its completion EventLoop again goes through a number of Outbound Channel handlers for configured processing.
f. At the end, EventLoop handed back the response to the same SocketChannel/Socket.
g. Repeat Step#a to Step#f in loop.
That’s a simple use case, that will certainly minimize the resource usage due to single thread usage, but this is not going to solve the real problem.
What happens if an application block the EventLoop for a long time due to any of the following reasons?
- High CPU intensive work
- Database operations like read/write etc.
- File read/write
- Any call to another application over the network.
In these cases, EventLoop will get blocked at Step# d and we could be in the same situation where our application could become slow or unresponsive very soon. Creating additional EventLoop is not the solution, as already mentioned, a range of sockets are bound to a single EventLoop. Hence, block in any EventLoop doesn’t allow other EventLoop to work on its behalf for already bounded sockets.
Note: Multiple EventLoops should be created only if an application is running on a multi CPU platform, as EventLoops must be kept running and a single CPU can’t keep multiple EventLoop running simultaneously. By default, application starts with as many EventLoops as many it has CPU cores underlying.
Now, that’s where the real power of NIO EventLoop comes into picture with superb simplicity behind it. User Application should delegate the request to another thread and return the result asynchronously via a callback to unblock the EventLoop for new request handling. So the updated working steps for EventLoop would be like below:
a. Step# a to Step# c as above.
b. EventLoop delegate request to new Worker Thread.
i. Worker Thread perform these long tasks
ii. After completing, it writes the response to a task and adds that in ‘ScheduledTaskQueue’.
c. EventLoop poll tasks in task queue ‘ScheduledTaskQueue’
i. If there is any, performs Step# e to Step# f via task’s Runnable#run method.
ii. Else, continues to poll new requests at SocketChannel.
These Worker threads can be created by the developer or can choose the ‘Scheduler’ strategy from reactive libraries like Reactor, RxJava or other. Remember to use these threads on an ad-hoc basis to keep the resource utilization minimum.
This approach helps when an application has multiple APIs and some of them are slow due to network calls or high CPU intensive work. Here, applications can still be partially available and responsive to user requests.
Ideally, to make your application fully reactive, there shouldn’t be any single thread which can block. Thus far, we are able to unblock our request handling thread i.e. EventLoop. But our Worker Threads are still handling blocking tasks and we can’t increase these thread count indefinitely when more such requests come, as that could also lead us to a new problem of managing large threads which can severely affect CPU utilization and memory usage. In such a case, we would need to use Worker Threads efficiently and strategically.
For most of the blocking cases mentioned above, it’s better to use a fully reactive approach. We have to choose DBs which provide reactive drivers for DB calls. Also, we have reactive Http Client to make call to another application over network e.g. Spring Webflux Reactive WebClient, which basically use Reactor Netty library i.e. EventLoop model. So, if our application uses Netty and Reactive Webclient, then EventLoop resources will be shared.
Advantage:
- Lightweight request processing thread
- Optimum utilization of hardware resources
- Single EventLoop can be shared between http client and request processing.
- Single thread can handle requests over many sockets i.e. from different clients.
- This model provides support for backpressure handling in case of infinite stream response.
Thread per request model
Thread per Request model is in practice since the introduction of synchronous Servlet programming. This model was adopted by Servlet Containers to handle incoming requests. As request handling is synchronous, this model requires many threads to handle requests. Hence increases the resource utilization. There was no consideration of network I/O, which can block these threads. Hence, for scalability generally resources need to Scale Up, which increases the system cost.
Going reactive using Spring Webflux gives the option to use Servlet Containers as well, if they implement Servlet 3.1+ APIs. Tomcat is the most commonly used Servlet Container and it supports the reactive programming as well.
When you choose Spring Webflux on Tomcat, the application starts with a certain number of request processing threads under thread pool (e.g. 10). Requests from sockets are assigned to a thread in this pool. Thing to note here is, there is no permanent binding of a socket to Thread.
If any request thread gets blocked for a I/O call, requests can still be handled by other threads in the thread pool. But more such requests can block all request threads. Till now, it is the same as we used to have in synchronous processing. But in a reactive approach, it allows, user to delegate requests to another pool of Worker Thread as discussed with EventLoop. That way, request processing threads become available and the application remains responsive.
Following are the steps executed while handling a request in this model:
a. All requests are received on a unique socket, associated with a channel known as SocketChannel.
b. Request is assigned to Thread from Thread Pool.
c. Request on Thread goes through a certain handler (like filter, servlet) for necessary pre-processing.
d. Request thread can delegate request to Worker thread or Reactive Webclient while executing any blocking code in Controller.
e. On its completion Worker Thread or Webclient (EventLoop) will be responsible to give back response to concerned Socket.
Again, by making applications fully reactive by choosing reactive clients (like Reactive Webclient) and Reactive DB drivers, minimum threads can provide effective scalability to the system.
Advantages:
- Supports and allows use of Servlet APIs.
- If any request thread blocks, it only blocks a single client socket, not a range of sockets as in EventLoop.
- Optimum utilization of hardware resources
Performance Comparison
For this comparison a sample Reactive Spring Boot Application using Spring Webflux has been created with deliberate delay of 100ms. Request has been delegated to another Worker Thread to unblock request processing thread.
Following configuration has been used for this comparison:
- Single EventLoop configured for Netty Performance using property reactor.netty.ioWorkerCount=1
- Single Request processing thread configured for Tomcat Performance using property server.tomcat.max-threads=1
- AWS EC2 instance with configuration of t2.micro (1GB RAM, 1CPU)
- Jmeter test script with 100 users and ramp up of 1sec executed for 10Minutes.
Max CPU utilization:
Tomcat Thread Per Request Model: 37.6%
Netty EventLoop: 34.6%
Result : Tomcat shows 8.67052% increase in CPU utilization.
Performance Result:
Tomcat 90 percentile response time: 114ms
Netty EventLoop 90 percentile response time: 109ms
Result: Tomcat shows 4.58716% increase in response time.