Who needs this?

This article intends to shed some light on the performance and memory consumption metrics of multiple technological stacks that were considered as candidate in the architectural process of designing a software system destined to run on top of an ARM device having 1GB of RAM and a dual core processor.

This article is useful for you if:

  • You are into geek-ish stories about how software solutions came to life
  • You are trying to assess performance between Spring with undertow / Spring Reactive stack / Nodejs / Undertow
  • You are analysing low memory footprint solutions that allow you to construct enterprise tier applications in a flexible way
  • You have a similar requirement, and don’t have the time to benchmark multiple solutions
  • You want to have an unbiased opinion about controversial topics such as “Node JS vs Java”
  • You want to get some insight into the advantages of using reactive technologies

If all you need are the actual benchmarks, please jump to the Results or Conclussions chapters of this article

We are not at liberty to name the end customer, that’s is why we shall refer to the end customer as just The Customer.

Domain description

To get a bit of context, our customer is a global provider of high end visualisation solutions that are used in the entertainment industry. They rent out video equipment that is needed in order to assemble and run large festival screens like the ones you see at festivals such as Tomorrowland or the Untold festival.

One of the biggest challenge for the industry is the logistics involved with transporting the immense screens, which get broken up into smaller 500×500 centimeter LED panels that can be interconnected in order to construct full fledged 4k displays at scale. Usually, a show uses somewhere between 500 and 800 LED panels.

Imagini pentru led panel

The video signal is routed to these panels by using special signal distribution controllers called distroboxes that multiplex the input signal so that it is correctly distributed among the interconnected panels. Each distrobox has multiple ports on which panels can be chained in a linked list fashion.

The distroboxes are all controlled by a video signal processor that is used by video engineers in order to manipulate and configure the LED screens constructed of panels.

The signal processor is where our designed software system is intended to run, on top of an ARM processor that was interconnected with other lower level components in order to send the binary signal over to the distribution boxes for multiplexing.


Our goal was to design a system that would allow for 3 to 6 video engineers to concurrently interact with our software solution is order to manipulate the video signal that would be routed and multiplex onto the LED panels. The overall interaction had to be performed in realtime, as during a show or festival, misconfigurations would be witnessed by hundreds or thousands of users.

Based on the initially analysed requirements for V1 of the project, the proposed software system was composed out of the following components a set of components that would interact with each-other in order to allow the video engineers to work with it.

The architecture

As the video engineers required a means of interacting with the solution by using either a laptop or a tablet that should be connected to the same network as the image processor, there came a need for a UI SPA component which would provide the engineers with all of the convenience functionalities that are would be need to manipulate the state of the LED panels. The SPA is implemented using Angular version 6 and will be served by an nginx server that will be installed on the ARM environment.

The UI application consumes a RESTful API app that handles business logic and exposes service operations to both the SPA and other 3rd party actors. The API act as a proxy and cache on top of an SDK service which takes in high level configuration and converts it to binary protocol that is sent over to the physical panels.

As some operations with regards to the streaming of data to the panels are performed in a synchronous way, we used Redis as a cache in order to have realtime abilities in the UI.

Hardware Limitations

The ARM itself was a limitation for allowing realtime data manipulation as it had only 1GB or RAM, 1 GB of storage and a quad core processor, which implied that any used tech stack should have a very low memory and storage footprint.

Another serious limitation that we had in designing the solution was the fact that it had to run on top of an Android 4.4.2 OS which does not support regular linux compiled applications and does not offer means of installing either Java or Node as runtime environments. After scouting around for solutions we’ve found GNURootDebian which practically installed a Debian instance on top of the original Android.

Albeit the solution being far from optimal, we did have to somehow evaluate and prove that constructing a software system on top of an ancient Android version was a No-Go and that it would seriously impact the ability to add features in the future.(not the best solution, but in order to convince a customer that it’s not, you have to show him the actual limitations of what was proposed).

Proposed tech stacks

The technological stack had to be determined in such a way as to find the best tool for the job, so in order to not make biased decisions, the decision to prototype and benchmarking several technologies on the environment was taken.

The proposed technological stacks that would be test candidates were:

  • Java based Spring using spring-mvc on top of a synchronous servlet runtime as a convenience stack (spring offers a lot of features out of the box)
  • Java based JBoss Undertow server without any other frameworks as a performance stack
  • Node js as a fast and memory efficient solution
  • Java based Spring Webflux on top of an asynchronous Spring Reactor runtime

What we wanted to find out

Because the software system was supposed to be developed and enhanced over a 4 years period, and because changing the hardware would mean very high impact on our customer’s business model, we wanted to know the crack-point of the environment.

The things that we wanted to find out were:

  • An approximation of the amount of storage that our solution would require and also the storage footprint of each individual component, in order to figure out if we can add more components to the architecture in the future
  • An approximation of the amount of memory that our solution would require and also the memory footprint of each individual component, in order to figure out if we can add more components to the architecture in the future
  • An approximation of the number of concurrent users performing intense updates or retrievals
  • The types of costful operations that the users would perform that may affect system level performance

How we tested

Firstly, we had to assess the state of the environment and to see if we can successfully install and use our tech stacks on top of the ARM. For this we analysed the system state for memory, processor usage and disk usage to get a feel of what weight each technological stack brings on top.

Then we used gatling.io to record user behaviour in order to determine the approximate number of service calls that would be needed for basic user interactions, and to determine how many video engineers could concurrently interact with the environment without facing performance issues. This allowed us to calculate the number of concurrent users that the system could withstand and also reveal transactionality issues with regards to the updating of data in Redis.

Lastly, because we also wanted to test out the crackpot for each technology, we created a basic update payload for a pretty common case of dragging a large set of assembled panels on the UI across the screen. The sample payload for each panel was:

"id": 1,
"parentId": "parentId",
"position": {
"x": 144
"y": 34

This payload became heavy when using it for > 1000 panels, as it amounted to a JSON file that had 914KB of data.

Benchmarking (bare metal)

Because we wanted to have a clear picture with regards to the overhead that our solution would bring on top of the android environment, and also figure out it’s limitations, we’ve monitored the bare-metal runtime itself. Because android 4.4.2 lacked aa convenient way of monitoring the environment, we have collected data using Android Device Monitor and ADB shell commands such as dumpsys or top (there is a more advanced Android profiler feature that requires at least Android 5.1 in order to run)


Collecting memory stats for the bare-metal environment lead us to the conclusion that the overall memory available on the system when idle would be around 520 MB of free space.


The available storage space, after uninstalling all of the Android default shipped applications was around 892MB of storage. As installing a debian on top of Android takes up around 389MB of storage, after doing this we were left with only 503MB free for our environment, which is dangerously low as we also had to account for multi-component logging, over the wire updates to the system using backup files and component addition in the future.

This is without having GNURootDebian installed which takes up 389MB of space

CPU Usage

The overall CPU usage of the bare-metal environment was around 3%.

The results


In a relaxed state, node.js seems to have the upper hand, and it’s runtime has a light footprint on memory of around. The undertow server has the smallest java based footprint, as both spring boot webflux and spring MVC with undertow have roughly two times more memory needs.

Lower is better

In a stressed state, node features has an average of 75MB of memory while the java stacks unite in doubling node’s number.

Lower is better

When it comes to memory efficiency, when using the frameworks out of the box, it seems that node has the upper had.


As far as storage goes, it seems that the java customly created jre has a smaller footprint.

Lower is better

When it comes to binaries sizes, node has the smallest footprint being closely followed by undertow. The spring MVC stack backed up by undertow seems to be quite inefficient when it comes to overall storage space occupancy.

Lower is better

Response time

When performing single single requests without concurrency, the response time seems to be dominated by node and webflux, as the undertow stack and spring backed by undertow stack lag behind.

Lower is better

Things change under high load, webflux seems to be the overall fastest stack when it comes to serving the 914KB json on the 99p.

Lower is better


As far as throughput goes, webflux is clearly a winner in this arena. It seems that the java stacks perform well under with regards to the number of requests per second that are processed.

Higher is better


If your aiming for a low memory footprint on your environment node.js is a clear winner for designing RESTful web services that consume NoSQL databases such as Redis.

If your aiming for a high throughput and fast response time and if the overall memory consumption does not matter that much, then both the spring backed by undertow or the spring webflux variants are good for you.

If you need a java version of nodejs then opt for spring webflux as it benefits from similar concurrency model, with faster serving speeds and a lot of enterprise tier features built in.