Read Medium logo
No Results
Translate to
Read Medium Logo
Free OpenAI o1 chatTry OpenAI o1 API
Read Medium logo
No Results
Translate to
avatarMohit Bajaj

Free AI web copilot to create summaries, insights and extended knowledge, download it at here

9306

Abstract

ing">"AND o.createdDate BETWEEN :start AND :end "</span> + <span class="hljs-string">"ORDER BY o.createdDate DESC"</span>) List<Order> <span class="hljs-built_in">findUserOrdersInDateRange</span>( <span class="hljs-variable">@Param</span>(<span class="hljs-string">"userId"</span>) Long userId, <span class="hljs-variable">@Param</span>(<span class="hljs-string">"status"</span>) OrderStatus status, <span class="hljs-variable">@Param</span>(<span class="hljs-string">"start"</span>) LocalDate start, <span class="hljs-variable">@Param</span>(<span class="hljs-string">"end"</span>) LocalDate end);</pre></div><p id="a289">I also optimized a particularly problematic N+1 query by using Hibernate’s <code>@BatchSize</code>:</p><div id="7372"><pre><span class="hljs-variable">@Entity</span> public class Order { <span class="hljs-comment">// Other fields</span>

<span class="hljs-variable">@OneToMany</span>(mappedBy = <span class="hljs-string">"order"</span>, fetch = FetchType.EAGER)
<span class="hljs-variable">@BatchSize</span>(size = <span class="hljs-number">30</span>) <span class="hljs-comment">// Batch fetch order items</span>
private Set&lt;OrderItem&gt; items;

}</pre></div><h1 id="cafb">2. Connection Pool Tuning</h1><p id="0316">The default HikariCP settings were causing connection contention. After extensive testing, I arrived at this configuration:</p><div id="47a0"><pre><span class="hljs-attr">spring:</span> <span class="hljs-attr">datasource:</span> <span class="hljs-attr">hikari:</span> <span class="hljs-attr">maximum-pool-size:</span> <span class="hljs-number">30</span> <span class="hljs-attr">minimum-idle:</span> <span class="hljs-number">10</span> <span class="hljs-attr">idle-timeout:</span> <span class="hljs-number">30000</span> <span class="hljs-attr">connection-timeout:</span> <span class="hljs-number">2000</span> <span class="hljs-attr">max-lifetime:</span> <span class="hljs-number">1800000</span></pre></div><p id="1bb0">The key insight was that more connections isn’t always better; we found our sweet spot at 30 connections, which reduced contention without overwhelming the database.</p><h1 id="1c2d">3. Implementing Strategic Caching</h1><p id="73f2">I added Redis caching for frequently accessed data:</p><div id="89a8"><pre><span class="hljs-meta">@Configuration</span> <span class="hljs-meta">@EnableCaching</span> <span class="hljs-keyword">public</span> <span class="hljs-keyword">class</span> <span class="hljs-title class_">CacheConfig</span> { <span class="hljs-meta">@Bean</span> <span class="hljs-keyword">public</span> RedisCacheManager <span class="hljs-title function_">cacheManager</span><span class="hljs-params">(RedisConnectionFactory connectionFactory)</span> { <span class="hljs-type">RedisCacheConfiguration</span> <span class="hljs-variable">cacheConfig</span> <span class="hljs-operator">=</span> RedisCacheConfiguration.defaultCacheConfig() .entryTtl(Duration.ofMinutes(<span class="hljs-number">10</span>)) .disableCachingNullValues();

    <span class="hljs-keyword">return</span> RedisCacheManager.builder(connectionFactory)
        .cacheDefaults(cacheConfig)
        .withCacheConfiguration(<span class="hljs-string">"products"</span>, 
            RedisCacheConfiguration.defaultCacheConfig()
                .entryTtl(Duration.ofMinutes(<span class="hljs-number">5</span>)))
        .withCacheConfiguration(<span class="hljs-string">"categories"</span>, 
            RedisCacheConfiguration.defaultCacheConfig()
                .entryTtl(Duration.ofHours(<span class="hljs-number">1</span>)))
        .build();
}

}</pre></div><p id="b96c">Then applied it to appropriate service methods:</p><div id="d53d"><pre><span class="hljs-meta">@Service</span> <span class="hljs-keyword">public</span> <span class="hljs-keyword">class</span> <span class="hljs-title class_">ProductService</span> { <span class="hljs-comment">// Other code</span>

<span class="hljs-meta">@Cacheable(value = <span class="hljs-string">"products"</span>, key = <span class="hljs-string">"#id"</span>)</span>
<span class="hljs-keyword">public</span> Mono&lt;Product&gt; getProductById(<span class="hljs-built_in">Long</span> id) {
    <span class="hljs-keyword">return</span> repository.findById(id)
        .switchIfEmpty(Mono.error(new ProductNotFoundException(id)));
}

<span class="hljs-meta">@CacheEvict(value = <span class="hljs-string">"products"</span>, key = <span class="hljs-string">"#product.id"</span>)</span>
<span class="hljs-keyword">public</span> Mono&lt;Product&gt; updateProduct(Product product) {
    <span class="hljs-keyword">return</span> repository.save(product);
}

}</pre></div><p id="5fc3">This reduced database load by 70% for read-heavy operations.</p><h1 id="1e1c">Serialization Optimization: The Surprising CPU Saver šŸ’¾</h1><p id="1352">Profiling showed that 15% of CPU time was spent in Jackson serialization. I switched to a more efficient configuration:</p><div id="5ed4"><pre><span class="hljs-meta">@Configuration</span> <span class="hljs-keyword">public</span> <span class="hljs-keyword">class</span> <span class="hljs-title class_">JacksonConfig</span> { <span class="hljs-meta">@Bean</span> <span class="hljs-keyword">public</span> ObjectMapper <span class="hljs-title function_">objectMapper</span><span class="hljs-params">()</span> { <span class="hljs-type">ObjectMapper</span> <span class="hljs-variable">mapper</span> <span class="hljs-operator">=</span> <span class="hljs-keyword">new</span> <span class="hljs-title class_">ObjectMapper</span>();

    <span class="hljs-comment">// Use afterburner module for faster serialization</span>
    mapper.registerModule(<span class="hljs-keyword">new</span> <span class="hljs-title class_">AfterburnerModule</span>());
    
    <span class="hljs-comment">// Only include non-null values</span>
    mapper.setSerializationInclusion(Include.NON_NULL);
    
    <span class="hljs-comment">// Disable features we don't need</span>
    mapper.disable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS);
    mapper.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES);
    
    <span class="hljs-keyword">return</span> mapper;
}

}</pre></div><p id="14dd">For our most performance-critical endpoints, I replaced Jackson with Protocol Buffers:</p><div id="08cf"><pre>syntax = <span class="hljs-string">"proto3"</span>; <span class="hljs-keyword">package</span> com.example.proto;

message ProductResponse { <span class="hljs-type">int64</span> <span class="hljs-variable">id</span> <span class="hljs-operator">=</span> <span class="hljs-number">1</span>; <span class="hljs-type">string</span> <span class="hljs-variable">name</span> <span class="hljs-operator">=</span> <span class="hljs-number">2</span>; <span class="hljs-type">string</span> <span class="hljs-variable">description</span> <span class="hljs-operator">=</span> <span class="hljs-number">3</span>; <span class="hljs-type">double</span> <span class="hljs-variable">price</span> <span class="hljs-operator">=</span> <span class="hljs-number">4</span>; <span class="hljs-type">int32</span> <span class="hljs-variable">inventory</span> <span class="hljs-operator">=</span> <span class="hljs-number">5</span>; }</pre></div><div id="ca59"><pre><span class="hljs-variable">@RestController</span> <span class="hljs-variable">@RequestMapping</span>(<span class="hljs-string">"/api/products"</span>) public class ProductController { <span class="hljs-comment">// Jackson-based endpoint</span> <span class="hljs-variable">@GetMapping</span>(<span class="hljs-string">"/{id}"</span>) public Mono<ResponseEntity<Product>> <span class="hljs-built_in">getProduct</span>(<span class="hljs-variable">@PathVariable</span> Long id) { <span class="hljs-comment">// Original implementation</span> }

<span class="hljs-comment">// Protocol buffer endpoint for high-performance needs</span>
<span class="hljs-variable">@GetMapping</span>(<span class="hljs-string">"/{id}/proto"</span>)
public Mono&lt;ResponseEntity&lt;byte[]&gt;&gt; <span class="hljs-built_in">getProductProto</span>(<span class="hljs-variable">@PathVariable</span> Long id) {
    <span class="hljs-selector-tag">return</span> <span class="hljs-selector-tag">service</span><span class="hljs-selector-class">.getProductById</span>(id)
        <span class="hljs-selector-class">.map</span>(product -&gt; ProductResponse.<span class="hljs-built_in">newBuilder</span>()
            .<span class="hljs-built_in">setId</span>(product.<span class="hljs-built_in">getId</span>())
            .<span class="hljs-built_in">setName</span>(product.<span class="hljs-built_in">getName</span>())
            .<span class="hljs-built_in">setDescription</span>(product.<span class="hljs-built_in">getDescription</span>())
            .<span class="hljs-built_in">setPrice</span>(product.<span class="hljs-built_in">getPrice</span>())
            .<span class="hljs-built_in">setInventory</span>(prod

Options

uct.<span class="hljs-built_in">getInventory</span>()) .<span class="hljs-built_in">build</span>().<span class="hljs-built_in">toByteArray</span>()) <span class="hljs-selector-class">.map</span>(bytes -> ResponseEntity.<span class="hljs-built_in">ok</span>() .<span class="hljs-built_in">contentType</span>(MediaType.APPLICATION_OCTET_STREAM) .<span class="hljs-built_in">body</span>(bytes)); } }</pre></div><p id="5524">This change reduced serialization CPU usage by 80% and decreased response sizes by 30%.</p><h1 id="a194">Thread Pool and Connection Tuning: The Configuration Magic 🧰</h1><p id="392c">With WebFlux, we needed to tune Netty’s event loop settings:</p><div id="57b7"><pre><span class="hljs-attr">spring:</span> <span class="hljs-attr">reactor:</span> <span class="hljs-attr">netty:</span> <span class="hljs-attr">worker:</span> <span class="hljs-attr">count:</span> <span class="hljs-number">16</span> <span class="hljs-comment"># Number of worker threads (2x CPU cores)</span> <span class="hljs-attr">connection:</span> <span class="hljs-attr">provider:</span> <span class="hljs-attr">pool:</span> <span class="hljs-attr">max-connections:</span> <span class="hljs-number">10000</span> <span class="hljs-attr">acquire-timeout:</span> <span class="hljs-number">5000</span></pre></div><p id="cecb">For the parts of our application still using Spring MVC, I tuned the Tomcat connector:</p><div id="84e4"><pre><span class="hljs-attr">server:</span> <span class="hljs-attr">tomcat:</span> <span class="hljs-attr">threads:</span> <span class="hljs-attr">max:</span> <span class="hljs-number">200</span> <span class="hljs-attr">min-spare:</span> <span class="hljs-number">20</span> <span class="hljs-attr">max-connections:</span> <span class="hljs-number">8192</span> <span class="hljs-attr">accept-count:</span> <span class="hljs-number">100</span> <span class="hljs-attr">connection-timeout:</span> <span class="hljs-number">2000</span></pre></div><p id="85e6">These settings allowed us to handle more concurrent connections with fewer resources.</p><h1 id="651f">Horizontal Scaling with Kubernetes: The Final Push 🚢</h1><p id="998f">To reach our 1M requests/second target, we needed to scale horizontally. I containerized our application and deployed it to Kubernetes.</p><div id="c3d0"><pre>FROM openjdk:17-slim COPY target/myapp.jar app.jar ENV JAVA_OPTS=<span class="hljs-string">"-XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+ParallelRefProcEnabled"</span> ENTRYPOINT <span class="hljs-built_in">exec</span> java <span class="hljs-variable">$JAVA_OPTS</span> -jar /app.jar</pre></div><p id="a757">Then configured auto-scaling based on CPU utilization:</p><div id="34b4"><pre><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">autoscaling/v2</span> <span class="hljs-attr">kind:</span> <span class="hljs-string">HorizontalPodAutoscaler</span> <span class="hljs-attr">metadata:</span> <span class="hljs-attr">name:</span> <span class="hljs-string">myapp-hpa</span> <span class="hljs-attr">spec:</span> <span class="hljs-attr">scaleTargetRef:</span> <span class="hljs-attr">apiVersion:</span> <span class="hljs-string">apps/v1</span> <span class="hljs-attr">kind:</span> <span class="hljs-string">Deployment</span> <span class="hljs-attr">name:</span> <span class="hljs-string">myapp</span> <span class="hljs-attr">minReplicas:</span> <span class="hljs-number">5</span> <span class="hljs-attr">maxReplicas:</span> <span class="hljs-number">20</span> <span class="hljs-attr">metrics:</span> <span class="hljs-bullet">-</span> <span class="hljs-attr">type:</span> <span class="hljs-string">Resource</span> <span class="hljs-attr">resource:</span> <span class="hljs-attr">name:</span> <span class="hljs-string">cpu</span> <span class="hljs-attr">target:</span> <span class="hljs-attr">type:</span> <span class="hljs-string">Utilization</span> <span class="hljs-attr">averageUtilization:</span> <span class="hljs-number">70</span></pre></div><p id="bf9c">We also implemented service mesh capabilities with Istio for better traffic management:</p><div id="d397"><pre><span class="hljs-attr">apiVersion:</span> <span class="hljs-string">networking.istio.io/v1alpha3</span> <span class="hljs-attr">kind:</span> <span class="hljs-string">VirtualService</span> <span class="hljs-attr">metadata:</span> <span class="hljs-attr">name:</span> <span class="hljs-string">myapp-vs</span> <span class="hljs-attr">spec:</span> <span class="hljs-attr">hosts:</span> <span class="hljs-bullet">-</span> <span class="hljs-string">myapp-service</span> <span class="hljs-attr">http:</span> <span class="hljs-bullet">-</span> <span class="hljs-attr">route:</span> <span class="hljs-bullet">-</span> <span class="hljs-attr">destination:</span> <span class="hljs-attr">host:</span> <span class="hljs-string">myapp-service</span> <span class="hljs-attr">retries:</span> <span class="hljs-attr">attempts:</span> <span class="hljs-number">3</span> <span class="hljs-attr">perTryTimeout:</span> <span class="hljs-string">2s</span> <span class="hljs-attr">timeout:</span> <span class="hljs-string">5s</span></pre></div><p id="aca4">This allowed us to handle traffic spikes efficiently while maintaining resilience.</p><h1 id="671d">Measuring the Results: The Proof šŸ“ˆ</h1><p id="0a1d">After all optimizations, our metrics improved dramatically:</p><div id="5ca5"><pre><span class="hljs-string">//</span> <span class="hljs-string">Final</span> <span class="hljs-string">Performance</span> <span class="hljs-string">Metrics</span> <span class="hljs-attr">Maximum throughput:</span> <span class="hljs-number">1</span><span class="hljs-string">,200,000</span> <span class="hljs-string">requests/second</span> <span class="hljs-attr">Average response time:</span> <span class="hljs-string">85ms</span> <span class="hljs-string">(was</span> <span class="hljs-string">350ms)</span> <span class="hljs-attr">95th percentile response time:</span> <span class="hljs-string">120ms</span> <span class="hljs-string">(was</span> <span class="hljs-string">850ms)</span> <span class="hljs-attr">CPU utilization during peak:</span> <span class="hljs-number">60</span><span class="hljs-number">-70</span><span class="hljs-string">%</span> <span class="hljs-string">(was</span> <span class="hljs-number">85</span><span class="hljs-number">-95</span><span class="hljs-string">%)</span> <span class="hljs-attr">Memory usage:</span> <span class="hljs-number">50</span><span class="hljs-string">%</span> <span class="hljs-string">of</span> <span class="hljs-string">available</span> <span class="hljs-string">heap</span> <span class="hljs-string">(was</span> <span class="hljs-number">75</span><span class="hljs-string">%)</span> <span class="hljs-attr">Database queries:</span> <span class="hljs-string">Reduced</span> <span class="hljs-string">by</span> <span class="hljs-number">70</span><span class="hljs-string">%</span> <span class="hljs-string">thanks</span> <span class="hljs-string">to</span> <span class="hljs-string">caching</span> <span class="hljs-attr">Thread efficiency:</span> <span class="hljs-string">10x</span> <span class="hljs-string">improvement</span> <span class="hljs-string">with</span> <span class="hljs-string">reactive</span> <span class="hljs-string">programming</span></pre></div><p id="98ba">The most satisfying result? During our Black Friday sale, the system handled 1.2 million requests per second without breaking a sweat no alerts, no downtime, just happy customers.</p><h1 id="9c80">Key Lessons Learned šŸ’”</h1><ol><li><b>Measurement is everything</b>: Without proper profiling, I would have optimized the wrong things.</li><li><b>Reactive isn’t always better</b>: We kept some endpoints on Spring MVC where it made more sense, using a hybrid approach.</li><li><b>The database is usually the bottleneck</b>: Caching and query optimization delivered some of our biggest wins.</li><li><b>Configuration matters</b>: Many of our improvements came from simply tuning default configurations.</li><li><b>Don’t scale prematurely</b>: We optimized the application first, then scaled horizontally, which saved significant infrastructure costs.</li><li><b>Test with realistic scenarios</b>: Our initial benchmarks using synthetic tests didn’t match production patterns, leading to misguided optimizations.</li><li><b>Optimize for the 99%</b>: Some endpoints were impossible to optimize further, but they represented only 1% of our traffic, so we focused elsewhere.</li><li><b>Balance complexity and maintainability</b>: Some potential optimizations were rejected because they would have made the codebase too complex to maintain.</li></ol><p id="9202">Performance optimization isn’t about finding one magic bullet; it’s about methodically identifying and addressing bottlenecks across your entire system. With Spring Boot, the capabilities are there; you just need to know which levers to pull.</p><p id="5ab8">What performance challenges are you facing with your Spring applications? Share your thoughts in the comments!</p></article></body>

How I Optimized a Spring Boot Application to Handle 1M Requests/Second šŸš€

Discover the exact techniques I used to scale a Spring Boot application from handling 50K to 1M requests per second. I’ll share the surprising bottlenecks I uncovered, the reactive programming patterns that made the biggest difference, and the configuration tweaks that unlocked massive performance gains.

Last year, our team faced what seemed like an impossible challenge: our Spring Boot application needed to handle a 20x increase in traffic, from 50,000 requests per second to a staggering 1 million. With only three months to deliver and a limited hardware budget, I wasn’t sure if we could pull it off.

Spoiler alert: we did it. Our application now comfortably handles peak loads of 1.2 million requests per second with sub-100ms response times, running on roughly the same infrastructure cost as before.

In this guide, I’ll walk you through exactly how we accomplished this, sharing the real bottlenecks we found, the optimizations that made the biggest difference, and the surprising lessons we learned along the way.

Measuring the Starting Point ā±ļø

Before making any changes, I established clear performance baselines. This step is non-negotiable; without knowing your starting point, you can’t measure progress or identify the biggest opportunities for improvement.

Here’s what our initial metrics looked like:

// Initial Performance Metrics
Maximum throughput: 50,000 requests/second
Average response time: 350ms
95th percentile response time: 850ms
CPU utilization during peak: 85-95%
Memory usage: 75% of available heap
Database connections: Often reaching max pool size (100)
Thread pool saturation: Frequent thread pool exhaustion

I used a combination of tools to gather these metrics:

  • JMeter: For load testing and establishing basic throughput numbers
  • Micrometer + Prometheus + Grafana: For real-time monitoring and visualization
  • JProfiler: For deep-dive analysis of hotspots in the code
  • Flame graphs: To identify CPU-intensive methods

With these baseline metrics in hand, I could prioritize optimizations and measure their impact.

Uncovering the Real Bottlenecks šŸ”

Initial profiling revealed several interesting bottlenecks:

  1. Thread pool saturation: The default Tomcat connector was hitting its limits
  2. Database connection contention: HikariCP configuration was not optimized for our workload
  3. Inefficient serialization: Jackson was consuming significant CPU during request/response processing
  4. Blocking I/O operations: Especially when calling external services
  5. Memory pressure: Excessive object creation causing frequent GC pauses

Let’s tackle each of these systematically.

Reactive Programming: The Game Changer ⚔

The most impactful change was adopting reactive programming with Spring WebFlux. This wasn’t a drop-in replacement; it required rethinking how we structured our application.

I started by identifying services with heavy I/O operations:

// BEFORE: Blocking implementation
@Service
public class ProductService {
    @Autowired
    private ProductRepository repository;
    
    public Product getProductById(Long id) {
        return repository.findById(id)
                .orElseThrow(() -> new ProductNotFoundException(id));
    }
}

And converted them to reactive implementations:

// AFTER: Reactive implementation
@Service
public class ProductService {
    @Autowired
    private ReactiveProductRepository repository;
    
    public Mono<Product> getProductById(Long id) {
        return repository.findById(id)
                .switchIfEmpty(Mono.error(new ProductNotFoundException(id)));
    }
}

The controllers were updated accordingly:

// BEFORE: Traditional Spring MVC controller
@RestController
@RequestMapping("/api/products")
public class ProductController {
    @Autowired
    private ProductService service;
    
    @GetMapping("/{id}")
    public ResponseEntity<Product> getProduct(@PathVariable Long id) {
        return ResponseEntity.ok(service.getProductById(id));
    }
}
// AFTER: WebFlux reactive controller
@RestController
@RequestMapping("/api/products")
public class ProductController {
    @Autowired
    private ProductService service;
    
    @GetMapping("/{id}")
    public Mono<ResponseEntity<Product>> getProduct(@PathVariable Long id) {
        return service.getProductById(id)
            .map(ResponseEntity::ok)
            .defaultIfEmpty(ResponseEntity.notFound().build());
    }
}

This change alone doubled our throughput by making more efficient use of threads. Instead of one thread per request, WebFlux uses a small number of threads to handle many concurrent requests.

Database Optimization: The Hidden Multiplier šŸ“Š

Database interactions were our next biggest bottleneck. I implemented a three-pronged approach:

1. Query Optimization

I used Spring Data’s @Query annotation to replace inefficient auto-generated queries:

// BEFORE: Using derived method name (inefficient)
List<Order> findByUserIdAndStatusAndCreatedDateBetween(
    Long userId, OrderStatus status, LocalDate start, LocalDate end);
// AFTER: Optimized query
@Query("SELECT o FROM Order o WHERE o.userId = :userId " +
       "AND o.status = :status " +
       "AND o.createdDate BETWEEN :start AND :end " +
       "ORDER BY o.createdDate DESC")
List<Order> findUserOrdersInDateRange(
    @Param("userId") Long userId, 
    @Param("status") OrderStatus status,
    @Param("start") LocalDate start, 
    @Param("end") LocalDate end);

I also optimized a particularly problematic N+1 query by using Hibernate’s @BatchSize:

@Entity
public class Order {
    // Other fields
    
    @OneToMany(mappedBy = "order", fetch = FetchType.EAGER)
    @BatchSize(size = 30) // Batch fetch order items
    private Set<OrderItem> items;
}

2. Connection Pool Tuning

The default HikariCP settings were causing connection contention. After extensive testing, I arrived at this configuration:

spring:
  datasource:
    hikari:
      maximum-pool-size: 30
      minimum-idle: 10
      idle-timeout: 30000
      connection-timeout: 2000
      max-lifetime: 1800000

The key insight was that more connections isn’t always better; we found our sweet spot at 30 connections, which reduced contention without overwhelming the database.

3. Implementing Strategic Caching

I added Redis caching for frequently accessed data:

@Configuration
@EnableCaching
public class CacheConfig {
    @Bean
    public RedisCacheManager cacheManager(RedisConnectionFactory connectionFactory) {
        RedisCacheConfiguration cacheConfig = RedisCacheConfiguration.defaultCacheConfig()
            .entryTtl(Duration.ofMinutes(10))
            .disableCachingNullValues();
            
        return RedisCacheManager.builder(connectionFactory)
            .cacheDefaults(cacheConfig)
            .withCacheConfiguration("products", 
                RedisCacheConfiguration.defaultCacheConfig()
                    .entryTtl(Duration.ofMinutes(5)))
            .withCacheConfiguration("categories", 
                RedisCacheConfiguration.defaultCacheConfig()
                    .entryTtl(Duration.ofHours(1)))
            .build();
    }
}

Then applied it to appropriate service methods:

@Service
public class ProductService {
    // Other code
    
    @Cacheable(value = "products", key = "#id")
    public Mono<Product> getProductById(Long id) {
        return repository.findById(id)
            .switchIfEmpty(Mono.error(new ProductNotFoundException(id)));
    }
    
    @CacheEvict(value = "products", key = "#product.id")
    public Mono<Product> updateProduct(Product product) {
        return repository.save(product);
    }
}

This reduced database load by 70% for read-heavy operations.

Serialization Optimization: The Surprising CPU Saver šŸ’¾

Profiling showed that 15% of CPU time was spent in Jackson serialization. I switched to a more efficient configuration:

@Configuration
public class JacksonConfig {
    @Bean
    public ObjectMapper objectMapper() {
        ObjectMapper mapper = new ObjectMapper();
        
        // Use afterburner module for faster serialization
        mapper.registerModule(new AfterburnerModule());
        
        // Only include non-null values
        mapper.setSerializationInclusion(Include.NON_NULL);
        
        // Disable features we don't need
        mapper.disable(SerializationFeature.WRITE_DATES_AS_TIMESTAMPS);
        mapper.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES);
        
        return mapper;
    }
}

For our most performance-critical endpoints, I replaced Jackson with Protocol Buffers:

syntax = "proto3";
package com.example.proto;

message ProductResponse {
  int64 id = 1;
  string name = 2;
  string description = 3;
  double price = 4;
  int32 inventory = 5;
}
@RestController
@RequestMapping("/api/products")
public class ProductController {
    // Jackson-based endpoint
    @GetMapping("/{id}")
    public Mono<ResponseEntity<Product>> getProduct(@PathVariable Long id) {
        // Original implementation
    }
    
    // Protocol buffer endpoint for high-performance needs
    @GetMapping("/{id}/proto")
    public Mono<ResponseEntity<byte[]>> getProductProto(@PathVariable Long id) {
        return service.getProductById(id)
            .map(product -> ProductResponse.newBuilder()
                .setId(product.getId())
                .setName(product.getName())
                .setDescription(product.getDescription())
                .setPrice(product.getPrice())
                .setInventory(product.getInventory())
                .build().toByteArray())
            .map(bytes -> ResponseEntity.ok()
                .contentType(MediaType.APPLICATION_OCTET_STREAM)
                .body(bytes));
    }
}

This change reduced serialization CPU usage by 80% and decreased response sizes by 30%.

Thread Pool and Connection Tuning: The Configuration Magic 🧰

With WebFlux, we needed to tune Netty’s event loop settings:

spring:
  reactor:
    netty:
      worker:
        count: 16  # Number of worker threads (2x CPU cores)
      connection:
        provider:
          pool:
            max-connections: 10000
            acquire-timeout: 5000

For the parts of our application still using Spring MVC, I tuned the Tomcat connector:

server:
  tomcat:
    threads:
      max: 200
      min-spare: 20
    max-connections: 8192
    accept-count: 100
    connection-timeout: 2000

These settings allowed us to handle more concurrent connections with fewer resources.

Horizontal Scaling with Kubernetes: The Final Push 🚢

To reach our 1M requests/second target, we needed to scale horizontally. I containerized our application and deployed it to Kubernetes.

FROM openjdk:17-slim
COPY target/myapp.jar app.jar
ENV JAVA_OPTS="-XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+ParallelRefProcEnabled"
ENTRYPOINT exec java $JAVA_OPTS -jar /app.jar

Then configured auto-scaling based on CPU utilization:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 5
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

We also implemented service mesh capabilities with Istio for better traffic management:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: myapp-vs
spec:
  hosts:
  - myapp-service
  http:
  - route:
    - destination:
        host: myapp-service
    retries:
      attempts: 3
      perTryTimeout: 2s
    timeout: 5s

This allowed us to handle traffic spikes efficiently while maintaining resilience.

Measuring the Results: The Proof šŸ“ˆ

After all optimizations, our metrics improved dramatically:

// Final Performance Metrics
Maximum throughput: 1,200,000 requests/second
Average response time: 85ms (was 350ms)
95th percentile response time: 120ms (was 850ms)
CPU utilization during peak: 60-70% (was 85-95%)
Memory usage: 50% of available heap (was 75%)
Database queries: Reduced by 70% thanks to caching
Thread efficiency: 10x improvement with reactive programming

The most satisfying result? During our Black Friday sale, the system handled 1.2 million requests per second without breaking a sweat no alerts, no downtime, just happy customers.

Key Lessons Learned šŸ’”

  1. Measurement is everything: Without proper profiling, I would have optimized the wrong things.
  2. Reactive isn’t always better: We kept some endpoints on Spring MVC where it made more sense, using a hybrid approach.
  3. The database is usually the bottleneck: Caching and query optimization delivered some of our biggest wins.
  4. Configuration matters: Many of our improvements came from simply tuning default configurations.
  5. Don’t scale prematurely: We optimized the application first, then scaled horizontally, which saved significant infrastructure costs.
  6. Test with realistic scenarios: Our initial benchmarks using synthetic tests didn’t match production patterns, leading to misguided optimizations.
  7. Optimize for the 99%: Some endpoints were impossible to optimize further, but they represented only 1% of our traffic, so we focused elsewhere.
  8. Balance complexity and maintainability: Some potential optimizations were rejected because they would have made the codebase too complex to maintain.

Performance optimization isn’t about finding one magic bullet; it’s about methodically identifying and addressing bottlenecks across your entire system. With Spring Boot, the capabilities are there; you just need to know which levers to pull.

What performance challenges are you facing with your Spring applications? Share your thoughts in the comments!

Spring Boot
Spring
Java
Software Development
Microservices
Recommended from ReadMedium
avatarPrasad Patare
Multi-Tenant Architecture using SpringBoot and PostgreSQL

Multi-Tenancy in Spring Boot Using PostgreSQL (Schema-per-Tenant)

5 min read
avatarSkilled Coder
Lesser Known Java Optimizations That Make a Big Difference

These minor refactors can drastically improved efficiency

6 min read
avatarLets Learn Now
Tricky Microservices Scenario Based Interview Question

In a microservice, if you have a scheduler which is generating a report but that application is having multiple instances, so will the…

4 min read
avatarUmadevi R
Unknown Java Facts That Will Surprise Even Experienced Developers

Java has been around for decades, yet it still holds some secrets that even seasoned developers might not know. While we often deal with…

2 min read
avatarRamesh Fadatare
Spring Boot DTO Tutorial (Using Java record) – Complete CRUD REST API Implementation

Learn how to use DTOs in Spring Boot with Java record. Implement CRUD operations in the service, repository, and controller layers. Handle…

7 min read
avatarRasathurai Karan
Java’s Funeral Has Been Announcedā€¦ā˜ ļøšŸ’»

Oh, Java is outdated! Java is too verbose! No one uses Java anymore!

4 min read