Wednesday, October 15, 2025

Java Performance Optimization for High-Volume Search Applications

 

Imagine a search box that must sweep billions of records across more than fifty data sources and still answer in under three seconds while thousands of people are clicking at once. That is the everyday reality for public unclaimed property lookups. Latency here is not a vanity metric. Ten seconds feels like forever, and thirty seconds often means a user gives up and never finds the money that could cover rent, tuition, or medical bills. Java can handle this scale, but large datasets, legacy endpoints, and network drag can slow even well-written code. The question is blunt: how do you deliver Google-like speed on top of upstream systems that were never designed for it? Below is a practical playbook drawn from turning a single-state search that took more than thirty seconds into a fifty-state sweep that lands under three seconds, with lessons you can reuse in any high-volume Java search.

Java performance in a cup: profile, optimize, repeat.

Understanding the Performance Bottlenecks

Database Query Time

This is commonly the most significant slice. The usual culprits are missing or weak indexes, joins that force full scans, overgrown subqueries, and servers that are starved for CPU, memory, or I O. Shape access paths to exploit indexes, and verify with execution plans rather than hunches.

Network Latency

Parallel calls help, but round-trip calls to external databases, slow links into legacy data centers, API rate limits that serialize requests, repeat DNS lookups, and SSL or TLS setup costs all add up. Minimize handshakes, coalesce requests, and reuse connections aggressively.

Data Processing Time

Large XML or JSON payloads must be parsed, validated, transformed, deduped, fuzzy-matched for names, and ranked. Streaming parsers, compact payloads, and careful algorithm choices trim this section.

Application Overhead

Heavy object churn, the wrong collections for the job, noisy logging, synchronous waits, and needless copying waste CPU. Favor allocation light patterns and keep the hot path small.

Measurement is Critical

You cannot optimize what you cannot see. Use profilers like VisualVM, JProfiler, or YourKit, plus APM, to find real hotspots under realistic load. Optimize only what measurements justify.

Common Misconceptions

Developers often assume the database is always at fault. Optimization without measurement can make things worse. Tricks that worked for thousands of rows rarely scale to billions.

Database Optimization Strategies

Indexing Strategy

Indexes move the needle the most. Build composite indexes that mirror user queries, for example, last name, first name, and state. Use covering indexes so the engine reads the needed columns straight from the index. Do not over-index because extra indexes slow writes and bloat storage.

Query Optimization

Reshape queries so the planner can choose indexes. Replace broad ORs with UNION where it improves index use. Remove joins by selectively denormalizing hot read paths. Avoid SELECT* and fetch only needed columns. Cap transferred rows with LIMIT or TOP for first page delivery. Use engine hints only when profiling proves a gain.

Connection Pooling

Creating connections is expensive. Use a fast pool such as HikariCP and size it deliberately. A helpful first guess is pool size equals core count times two plus effective spindle count, then refine using production metrics.

Read Replicas

Split reads from writes. Direct search traffic to read replicas and keep the primary focused on writes. Read heavy systems see immediate throughput gains with minimal code changes.

Batch Processing

When scanning many jurisdictions, batch lookups are used. One request carrying ten searches can replace ten separate round-trip searches and cut network overhead dramatically.

Database Caching

Enable query result caching where appropriate and tune it using actual hit rates. Popular names repeat, so cached answers land instantly and reduce load.

Application Level Optimization

Concurrency

Never query fifty states one by one. Use CompletableFuture or virtual threads in Java 21 to issue calls in parallel and then compose results. Total time approaches the slowest upstream, not the sum of all.

Caching Layers

Adopt a three-tier model. L1 is an in-process cache with Caffeine for microsecond access on hot keys. L2 is a distributed cache with Redis, so instances share hits. L3 is an edge cache or CDN for static payloads and precomputed common results. Choose TTLs based on the upstream refresh cadence. For many public datasets, a daily or weekly refresh is adequate.

Pagination and Lazy Loading

Return the first page immediately and stream further pages. Perceived speed rises even if total work stays the same.

Object Reuse

Pool expensive objects. You already have pool connections and threads. Extend that mindset to parsers, mappers, and buffers to cut allocation churn and GC pressure.

Garbage Collection Tuning

Favor low-latency collectors like G1GC or ZGC for interactive search. Tune heap size and GC threads guided by profiling under realistic load. The goal is brief, predictable pauses.

Implementing these tactics at scale changed outcomes. Platforms like Claim Notify issue parallel queries across more than fifty state data sources, serve millions of lookups from layered caches, and hold response times under three seconds even across billions of records. This demonstrates that Java can feel consumer-grade while wrangling messy, massive datasets.

Asynchronous Processing

Move expensive enrichment to background workers via Kafka or RabbitMQ. Deliver fast first page results and notify users when deep scans complete.

Resource Hygiene

Close streams and sockets with try-with-resources. Track open file descriptors and database handles. Small leaks become production fires under real traffic.

Real World Performance Results

Before Optimization

A single state search took thirty to forty-five seconds. A full multi-state pass would have taken more than twenty-five minutes. Concurrency collapsed near a dozen users before timeouts. Database CPU pegged in the high eighties to mid-nineties. The JVM threw intermittent out-of-memory errors. There was no caching.

After Optimization

A single state search takes between half a second and one second. A comprehensive fifty-state search returns in two to three seconds. The system handles more than one thousand concurrent users without degradation. Database CPU averages in the twenties to forties. Memory is stable. Cache hit rate reaches seventy-five to eighty percent, which slashes query volume.

Performance Metrics That Matter

P50 latency sits near 1.2 seconds, P95 near 2.8, and P99 near 4.5. Throughput reaches roughly five hundred searches per second with an error rate under 0.1 percent. Infrastructure cost drops about sixty percent, helped by a seventy percent reduction in database queries due to caching.

Monitoring and Continuous Improvement

What to Track

Watch latency percentiles, slow query logs, cache hit and miss patterns, heap usage, GC pauses, thread counts, upstream API times, and timeout or error rates. Use APM tools, query analyzers, load testing with JMeter or Gatling, and alerts that trigger on percentile shifts rather than averages. Profile production traffic regularly, A/B test changes, and keep current with Java runtime improvements.

Performance as a Feature

Measure first and cut second. Databases deliver the biggest early wins through indexing, query shaping, pooling, replicas, and batching. Parallelism collapses wall time from the sum to the max. Caching multiplies speed and reduces cost. Continuous monitoring preserves the gains. Platforms like ClaimNotify show that disciplined, data-guided engineering lets Java deliver consumer-grade speed on top of complex public data. Start by baselining your system, fix the loudest bottleneck, and repeat. Share the tuning tactics and surprises you discover so the community can push the craft forward.


Saturday, March 8, 2025

Best Java Libraries for XML Data Processing




Users continue to rely on XML (eXtensible Markup Language) for data exchange and storage across Java applications because it delivers adaptable structures for complex information representation. XML demonstrates wide industry application because of its platform-independent format that produces human-readable content. 


However, the processing of XML data requires careful management due to its difficulties including efficient parsing of big files and database storage consistency with data integrity and fast data transformation.

Top Java Libraries for XML Processing



The selection of a library depends on two factors: the level of task complexity and the set criteria for system performance. 

Saturday, March 1, 2025

Upgrading to Java 21 and Spring Boot 3: A Comprehensive Guide

The transition from Java 17 to Java 21, paired with an upgrade to Spring Boot 3, is a transformative step for modern application development. This blog post shares the detailed insights, challenges, and solutions from upgrading various services and Lambda functions to these latest versions. We’ll explore the evolution of dependencies, dive into troubleshooting steps, discuss resolutions for common issues, and provide actionable takeaways to help you succeed in your own upgrade journey.

Overview of the Upgrade

The upgrade process entailed moving applications from Java 17 to Java 21 and aligning them with Spring Boot 3. This wasn’t a simple plug-and-play operation—it required careful updates to dependencies, adjustments to configurations, and tweaks to codebases to ensure everything worked harmoniously with the new versions. The process touched multiple components, including core services and AWS Lambda functions, each requiring its own set of changes.

Dependency Evolution

The upgrade unfolded in stages, with dependencies evolving iteratively as issues surfaced and were resolved. Here’s how the key components changed over time:

Java: Initially running on version 17, the leap to Java 21 introduced a "Major version 65" issue, signaling bytecode incompatibility with older tools. This necessitated updates to other dependencies to align with Java 21’s requirements.

Gradle: Starting at version 7.2, we upgraded to 8.1 early in the process to leverage its improved features and compatibility with Java 21. However, this shift triggered initial compile errors, which we addressed as part of broader dependency updates.

Spring Boot: We began with version 2.7.3 but quickly encountered limitations. An intermediate step to 2.7.18 resolved some issues, but JUnit and acceptance test failures persisted. The final move to Spring Boot 3.3.3 was essential to fully support Java 21 and handle the significant shift from javax to jakarta namespaces.

AWS SDK v1: Version 1.12.139 was in use initially, but it proved incompatible with Java 21. We phased it out entirely, relying instead on AWS SDK v2.

AWS SDK v2: Starting at 2.17.162, this remained stable throughout the upgrade, though we later validated its compatibility with the final configuration.

LocalStack: We started with version 1.17.1 for local testing. As issues emerged with Spring Boot 3, we upgraded to 1.20.1 and eventually aligned the LocalStack Docker image to version 3.0.0 for better test reliability.

LocalStack Docker: The initial version, 0.14.0, was outdated for our needs. Upgrading to 3.0.0 ensured compatibility with the updated LocalStack and Spring Boot 3.

Lombok Plugin: Version 6.4.3 caused compile errors with Java 21. Upgrading to 8.10 resolved these issues and ensured smooth integration with the new Java version.

AWS Spring: We began with version 2.4.4, which worked with Spring Boot 2.x. The move to Spring Boot 3 required an update to version 3.1.1 to maintain AWS integration.

Spring Cloud AWS Messaging: Also at 2.4.4 initially, this dependency was ultimately removed as we streamlined our AWS interactions with SDK v2.

Each step in this evolution addressed specific pain points—whether it was compilation failures, test issues, or runtime errors—bringing us closer to a fully functional Java 21 and Spring Boot 3 setup.

Known Issues and Solutions

Throughout the upgrade, several issues emerged that required targeted solutions. Here’s a detailed look at what we encountered and how we resolved them:

1. PortUnreachableException Spamming Logs

Issue: After the Spring Boot upgrade, logs became inundated with errors like:

java.net.PortUnreachableException: recvAddress(..) failed: Connection refused

Cause: This stemmed from a StatsD configuration mismatch introduced by Spring Boot’s updated metrics handling.

Solution: We updated the configuration key from management.metrics.export.statsd.enabled to management.statsd.metrics.export.enabled and explicitly enabled it in the application’s YAML file:

management:

  statsd:

    metrics:

      export:

        enabled: true

This adjustment silenced the log spam and restored proper metrics behavior.

2. Container Privileged Mode

Issue: Running containers in privileged mode clashed with Docker’s user namespaces, causing failures during testing.

Solution: We disabled Testcontainers’ Ryuk resource reaper by setting an environment variable:


TESTCONTAINERS_RYUK_DISABLED=true

This workaround allowed our tests to run smoothly without requiring privileged mode adjustments.

3. Gradle Job Dependency

Issue: A task responsible for running the application failed because it implicitly relied on the output of a jar task without declaring a dependency. The error message highlighted this misconfiguration:


Task uses this output of another task without declaring an explicit or implicit dependency.

Solution: We modified the build.gradle file to explicitly declare the dependency:

afterEvaluate {
    tasks.named('forkedSpringBootRun') {
        dependsOn ':jar'
    }
}

This ensured tasks executed in the correct order, resolving the build failure.

4. Missing AWSCredentials

Issue: After removing AWS SDK v1, we encountered a NoClassDefFoundError: com/amazonaws/auth/AWSCredentials error, indicating a lingering dependency mismatch.

Solution: We updated the LocalStack Docker image to localstack/localstack:3.0.0 and refreshed the Testcontainers dependencies in build.gradle:

testImplementation 'org.testcontainers:localstack'
testImplementation 'org.testcontainers:testcontainers'

This aligned our local testing environment with the updated AWS SDK v2 setup.

5. Acceptance Test Failures

Issue: Acceptance tests failed due to a missing AmazonSQSAsync bean, disrupting validation of AWS SQS interactions.

Solution: We added the spring.cloud.aws.sqs.endpoint property to the configuration and updated the Docker entry point to support Java 21. This restored the bean’s availability and fixed the tests.

public LocalStackContainer create() {
    try (LocalStackContainer localstack =
        new LocalStackContainer(DockerImageName.parse(IMAGE_NAME))
            .withExposedPorts(EXPOSED_PORT)
            .withServices(DYNAMODB, SNS, SQS)
            .withCopyToContainer(forClasspathResource(INIT_LOCALSTACK_SH, 0775),"/etc/localstack/init/ready.d/init-localstack.sh")
            .waitingFor(
                Wait.forLogMessage(LOG_MARKER, 1).withStartupTimeout(Duration.ofMinutes(1)))) {

      return localstack;
    }
  }

Conclusion

Upgrading to Java 21 and Spring Boot 3 is a complex but rewarding endeavor. By navigating challenges like log spam from PortUnreachableException, Gradle task misconfigurations, and AWS SDK transitions, you can modernize your applications for improved performance and maintainability. This guide offers a detailed roadmap to help you avoid common pitfalls and achieve a successful upgrade.

Tuesday, March 19, 2024

Simplifying Docker Deployment with PM2

As you know, PM2 is a daemon process manager that allows you to keep applications online. Many times, you may be running your service inside a Docker image. The service can be written in any language, such as Node.js, Java, etc. Below is the shell script that can be used to deploy your service. It can be added and used in your CI/CD pipeline.


Sunday, December 31, 2023

AWS - Get quicksight embed url using JavaScript SDK V3

In the realm of data-driven solutions, AWS QuickSight offers a robust platform for crafting dynamic and insightful dashboards. Embedding these dashboards directly into your applications adds a layer of accessibility and convenience. This guide walks you through the process of obtaining a secure QuickSight embed URL using JavaScript SDK V3, suitable for both Node.js backend and Lambda functions. Before proceeding, ensure your QuickSight dashboard is created and shared with the intended audience.

Prerequisites:

Make sure you've completed the following preliminary steps:

Dashboard Setup:
    
Create your QuickSight dashboard and Share the dashboard with all users in your AWS account.
Open the published dashboard and choose Share at upper right. Then choose Share dashboard.



















Domain Whitelisting:

Whitelist the domain where you plan to embed the QuickSight dashboard