Optimizing a Spring Boot BFF

Optimizing a Spring Boot Backend-for-Frontend Application on AWS EKS

Executive Summary

This report provides a comprehensive analysis and actionable recommendations for optimizing an existing Spring Boot Backend-for-Frontend (BFF) application deployed on AWS Elastic Kubernetes Service (EKS). The assessment focuses on critical areas including external API consumption, data management, asynchronous processing, messaging, comprehensive observability, and performance optimization. By leveraging cloud-native patterns and Spring Boot’s inherent capabilities, the aim is to bolster the application’s resilience, scalability, and operational efficiency, thereby transforming the current setup into a robust, high-performing cloud-native solution.

1. Introduction to the BFF Pattern on AWS EKS

1.1. Understanding the Backend-for-Frontend (BFF) Pattern and its Benefits

The Backend-for-Frontend (BFF) pattern represents a specialized architectural approach within web development, advocating for the creation of distinct backend services tailored to the unique demands of individual frontend applications or client types, such as web browsers, mobile applications, or IoT devices. This design diverges significantly from the traditional monolithic API model, which attempts to serve a diverse array of clients from a single, undifferentiated interface. The core principle of the BFF is to optimize the data and interactions for a specific consumer, thereby enhancing the overall user experience and streamlining frontend development efforts.1
This architectural choice establishes the BFF not merely as a proxy but as a critical strategic abstraction layer. Its function extends beyond simple request forwarding to actively composing and transforming responses to precisely match the consumption patterns of a particular client. For instance, the concept of a “composable Backend for Frontend” (cBFF) further illustrates this, where the BFF itself can integrate and unify multiple disparate internal and external APIs into a cohesive, client-optimized interface.2 This implies that the BFF should incorporate its own domain logic pertaining to presentation and client-specific workflows, moving beyond a purely thin data aggregation role. Such a design also influences organizational structures, often fostering shared ownership between frontend and backend development teams 3, and necessitates a more nuanced approach to API design.
The adoption of the BFF pattern offers several compelling advantages. It significantly simplifies frontend development by providing tailored API responses, allowing frontend teams to concentrate on user interface and experience without excessive data manipulation.1 This optimization extends to the user experience itself, as client-specific requirements, such as mobile bandwidth constraints or reduced payload sizes, can be directly addressed, leading to improved performance and responsiveness.1 Furthermore, the BFF pattern promotes a clear separation of concerns between frontend and backend development, enabling independent development, deployment, and scaling of client-specific services.2 This decoupling streamlines API evolution, as changes can be introduced without inadvertently impacting other client applications.1 From a security standpoint, the BFF layer serves as a centralized point for handling authentication and authorization logic, effectively protecting underlying microservices from direct exposure.1 This positioning makes the BFF a critical security boundary, requiring meticulous design and testing of its authentication and authorization mechanisms. Downstream services can then operate with reduced exposure, potentially simplifying their security posture, though this responsibility is then consolidated within the BFF. Finally, implementing caching at the BFF layer can substantially enhance performance by reducing latency and decreasing the load on downstream services.1
However, the implementation of BFFs is not without its considerations. While the pattern simplifies frontend development and improves user experience, it can introduce additional operational overhead. Each new BFF instance contributes to the overall system complexity and potentially increases costs due to the management of more components.3 This means that while the architectural pattern is inherently sound, its successful deployment heavily relies on robust DevOps practices, extensive automation, and a profound understanding of distributed systems challenges, including fault tolerance and comprehensive monitoring. A critical operational risk, termed the “service fuse” problem, arises if multiple BFFs share a single backend service or database. A failure in such a shared component could lead to a cascading outage affecting all dependent BFFs, necessitating the implementation of robust fault isolation strategies, potentially by deploying services separately for each BFF.3 The aggregation of calls at the BFF layer can also shift scaling challenges from the frontend to the BFF itself, demanding a robust backend infrastructure capable of handling the consolidated load.4

Aspect	Benefits	Considerations/Challenges
Frontend Development	Simplified API consumption, reduced data manipulation on client side	Clear team responsibilities required for BFF ownership
User Experience	Tailored responses, optimized for specific client needs (e.g., mobile bandwidth, reduced payloads)	Potential for “Service Fuse” if shared backend services fail
Architecture	Decoupling of concerns, independent development and deployment of client-specific services	Increased backend complexity and operational overhead with more BFFs
Performance	Data aggregation and caching reduce latency, offload downstream services	Scaling management shifts to the BFF layer, requiring robust infrastructure
API Evolution	Changes can be introduced without breaking other client applications	Discipline in API versioning and contract management is crucial
Security	Centralized authentication and authorization at the BFF layer	Requires robust security hardening of the BFF itself

1.2. Architectural Context: Spring Boot on AWS EKS

The Spring Boot framework provides a robust foundation for developing production-grade Spring applications, offering auto-configuration capabilities that streamline the creation of standalone, deployable units.5 This inherent simplicity makes it an excellent choice for microservices architectures.
For orchestrating these containerized applications, Kubernetes stands as an open-source system designed for automating deployment, scaling, and management. It logically groups containers that constitute an application, facilitating their discovery and management within a cluster.7
AWS Elastic Kubernetes Service (EKS) extends the capabilities of Kubernetes by offering a fully managed service on the Amazon Web Services cloud. EKS simplifies the complexities associated with deploying, managing, and scaling containerized applications, effectively abstracting away many underlying infrastructure details and reducing operational overhead.8 This managed service allows development teams to concentrate more on application logic rather than infrastructure management.
The typical deployment workflow for a Spring Boot application on AWS EKS involves several key stages. Initially, Spring Boot applications are packaged into executable JARs, commonly using ./mvnw clean install.13 These JARs are then containerized, either through traditional Dockerfiles, specifying an OpenJDK base image and adding the application JAR, or more efficiently using Cloud Native Buildpacks via a command like
./mvnw spring-boot:build-image.7 Once containerized, the images are pushed to a container registry, with AWS Elastic Container Registry (ECR) being a common choice within the AWS ecosystem.8 An EKS cluster is then provisioned, often utilizing command-line tools such as
eksctl (e.g., eksctl create cluster —name <cluster-name> —region <aws-region>).8
Application deployment within Kubernetes is orchestrated through YAML manifests. These manifest files define Kubernetes objects such as Deployment, which specifies the container image, desired replica count, and associated labels, and Service, which exposes the application and functions as a load balancer for all its instances.6 For exposing services externally, AWS Load Balancers, such as Network Load Balancers, can be automatically provisioned by specifying the
LoadBalancer service type and appropriate AWS-specific annotations within the Kubernetes service manifest.8 Finally, the entire deployment process can be automated through Continuous Integration/Continuous Delivery (CI/CD) pipelines, integrating tools like Jenkins, GitHub Actions, or GitOps solutions such as ArgoCD.9
While EKS simplifies the deployment and scaling of applications, it is important to recognize that it is a facilitator of operational focus, not a complete solution that eliminates all complexities. EKS abstracts away underlying infrastructure, but it does not remove the necessity for careful application-level design and Kubernetes-native configurations. This is underscored by the existence of extensive EKS best practices guides covering security, reliability, scalability, and cost optimization.11 The “room for improvements” in a Spring Boot BFF on EKS therefore centers on optimizing the interaction between the application and its EKS environment, requiring a deep understanding of both Spring Boot’s cloud-native features and Kubernetes’ operational primitives.
A critical aspect of operating applications in a cloud-native environment is the interplay between application behavior and infrastructure events, which profoundly impacts performance and reliability. A notable example involves random 500 errors observed when Spring Boot applications fail to shut down gracefully during spot instance termination.16 In such scenarios, Kubernetes gracefully evicts pods, but the application itself must be configured to properly handle
SIGTERM signals and complete any in-flight requests before exiting. This highlights a crucial dependency: performance and reliability are not solely infrastructure concerns or application concerns; they are a direct outcome of how well the application is designed to be “cloud-native” and interact with its orchestrator. This necessitates a holistic approach to optimization, bridging the gap between development and operations teams to ensure seamless integration and robust performance.

2. Optimizing External API Consumption and Resilience

2.1. Leveraging Spring WebClient for Reactive API Calls

In a Backend-for-Frontend (BFF) application, which frequently aggregates data from numerous downstream microservices, the implementation of non-blocking I/O is paramount for achieving optimal performance. Spring WebClient, an integral component of Spring WebFlux, is specifically engineered for reactive, non-blocking HTTP requests, making it an ideal choice for such architectures.17
The WebClient offers a significant advantage over the traditional, blocking RestTemplate (which is now deprecated). By employing a non-blocking approach, WebClient allows the application to continue processing other requests concurrently while awaiting responses from external APIs. This capability markedly improves resource utilization and overall throughput, which is essential for a BFF acting as an aggregation layer.17
Basic usage of WebClient involves simple instantiation or specifying a base URL for common API endpoints. For more sophisticated configurations and customization, the WebClient.builder() method is the recommended approach. Spring Boot further streamlines this process by automatically configuring a default WebClient.Builder bean, providing a convenient starting point for developers.17 When consuming APIs,
WebClient leverages a functional API to chain reactive operators, returning Mono for single elements or Flux for multiple elements. This design facilitates asynchronous handling of responses, aligning with the reactive programming paradigm.17 A common pattern for external API consumption involves defining a service interface and an implementation class that utilizes
WebClient to perform various HTTP methods (GET, POST, PUT, DELETE), map the received responses to Plain Old Java Objects (POJOs), and correctly manage content types.18
The choice of WebClient and a reactive stack is not merely a modern preference; it represents a fundamental architectural decision for BFFs to achieve high concurrency and throughput. A BFF’s primary function is to consume multiple external APIs and aggregate data 1, which is an inherently I/O-bound workload. If a traditional blocking client were used, each external API call would block a thread, quickly exhausting the thread pool under load and leading to degraded performance and scalability issues. The non-blocking nature of
WebClient ensures that threads are not tied up waiting for I/O operations to complete. This enables the BFF to act as an efficient orchestrator of external calls, directly impacting the overall responsiveness and scalability of the application.

2.2. Advanced WebClient Configuration: Connection Pooling and Timeouts

Proper configuration of connection pooling and timeouts for Spring WebClient is critical for handling high loads, preventing hanging requests, and optimizing performance within microservices environments.18 These settings are not just performance tuners but also crucial resilience mechanisms.
WebClient utilizes Reactor Netty by default, which provides built-in connection pooling capabilities. Connection pools are essential for reducing TCP overhead, such as the repeated handshake process, by reusing established connections for subsequent requests to the same URL.18 For advanced configurations not exposed through standard Spring Boot properties, a
ConnectionProvider bean can be manually configured within a Spring @Configuration class. Key parameters for the ConnectionProvider include maxConnections (the maximum number of active connections in the pool), pendingAcquireTimeout (the maximum duration to wait for acquiring a connection from the pool), maxIdleTime (the maximum time a connection can remain idle in the pool before being closed), maxLifeTime (the maximum duration a connection can stay alive), and evictInBackground (the interval for background eviction of idle connections).18 Employing a LIFO (Last In, First Out) strategy for connection reuse can further mitigate
PrematureCloseException errors, which often occur when connections are prematurely closed due to timeouts between the acquisition of a connection and the actual request dispatch.23
Multiple timeout options are available to prevent requests from hanging indefinitely. These include responseTimeout for the total response duration, connectTimeout (configured via ChannelOption.CONNECT_TIMEOUT_MILLIS) for connection establishment, and ReadTimeoutHandler/WriteTimeoutHandler for read/write operations on the connection.18 It is important to note that
ReadTimeoutHandler and WriteTimeoutHandler can close connections even when no HTTP request is active, which, if not managed carefully, can lead to PrematureCloseException.23 Setting
keepAlive to false in HttpClient can sometimes resolve these PrematureCloseException issues caused by server-side timeouts.23 Furthermore, reusing
WebClient instances is a recommended practice, as they are thread-safe, thereby avoiding reconnection overhead and improving overall performance.18
The proper tuning of connection pools and timeouts is fundamental for the robustness of a BFF. Misconfigured parameters can lead to cascading failures, resource exhaustion, and degraded service availability, particularly in a microservices architecture where the BFF relies on multiple external services. This transforms these configurations from mere performance tweaks into essential components of a robust, fault-tolerant system. The fact that advanced ConnectionProvider settings are not directly exposed by Spring Cloud Gateway 22 highlights that default Spring Boot configurations may not be optimal for all production environments, especially those with strict network policies like ALBs or firewalls. This underscores the importance of a deep technical understanding of underlying libraries like Reactor Netty and the specific network environment within AWS EKS. Generic best practices may not suffice; detailed performance and error analysis are often required to fine-tune these parameters for optimal resilience and efficiency in a given cloud context.

2.3. Implementing Resilience Patterns with Resilience4j (Circuit Breaker, Retry, Bulkhead)

In a microservices architecture, external API calls are inherently susceptible to various failures, including slow responses, temporary unavailability, or complete service outages. Implementing resilience patterns is crucial to prevent cascading failures throughout the system and to maintain the responsiveness of the application.19 Resilience4j is a lightweight, fault-tolerance library that integrates seamlessly with Spring Boot and WebClient, offering a suite of annotations for common resilience patterns.18
Circuit Breaker: This pattern prevents an application from repeatedly attempting to invoke a failing service, thereby giving the failing service time to recover and preventing the calling application from becoming overwhelmed. A circuit breaker transitions through three states:

Closed: In this default state, all requests are allowed to pass through, and the circuit breaker actively monitors for failures.
Open: If the failure rate exceeds a configured threshold, the circuit transitions to the Open state, immediately rejecting all subsequent requests to the failing service.
Half-Open: After a specified waitDurationInOpenState, the circuit transitions to Half-Open, allowing a limited number of requests (permittedNumberOfCallsInHalfOpenState) to pass through to test if the backend service has recovered. If these test calls succeed, the circuit returns to Closed; otherwise, it reverts to Open.24

Key configurations include failureRateThreshold (the percentage of failed calls required to open the circuit), waitDurationInOpenState (how long the circuit remains open), permittedNumberOfCallsInHalfOpenState (the number of calls allowed in the half-open state), slidingWindowSize (the number of calls considered for failure rate calculation), and minimumNumberOfCalls (the minimum calls before evaluating the failure rate).24 The
@CircuitBreaker(name = “serviceName”, fallbackMethod = “fallbackMethod”) annotation is used to apply this pattern to a service method.24

Retry: This pattern automatically re-attempts failed operations, which is particularly effective for transient errors such as brief network timeouts or momentary service unavailability. By retrying, the system can often succeed without impacting the user experience.19 Configurable properties include
maxAttempts (the total number of attempts, including the initial call) and waitDuration (the fixed time delay between retry attempts).24 The
@Retry(name = “serviceName”, fallbackMethod = “fallbackMethod”) annotation is used for this purpose.24
Bulkhead: This pattern limits the number of concurrent calls to a service, preventing resource exhaustion within the calling application itself if a downstream service becomes slow or unresponsive. It isolates failures, ensuring that a problem in one part of the system does not consume all resources and affect other parts.24 The primary configuration is
maxConcurrentCalls, which defines the maximum number of concurrent calls allowed at any given time.24 For thread pool isolation, the
@Bulkhead(name = “serviceName”, type = Bulkhead.Type.THREADPOOL, fallbackMethod = “fallbackMethod”) annotation is used.24
In all these patterns, Fallback Methods are crucial. They provide a predefined default response when an operation fails due to circuit breaking, exhaustion of retries, or hitting bulkhead limits. This ensures that the frontend application remains responsive and provides a graceful degradation of service, rather than a complete failure.24
Integrating Resilience4j should be considered a design principle rather than an add-on. The detailed configuration options for Circuit Breaker and Retry indicate that these are not simple on/off switches but require careful tuning based on the observed behavior of external APIs and the tolerance of the consuming frontend. The objective of “preventing cascading failures” and “maintaining application responsiveness” 24 underscores their foundational role in distributed cloud environments. This approach implies a shift in development mindset from merely making API calls to proactively designing for potential failure and graceful degradation.
Furthermore, the effectiveness of resilience patterns is deeply intertwined with robust observability. Resilience4j metrics can be exposed via Spring Boot Actuator endpoints (e.g., /actuator/metrics/resilience4j.circuitbreaker) and integrated with monitoring systems like Prometheus and Grafana for visualization and detailed insights into circuit states.25 Without proper monitoring, it would be impossible to ascertain if circuit breakers are tripping correctly, if retries are effective, or if bulkheads are preventing resource exhaustion. The ability to observe state transitions, success/failure rates, and the impact of these patterns is essential for validating their efficacy and for proactive issue detection and resolution in production. This reinforces the need for a comprehensive observability strategy that includes application-level resilience metrics.

Resilience Pattern	Key Configuration Parameters	Description
Circuit Breaker	failureRateThreshold (e.g., 50%)	Percentage of failed calls to open the circuit.
	waitDurationInOpenState (e.g., 60s)	Duration circuit stays open before half-open.
	permittedNumberOfCallsInHalfOpenState (e.g., 10)	Calls allowed in half-open state to test recovery.
	slidingWindowSize (e.g., 100)	Number of calls in the sliding window for failure rate.
	minimumNumberOfCalls (e.g., 100)	Minimum calls before failure rate is evaluated.
	automaticTransitionFromOpenToHalfOpenEnabled (e.g., true)	Automatically transitions from open to half-open.
Retry	maxAttempts (e.g., 3)	Maximum number of attempts, including initial call.
	waitDuration (e.g., 500ms)	Fixed time delay between retry attempts.
Bulkhead	maxConcurrentCalls (e.g., 25)	Maximum number of concurrent calls allowed.

3. Enhancing the Data Layer

3.1. Database Layer Best Practices (AWS RDS/Aurora/DynamoDB)

The selection of a database solution within AWS is a critical decision, heavily influenced by the application’s data model, consistency requirements, and access patterns. AWS offers a range of fully managed services, including relational databases like Amazon RDS and Amazon Aurora, and NoSQL databases such as Amazon DynamoDB. Amazon Aurora, compatible with MySQL and PostgreSQL, is specifically engineered for cloud environments, providing high performance and availability.26 In contrast, Amazon DynamoDB is a serverless NoSQL database known for its extreme scalability, low latency, and flexible schema, making it suitable for diverse use cases.27
Regardless of the database chosen, robust connection security is paramount. Database credentials, such as usernames and passwords, must be stored securely using Kubernetes Secrets, rather than being hardcoded in application code or exposed in ConfigMaps. Kubernetes Secrets encrypt sensitive data and make it accessible only to authorized pods. For instance, a Secret can be defined in YAML with base64 encoded data, and then referenced in the Deployment YAML using valueFrom: secretKeyRef to inject credentials as environment variables.28
Complementing this, non-sensitive database connection URLs and other configuration properties should be externalized using Kubernetes ConfigMaps. This practice promotes portability and allows for dynamic updates without redeploying the application. Spring Boot applications can consume these configurations efficiently through the spring-cloud-starter-kubernetes-client-config dependency, which supports hot reloading of properties when ConfigMaps are updated.29

3.1.1. Connection Pooling Tuning (HikariCP)

Database connection pooling is a critical component for optimizing performance and ensuring resource efficiency, particularly in a microservices architecture where numerous application instances may concurrently connect to the same database. HikariCP is a widely adopted and highly performant connection pool for Spring Boot applications.33
A common observation is that default HikariCP configurations can be oversized, leading to inefficient resource utilization and potential “noisy neighbor” impacts in multi-tenant cloud environments.33 This means that inefficient connection management can lead to resource contention and degraded performance for other applications sharing the same database. Optimal database connection pooling is not just about the individual application’s performance; it is about the stability and efficiency of the entire ecosystem of services interacting with the database. It represents a key aspect of multi-tenancy and resource governance within a shared cloud environment like EKS.
To mitigate these issues, specific tuning of HikariCP parameters is recommended:

maximumPoolSize: It is generally advisable to keep this value under 10 connections per application instance. When considering overall system capacity, scaling application instances should be planned such that the total number of database connections across all instances remains below 1000, especially for databases like Oracle.33
idleTimeout: This parameter should be set slightly higher than the average database query execution time. For example, if the average query time is 50ms, an idleTimeout of 100ms would be sensible. The default value of 10 seconds is often excessively high for applications requiring quick responses, leading to connections remaining idle in the pool for too long.33
maxLifetime: This value must be set several seconds shorter than any connection time limit imposed by the database or underlying infrastructure (e.g., AWS RDS/Aurora idle timeouts). This ensures that the application’s connection pool proactively closes connections before the infrastructure forcibly terminates them, preventing unexpected errors.33 This emphasizes that “cloud-native” development requires understanding and configuring application components to gracefully handle and anticipate behaviors and limits imposed by managed cloud services, thereby reducing unexpected runtime errors and improving overall system reliability.
connectionTimeout: For time-critical applications, a value between 5-10 seconds is often appropriate, as the default of 30 seconds can be too long. Setting this value too low, however, can result in a flood of SQLExceptions in the logs.33

Effective monitoring of connection pool metrics is indispensable for identifying performance bottlenecks or potential memory leaks. Tools such as Micrometer, Prometheus, and Grafana can be used to track active, idle, and pending connections, as well as connection usage, acquisition, and creation times.33

HikariCP Parameter	Recommended Value for EKS	Rationale
maximumPoolSize	<= 10 connections per app instance	Minimizes “noisy neighbor” impact; total connections across all instances < 1000.
idleTimeout	Slightly > average DB query time (e.g., 100ms for 50ms query)	Reclaims idle connections faster, prevents too many idle connections.
maxLifetime	Several seconds < infrastructure/DB connection time limit	Ensures app times out before infrastructure, preventing unexpected errors.
connectionTimeout	5s - 10s for time-critical apps (default 30s often too high)	Prevents long waits for connection acquisition; avoid too low to prevent SQLExceptions.

3.1.2. Read Replicas and Data Access Strategies

For applications characterized by high read volumes, such as a BFF that aggregates data for presentation, the strategic utilization of read replicas can significantly offload the primary database instance. This approach enhances read throughput and reduces latency, directly contributing to improved application responsiveness.34 The ability to route reads to replicas is a direct response to the read-heavy nature of many BFFs. This is not merely about adding a feature; it is a fundamental architectural pattern for scaling database access.
Spring Boot applications can be effectively configured to leverage separate read and write endpoints, particularly with services like AWS Aurora. This integration typically involves:

Including the spring-boot-starter-data-jpa dependency in the project.
Configuring distinct data sources in application.properties or application.yml for both the write (primary) and read endpoints.34
Employing multiple EntityManager instances within services or controllers, explicitly specifying them with @PersistenceContext(unitName = “primary”) for write operations and @PersistenceContext(unitName = “read”) for read operations.34

For use cases demanding extreme scalability, ultra-low latency, and a flexible schema, integrating a NoSQL database like Amazon DynamoDB presents a compelling alternative. This choice represents a strategic decision based on workload characteristics. Spring Cloud AWS simplifies DynamoDB integration through the spring-cloud-aws-starter-dynamodb dependency, abstracting away the complexities of direct AWS SDK interaction.27 Additionally, the
DynamoDbTableNameResolver can be utilized to override default naming conventions for DynamoDB tables, providing greater flexibility in schema management.27
A critical consideration across all database choices is the proper configuration of IAM permissions. Ensuring that the Spring Boot application possesses the correct IAM roles and policies to access the Aurora or DynamoDB clusters is fundamental for secure and operational connectivity.27
While read replicas offer substantial scaling benefits, they often operate on an eventual consistency model, meaning there might be a slight delay before data written to the primary instance becomes available on the replicas. If the BFF also interacts with a NoSQL database like DynamoDB, which often defaults to eventual consistency for reads, it introduces another layer of consistency considerations. For a BFF, minor eventual consistency might be acceptable for some data, but critical user-facing data might require stronger guarantees, impacting the choice and implementation of data access patterns. Therefore, optimizing the data layer means more than just tuning connections; it involves strategically choosing and configuring database architectures (relational with read replicas, or NoSQL) that align with the application’s access patterns and scalability requirements, while carefully considering the implications of data consistency models.

3.1.3. Database Migration Strategies (Flyway/Liquibase)

Managing database schema changes consistently, reliably, and automatically is paramount for applications deployed in dynamic environments like Kubernetes, especially with frequent deployments. Tools such as Flyway and Liquibase are industry standards for this purpose.35
The execution strategy for database migrations is a critical consideration. It is strongly recommended to execute migrations as a distinct pre-deployment step, ensuring they are applied before the application pods start. Running migrations during every application startup can introduce unnecessary overhead, slow initialization, and lead to race conditions in distributed systems where multiple pods might attempt to apply schema changes concurrently.35 This approach also tightly couples schema changes with application deployment, complicating advanced deployment strategies like rolling updates or blue-green deployments. This strong recommendation to run migrations as a “distinct pre-deployment step” is a critical shift in thinking, transforming database migration into a deployment gate rather than merely a database task. It implies that the CI/CD pipeline for the Spring Boot BFF on EKS must be designed to explicitly include a migration phase
before the application deployment phase, ensuring atomicity of schema changes and preventing application startup failures due to schema mismatches.
To ensure controlled execution, mechanisms within the migration tools should be leveraged to prevent concurrent migrations, guaranteeing that only one instance applies changes at a time.35 Furthermore, for enhanced security, a dedicated database user with elevated privileges should be used exclusively for running migrations. The application’s runtime database user should operate with the principle of least privilege, possessing only the minimum necessary permissions.35
Understanding the database’s support for transactional DDL (Data Definition Language) is also vital. PostgreSQL, for instance, supports transactional DDL, allowing DDL operations to be rolled back if an error occurs during a migration script. However, databases like MySQL, MariaDB, and Oracle (prior to 12c) do not support transactional DDL. In these cases, an error during a migration can leave the schema in an inconsistent state, as DDL operations already executed will not be rolled back.35
Backward compatibility is crucial for minimizing disruption during deployments and enabling zero-downtime updates. This is a fundamental principle for maintaining agility and velocity in a microservices ecosystem. Key guidelines include:

Adding non-breaking changes first: New columns should always be added with a default value or allowed to be nullable.
Avoiding destructive operations: Directly dropping or renaming columns/tables that the current application might depend on should be avoided. Instead, deprecated elements can be marked and removed in a later, separate migration, or new elements can be added while gradually phasing out the old ones.
Maintaining dual reads/writes temporarily: During transitions, data should be written to both the old and new schema.
Using database views: For logical changes, views can help reduce the impact on application code that might need to support both old and new schemas.35

A phased approach, such as “Safe first, clean later,” is recommended for complex schema changes. This involves an initial phase of adding non-breaking changes, followed by a period of dual writes, then updating the application to read from the new schema, and finally, a cleanup phase to remove deprecated elements.35 This disciplined approach to schema evolution ensures that the database can support multiple versions of the application concurrently, which is essential for rapid iteration and safe deployments in EKS.

Structuring migration scripts effectively is key to their manageability and clarity as an application grows. Best practices suggest creating one script per feature or change, ensuring each migration is atomic and modular, which reduces merge conflicts and simplifies rollbacks.35 Consistent sequential versioning is crucial for tools like Flyway, which execute scripts based on their version order. For concurrent development or hotfixes, conventions like timestamp-based versioning or reserving version blocks for specific teams can be adopted.35 Organizing scripts into folders, grouped by version or feature, can further enhance project organization.35
Testing migrations is indispensable for ensuring data integrity and validating the migration process. Testcontainers can be utilized to test migrations in isolated environments with production-like masked data.35 It is also important to ensure that integration and regression tests validate that the application’s entity classes correctly match the database schema.36 For large projects, consolidating historical migrations into a baseline script can significantly speed up the creation of new databases for development or testing environments.35

Principle	Description	Phased Approach (for complex changes)
Add Non-Breaking Changes First	Introduce new columns as nullable or with default values.	Phase 1: Safe First, Clean Later - Add new tables/columns that don’t break existing logic.
Avoid Destructive Operations	Do not directly drop/rename columns or tables. Mark as deprecated and remove later.	Phase 2: Maintain Dual Writes Temporarily - Write data to both old and new schema (via app code or triggers).
Maintain Dual Reads/Writes	During transition, application writes to both old and new columns/tables.	Phase 3: Read from New Source - Update application to read from new tables/columns.
Use Views for Logical Changes	Create database views to abstract schema changes from application code.	Phase 4: Clean Up - Remove old columns/tables, triggers, and dual-write logic.

3.2. Cache Layer Optimization (AWS ElastiCache for Redis)

Caching plays a pivotal role in optimizing application performance, reducing latency, and significantly offloading the primary database, especially for frequently accessed data or computationally intensive operations.1
AWS ElastiCache for Redis is a fully managed, in-memory caching service that is compatible with Redis OSS. It delivers exceptional performance, capable of handling millions of operations per second with microsecond response times, while also ensuring high availability.5 This makes it an excellent choice for a BFF that needs to serve aggregated data quickly.
Spring Framework provides a transparent caching abstraction layer, which simplifies the implementation of caching within a Spring Boot application. Developers can easily add caching to methods using annotations such as @Cacheable (for caching method results), @CachePut (for updating cache entries), and @CacheEvict (for removing cache entries).5 The
spring-boot-starter-data-redis dependency facilitates the seamless integration of Spring Boot’s caching abstraction with Redis.5 A crucial technical requirement for this integration is that any object intended for caching must be
Serializable.37
ElastiCache for Redis enables distributed caching, a system where the cache is spread across multiple servers. This ensures that cached data remains consistent and accessible across all instances (pods) of the Spring Boot application deployed in EKS, thereby allowing for effective horizontal scaling of the BFF.39 This positions caching not just as a performance optimization but as a critical component for protecting the database from excessive load, especially during traffic spikes. The example of a method taking 2 seconds initially but returning instantly after caching 37 directly illustrates the performance gain, transforming potentially slow, multi-API calls into fast, single-cache lookups.
ElastiCache Configuration and Security Considerations:

VPC Restriction: For security and performance, ElastiCache clusters are typically configured to be accessible only from within the same Virtual Private Cloud (VPC) as the application. It is a best practice to avoid connecting an internet gateway to the cache cluster.37
Security Groups: Proper configuration of security groups is essential. Inbound rules on the ElastiCache security group must permit traffic on port 6379 (the default Redis port) from the EKS worker nodes (EC2 instances) running the Spring Boot application. Conversely, the application’s outbound rules must also allow communication on this port.37
Connection Properties: The Redis host and port should be configured in the application’s application.properties or application.yml file. For ElastiCache Serverless, the specific endpoint address provided by AWS should be used.5
Time-to-Live (TTL): Setting spring.cache.redis.time-to-live is crucial for defining the expiration time of cached entries. An infinite default TTL can lead to stale data or memory exhaustion, so careful consideration of data freshness is necessary.5 Effective caching requires a clear strategy for data freshness and invalidation. It is not enough to simply enable caching; developers must carefully consider the consistency requirements of different data types and implement appropriate TTLs or explicit invalidation mechanisms.
Serverless Option: ElastiCache Serverless offers a compelling advantage by allowing quick cache creation (under a minute) and automatic scaling based on application traffic patterns, significantly reducing management overhead for the operations team.5
Local Development: For local development and testing, it is often convenient to run a Redis instance using Docker Compose.37

This approach to caching significantly improves the perceived performance for the end-user and can substantially reduce the operational cost and load on downstream services. The decision to cache, and how to manage that cache, also impacts the observability strategy, as cache hit/miss ratios become critical metrics for performance analysis.

4. Streamlining Asynchronous Processing and Messaging

4.1. Managing Background Jobs and Asynchronous Tasks in Kubernetes

Spring Boot offers built-in support for asynchronous processing through the @Async annotation. This mechanism allows methods to execute on separate threads, managed by Spring’s task executor, thereby freeing up the main application thread to handle other incoming requests. This improves the overall responsiveness and throughput of the application.40 For tasks that require periodic execution, Spring Boot provides the
@Scheduled annotation.41
To enable asynchronous support, the @EnableAsync annotation must be added to the main application class.41 Furthermore, it is beneficial to configure a custom
Executor bean, typically within an AsyncConfig class, to manage the thread pool for @Async tasks. This custom executor can leverage modern Java features, such as virtual threads, for enhanced efficiency.41
When deploying such applications in Kubernetes, the scaling strategy for background jobs and asynchronous tasks requires careful consideration. For CPU-intensive or long-running background jobs, it is often advantageous to deploy them as dedicated worker pods, distinct from the main BFF application. This architectural separation enables independent scaling and resource allocation for the background processing, preventing these tasks from impacting the responsiveness of the primary frontend-facing services.21
For one-off or scheduled batch processes, often implemented with Spring Batch, Kubernetes Job or CronJob resources are highly suitable.42 Several deployment strategies exist for Spring Batch on Kubernetes:

Deploying a New Kubernetes Job for Each Batch File: In this strategy, each batch file or processing unit triggers the creation of a new Kubernetes Job. Each job runs in its own isolated environment, dynamically allocating and freeing resources upon completion. This approach offers excellent isolation, scalability, and resource optimization, though it may incur higher startup overhead for each individual job run.42
Shared Spring Batch Master and Worker: This approach involves a continuously running set of Kubernetes pods, comprising a master component that coordinates job execution and worker components that handle the actual processing. This setup is efficient for environments with a steady stream of batch jobs, offering reduced startup times and better resource utilization due to long-running processes. However, it entails constant resource consumption and may be less flexible in scaling compared to job-based strategies. Custom scaling can be implemented based on metrics such as worker queue depth.42
Spring Cloud Data Flow (SCDF): SCDF provides a comprehensive toolkit for orchestrating data processing pipelines, including Spring Batch jobs, on Kubernetes. It offers robust features for scheduling, monitoring, and scaling, integrating well with Kubernetes for dynamic resource management.42

The distinction between @Async (for in-process tasks) and dedicated worker pods or Kubernetes Jobs (for heavier, decoupled tasks) is crucial. While @Async improves responsiveness for quick, non-critical operations, it does not scale effectively for long-running or resource-intensive processes, as it consumes the main application’s resources. Deploying separate worker pods or Jobs provides isolation, scalability, and resource optimization 42, ensuring the BFF’s primary function (serving frontend requests) scales independently of its background processing.
A critical aspect for any background processing, especially message consumers or batch jobs, is ensuring graceful application shutdown. This is paramount in dynamic Kubernetes environments where pods can be terminated at any time due to scaling events, updates, or spot instance preemption. Spring Boot’s server.shutdown=graceful and spring.lifecycle.timeout-per-shutdown-phase settings are essential for allowing in-progress background tasks to complete before a pod terminates.16 The “Poison Pill” approach for scaling down SQS consumers 43 further illustrates the need for applications to be designed to: 1) acknowledge messages only upon successful processing, 2) handle message re-delivery, and 3) implement robust graceful shutdown logic that allows current work to finish. This is a critical resilience and data integrity concern.

4.2. Robust Messaging with AWS SQS (Queues and Dead-Letter Queues)

Amazon SQS (Simple Queue Service) is a fully managed, distributed messaging system provided by AWS. Its primary function is to decouple application components, enabling them to scale independently. In this model, producers send messages to a queue, and consumers poll or receive messages from that queue.44
SQS offers two main types of queues:

Standard Queues: These are the default queue type, supporting a nearly unlimited number of API calls per second. They provide at-least-once message delivery, meaning messages might be delivered more than once, and message order is not guaranteed. Standard queues are suitable when message order is not critical and duplicate processing can be tolerated.45
FIFO (First-In-First-Out) Queues: FIFO queues guarantee message order and exactly-once processing within a message group. They are used in scenarios where the order of message arrival and processing is of utmost importance, and duplicate messages are unacceptable.45

Spring Boot applications can integrate seamlessly with SQS through Spring Cloud AWS. The spring-cloud-aws-dependencies module simplifies this integration.44 A default
SqsTemplate instance is automatically configured and injected, providing convenient methods for sending and polling messages.44 For consuming messages, the
@SqsListener annotation is used to define message listener methods, which Spring automatically configures to receive and process messages from a specified SQS queue.44 Configuration properties for AWS credentials (
access-key, secret-key), region, and the SQS queue endpoint are typically defined in application.properties.44
Dead-Letter Queues (DLQs):
DLQs are a critical component for building robust and fault-tolerant message processing systems. A DLQ is a dedicated queue that receives messages that cannot be successfully processed by the source queue’s consumers. This mechanism isolates problematic messages, preventing them from perpetually failing and blocking the main queue, which would otherwise lead to a cascading failure of the entire consumer system.44 The purpose of DLQs is to isolate unprocessed messages and facilitate the determination of why their processing failed.47
Setting up a DLQ involves creating it as a regular queue (ensuring it is the same type, in the same AWS account and region as the source queue). Subsequently, a “redrive policy” is configured on the source queue. This policy specifies the DLQ’s Amazon Resource Name (ARN) and a maxReceiveCount, which defines the maximum number of times a message can be received and not processed before it is moved to the DLQ.44 Proper DLQ configuration and a defined process for reviewing and reprocessing DLQ messages are essential for production readiness, as their absence can lead to data loss, operational blind spots, and system instability.
Polling Strategies:

Short Polling: Involves the consumer continuously calling the receive() method of the SqsTemplate in a loop.
Long Polling: Achieved by setting the Receive message wait time to a value greater than zero. With long polling, if there are no messages in the queue, SQS waits for the specified duration for new messages to arrive before returning a response. This approach significantly reduces polling costs and minimizes empty responses, leading to more efficient resource utilization.45

Scaling SQS Consumers on EKS:
For scaling SQS consumers on EKS, event-driven autoscaling is highly effective. The ApproximateNumberOfMessagesVisible metric from SQS can be used to dynamically scale consumer pods.43 A Target Tracking Scaling policy, employing a “Backlog per Task” approach (number of messages in the queue divided by the number of currently running tasks), is often a more suitable fit than a Step Scaling policy for SQS queue-based auto-scaling.43 Kubernetes Event-Driven Autoscaling (KEDA) can be leveraged to scale Kubernetes Deployments based on SQS queue length, providing a robust event-driven autoscaling solution.49 Addressing the challenge of scaling down to zero (or a low baseline) after large bursts or during recovery from downtime requires careful implementation, potentially involving “Poison Pill” messages to gracefully shut down idle workers.43 This demonstrates how deep integration between cloud messaging services and Kubernetes autoscaling mechanisms can lead to significant cost savings and improved responsiveness to variable loads, shifting the scaling paradigm from reactive resource-based to proactive event-driven.

4.3. Event Streaming with Apache Kafka / AWS MSK

Apache Kafka is a distributed streaming platform designed for high-throughput, low-latency message delivery, making it ideal for building real-time data pipelines, streaming applications, and event-driven architectures. It functions as a robust message broker, facilitating the efficient exchange of data between producers and consumers.50
The core components of Kafka include:

Producers: Clients that send messages to Kafka topics.
Consumers: Clients that read messages from Kafka topics.
Topics: Logical channels to which producers publish messages and from which consumers read.
Brokers: Servers that store and serve messages within the Kafka cluster.51

Spring Boot applications can integrate with Kafka using the spring-kafka dependency.50 The
KafkaTemplate is provided for producing messages 50, while the
@KafkaListener annotation enables the creation of Message-driven POJOs for consuming messages.50 Configuration involves setting up
ProducerFactory, ConsumerFactory, and KafkaListenerContainerFactory beans. The KafkaAdmin bean can also be used for automatically creating Kafka topics.50 Essential configuration properties include
bootstrap-servers (addresses of Kafka brokers), key/value serializers/deserializers, and group-id for consumer groups.50
For a fully managed Kafka experience on AWS, Amazon MSK (Managed Streaming for Kafka) simplifies the deployment, management, and scaling of Apache Kafka clusters. MSK handles the operational complexities, allowing teams to focus on application development.52
Scaling Kafka Consumers on EKS:
Scaling Kafka consumers on EKS is highly efficient due to Kafka’s design around consumer groups and parallelism. Multiple consumers can work in parallel within a consumer group, with each consumer reading from a subset of partitions. This mechanism provides inherent scalability and fault tolerance.53

Horizontal Scaling: Increasing throughput can be achieved by adding more consumer instances to a consumer group.
Vertical Scaling: Adding more partitions to a topic allows the load to be distributed among more consumers.53

Similar to SQS, Kafka consumer lag metrics can be utilized for metrics-driven autoscaling of consumer workloads.54 For managing Kafka clusters on Kubernetes, specialized Kubernetes Operators, such as Strimzi, significantly simplify deployment, configuration, security, upgrades, and the management of topics and users.52 Kafka brokers themselves can be scaled horizontally by adding more brokers to the cluster, with Kafka automatically rebalancing partitions across the new brokers.53

Kafka serves as the backbone for event-driven architectures and real-time data processing. While SQS excels at decoupling and asynchronous tasks, Kafka is explicitly designed for “real-time data pipelines,” “streaming applications,” and “event-driven architectures”.52 This implies a need for higher throughput, lower latency, and more complex event processing than typical SQS scenarios. The concepts of topics, partitions, and consumer groups are fundamental to its horizontal scalability. For a BFF that needs to react to real-time events (e.g., product updates, user status changes from other microservices), Kafka (or MSK) provides the necessary infrastructure, shifting the BFF from a purely request-response model to an event-driven one, enabling richer, more dynamic frontend experiences.
Performance Tuning for Kafka:
Optimizing Kafka performance involves tuning several configuration parameters:

Producer Configuration: Increasing batch-size can improve throughput by reducing the number of requests sent to Kafka. Using compression-type (e.g., gzip, snappy) reduces network load, and configuring retries handles transient failures.53
Consumer Configuration: Increasing fetch.min.bytes and fetch.max.wait.ms can enhance throughput for consumers. Limiting max-poll-records prevents memory overload in consumers.53
Log Retention: Setting appropriate log retention policies is important for managing disk space on brokers.53
Replication Factor: A replication factor of 3 is typically recommended for high availability, though this depends on specific fault tolerance requirements.53

Managing and scaling Kafka clusters is often described as challenging and time-consuming.52 The recommendation of Kubernetes Operators like Strimzi is a direct response to this complexity. Operators encapsulate domain-specific knowledge to manage stateful applications on Kubernetes. Deploying a stateful system like Kafka (or its consumers that maintain state) on EKS requires a different approach than stateless BFFs. Relying on specialized Kubernetes Operators is a best practice to automate the complex lifecycle management (deployment, scaling, upgrades, security) of such systems, ensuring their high availability and performance within the EKS environment.

5. Comprehensive Observability

In a distributed microservices architecture deployed on EKS, comprehensive observability, encompassing logs, metrics, and traces, is paramount. It provides the necessary visibility to understand application behavior, proactively detect issues, efficiently troubleshoot problems, and continuously optimize performance.55

5.1. Structured Logging with CloudWatch Logs

Centralized logging is a fundamental requirement for distributed systems. Amazon CloudWatch Logs offers a centralized service for structured log aggregation, enabling efficient searching, custom metric extraction, alarm generation, and long-term log retention and analysis.55
Spring Boot 3.4.0 and later versions provide built-in support for structured logging, offering various formats such as Logstash, Elastic Common Schema (ECS), and Graylog Extended Log Format (GELF).59 This internal support simplifies the adoption of structured logging. To enable it, properties like
logging.structured.format.console or logging.structured.format.file can be set in application.properties (e.g., logging.structured.format.console=logstash).59 For highly specific requirements, custom
StructuredLogFormatter implementations can also be created.59
For collecting container logs from EKS pods and forwarding them to CloudWatch Logs, FluentBit is a widely used and efficient log processor.55 Log correlation is crucial for debugging in distributed environments. It involves integrating log messages with trace IDs and metrics, allowing for a comprehensive debugging experience.55 This ensures that unstructured logs, which are nearly useless for root cause analysis in a distributed system, are transformed into actionable data. For a BFF on EKS, structured logging is not just a “nice-to-have”; it is a foundational requirement for operational efficiency, enabling rapid identification of issues, trend analysis, and effective collaboration between development and operations teams. Proper log retention policies should also be implemented in CloudWatch Logs (e.g., 7-30 days for debug logs) to manage storage costs effectively.55

5.2. Metrics Collection and Visualization (Micrometer, Prometheus, Grafana)

Metrics provide real-time numerical measurements of system performance and behavior, which are essential for detecting issues and optimizing application performance.56
Spring Boot integrates seamlessly with Micrometer, an application metrics facade. Micrometer allows developers to instrument their applications with various types of metrics, including counters, gauges, and timers, and to export these metrics to different monitoring systems.18
Prometheus is an open-source monitoring system designed for collecting, storing, and querying time-series metrics data. Spring Boot applications can expose Prometheus-compatible metrics via the Actuator endpoint, typically at /actuator/prometheus.33 To enable this, the
micrometer-registry-prometheus dependency needs to be added to the project.33 For a fully managed Prometheus experience on AWS, Amazon Managed Service for Prometheus (AMP) eliminates the need for infrastructure management, offering automatic scaling and high availability.55
Grafana is a popular open-source visualization tool that enables the creation of real-time dashboards and visualizations based on metrics data collected by Prometheus.33 Beyond standard JVM and Spring Boot metrics, custom application-specific metrics can be exposed. For example, a “HTTP requests per second” metric can be crucial for driving Horizontal Pod Autoscaling (HPA) decisions.60 This capability of defining custom metrics directly links business logic to infrastructure scaling. Metrics are not just for reactive troubleshooting; they are the proactive drivers of efficient resource allocation and cost management in EKS. By instrumenting the BFF with relevant business and technical metrics, teams can implement intelligent autoscaling policies that align with actual workload demands, rather than relying on generic resource utilization.
The distinction between basic CPU/memory metrics and the necessity to extend HPA thresholds beyond these, using “adapters” for custom metrics 60, suggests that out-of-the-box metrics are often insufficient for complex applications. This highlights the importance of a metrics strategy that goes beyond basic monitoring. A comprehensive metrics strategy for a BFF on EKS requires identifying key performance indicators (KPIs) relevant to its specific business logic (e.g., external API call success rates, data aggregation times, cache hit ratios) and exposing them as custom metrics. This shifts monitoring from infrastructure-centric to application-centric, enabling more meaningful insights and better operational decisions. To manage costs, intelligent sampling for high-cardinality metrics should be implemented.55

5.3. Distributed Tracing with OpenTelemetry and AWS X-Ray

In modern microservices architectures, a single user request often traverses multiple services. Distributed tracing is essential to link the processing across these disparate services, even as requests cross network boundaries between containers. This provides an end-to-end view of the request flow, which is crucial for understanding system behavior and identifying performance bottlenecks.58
AWS X-Ray is a fully managed service for distributed tracing that assists in analyzing and debugging production applications. It offers an end-to-end view of requests, presenting a map of the application’s underlying components and helping to pinpoint the root causes of performance issues and errors.55 X-Ray transforms a black box of inter-service communication into a transparent, observable flow. This is crucial for identifying performance bottlenecks, understanding dependencies, and rapidly pinpointing the root cause of issues in a highly distributed EKS environment.
OpenTelemetry is a collaborative, open-source effort that provides a common standard for instrumentation. It enables consistent telemetry collection (traces, metrics, and logs) across different languages and libraries.55 The OpenTelemetry Java Agent offers the simplest way to get started with Java applications, as it automatically instruments supported libraries (e.g., Spring, gRPC, MySQL, Redis, AWS SDK) without requiring any code changes.58 The AWS Distro for OpenTelemetry (ADOT) is an open-source distribution that packages OpenTelemetry components, pre-configured to export data to AWS X-Ray and Amazon CloudWatch. ADOT also supports AWS Tracing headers, ensuring complete traces across AWS managed services.55
Effective implementation of distributed tracing involves instrumenting critical code paths, using consistent trace context propagation across services, and monitoring service dependencies through X-Ray service maps.55 It is also important to implement proper error handling and apply trace sampling strategies for cost optimization.55 Verification that the ADOT Collector is running and that the associated IAM roles have the correct permissions for AWS services is a necessary operational step.55
OpenTelemetry is presented as a common ground for instrumentation and a way to avoid vendor lock-in. This suggests a strategic benefit beyond immediate debugging, as adopting a standardized approach provides the flexibility to change backend observability tools without re-instrumenting applications. Investing in OpenTelemetry for the Spring Boot BFF ensures that the observability strategy is adaptable and scalable, decoupling the application’s instrumentation from specific AWS services or third-party vendors, thus providing long-term agility and reducing technical debt in the observability stack.

5.4. Kubernetes Health Checks (Liveness, Readiness, Startup Probes)

Kubernetes utilizes various probes to manage the lifecycle and availability of containers within pods, which are fundamental for maintaining application reliability and efficient traffic routing.62

Liveness Probes: These probes determine if a container is actively running and healthy. If a liveness probe fails, Kubernetes automatically restarts the container. This mechanism is crucial for detecting and recovering from conditions such as deadlocks or unresponsive applications, thereby enhancing application availability.62
Readiness Probes: These probes ascertain whether a container is ready to accept incoming traffic. If a container is not deemed ready, its corresponding pod is temporarily removed from the load balancers associated with its Kubernetes Service. This prevents traffic from being routed to unready or unhealthy instances, ensuring that only fully operational pods receive requests.62
Startup Probes: These probes are designed to determine if an application inside a container has successfully started. If a startup probe is configured, liveness and readiness probes are deferred until the startup probe succeeds. This prevents them from interfering with applications that have a slow initialization process, avoiding premature restarts or pods never becoming ready due to early probe failures.62 For Spring Boot applications on EKS, implementing a startup probe is a critical optimization, ensuring the application has sufficient time to fully initialize before Kubernetes actively monitors its runtime health or routes traffic to it, preventing “crash loops” during deployment.

Spring Boot integrates effectively with Kubernetes probes via its Actuator endpoints, specifically /actuator/health/liveness and /actuator/health/readiness.63 As of Spring Boot 2.3+, the
LivenessStateHealthIndicator and ReadinessStateHealthIndicator classes expose the application’s current health state. Spring Boot automatically registers these health indicators when deployed in a Kubernetes environment or when management.health.probes.enabled=true is explicitly set.63 The liveness state transitions between
CORRECT and BROKEN, while the readiness state transitions between ACCEPTING_TRAFFIC and REFUSING_TRAFFIC.63
Probes are configured with several key parameters that allow for precise control over their behavior:

initialDelaySeconds: The initial delay after a container starts before probes are initiated. If a startup probe is defined, this delay for liveness and readiness probes begins only after the startup probe has succeeded.62
periodSeconds: How often (in seconds) the probe is performed.62
timeoutSeconds: The duration after which the probe times out.62
successThreshold: The minimum consecutive successes required for the probe to be considered successful after having failed.62
failureThreshold: The number of consecutive failures after which Kubernetes considers the overall check to have failed (leading to container restart for liveness/startup probes, or removal from load balancer for readiness probes).62
terminationGracePeriodSeconds: Configures the grace period for the Kubelet to wait between triggering a container shutdown and forcibly stopping it.62

A common pattern for implementing these probes is to use HTTP GET requests to a low-cost endpoint, such as the Actuator health endpoints.62 Properly configured health probes are non-negotiable for any Spring Boot application deployed on EKS. They serve as the primary interface through which Kubernetes understands the application’s runtime state and makes intelligent decisions about traffic routing and restarts, directly impacting the BFF’s reliability and user experience.

Category	Spring Boot Component	AWS Service/Kubernetes Tool	Purpose/Benefit
Logs	Logback/SLF4J (Structured Logging)	CloudWatch Logs / FluentBit	Centralized log aggregation, search, and long-term retention; enables comprehensive debugging.
Metrics	Micrometer	Amazon Managed Service for Prometheus / Grafana	Real-time performance monitoring, resource utilization tracking, and custom metric-driven autoscaling.
Traces	OpenTelemetry Java Agent	AWS X-Ray / ADOT Collector	End-to-end request visibility across microservices, bottleneck identification, and root cause analysis.
Health Checks	Spring Boot Actuator (Liveness, Readiness, Startup Probes)	Kubernetes Probes	Automated container lifecycle management, traffic routing control, and self-healing capabilities.

6. Deep Dive into Performance Optimization

Achieving optimal performance in a cloud-native environment like EKS necessitates a comprehensive and multi-faceted approach, considering optimizations at the JVM, Spring Boot application, Kubernetes orchestration, and containerization layers.

6.1. JVM Optimizations (CRaC, GraalVM Native Image)

Traditional Java applications often exhibit longer startup times and higher initial resource consumption due to the inherent overhead of JVM initialization, extensive class loading, and Just-In-Time (JIT) compilation. This characteristic presents a significant challenge in dynamic, containerized environments such as Kubernetes, where rapid startup is crucial for efficient autoscaling and cost management.64
Coordinated Restore at Checkpoint (CRaC):
CRaC is an OpenJDK project designed to address JVM startup challenges by providing fast startup and immediate performance. It operates by allowing a Java application and its JVM to start from a “warmed-up” image. This involves creating a checkpoint of a running JVM process at an arbitrary point in time and saving its state to files.65 The application can then be restored from these checkpoint files, which significantly reduces the startup time and eliminates the typical initial spike in compute resource consumption observed during Java application startup.65 Spring Boot offers full support for CRaC concerning its own dependencies, though external libraries may require additional logic for full compatibility.65 For deployment, these checkpoint files can either be stored as additional layers within the container image or externalized to persistent storage solutions like Amazon EFS or S3.65 Utilizing CRaC-enabled JDK images (e.g.,
azul/zulu-openjdk:17-jdk-crac) is necessary for this approach.65
GraalVM Native Image:
Another powerful optimization technique involves compiling Java applications into standalone native executables using GraalVM Native Image. This process eliminates the dependency on a traditional JVM at runtime, resulting in exceptionally fast startup times and a substantially reduced memory footprint. This makes native images ideal for serverless functions, command-line tools, and microservices in containerized environments where rapid scaling and minimal resource consumption are paramount. While offering significant performance benefits, building native images can increase build times and may require more effort to ensure compatibility with all third-party libraries.
The complexity of the JVM and its optimization processes, such as JIT compilation, inherently lead to longer startup times.64 The goal in Kubernetes and serverless environments is to minimize this time-to-start. JIT compilation, while optimizing frequently executed code (hotspots) for better runtime performance, incurs an initial compilation overhead. This means there is a trade-off between the initial startup speed and long-term runtime performance. CRaC and GraalVM Native Image directly address this trade-off by pre-optimizing or pre-compiling the application, allowing it to bypass or significantly reduce the initial JVM warm-up phase, thereby making Java applications more suitable for the ephemeral nature of containerized deployments.

6.2. Spring Boot Specific Performance Enhancements (Lazy Initialization)

Spring Boot provides mechanisms to enhance application startup performance, notably through lazy initialization. This feature defers the creation of Spring beans until they are explicitly required during the application’s runtime.66 This contrasts with the default eager initialization behavior, where all singleton beans are created and fully initialized during the application startup phase, regardless of whether they are immediately needed.66
By shifting the bean creation process, lazy initialization can significantly reduce the initial startup time, especially for applications with a large number of components or those that contain rarely used services. For instance, a service that interacts with a database but is not invoked immediately after startup can have its instantiation delayed until its methods are called for the first time.66 This approach reduces the initial workload during startup, saving memory and processing power.
Lazy initialization can be enabled globally across the entire application by setting spring.main.lazy-initialization=true in the application.properties or application.yml file.66 This configuration is particularly beneficial for large applications with numerous beans and dependencies, or for deployments in environments with limited resources, as it minimizes the number of beans loaded upfront.66

6.3. Kubernetes Resource Management (Requests, Limits, HPA)

Effective resource management within Kubernetes is crucial for optimizing performance, ensuring stability, and controlling costs for Spring Boot applications deployed on EKS.
Resource Requests and Limits:
Kubernetes uses resources.requests and resources.limits to manage container resource consumption.

Requests: resources.requests specify the minimum amount of CPU and memory a container needs. The Kubernetes scheduler uses these values to decide which node to place a pod on, ensuring the node has sufficient available (non-requested) resources.67
Limits: resources.limits define the maximum amount of CPU and memory a container can consume. If a container exceeds its memory limit, it will be terminated (OOM Kill). For CPU, exceeding the limit results in throttling. It is often recommended to set limits equal to requests to ensure predictable performance and avoid resource contention. Setting limits significantly higher than requests can lead to situations where pods do not receive their requested resources if a node becomes overcommitted.67 Therefore, it is essential to set both requests and limits for both CPU and memory.

Horizontal Pod Autoscaler (HPA):
The Horizontal Pod Autoscaler (HPA) automatically adjusts the number of pod replicas in a Deployment or StatefulSet based on observed metrics. This ensures that the application can respond dynamically to varying workloads while optimizing resource utilization and cost.52 HPA can scale based on:

Resource Metrics: Average CPU utilization or average memory utilization per pod.68
Custom Metrics: Any other application-specific or business-level metric (e.g., “HTTP requests per second,” queue length, or consumer lag). These metrics are typically collected via aggregated APIs like custom.metrics.k8s.io (often provided by metrics adapters like Prometheus Adapter) or external.metrics.k8s.io.60 The ability to define custom metrics for autoscaling directly links business logic to infrastructure scaling.

When multiple metrics are specified in an HPA, the controller calculates a desired replica count for each metric and then selects the largest of these counts to ensure adequate scaling.68 The HPA also incorporates flags like
—horizontal-pod-autoscaler-cpu-initialization-period to ignore CPU usage during a pod’s initial startup phase, preventing premature scaling decisions.68
Vertical Pod Autoscaler (VPA):
While HPA scales horizontally, the Vertical Pod Autoscaler (VPA) can automatically adjust the CPU and memory requests and limits for individual pods. VPA has different modes:

“Initial” Mode: Assigns resources when pods start and does not change them later. This can be problematic if the load suddenly increases after a low-load period.67
“Auto” Mode: Recreates pods with adjusted resource requests/limits. This mode is generally not advisable for StatefulSets due to the disruptive nature of pod recreation.67

VPA recommendations become more accurate over time (days to weeks) as it observes workload patterns, helping to avoid Out-Of-Memory (OOM) kills due to insufficient resources and optimize infrastructure costs.67

Resource Type	Parameter	Best Practice	Rationale
CPU	requests	Set to the average CPU usage under normal load.	Guarantees minimum CPU for scheduling; prevents throttling under average load.
	limits	Set equal to requests or slightly higher (e.g., 1.5x requests).	Ensures predictable performance; avoids excessive throttling; prevents “noisy neighbor” issues.
Memory	requests	Set to the average memory usage plus a buffer.	Guarantees minimum memory for scheduling; prevents OOM kills during startup.
	limits	Set equal to requests or slightly higher (e.g., 1.2x requests).	Prevents memory leaks from consuming all node resources; ensures OOM kills are predictable.

6.4. Container Image Optimization with Cloud Native Buildpacks

Optimizing the size and layering of container images is crucial for faster deployments, reduced storage costs, and improved security in Kubernetes environments. Cloud Native Buildpacks (CNBs) offer a powerful solution for transforming application source code into optimized container images without the need for manual Dockerfile creation.69
CNBs centralize the knowledge of container build best practices within a specialized team, eliminating the need for individual application developers to maintain their own Dockerfiles. This approach streamlines security and compliance enforcement, and simplifies upgrades with minimal effort.69 For Java applications, CNBs allow developers to focus on application logic rather than intricate Dockerfile configurations, including JVM settings, certificate management, and OS patching.70
A significant benefit of CNBs, particularly for Spring Boot applications, lies in their ability to optimize image layering. Spring Boot 2.3.0.M1 introduced built-in support for buildpacks and layered JARs.14 By enabling the
LAYERED_JAR layout, the application’s dependencies (lib) and application code (classes) are split into distinct layers within the container image.14 This layering strategy is highly effective because it separates code based on its likelihood of change:

Dependency Layers: Library code, which tends to change less frequently, is placed in its own layers. When an application is updated, if only the application code has changed, Docker can reuse the cached layers for dependencies, significantly speeding up subsequent builds and deployments.14 This also means that if multiple applications are built using the same buildpacks, the base layers are shared and do not need to be downloaded multiple times on Kubernetes, offering a substantial advantage.70
Application Code Layer: The application’s custom code, which changes more frequently, is isolated in a separate, top layer. This ensures that only this relatively small layer needs to be rebuilt and pushed/pulled during updates, leading to much faster deployment cycles.14

This standardization provided by CNBs leads to predictability and speed in patching, deployment, and security. For instance, patching security vulnerabilities in underlying components (like OpenSSL) becomes significantly easier; it typically involves rebuilding (or “rebasing”) all applications using a patched buildpack, rather than individually updating Dockerfiles for each application.70 CNBs embrace modern container standards, such as the OCI image format, and leverage capabilities like cross-repository blob mounting and image layer “rebasing” on Docker API v2 registries, further enhancing efficiency.69

Layer Type	Content	Change Frequency	Caching Benefit	Impact on Deployment
Dependencies (lib)	Third-party libraries, framework JARs	Low	Highly cacheable across builds and applications	Reduced image download/upload size; faster deployments
Application Code (classes)	Custom application logic, resources	High	Only this layer needs to be rebuilt/updated	Rapid deployment cycles; minimal network transfer
Base Image (OS/JDK)	Operating system, Java Development Kit	Low	Shared across all applications using the same buildpack	Consistent base environment; simplified patching

6.5. Graceful Application Shutdown

Ensuring that Spring Boot applications shut down gracefully is a critical aspect of maintaining application reliability and preventing data loss, especially in dynamic Kubernetes environments where pods can be terminated due to scaling events, node rebalancing, or spot instance preemption.16
When an AWS spot instance is preemptively terminated, Kubernetes initiates a graceful eviction of pods. However, if the Spring Boot application itself is not configured to respond appropriately to shutdown signals (e.g., SIGTERM), active HTTP requests or ongoing background tasks can be abruptly cut off, leading to unexpected 500 errors or incomplete processing.16 This situation highlights that the problem is not inherently with Kubernetes infrastructure, but rather with the application’s internal handling of termination signals.
Spring Boot provides built-in support for graceful shutdown that needs to be properly configured. The key properties for this are:

server.shutdown=graceful: This property enables graceful shutdown for the embedded web server (e.g., Tomcat, Jetty, Netty). When enabled, the server stops accepting new requests and attempts to complete existing ones within a defined timeout.16
spring.lifecycle.timeout-per-shutdown-phase=30s: This property defines the maximum time allowed for each phase of the application context shutdown. Setting an appropriate timeout ensures that the application has sufficient time to complete in-flight requests and clean up resources before being forcibly terminated.16

The ability of an application to handle interrupt signals (like SIGTERM) is a fundamental principle of containerization. Any application running in a container should be designed to respond to these signals. While Kubernetes preStop hooks can be used to execute commands before a container is terminated, they do not inherently ensure graceful shutdown if the application itself does not handle the SIGTERM signal properly.16 The application must internally implement logic to finish ongoing requests and release resources upon receiving the shutdown signal. This is a crucial aspect of building resilient cloud-native applications.

7. Conclusion and Continuous Improvement

Optimizing a Spring Boot Backend-for-Frontend (BFF) application on AWS EKS involves a multi-layered approach that transcends basic deployment to embrace cloud-native best practices for resilience, scalability, and operational efficiency. The analysis presented in this report highlights several critical areas for improvement, each offering distinct advantages when properly implemented.
The BFF pattern itself, while simplifying frontend development and tailoring user experiences, introduces inherent complexities in the backend. This necessitates a strategic view of the BFF as a powerful data aggregation and security enforcement layer, rather than a mere proxy. Its success hinges on robust operational practices that mitigate risks like the “service fuse” and manage increased component complexity.
For external API consumption, migrating to Spring WebClient is not just a modernization step but a fundamental shift towards a reactive, non-blocking architecture, which is crucial for handling high concurrency in I/O-bound BFFs. Fine-tuning connection pooling and timeouts with Reactor Netty is essential for both performance and resilience, preventing cascading failures and resource exhaustion. Furthermore, the integration of Resilience4j for circuit breaking, retries, and bulkheads is indispensable for building fault-tolerant systems that gracefully degrade under stress. These resilience patterns must be embedded in the design phase, not as afterthoughts, and their effectiveness must be continuously validated through comprehensive observability.
The data layer demands careful attention, from secure credential management using Kubernetes Secrets to the strategic externalization of configuration via ConfigMaps. Optimizing database connection pooling with HikariCP, by meticulously tuning parameters like maximumPoolSize, idleTimeout, and maxLifetime, is paramount for preventing resource contention and ensuring stable database interactions across multiple application instances. For read-heavy workloads, leveraging AWS Aurora read replicas and adopting appropriate data access strategies are key to scaling database operations. Moreover, disciplined database migration practices using tools like Flyway or Liquibase, executed as a distinct pre-deployment step with backward compatibility in mind, are non-negotiable for agile and safe schema evolution in a continuous delivery pipeline.
Caching with AWS ElastiCache for Redis is a powerful mechanism to significantly boost performance and offload the database. Implementing Spring Cache Abstraction with distributed Redis ensures data consistency across scaled application instances. However, effective caching requires a clear strategy for cache invalidation and Time-to-Live (TTL) management to balance performance gains with data freshness.
Asynchronous processing and messaging are vital for decoupling components and handling background tasks. Spring Boot’s @Async and @Scheduled annotations provide in-application asynchronous capabilities, but for heavier or long-running jobs, deploying dedicated worker pods or utilizing Kubernetes Jobs/CronJobs offers superior isolation and scalability. Robust messaging with AWS SQS, coupled with Dead-Letter Queues (DLQs), is crucial for reliable message processing and error isolation. For real-time event-driven architectures, Apache Kafka (or AWS MSK) provides the necessary high-throughput, low-latency streaming backbone, with scaling managed efficiently through consumer groups and Kubernetes Operators. For all asynchronous workloads, ensuring graceful application shutdown is paramount to prevent data loss during pod termination.
Comprehensive observability, encompassing structured logging, metrics collection, and distributed tracing, forms the bedrock of operational excellence in EKS. Structured logging with CloudWatch Logs, complemented by FluentBit, enables efficient troubleshooting. Metrics collected via Micrometer and exposed to Prometheus (or AWS Managed Service for Prometheus) and visualized in Grafana provide real-time performance insights and drive intelligent autoscaling. Distributed tracing with OpenTelemetry and AWS X-Ray is indispensable for navigating complex microservice interactions and pinpointing bottlenecks across the entire request flow. Kubernetes health checks (liveness, readiness, startup probes) are fundamental for automated container lifecycle management and reliable traffic routing.
Finally, deep performance optimization involves fine-tuning the JVM using technologies like CRaC or GraalVM Native Image for faster startup and lower resource consumption, and leveraging Spring Boot’s lazy initialization for quicker application readiness. Meticulous Kubernetes resource management through requests and limits, coupled with Horizontal Pod Autoscaling (HPA) driven by relevant metrics, ensures efficient resource allocation. Container image optimization with Cloud Native Buildpacks, particularly their intelligent layering capabilities, significantly reduces image size and accelerates build and deployment times.
In conclusion, the path to continuous improvement for this Spring Boot BFF on AWS EKS involves a disciplined adoption of these cloud-native patterns and best practices. Each recommendation contributes synergistically to a more resilient, scalable, cost-efficient, and observable application. Regular monitoring, iterative refinement of configurations, and a strong collaboration between development and operations teams will be key to realizing the full potential of this architecture.

Referências citadas

Backend for Frontend (BFF) Pattern: Microservices for UX | Teleport, acessado em julho 22, 2025, https://goteleport.com/learn/backend-for-frontend-bff-pattern/
BFF Patterns: Introduction, acessado em julho 22, 2025, https://bff-patterns.com/
Backend for Frontend: Understanding the Pattern to Unlock Its Power - OpenLegacy, acessado em julho 22, 2025, https://www.openlegacy.com/blog/backend-for-frontend
BFF - Is a backend for frontend still necessary or even relevant nowadays? : r/node - Reddit, acessado em julho 22, 2025, https://www.reddit.com/r/node/comments/117phzn/bff_is_a_backend_for_frontend_still_necessary_or/
Integrate your Spring Boot application with Amazon ElastiCache | AWS Database Blog, acessado em julho 22, 2025, https://aws.amazon.com/blogs/database/integrate-your-spring-boot-application-with-amazon-elasticache/
Java Microservices and Containers in the Cloud: With Spring Boot, Kafka, Postgresql, Kubernetes, Helm, Terraform and AWS EKS 9798868805547, 9798868805554 - DOKUMEN.PUB, acessado em julho 22, 2025, https://dokumen.pub/java-microservices-and-containers-in-the-cloud-with-spring-boot-kafka-postgresql-kubernetes-helm-terraform-and-aws-eks-9798868805547-9798868805554.html
Getting Started | Spring Boot Kubernetes, acessado em julho 22, 2025, https://spring.io/guides/gs/spring-boot-kubernetes
Deploying a Spring Boot Application on Kubernetes with AWS EKS - DEV Community, acessado em julho 22, 2025, https://dev.to/chanuka/deploying-a-spring-boot-application-on-kubernetes-with-aws-eks-3hjm
Deploying a Spring Boot Bank Application on Amazon EKS: A Step-by-Step Guide - DEV Community, acessado em julho 22, 2025, https://dev.to/pravesh_sudha_3c2b0c2b5e0/deploying-a-spring-boot-bank-application-on-amazon-eks-a-step-by-step-guide-2gah
Deploying a Spring Boot Application on AWS EKS | by Java Techie | Medium, acessado em julho 22, 2025, https://medium.com/@javatechie/deploying-a-spring-boot-application-on-aws-eks-fdd7d075f034
Amazon EKS Best Practices Guide, acessado em julho 22, 2025, https://docs.aws.amazon.com/eks/latest/best-practices/introduction.html
Java, Spring Boot, Docker, AWS & EKS Deployment to Amazon’s Elastic Kubenernetes Service Tutorial - YouTube, acessado em julho 22, 2025, https://www.youtube.com/shorts/kQTFQJm4qis
Deploying a Spring Boot Application to an EKS cluster with ECR | by Abdullah jaffer, acessado em julho 22, 2025, https://medium.com/@abdullahjaffer96/deploying-a-spring-boot-application-to-an-eks-cluster-with-ecr-e9ced8707825
Creating a Docker image using Cloud Native Buildpacks in Spring …, acessado em julho 22, 2025, https://faun.pub/creating-a-docker-image-using-cloud-native-buildpacks-in-spring-boot-19ff81b5209d
Deployment of Spring Boot App on Amazon EKS Using GitHub, Jenkins, Maven, Docker, and Ansible | by Cloudoholic | AWS in Plain English, acessado em julho 22, 2025, https://aws.plainenglish.io/deployment-of-spring-boot-app-on-amazon-eks-using-github-jenkins-maven-docker-and-ansible-b049d9e06a0b
EKS Auto-Scaling + Spot Instances Caused Random 500 Errors — Here’s What Actually Fixed It : r/aws - Reddit, acessado em julho 22, 2025, https://www.reddit.com/r/aws/comments/1kcts5o/eks_autoscaling_spot_instances_caused_random_500/
Getting Started | Building a Reactive RESTful Web Service - Spring, acessado em julho 22, 2025, https://spring.io/guides/gs/reactive-rest-service/
Mastering Spring WebClient. A Comprehensive Guide for… | by …, acessado em julho 22, 2025, https://medium.com/@pradeepisuru31/mastering-spring-webclient-a693f90447f0
Spring Boot WebClient: Performance Optimization and Resilience - DZone, acessado em julho 22, 2025, https://dzone.com/articles/spring-boot-webclient-optimizing-performance-and-resilience
Consuming and Testing third party API’s using Spring Webclient …, acessado em julho 22, 2025, https://dev.to/itscosmas/consuming-and-testing-third-party-apis-using-spring-webclient-26lj
HTTP Client :: Reactor Netty Reference Guide - Spring, acessado em julho 22, 2025, https://docs.spring.io/projectreactor/reactor-netty/docs/1.2.0-M2/reference/html/http-client.html
Advanced WebClient Pool Configuration via `application.yaml` · Issue #3809 - GitHub, acessado em julho 22, 2025, https://github.com/spring-cloud/spring-cloud-gateway/issues/3809
Webclient timeout and connection pool Strategy - DEV Community, acessado em julho 22, 2025, https://dev.to/yangbongsoo/webclient-timeout-and-connection-pool-strategy-2gpn
Resilience4j Circuit Breaker, Retry & Bulkhead Tutorial - Mobisoft Infotech, acessado em julho 22, 2025, https://mobisoftinfotech.com/resources/blog/microservices/resilience4j-circuit-breaker-retry-bulkhead-spring-boot
Implementing Resilient Microservices with Resilience4j | by Arvind Kumar - Medium, acessado em julho 22, 2025, https://codefarm0.medium.com/implementing-resilient-microservices-with-resilience4j-ebb5f3c3599b
Guide to AWS Aurora RDS with Java | Baeldung, acessado em julho 22, 2025, https://www.baeldung.com/aws-aurora-rds-java
Integrating Amazon DynamoDB With Spring Boot Using Spring Cloud AWS - Baeldung, acessado em julho 22, 2025, https://www.baeldung.com/spring-data-dynamodb
Deploy SpringBoot app with MySQL on Amazon EKS | Kubernetes | AWS load balancer controller - YouTube, acessado em julho 22, 2025, https://www.youtube.com/watch?v=aXOB4tR0ONU
Securing Spring Boot Applications in Kubernetes | by Arton D. - Medium, acessado em julho 22, 2025, https://medium.com/@a-dem/securing-spring-boot-applications-in-kubernetes-a3c07725b856
Externalize Configuration to ConfigMap - Kube by Example, acessado em julho 22, 2025, https://kubebyexample.com/learning-paths/developing-spring-boot-kubernetes/lesson-4-deploying-spring-boot-kubernetes-3
ConfigMaps - Kubernetes, acessado em julho 22, 2025, https://kubernetes.io/docs/concepts/configuration/configmap/
Using a ConfigMap PropertySource :: Spring Cloud Kubernetes, acessado em julho 22, 2025, https://docs.spring.io/spring-cloud-kubernetes/reference/property-source-config/configmap-propertysource.html
pbelathur/spring-boot-performance-analysis: How to tune Spring Boot + HikariCP for the cloud - avoiding the common mistakes - GitHub, acessado em julho 22, 2025, https://github.com/pbelathur/spring-boot-performance-analysis
Spring Boot with AWS Aurora read replica - DEV Community, acessado em julho 22, 2025, https://dev.to/jackynote/spring-boot-with-aws-aurora-read-replica-35lg
Database Migrations in the Real World | The IntelliJ IDEA Blog, acessado em julho 22, 2025, https://blog.jetbrains.com/idea/2025/02/database-migrations-in-the-real-world/
Advice on db migrations workflow? : r/SpringBoot - Reddit, acessado em julho 22, 2025, https://www.reddit.com/r/SpringBoot/comments/1iajpk4/advice_on_db_migrations_workflow/
Using Amazon ElastiCache for Redis To Optimize Your Spring Boot Application, acessado em julho 22, 2025, https://keyholesoftware.com/using-amazon-elasticache-for-redis-to-optimize-your-spring-boot-application/
AWS ElasticCache with SpringBoot - Dev Genius, acessado em julho 22, 2025, https://blog.devgenius.io/aws-elastic-redis-cache-with-springboot-98c3ebcc6036
Optimizing Spring Boot Application Performance with Distributed Caching Using Redis, acessado em julho 22, 2025, https://yashodharanawaka.medium.com/optimizing-spring-boot-application-performance-with-distributed-caching-using-redis-ed611032ffb0
medium.com, acessado em julho 22, 2025, https://medium.com/@AlexanderObregon/tracking-background-job-status-with-spring-boot-and-a-database-table-5cf184a419c9#:~:text=Spring%20Boot%20supports%20asynchronous%20processing,managed%20by%20Spring’s%20task%20executor.
Handling Background Tasks with Spring Boot - GeeksforGeeks, acessado em julho 22, 2025, https://www.geeksforgeeks.org/advance-java/spring-boot-handling-background-tasks-with-spring-boot/
Deployment Strategies for Spring Batch on Kubernetes - GitHub Gist, acessado em julho 22, 2025, https://gist.github.com/srirajk/c7b5eca2b511bf20345c119e7d2dc950
Scaling ECS with SQS : r/aws - Reddit, acessado em julho 22, 2025, https://www.reddit.com/r/aws/comments/1j6a1ns/scaling_ecs_with_sqs/
AWS SQS with Spring Cloud AWS and Spring Boot 3 - HowToDoInJava, acessado em julho 22, 2025, https://howtodoinjava.com/spring-cloud/aws-sqs-with-spring-cloud-aws/
Spring Boot Integration With Amazon SQS — A Comprehensive Guide With Best Practices. | by Akeni Promise | Medium, acessado em julho 22, 2025, https://medium.com/@akeni.promise/spring-boot-integration-with-amazon-sqs-a-comprehensive-guide-with-best-practices-e4d63e5de10d
AWS Standard SQS Queue with Spring Boot | by Mert ÇAKMAK - Medium, acessado em julho 22, 2025, https://medium.com/@mertcakmak2/aws-standard-sqs-queue-with-spring-boot-974c163e0616
Using Dead Letter Queues in Amazon SQS - AWS SDK for Java 1.x, acessado em julho 22, 2025, https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/examples-sqs-dead-letter-queues.html
Scaling Microservices with Message Queues, Spring Boot and Kubernetes - LearnKube, acessado em julho 22, 2025, https://learnkube.com/blog/scaling-spring-boot-microservices
Set up event-driven auto scaling in Amazon EKS by using Amazon EKS Pod Identity and KEDA - AWS Prescriptive Guidance, acessado em julho 22, 2025, https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/event-driven-auto-scaling-with-eks-pod-identity-and-keda.html
Intro to Apache Kafka with Spring | Baeldung, acessado em julho 22, 2025, https://www.baeldung.com/spring-kafka
Spring Boot - Integration with Kafka - GeeksforGeeks, acessado em julho 22, 2025, https://www.geeksforgeeks.org/advance-java/spring-boot-integration-with-kafka/
Deploying and scaling Apache Kafka on Amazon EKS | Containers, acessado em julho 22, 2025, https://aws.amazon.com/blogs/containers/deploying-and-scaling-apache-kafka-on-amazon-eks/
Scaling Kafka for High Performance | by Bishal Devkota - Dev Genius, acessado em julho 22, 2025, https://blog.devgenius.io/scaling-kafka-for-high-performance-f322217f7cd8
Autoscale your Kafka consumer workloads | Cloud Run Documentation, acessado em julho 22, 2025, https://cloud.google.com/run/docs/configuring/workerpools/kafka-autoscaler
Implement monitoring for Amazon EKS with managed services | AWS Architecture Blog, acessado em julho 22, 2025, https://aws.amazon.com/blogs/architecture/implement-monitoring-for-amazon-eks-with-managed-services/
Monitoring Spring Boot Applications with Prometheus and Grafana - Stackademic, acessado em julho 22, 2025, https://blog.stackademic.com/monitoring-spring-boot-applications-with-prometheus-and-grafana-99805c27246a
How to generate Prometheus metrics from Spring Boot with Micrometer - Tutorial Works, acessado em julho 22, 2025, https://www.tutorialworks.com/spring-boot-prometheus-micrometer/
Distributed tracing with OpenTelemetry | AWS Open Source Blog, acessado em julho 22, 2025, https://aws.amazon.com/blogs/opensource/distributed-tracing-with-opentelemetry/
How To – Structured Logging with Spring Boot - Spring Framework Guru, acessado em julho 22, 2025, https://springframework.guru/how-to-structured-logging-with-spring-boot/
Kubernetes Horizontal Pod AutoScaler With Custom Metrics - NEX Softsys, acessado em julho 22, 2025, https://www.nexsoftsys.com/articles/how-to-extend-horizontal-pod-autoScaler-with-custom-metrics-in-kubernetes.html
hariohmprasath/distributed-tracing-for-containers-with-x-ray: This article explains how to enable distributed tracing for a microservice-based on spring boot using AWS X-ray. The microservice uses Dynamodb and Aurora RDS (Mysql) for data persistence. This article will also explain see how to create - GitHub, acessado em julho 22, 2025, https://github.com/hariohmprasath/distributed-tracing-for-containers-with-x-ray
Configure Liveness, Readiness and Startup Probes - Kubernetes, acessado em julho 22, 2025, https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
Liveness and Readiness Probes in Spring Boot | Baeldung, acessado em julho 22, 2025, https://www.baeldung.com/spring-liveness-readiness-probes
Java on containers: a guide to efficient deployment - Datadog, acessado em julho 22, 2025, https://www.datadoghq.com/blog/java-on-containers/
Using CRaC to reduce Java startup times on Amazon EKS | Containers, acessado em julho 22, 2025, https://aws.amazon.com/blogs/containers/using-crac-to-reduce-java-startup-times-on-amazon-eks/
How Spring Boot Optimizes Startup with Lazy Initialization - Medium, acessado em julho 22, 2025, https://medium.com/@AlexanderObregon/how-spring-boot-optimizes-startup-with-lazy-initialization-5467adb89fa0
Resource management | Best practices for deploying | Spring Boot - werf, acessado em julho 22, 2025, https://werf.io/guides/java_springboot/300_deployment_practices/030_resource_management.html
Horizontal Pod Autoscaling - Kubernetes, acessado em julho 22, 2025, https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
Cloud Native Buildpacks, acessado em julho 22, 2025, https://buildpacks.io/
Optimizing Java Applications on Kubernetes: Beyond the Basics : r …, acessado em julho 22, 2025, https://www.reddit.com/r/java/comments/1h1lg04/optimizing_java_applications_on_kubernetes_beyond/