Open all questions about Spring
In this article:
➤ What are the main cloud solutions for Java development and their analogues for regular infrastructure?
➤ What are the main cloud solutions from Amazon for Java development
➤ What are the main cloud solutions from Google for Java development?
➤ What are the main cloud solutions from Microsoft for Java development?
➤ How to optimize a cloud solution in terms of costs
➤ How servless solutions work
➤ How Git LFS (Large File Storage) Works
➤ How to work with Git submodules
➤ What are the main cloud solutions for Java development and their analogues for regular infrastructure?
Amazon Web Services | Google Cloud | Microsoft Azure | Analogue without cloud |
EC2 (Elastic Compute Cloud) | Google Compute Engine (GCE) | Azure Virtual Machines (VMs) | VMware, KVM, VirtualBox |
EKS/ECS (Kubernetes/Containers) | Google Kubernetes Engine (GKE) | Azure Kubernetes Service (AKS) | Kubernetes on local server |
API Gateway | Google Cloud Endpoints | Azure API Management | Kong, NGINX, Apache |
AWS Lambda | Google Cloud Functions | Azure Functions | Serverless Framework (for local running) |
Amazon RDS | Google Cloud SQL | Azure SQL Database | PostgreSQL, MySQL on local server |
Amazon DynamoDB | Google Cloud Firestore/Bigtable | Azure Cosmos DB | MongoDB, CouchDB, Cassandra |
Amazon SQS | Google Cloud Pub/Sub | Azure Queue Storage | RabbitMQ, Apache Kafka, ActiveMQ |
Amazon SNS | Google Cloud Pub/Sub | Azure Notification Hubs | RabbitMQ with push notifications, email services |
Elastic Load Balancing (ELB) | Google Cloud Load Balancing | Azure Load Balancer | HAProxy, NGINX |
IAM (Identity and Access Management) | Google Cloud IAM | Azure Active Directory (AD) | FreeIPA, OpenLDAP, Kerberos |
CloudWatch | Google Cloud Operations (Stackdriver) | Azure Monitor | Prometheus, Grafana, ELK Stack |
Amazon S3 | Google Cloud Storage | Azure Blob Storage | MinIO, Ceph, FTP servers |
AWS Step Functions | Google Cloud Workflows | Azure Logic Apps | Apache Airflow, Cadence, Conductor |
Secrets Manager | Google Secret Manager | Azure Key Vault | HashiCorp Vault, KeePass |
Amazon Aurora | Google Cloud Spanner | Azure SQL Database Managed Instance | PostgreSQL, MySQL with replication |
➤ What are the main cloud solutions from Amazon for Java development
Amazon EC2 (Elastic Compute Cloud):
Provides scalable virtual servers for running applications. EC2 allows you to create and manage virtual machine instances on which you can deploy your applications, such as microservices, web servers, databases, and other components. Amazon EC2 (Elastic Compute Cloud)
Amazon Elastic Kubernetes Service (EKS) / Amazon Elastic Container Service (ECS):
Used for container orchestration. If you use Docker for microservices, these services will help automate the deployment, management, and scaling of containers.
Amazon API Gateway:
Provides the ability to create, publish, manage, and secure APIs for microservices. Works in conjunction with AWS Lambda or ECS.
AWS Lambda:
If you need serverless functions for small tasks or individual microservices, Lambda can help you run code without having to manage servers.
Amazon RDS (Relational Database Service):
For managing relational databases such as PostgreSQL, MySQL or Oracle, AWS takes care of infrastructure management, backups and scaling.
Amazon DynamoDB:
NoSQL database for microservices that require high speed and flexible data structure. Great for applications with unstructured data.
Amazon SQS (Simple Queue Service):
Asynchronous message queue for communication between microservices. Allows microservices to exchange messages, avoiding data loss and failure under high load.
Amazon SNS (Simple Notification Service):
Notification distribution service. Used to publish and deliver messages between microservices, or to send push notifications, email or SMS.
Amazon Elastic Load Balancing (ELB):
To balance the load between your microservices, AWS ELB automatically distributes incoming traffic across multiple application instances.
AWS IAM (Identity and Access Management):
Manage access and security policies for AWS services and resources. Useful for microservice architecture with distributed access.
AWS CloudWatch:
For monitoring logs, metrics and alerts of your microservices. Allows you to configure monitoring of the state of services, as well as identify and fix failures.
Amazon S3 (Simple Storage Service):
Object storage for storing static files, data or backups. Used as a centralized storage for microservices if they need to store or share files.
AWS Step Functions:
Orchestration and management of execution processes of several microservices. Allows you to build and manage complex business processes using microservices.
AWS Secrets Manager:
Manage secrets such as API keys, passwords, tokens, etc. This makes it easy to securely store and access secrets required to run microservices.
Amazon Aurora:
A high-performance relational database compatible with MySQL and PostgreSQL, used for high-load applications in microservice architecture.
➤ What are the main cloud solutions from Google for Java development?
Google Kubernetes Engine (GKE):
a managed container orchestration service based on Kubernetes. It is one of the most popular options for microservice architecture, where each microservice runs as a container. GKE allows you to easily manage and scale containerized Java applications, providing automatic deployment and load balancing.
Cloud Run:
a platform for running containerized applications without managing infrastructure. The service is fully managed, and you pay only for the resources you use. Suitable for small microservices that do not require full management of Kubernetes clusters.
App Engine:
a serverless platform (PaaS) designed for automatic application scaling. It supports Java and allows you to develop microservices without worrying about servers and their configuration. This is a great solution for microservices where simplicity and speed of development are important.
Google Cloud Functions:
A serverless Function as a Service (FaaS) service that allows you to run small pieces of Java code in response to events (such as HTTP requests or messages from other services). It is well suited for microservice architectures where you need to quickly and easily implement small components or event handlers.
Cloud Pub/Sub:
An asynchronous messaging service that enables communication between microservices. It supports a publish/subscribe model for message delivery and is ideal for use cases where your microservices need to communicate with each other via messaging.
Cloud SQL:
A managed relational database for MySQL, PostgreSQL, and SQL Server. Provides high availability, automatic backups, and scalability. It’s a great choice for microservices that require traditional relational databases.
Cloud Firestore and Datastore:
NoSQL databases with high availability, designed for storing unstructured data. Firestore can be a great choice for microservices that need flexibility in the data structure.
Google Cloud Storage:
object storage for storing data, files, static resources, and media. Used to store large amounts of data that can be accessed by various microservices.
Google Cloud Spanner:
a scalable, distributed, highly available, transaction-based relational database. Suitable for global microservices that require working with large volumes of data with low latency.
Google BigQuery:
a scalable analytics platform (data warehouse) optimized for working with large volumes of data. Supports SQL queries for analyzing data that can be generated by microservices and is used for processing and analyzing data in real time.
Google Cloud Load Balancing:
a load balancing service that allows you to distribute incoming traffic between multiple microservices or instances deployed in Google Cloud. Supports balancing for both HTTP(S) and TCP/UDP traffic.
Cloud Monitoring (formerly Stackdriver):
a monitoring and logging system that allows you to track the performance of microservices, analyze logs, and set up alerts. This helps control the state of microservices and respond to incidents in a timely manner.
Secret Manager:
a service for securely managing secrets (API keys, passwords, and sensitive data) required for microservices. Supports encryption and strict access control policies.
Anthos:
a hybrid and multi-cloud management platform that enables deployment and management of microservices across different platforms (GKE, other clouds, on-premises infrastructure) with unified management and security policies.
Traffic Director:
A managed service for dynamic load balancing and traffic management at the application level. Supports integration with Envoy for implementing a service mesh.
VPC Service Controls:
a service for protecting microservices from unwanted access, which allows you to configure control boundaries between different Google Cloud services and set access rules.
➤ What are the main cloud solutions from Microsoft for Java development?
Azure Kubernetes Service (AKS):
managed container orchestration service based on Kubernetes. It is an excellent solution for deploying and managing containerized Java applications, providing automated deployment, scaling and management of containers.
Azure App Service:
Platform as a Service (PaaS) that enables you to deploy and scale web applications and APIs. Supports Java and provides automated infrastructure management, including updates and scaling.
Azure Functions:
a serverless computing service (Function as a Service, FaaS) that allows you to run fragments of Java code in response to events. Ideal for implementing small microservices and event handlers.
Azure Service Bus:
a messaging service that supports a publish/subscribe model and message queues. Allows microservices to communicate with each other, ensuring reliable message delivery.
Azure SQL Database:
A managed relational database for SQL Server. Provides high availability, automatic backups, and scalability, making it a good choice for relational data in microservices.
Azure Cosmos DB:
A globally distributed NoSQL database with high availability and support for various data models, including documents and graphs. Great for storing unstructured microservice data.
Azure Blob Storage:
object storage for storing large amounts of unstructured data such as files, images, and backups. Used to store data accessible to microservices.
Azure Synapse Analytics:
an analytics platform that combines data warehousing and analytics capabilities. Suitable for processing and analyzing large volumes of data generated by microservices.
Azure Load Balancer:
A load balancing service that distributes traffic across multiple application instances. Supports load balancing for TCP and UDP traffic and provides high availability and scalability.
Azure Monitor:
a monitoring and logging system that provides tools for tracking application performance, analyzing logs, and setting up alerts. Helps monitor the state of microservices and respond to incidents.
Azure Key Vault:
a service for managing secrets such as keys, passwords, and certificates. Provides secure storage and access control for sensitive data required for microservices to operate.
Azure Arc:
a platform for managing hybrid and multi-cloud environments that enables you to deploy and manage microservices across multiple platforms and clouds with unified management and security policies.
Azure Application Gateway:
managed application-level load balancing service with Web Application Firewall (WAF) support. Allows you to manage and secure application traffic, including microservices.
Azure Virtual Network (VNet):
a service for creating private networks in the cloud. Allows you to set up a network protected from external access and manage connections between different microservices and resources in Azure.
➤ How to optimize a cloud solution in terms of costs
Using Serverless and BaaS solutions:
Use Azure Functions or Google Cloud Functions to run individual microservices or functions that don't need to be running all the time. This lets you pay only for the actual time your code runs, rather than for VMs or containers that are constantly running.
Automatic scaling:
Set up automatic scaling in Kubernetes (using Azure Kubernetes Service (AKS) or Google Kubernetes Engine (GKE)) to manage the number of microservice replicas based on load. This allows you to conserve resources and pay only for the capacity you need during periods of high load.
Optimizing container resources:
Configure the right resources (CPU and memory) for containers in Kubernetes. Define and configure limits and requests for container resources to prevent over-consumption and unnecessary costs.
Selecting the Right Virtual Machine Types:
In Amazon EC2 or Azure Virtual Machines, choose the right instance types that are optimized for your needs. For example, use t3.micro or t4g.micro for light-load applications, or c5/m5 for high-performance workloads. Using reserved instances or long leases can also reduce costs.
Using Spot Instances and Preemptible VMs:
In AWS, you can use Spot Instances, and in Google Cloud, you can use Preemptible VMs for less critical workloads. These instances are significantly cheaper than regular instances, but can be terminated at any time.
Optimizing data storage:
For storing data, use the appropriate storage depending on the type of data. For example, use Google Cloud Storage or Azure Blob Storage for unstructured data and Amazon S3 for storing backups. Choose storage tiers (e.g. cold storage) for infrequently accessed data.
Setting up caching:
Using application-level caching or services like Redis (Azure Cache for Redis or Google Cloud Memorystore) to reduce database load and query costs.
Cost monitoring and management:
Use monitoring and cost management tools, such as Google Cloud's Operations Suite (formerly Stackdriver) or Azure Cost Management, to track resource usage and analyze costs. These tools help you identify inefficient spending and optimize resource usage.
Reduced number of database requests:
Use caching and data aggregation strategies to minimize the number of database queries and reduce query and storage costs.
Proper resource lifecycle management:
Configure automatic deletion of unnecessary resources and temporary instances after task completion. For example, using Auto Scaling groups with policies to delete instances after their use or on a schedule.
Scenario:
Let's say you have a microservices architecture on Google Cloud or AWS with multiple microservices that communicate via APIs and handle different amounts of traffic throughout the day. You want to reduce costs by optimizing resource usage.
Optimization steps:
Using Cloud Run for occasional services:
Some microservices may not need to be running all the time, especially if they are only used in response to certain events (such as asynchronous tasks). Instead of deploying these microservices to Kubernetes or virtual machines, you can migrate them to Google Cloud Run or AWS Fargate.
This will allow you to pay only for the time when services are actually running. For example, event handlers or functions can be run in response to HTTP requests or messages from queues (Google Pub/Sub, Amazon SQS).
Using Spot/Preemptible Instances for non-critical tasks:
Some microservices may perform tasks that are not time-critical and can be suspended. For such tasks, you can use AWS Spot Instances or Google Preemptible VMs, which are significantly cheaper than standard virtual machines.
Spot instances or Preemptible VMs are 70-90% cheaper and can be used for background tasks such as data analysis, batch processing, or rendering.
Automatic resource scaling:
Setting up automatic horizontal scaling in Kubernetes (GKE in Google Cloud or AKS in Azure) allows you to spin up more microservice instances only when really needed (for example, when traffic spikes) and automatically reduce the number of instances during periods of low load.
This helps avoid overpaying for computing power during periods of low activity when minimal resources are sufficient.
Optimizing Containers in Kubernetes (GKE, AKS, EKS):
When using Kubernetes, it is important to set resource limits and requests for containers correctly. If containers "ask" for resources (CPU, memory) too much, this leads to excessive costs. If resources are set too low, this can lead to performance degradation.
Properly configuring CPU and memory for each microservice allows you to use resources as efficiently as possible without over-provisioning them.
Using managed databases with automatic scaling:
Use managed databases such as Google Cloud SQL, Amazon RDS, or Azure SQL with autoscaling enabled. This allows the database to automatically scale up or down its processing power based on the load.
You pay only for the resources you actually use and avoid the need to manually configure or maintain database servers.
Setting up cold and hot storage:
Use multiple storage tiers to store data. For example, data that is rarely accessed can be stored in cheaper solutions (cold storage) such as Google Cloud Coldline Storage or Amazon S3 Glacier. Hot data that is frequently accessed can be stored in Google Cloud Storage or Amazon S3 Standard.
This helps reduce data storage costs without affecting the performance of frequently accessed data.
Using Pub/Sub or SQS for asynchronous processing:
Asynchronous processing via message queues (e.g. Google Pub/Sub or AWS SQS) allows microservices to communicate with each other without the need for a permanent active connection. This allows inactive microservices to be temporarily suspended and started only when a new message arrives.
Asynchronous architecture reduces resource consumption because microservices run only when needed, rather than constantly.
Monitoring usage and costs:
Use built-in monitoring tools like Google Cloud Monitoring or AWS CloudWatch to analyze resource usage and costs. They can help you identify unused or underutilized resources that can be optimized or removed.
Regular analysis of metrics helps to quickly identify ineffective resources and optimize their use.
Data caching:
To reduce the load on databases and speed up microservices, implement a caching system using Redis or Memcached (for example, Google Cloud Memorystore or AWS ElastiCache). This allows you to cache frequently accessed data and reduce the number of database calls.
Caching reduces database costs, reduces latency, and improves performance.
Selecting Reserved Instances:
For services that need to be running all the time, consider using Reserved Instances in AWS or similar solutions in other clouds (e.g. Committed Use Contracts in Google Cloud). These can provide discounts of up to 75% compared to regular instances if you are willing to commit to instance usage for a long period (e.g. one or three years).
This significantly reduces the cost of persistent services, especially for mission-critical microservices.
➤ How servless solutions work
Serverless solutions (or serverless computing) is a cloud computing model in which developers can create and run applications without managing physical servers or virtual infrastructure. In this model, servers still exist, but their management and maintenance is completely taken over by cloud providers, allowing the user to focus only on the code and logic of the application.
How Serverless works:
Automatic infrastructure management:
In the serverless model, the developer does not create and manage servers. The cloud provider (e.g. AWS, Google Cloud, Azure) dynamically manages servers, scales resources depending on the load, and automatically releases resources when the application is not in use.
Example: When a request comes to a serverless function, the provider automatically allocates the necessary computing power and runs the function. Once the request is processed, the resources are released.
Pay as you go:
One of the main advantages of serverless is that you only pay for the actual code execution time and resource volume used. This is different from traditional models, where you pay for dedicated servers regardless of how intensively they are used.
Example: With AWS Lambda or Google Cloud Functions, you pay for the number of requests and the time it takes to execute your code (e.g., milliseconds), rather than for servers that are constantly running.
Event-driven architecture:
Serverless solutions are often built around an event-driven architecture, where code (functions) are executed in response to certain events. Events can be triggered by HTTP requests, database changes, posting messages to a queue, or uploading files to cloud storage.
Example: The function can be called when a user uploads a file to storage (such as Amazon S3 or Google Cloud Storage), and the serverless function automatically processes the file, such as compressing it or changing the format.
Automatic scaling:
Serverless solutions automatically scale to the current load. When multiple users access the application simultaneously, the serverless platform automatically launches additional instances of functions to serve their requests. When the load decreases, resources are automatically released.
Example: If you have a web application powered by Google Cloud Functions and 1000 requests come in at the same time, Google Cloud will automatically create as many containers as needed to handle those requests in parallel.
Multi-language support:
Serverless platforms usually support different programming languages, such as Java, Python, JavaScript (Node.js), Go, and others. This allows developers to choose the most appropriate language for the task, without limiting themselves to infrastructure issues.
Example: AWS Lambda supports Java, Python, Node.js, Go, and other languages, so developers can write functions in the language they're comfortable with.
The main components of serverless:
Functions:
Pieces of code that are executed in response to events. This is the basis of serverless applications. For example, in AWS these are Lambda functions, in Google Cloud – Cloud Functions, in Azure – Azure Functions.
Events (Triggers):
The mechanism that causes the function to be executed. This could be an HTTP request, a change to data in a database, a message in a queue, or another event supported by the provider.
Services (Back-end services):
Serverless applications can use additional services such as databases, message queues, authentication systems, and others that are provided by the cloud provider. These services are also managed by the cloud and are paid for as you use them.
Examples of serverless solutions:
AWS Lambda:
allows you to run functions in response to events such as HTTP requests (via API Gateway), database changes (e.g. Amazon DynamoDB), postings to message queues (e.g. Amazon SQS), file uploads (Amazon S3), etc. Lambda automatically scales and releases resources when the function completes.
Google Cloud Functions:
works similarly to AWS Lambda and supports many types of events: HTTP requests, changes in databases (Cloud Firestore), publications in Pub/Sub, file uploads to Cloud Storage, etc.
Azure Functions:
also provide serverless solutions that can scale and support a variety of events, including HTTP requests, changes in Azure Cosmos DB, events in Azure Storage, publications in queues, and many others.
Here is an example of a serverless function on AWS Lambda using Kotlin. The function will handle HTTP requests via API Gateway and return a simple message, similar to the JavaScript example.
import com.amazonaws.services.lambda.runtime.Context import com.amazonaws.services.lambda.runtime.RequestHandler import com.amazonaws.services.lambda.runtime.events.APIGatewayProxyRequestEvent import com.amazonaws.services.lambda.runtime.events.APIGatewayProxyResponseEvent class LambdaHandler : RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent> { override fun handleRequest( input: APIGatewayProxyRequestEvent, context: Context ): APIGatewayProxyResponseEvent { // We get the parameter "name" from the request val name = input.queryStringParameters?.get("name") ?: "World" val responseMessage = "Hello, $name!" // Forming a response val response = APIGatewayProxyResponseEvent() response.statusCode = 200 response.headers = mapOf("Content-Type" to "application/json") response.body = """{"message": "$responseMessage"}""" return response } }
Description:
LambdaHandler:
This is the main class that implements the RequestHandler interface to handle requests.
APIGatewayProxyRequestEvent:
An object containing the input data of an HTTP request, such as request parameters, body, and headers.
handleRequest:
method that processes the incoming request. It extracts the name parameter from the query string and returns a greeting message.
APIGatewayProxyResponseEvent:
An object that returns a response with HTTP status, headers, and response body in JSON format.
Deploy to AWS Lambda:
Write and compile Kotlin code (use Gradle to build into a .jar file).
In AWS Lambda, create a new function and upload the compiled .jar.
Configure API Gateway to call the function over HTTP.
Advantages of serverless solutions:
Minimal infrastructure management costs: There is no need to manage servers or their configurations.
Pay as you go: You pay only for the time the code is executed and for the resources used.
Automatic scaling: Serverless applications automatically adjust to load.
Rapid development and launch: Developers can focus on writing code rather than configuring servers.
Disadvantages of serverless solutions:
Execution time limits: Serverless functions usually have execution time limits (for example, in AWS Lambda it is 15 minutes).
Cold start delay: Sometimes the first initialization of a function can take longer than subsequent calls (called "cold start").
Limited flexibility: Serverless platforms are limited by the capabilities and configurations offered by cloud providers. Some applications may require specific configuration that is not available in serverless models.
Vendor lock-in: Dependency on a specific cloud provider and its services can make it difficult to migrate an application to another platform.
Conclusion:
Serverless solutions are ideal for applications with intermittent load or event-driven systems where you don’t need to maintain servers all the time. They greatly simplify application development and deployment, providing automatic scaling and cost savings.
➤ How Git LFS (Large File Storage) Works
Regular git does not handle very large files well (e.g. .bin, .so, .ckpt, .pt, .jpg, .zip over 100MB).
Git LFS solves this problem like this:
What's happening:
When you add a large file (like model.bin) to a repository, Git LFS stores only a small "placeholder" – a text file indicating where to find the real file – instead of the file itself.
The actual file is sent separately to a special cloud storage GitHub LFS.
When another person clones the repository, git lfs will automatically download large files separately.
Install Git LFS:
git lfs install
Set up large file tracking:
After this, a .gitattributes file is automatically created, where the tracking rules are written.
git lfs track "*.bin"
How to correctly download a project with LFS files if they did not load:
With normal cloning:
small files will download immediately
instead of large files, "placeholder links" (like texts like oid sha256:) will be downloaded first
If after cloning the LFS files are not loaded automatically (for example, sometimes happens in old versions of Git), you need to do:
git lfs pull
➤ How to work with Git submodules
Git submodules are a mechanism that allows a repository to include other repositories as subprojects. They are very useful, for example, when a project depends on other libraries that are developed separately.
What is important to remember:
A submodule is tied to a specific commit of the connected repository, not to the latest state of a branch.
If you change something inside a submodule, you need to go into the submodule folder, do git commit there, and push separately.
Add submodule:
This will create a libs/library folder and clone the external repository into it.
A .gitmodules file will appear – this is where Git stores information about submodules.
git submodule add <repository URL> <path_to_clone>
Clone a project with submodules:
When a project containing submodules is cloned, they are not pulled in by default! You need to:
git clone <repository URL> git submodule update --init --recursive // or git clone --recurse-submodules <repository URL>
Update submodules:
If changes have appeared in the connected project
git submodule update --remote --merge // or for a specific submodule cd path/to/submodule git fetch git merge origin/main