Architecture Consideration for Cloud Native Applications

Truong, articles
Back

architecture-considerations.png

Design Considerations for Cloud-Native Applications

Functional Requirements

Available Resources

Monoliths and Microservices

Prior to an organization delivers a product, the engineering team needs to decide on the most suitable application architecture. In most of the cases 2 distinct models are referenced: monoliths and microservices.

Regardless of the adopted structure, the main goal is to design an application that delivers value to customers and can be easily adjusted to accommodate new functionalities.

Also, each architecture encapsulates the 3 main tires of an application:

Monoliths

In a monolithic architecture, application tiers can be described as:

Monoliths.png Diagram showcasing how a booking application is implemented using a monolithic architecture

A booking application referencing the monolithic architecture

Imagine a team develops a booking application using a monolithic approach. In this case, the UI is the website that the user interacts with. The business logic contains the code that provides the booking functionalities, such as search, booking, payment, and so on. These are written using one programming language (e.g. Java or Go) and stored in a single repository. The data layer contains functions that store and retrieve customer data. All of these components are managed as a unit, and the release is done using a single binary.

Microservices

In a microservice architecture, application tiers are managed independently, as different units. Each unit has the following characteristics:

Microservices.png Diagram showcasing how a booking application is implemented using a microservice architecture

A booking application referencing the microservice architecture

Now, let's imagine the team develops a booking application using a microservice approach.

In this case, the UI remains the website that the user interacts with. However, the business logic is split into smaller, independent units, such as login, payment, confirmation, and many more. These units are stored in separate repositories and are written using the programming language of choice (e.g. Go for the payment service and Python for login service). To interact with other services, each unit exposes an API. And lastly, the data layer contains functions that store and retrieve customer and order data. As expected, each unit is released using its own binary.

New terms

Further reading

Trade-offs for Monoliths and Microservices

These trade-offs cover development complexity, scalability, time to deploy, flexibility, operational cost, and reliability.

Development Complexity

Development complexity represents the effort required to deploy and manage an application.

Monoliths - one programming language; one repository; enables sequential development

Microservice - multiple programming languages; multiple repositories; enables concurrent development

Scalability

Scalability captures how an application is able to scales up and down, based on the incoming traffic.

Monoliths - replication of the entire stack; hence it's heavy on resource consumption

Microservice - replication of a single unit, providing on-demand consumption of resources

Time to Deploy

Time to deploy encapsulates the build of a delivery pipeline that is used to ship features.

Monoliths - one delivery pipeline that deploys the entire stack; more risk with each deployment leading to a lower velocity rate

Microservice - multiple delivery pipelines that deploy separate units; less risk with each deployment leading to a higher feature development rate

Flexibility

Flexibility implies the ability to adapt to new technologies and introduce new functionalities.

Monoliths - low rate, since the entire application stack might need restructuring to incorporate new functionalities

Microservice - high rate, since changing an independent unit is straightforward

Operational Cost

Operational cost represents the cost of necessary resources to release a product.

Monoliths - low initial cost, since one code base and one pipeline should be managed. However, the cost increases exponentially when the application needs to operate at scale.

Microservice - high initial cost, since multiple repositories and pipelines require management. However, at scale, the cost remains proportional to the consumed resources at that point in time.

Reliability

Reliability captures practices for an application to recover from failure and tools to monitor an application.

Monoliths - in a failure scenario, the entire stack needs to be recovered. Also, the visibility into each functionality is low, since all the logs and metrics are aggregated together.

Microservice - in a failure scenario, only the failed unit needs to be recovered. Also, there is high visibility into the logs and metrics for each unit.

Best Practices For Application Deployment

These practices are focused on health checks, metrics, logs, tracing, and resource consumption.

Health Checks

Health checks are implemented to showcase the status of an application. These checks report if an application is running and meets the expected behavior to serve incoming traffic. Usually, health checks are represented by an HTTP endpoint such as /healthz or /status. These endpoints return an HTTP response showcasing if the application is healthy or in an error state.

health-checks.png Screenshot showcasing a /status health check that returns an "OK - healthy" response

/status health check that showcases that the application is healthy

Metrics

Metrics are necessary to quantify the performance of the application. To fully understand how a service handles requests, it is mandatory to collect statistics on how the service operates.

For example, the number of active users, handled requests, or the number of logins. Additionally, it is paramount to gather statistics on resources that the application requires to be fully operational.

For example, the amount of CPU, memory, and network throughput. Usually, the collection of metrics are returned via an HTTP endpoint such as /metrics, which contains the internal metrics such as the number of active users, consumed CPU, network throughput, etc.

metrics.png Screenshot showcasing a list of metrics that counts the handled requests for different HTTP code

/metrics endpoint that list of metrics counting the amount requests by the HTTP code returned

Logs

Log aggregation provides valuable insights into what operations a service is performing at a point in time. It is the nucleus of any troubleshooting and debugging process.

For example, it is essential to record if a user logged in successfully into a service, or encountered an error while performing a payment.

Usually, the logs are collected from STDOUT (standard out) and STDERR (standard error) through a passive logging mechanism. This means that any output or errors from the application are sent to the shell. Subsequently, these are collected by a logging tool, such as Fluentd or Splunk, and stored in backend storage. However, the application can send the logs directly to the backend storage. In this case, an active logging technique is used, as the log transmission is handled directly by the application, without a logging tool required.

There are multiple logging levels that can be attributed to an operation. Some of the most widely used are:

As well, it is common practice to associate each log line with a timestamp, that will exactly record when an operation was invoked.

logs.png Screenshot of multiple INFO log lines recorded when Prometheus service started

Multiple INFO log lines recorded when a Prometheus service started

Tracing

Tracing is capable of creating a full picture of how different services are invoked to fulfill a single request. Usually, tracing is integrated through a library at the application layer, where the developer can record when a particular service is invoked. These records for individual services are defined as spans. A collection of spans define a trace that recreates the entire lifecycle of a request.

Resource Consumption

Resource consumption encapsulates the resources an application requires to be fully operational. This usually refers to the amount of CPU and memory that is consumed by an application during its execution. Additionally, it is beneficial to benchmark the network throughput, or how many requests can an application handle concurrently. Having awareness of resource boundaries is essential to ensure that the application is up and running 24/7.

resource.png Screenshot of the graph with the CPU consumption of the coredns container

A graph showcasing the CPU consumption of the coredns container

Further reading

Health Checks - explore the core reasons to introduce health checks and implementations examples

Prometehus Best Practices on Metrics Naming - explore how to name, label, and define the type of metrics

Application Logging Best Practices - read more on how to define what logs should be collected by an application

Logging Levels - explore possible logging levels and when they should be enabled

Enabling Distributed Tracing for Microservices With Jaeger in Kubernetes - learn what tools can be used to implement tracing in a Kubernetes cluster

Edge Case: Amorphous Applications

Some of the most encountered operations in the maintenance phase are listed below:

Amorphous Applications.png Screenshot of operations to occur in the maintenance phase, including split, merge, replace and stale operations

Application operations to occur in the maintenance phase

A split operation - is applied if a service covers too many functionalities and it's complex to manage. Having smaller, manageable units is preferred in this context.

A merge operation - is applied if units are too granular or perform closely interlinked operations, and it provides a development advantage to merge these together. For example, merging 2 separate services for log output and log format in a single service.

A replace operation - is adopted when a more efficient implementation is identified for a service. For example, rewriting a Java service in Go, to optimize the overall execution time.

A stale operation - is performed for services that are no longer providing any business value, and should be archived or deprecated. For example, services that were used to perform a one-off migration process.

Performing any of these operations increases the longevity and continuity of a project. Overall, the end goal is to ensure the application is providing value to customers and is easy to manage by the engineering team. But more importantly, it can be observed that the structure of a project is not static. It amorphous and it evolves based on new requirements and customer feedback.

Further reading

Modern Banking in 1500 Microservices - watch how Monzo is managing thousands of microservices and evolves their ecosystem.

© TruongIdeas & Feedback