What should you not be doing with the Microservice architecture?

. 14 min read

If you are working in the booming micro service ecosystem, you might have already seen this. API Gateway to the rescue. It’s like that one single solution that will make your micro-service architecture perfect.

I work as a Software Architect in one of the Top 5 Fortune company and my job is to build the products that can withstand the next generation needs.

I have been into some brutal discussions about this topic with my friends, and I sound like an arrogant b***h having radical opinions. After all, there are a lot of so-called open source projects vouching about this. And what about cloud products, who are charging hefty amounts for it. They must not be crazy to do so, I am definitely mad. So, I thought, let’s write about it, and let you decide.

The Hard Truth of the IT industry, the new fancy!!!

Technology has recurring cycles, we all know it. A few decades ago, functional programming was in the mainstream, and then we have it again now. Likewise, we had the service-oriented architecture in the past, and it came again with a different name — micro-services.

And let’s get serious, most of these debate and architectural trends are not driven by requirements, but by marketing.

It starts with some fancy product, making buzz about how the world is changing and their product will save you. We often miss the point of technology and architectural needs. So, let’s take a step back and figure out what are micro-services and why do we need it. If we do need it, should we really use API Gateway products?


What the heck is API Gateway and What problem does it solve?

Modern application architecture demands everything that is new. After all, it’s modern, it’s not conventional! We can’t live in a monolithic world anymore. And yes ofcourse, service-oriented architecture are things of the past. Now, it should ONLY be micro-services. Sarcastic ?? obviously!!!

First, microservices?

The basic premise of a micro-service architecture is that the application should be divided into small services. These services can talk to different data providers and have their own data stores. And these services can talk to each other to fulfill the user requests.

There will be yet another service abstraction for aggregation, talking with multiple micro-services and doing orchestration. They could be doing data transformations, and yeah they will be scalable.

Well, all this is good. But, here comes the play!

These abstract or higher order services now need orchestration. They need cross-cutting concerns like throttling, circuit breaker, business transformations, marshaling. So, let’s use the API Gateway.

Second, API Gateway?

An API gateway is a software that sits between clients and services. It acts as a reverse proxy, routing requests from clients to services. It would replace the direct client to service communication with a proxy. At a broad level, we can associate following operations with it:

  • Layer 7 routing and load balancing
  • Response aggregation from many services
  • Business transformations
  • Marshaling and Un-marshaling for the request/response
  • Data format and protocol conversions
  • Authentication and Authorization
  • SSL termination
  • Throttling, Circuit breaker, rate limiting etc
It is essentially a centralized service abstraction that does the heavy lifting of orchestration, transformations and cross-cutting magic.
Example architecture diagram with an API Gateway

Let’s look into this example architectural diagram and evaluate the benefits of the API Gateway.

  1. It provides a layer 7 reverse proxy for routing. The clients are not aware of the presence of 3 microservices underneath. They further do not need to care about the location or the access points of each service.
  2. The internal services could have their own release cycles independent of the client. As far as one of the client, does not have an impact due to contract changes, they do not need to change.
  3. It can aggregate responses from multiple services and provide a unified single response. It can also parse and return only specific sections of the data. It’s like a mini transformation engine which can transform data elements, data formats, or even protocols.
  4. It removes a lot of chattiness between the client and the server, both in terms of request count and the content of data. It usually requires some server-side agent or a snippet of scripting code to provide the required logic of conversions. Some products go further to provide you with a UI tool and a Javascript sandbox to write or define transformation logic.
  5. It can handle cross-cutting security concerns like authentication, authorization, SSL termination, logging, tracing, web application firewalls, cross-site scripting etc.
  6. It can do service discovery, response caching, request/response correlation for clients.
  7. It can set up barriers for throttling, rate limits, circuit breakers, quality of service, and health probes.
  8. It can further provide monitoring, alerting and reporting capabilities for the upstream performance.

This is all cool, but What’s wrong with it?

Well, all this is good and in most cases necessary. But, there are dark sides to it and we have to get to the basics to understand those. I would like to make my argument on the basis of the reactive design principles. I will further use hypothetical Netflix example for all the arguments.

Let’s get to the bottom with reactive principles

Jonas Boner, founder, and CTO of Lightbend and creator of Akka says that:

“Application requirements have changed dramatically in recent years. Both from a runtime environment perspective, with multicore and cloud computing architectures nowadays being the norm, as well as from a user requirements perspective, with tighter SLAs in terms of lower latency, higher throughput, availability and close to linear scalability. This all demands writing applications in a fundamentally different way than what most programmers are used to.”

Reactive Manifesto defines the core principles of reactive programming.

Systems built as Reactive Systems are more flexible, loosely-coupled and scalable. This makes them easier to develop and amenable to change. They are significantly more tolerant of failure and when failure does occur they meet it with elegance rather than a disaster. Reactive Systems are highly responsive, giving users effective interactive feedback.

Principle #1: Responsive

The 1st principle of responsiveness defines that:

“The system responds in a timely manner if at all possible. Responsiveness is the cornerstone of usability and utility, but more than that, responsiveness means that problems may be detected quickly and dealt with effectively. Responsive systems focus on providing rapid and consistent response times, establishing reliable upper bounds so they deliver a consistent quality of service. This consistent behaviour in turn simplifies error handling, builds end user confidence, and encourages further interaction.”

It does not matter if an application is fancy and solves world hunger, if it does not respond in a timely manner all the time. End users have a very limited attention span. Anything taking longer than expected is enough for the users to move away.

API Gateway is a single access point, potentially doing a monolith work across all the domains of your application. It suffers from the noisy neighbour syndrome, where the chattiness of one particular type of domain impacts others. It can drastically impact responsiveness.

Let’s imagine Netflix has an agreement with a business partner, to offer a live stream of the next “Yankees game” to the users. They also stream recorded “Game of Thrones” episodes from HBO, delivered immediately next week after the broadcast.

Sadly, they coincide with the timings this time. While streaming “Game of Thrones” is simple because all the digital conversion was already done. But, live game streaming is a different beast. They require huge real-time processing of the video and is damn slow.

But guess what, they share the same API Gateway, and because Yankees data feed is slow, no one can watch “Game of Thrones” properly. Audience is OK to expect a slight delay or lag in the “Yankees Game” because it is live, but not with “Game of Thrones”. (I know some Yankees fans might wanna hit me!).

Anyways, the point is, you can increase the capacity of the entire API Gateway, but good luck with the cost and the operational hassle.

In a typical responsive application, different services follow different service level objectives and agreements. It’s OK for one of them to be slow, while other is providing immediate responses. That’s where microservices help by decoupling the resources and allowing selective scaling. You can’t really shut down or slow the entire application because one of the application data sources has bugged the API Gateway.

Principle #2: Resilient

The 2nd principle of resiliency defines that:

“The system stays responsive in the face of failure. This applies not only to highly-available, mission-critical systems — any system that is not resilient will be unresponsive after a failure. Resilience is achieved by replication, containment, isolation, and delegation. Failures are contained within each component, isolating components from each other and thereby ensuring that parts of the system can fail and recover without compromising the system as a whole. Recovery of each component is delegated to another (external) component and high-availability is ensured by replication where necessary. The client of a component is not burdened with handling its failures.”

To my point again, API Gateway is a single access point, potentially doing a monolith work across all the domains of your application. If it breaks, everything breaks.

Let’s imagine that our own Netflix API Gateway has a recommendation engine which suggests us the top movies and TV series to watch. Well, it does. Nothing hypothetical about it. But in this case, it uses the same API Gateway as our catalog.

Let’s assume, because of some operational issue, it’s recommendation APIs are not responding. Worst, it’s not failing with a 500, it’s stuck with the request forever, till it’s killed by the client. In a typical scenario, we wouldn’t bother about the recommendations if it’s not there, as far as we can explore the catalog and watch something.

Again, from an end-user perspective, different services do not follow the same service level objectives and availability agreements. It is OK for one service to go down, as far as other is available. Unfortunately, if our clients are very curious and do not handle failures well, they may clog the system, effectively shutting down everything.

You might be wondering, that the rate limiting and circuit aspects are there in the API Gateway for the same reason.

Yes, but if you remember from the principle, there is no isolation. The API Gateway is yet another piece of software, and if it goes cranky, so do you. You are essentially offloading the resiliency from your component to a third party. You can’t really isolate it and replicate it per service for high availability. It’s all or nothing, in case of failures!

Principle #3: Elastic

The 3rd principle of elasticity defines the following:

“The system stays responsive under the varying workload. Reactive Systems can react to changes in the input rate by increasing or decreasing the resources allocated to service these inputs. This implies designs that have no contention points or central bottlenecks, resulting in the ability to shard or replicate components and distribute inputs among them. Reactive Systems support predictive, as well as Reactive, scaling algorithms by providing relevant live performance measures. They achieve elasticity in a cost-effective way on commodity hardware and software platforms.”

Back to the very first example of “Yankees game”. Will elasticity help with a slow moving “Yankees game” streaming? Yes! But, can it be done selectively, No!

I guess you might have got my point already. API Gateway must and will be elastic, but can you contain the elasticity requirements and make it cost effective. I guess no! If your elasticity requirements are very demanding of a particular service, you still have to scale the entire ecosystem. Not the best cost-effective and efficient way to handle it, I guess!

Principle #4: Message-Driven

The 4th principle of message driven communication defines:

“Reactive Systems rely on asynchronous message-passing to establish a boundary between components that ensure loose coupling, isolation and location transparency. This boundary also provides the means to delegate failures as messages. Employing explicit message-passing enables load management, elasticity, and flow control by shaping and monitoring the message queues in the system and applying back-pressure when necessary. Location transparent messaging as a means of communication makes it possible for the management of failure to work with the same constructs and semantics across a cluster or within a single host. Non-blocking communication allows recipients to only consume resources while active, leading to less system overhead.”

We all know Netflix is a data-driven company. Joris Evers, director of global communications at Netflix says:

“There are 33 million different versions of Netflix.”

Why, because of the huge analytical use case and the variation of the content they offer.

Again, hypothetically, let’s assume a new ambitious goal of Netflix to collect and analyze millions of real-time data points from “Yankees Game”. All these are to be processed in real time via the viewer’s usage patterns and sending it back to the broadcasters.

Why, so that they can provide better than ever coverage. Better camera angles, commentary styles, cheer-leaders, repeats and what not. The only best experience of the “Yankees”, even better than watching it live.

But wait, all the massive feedback channels now pass through an API Gateway. And since it was a lot of unexpected real-time streams, it demands message driven back-pressure aware buffering. Sadly, there is absolutely no API Gateway that is message driven and has inbuilt back pressure mechanisms built in. And last time a loosely coupled, message-driven, back pressure aware middleware solution was built, it was called Enterprise Service Bus.

But, what’s wrong with Enterprise Service Bus?

Well, as we know it, it was a failed strategy. It all started with service-oriented architecture when suddenly everyone started realizing these orthogonal and reactive needs have to be met. The marketed solution was obviously Enterprise Service Bus. It went well for a while until people realized that the basic premise of SOA was decoupling, but ESB’s were such a monolith.

It didn’t take much time to realize that there is not much gain in creating de-coupled services when all these services are connected via one monolith ESB.

Every API Gateway starts with the basic premise of reactive message passing, without much awareness of back-pressure, and front loads. Plain simple layer 7 routing. Then, as complexity grows, we start adding context awareness, message protocol translations, and business logic. And there goes the message-driven, elastic, de-coupled, location transparent communication goes out of the window.

Clemens Vaster, the chief architect at the Azure Service Bus team says:

“API Gateways are the new ESBs. The same companies that were selling the ESBs are now selling the API Gateways”

Ok, What’s my point then? And Why did I choose Netflix as an example?

Because API Gateway all started with Netflix and they are absolutely killing it. They have even open sourced some amazing tools that you can use and potentially using them.

Should you use API Gateways? Hell Yes definitely! It’s a proven pattern and architecture that giants like Netflix have been using. So should you.

Here’s my take away for the usage:

#1 Don’t use it everywhere. Explore mesh as well.

It all started with Netflix, when back in 2012 Daniel Jacobson, director of engineering for the Netflix API published:

“Why REST Keeps Me Up At Night?”

In his article, he mentioned some of the challenges team in Netflix has faced by providing a one size fits all APIs to its 800 different device types. He further added that each device type has different data requirements, and memory footprint, network bandwidth etc. Unfortunately, with every change in the publisher APIs, the change has to cascade through all the consumer layers(800+). The variance was just huge.

He further added the individual UI teams are forced to write spaghetti code to provide a different form factor of the APIs. So basically, the publishers bounded context was limited to their course grained APIs, while UI teams were creating fine-grained APIs on top of them.

To quote him, he said:

“The basis for our decision is that Netflix’s streaming service is available on more than 800 different device types, almost all of which receive their content from our private APIs.”

In a nutshell, they did this.

API Gateway serving external applications with adapter code

They moved the content delivery and formatting to the server side from the UI or the spaghetti code. They developed client adapter code to sit on the server. It absolutely made a lot of sense. There were a clear network and ownership boundaries.

However, what does not make sense is to introduce Gateway, where that ownership boundary is not clearly established.

API Gateway for internal service orchestration

In the example above, the server microservices do not need an API Gateway orchestration, if all the services are within the same perimeter. A mesh orchestration is a much more scalable and reliable approach. This is also one of the important reasons why tools like Kubernetes and Mesosphere are flourishing because they offer this mesh communication.

#2: Do it the Netflix way, with autonomy

Another crucial lesson from Netflix development was that the consumers had the autonomy to develop and control their adapter code. Furthermore, the adapter code was not written in some dynamic scripting language for flexibility. It was rather a compiled Java code, which offered a secure and fast runtime environment.

Further, the autonomy and SLAs were largely bound by the consumer teams, which is intended. The availability was still handled by a centralized API team, but the Netflix rigorous engineering practices handled it well.

#3: Don’t forget the service autonomy, and don’t shy away from fine-grained gateways.

One of the motivations of microservice culture was autonomy. It was to improve agility, speed, release cycles, performance, and availability. The web team might not have the same stringent requirements as the mobile team. It might make absolute sense to have separate API Gateway for the mobile team and the web team. You can segregate the gateway among the different line of business or different teams.

#4: Define and isolate individual service SLA. Performance and Availability both.

In an enterprise microservice setting, each service has their own SLAs. But, with API Gateway in between, your SLA is now multiplied with that of the API Gateway.

If the gateway is slow, because of the other tenants, so are your responses. If the gateway goes down, so is your service. And god knows, how crappy is the other service/tenant.

If the other tenant service is acting sleazy and very slow, it can slow down yours as well. Plus, there is now added network boundaries and another round trip.

If your SLA’s are mission critical, don’t shy away from isolating it from API Gateway, or have multiple physical gateways. There is an absolute tight coupling between your service and the gateway.

#5: Don’t shy away from writing another microservice abstraction

Susan Fowler, site reliability engineer at Uber once said:

“Uber had about 1300 microservices”

One of the objectives of Susan, when she started exploring to standardize the microservice practices is to bring order to it. From a site reliability engineer perspective, it didn’t mean to bring down the number but to provide an architectural pattern. It was to establish a deployment and orchestration model, which is scalable.

With the microservice model, it is no longer a problem if a business requires another abstraction. As far as there is appropriate logical isolation which demands flexibility, creating physical isolation of a service is not an issue.

#6: Be aware of the availability impacts

Using a microservice API Gateway introduces a single point of failure. It does not help much if your microservice is highly scalable and has 10 replicas.

The net availability of the APIs will be determined by the availability of both, service and API Gateway.

While you can definitely control the availability of the service, you might not be able to control the availability of the gateway. As the application becomes more complex, a centralized infrastructure service like API Gateway will very likely be owned by a central team.

#7: Keep it simple stupid!

API Gateway provides a lot of wonderful features which cannot be avoided in general. However, the moment someone starts adding business logic to the gateway, it begins turning into an ESB. Data aggregation and business transformation require pieces of manually written code to be deployed in the gateway. Each piece of code incurs CPU cycles and usually executes in a sandbox for security reasons. It impacts performance, incurs cross-team communication, has additional maintenance and support issues. Some of the vital challenges against basic micro-service principles. Minimize the business logic in the API Gateway’s.

“If all you have is a hammer, everything looks like a nail”.

Don’t limit your architectural options with the fancy products out there. In the end, the choice is yours.


Don’t believe me, check out these additional references