The art of simplification when building on AWS

Introduction

AWS has existed for more than a decade and as of today there are more than 200 AWS services (and counting), even after a few “de-prioritizations” in 2024. The landscape of building cloud applications on AWS is big and ever growing and as builders we need to take hunderts of decisions every day.

One of the most common sentences that I have heard from “Cloud Architects” in the past weeks are sentences that start with “It depends…” when being asked about how to build or solve a specific challenge. I personally believe that it has become very complex and difficult to decide on the technology (or service) to use and that we as the AWS community need to do a better job at explaining consistently on how to take specific decisions for a specific application or architecture.

If we add the option of deploying a k8s cluster on AWS, the number of choices becomes even bigger as you can…build “anything” on k8s 🙂

I believe that it is too difficult to take choices and that we need to start looking at the “simplification” of building applications on AWS.

“A good cloud architect” knows when to use which service and which architecture, weighting between simplicity, complexity, costs, security foodprint and extensibility.

Let’s have a look on the current landscape and the challenges we see.

(This article was written before re:Invent 2024 so some of the details might be outdated by the time you read this 🙂
I’ll try to update this article if there are any related announcements at the conference.)

A few examples for things that could be simpler

As a preparation for this blog post, I’ve asked a few AWS Heroes and Community builders about where they think that AWS is too difficult and complex in november 2024. The answer I got vary based on the focus of the individual and role that each of them has. In this blog I’ll cluster them by topics.

Upgrading documentation, showcasing bast practices

The most common input that I’ve received by far is the ask for more supported and mantained example implementation, best practice documentations and recommendations. Most of the best practices for different services are presented in sessions at re:Invent or re:Inforce, in AWS Blog posts. Partly they are shared within the service documentation or Github – awslabs or aws. Unfortunately, a lot of them become outdated fast and are not actively maintained.
In our area of business, changes to technology are rapidly hapenning and thus best practices that are presented today are already outdated tomorrow.

AWS needs to do a better job at keeping documentation and best practice implementations up to date. This also includes a more frequent and better colllaboration in open source projects. Some of the AWS owned Open Source projects (like AWS cdk or the “containers-roadmap”) are loosing momentum because of missing engagement from the service teams in 2024.

When CodeCatalyst was announced in 2022, I had high hopes of the “Blueprints” functionality to become the “go-to” place for best practice implementations – but AWs unfortunatelly failed to deliver on that promise.
Blueprints are barely maintained and even tho the “genai-chatbot” blueprint has produced a large number of views on my Youtube channel, it feels a bit like they have been abandonned by AWS in the past months.

Simplify costs and cost management

As organizations mature in the usage of AWS and in building applications on AWS, a lot of them put a focus on understanding and analyzing the costs that are produced by their applications running in the cloud. AWS currently allows to track costs mainly based on usage and resources consumed.

This often makes it hard to track the costs allocated for a certain business functionality. Especially if you’re building multi-tenant applications on AWS, it can be really hard to understand and verify what each of the tenant is actually costing you.

We’d love to simplify the cost allocation per application or even per transaction to be able to properly understand the consumption of our budget. This also includes examples like Athena, where you’re billed for using Athena but for the same transaction you are also triggering S3 API calls which are then not allocated correctly to your Athena based application.

Another example that I recently encountered myself is the deployment of an EKS cluster that was deployed in a VPC with a Network firewall attached and activated GuardDuty. The EKS cluster itself was a portion of the totally allocated costs – it was a 20% costs for EKS, but – due to some application deployment challenges – 60% costs on the Network firewall and 20% on GuardDuty.

I wish for AWS to auto-discover my applications (e.g. by using myApplications) and transactions and to output the information that helps me understand the costs of my applications.

k8s and containers

Even in the containers world, AWS has too many options to go with: besides the prominent options like ECS and EKS, we have Beanstalk, AppRunner and even Lambda to run containers. While I understand that all of these building blocks empower builders to build applications using the service that they want to – but you still need to take choices and the migration between one another is often hard, complex and difficult. And – even worse – you need to be an expert of the service to be able to take the right choice for your use case.

I wish for this decision to be simpler, if not to say seamless. Builders potentially don’t want to take decisions on the service, they want to have their applications to adapt automatically to the changing requirements of your the applications they built. Having the possibility to switch from one to another service automatically, without (much) human intervention, would empower us to invent and simplify!

AWS EKS – my top challenges

I’ve been experimenting with AWS EKS lately – and to be honest, every time I start a new cluster, it is a real pain.

Everything is “simple” if you are able to work with defaults – like creating a new VPC in a non-enterprise environment. However, the default ocreation process allows to create public EKS clusters, which should be forbidden by default. Triaging network challenges for EKS are also still very complicated and the support of these kind of problems can be a painful experience.

I would love to get an “auto-fix” button that solves my networking problems on EKS clusters or verifies for me if my setup is correct.

In addition to that, now that EKS supports IPv6, it might be the right time to solve the never-ending IP adress problem that a lot of organizations have by enabling this by default and setting up the EKS clusters using IPv6 for private subnets and internal networking.

Another thing that currently EKS Fargate doesn’t solve is the possibility to use full-k8s-API options and scalability. If you want to implement something like Karpenter on your workloads, you will always need to fall back to the “self-managed” EC2 compute – and this is always painful, because it requires you to start managing your own AMIs and infrastructure. In this case, you also need to take care of the scalability of your cluster infrastructure, which seems to be and outdated thing to do in 2024.

Creating, running and depoying EKS clusters should become a commodoty and a “simple thing” – noone should be worried about it, as it is really only the starting point for you building on Kubernetes.

I hope that AWS takes away some of these challenges and helps organizations that are building on Kubernetes to be able to focus on what they wan to build – on their business value – instead of managing infrastructure for their clusters.

Investing into cross service integrations for serverless

The serverless landscape has evolved a lot over the past years. We’ve seen new functionalities and integrations become available but similar to the containers space, the amount of choices you can and need to take have increased.

At the same time, the integration between the services has not evolved a lot. The Infrastructure as Code (IaC) is massively fragmented with AWS CDK, CloudFormation, Serverless Application Model (SAM), Terraform and newer players like Pulumi growing. Lately, I’ve also encountered Crossplane as a “serious” option to write Infrastructure as Code and deploy infrastructure on AWS.

The obervability landscape is also big – with Open Telemetry, AWS X-Ray and missing integrations to other observability tools – it is difficult to build observability in serverless applications that span accross a lot of different services. Not all of the services support Open Telemetry integration out of the box – I believe this would be a great addition. Allowing to auto-discover transactions, giving developers insights into whats happening within their applications across multiple services helps to make application development easier.

Another thing that I got as a feedback during my conversations was the wish to simplify the setup and definitions of the API Gateway integrations with Load Balancers. The definition of routes, paths and payloads seem to still be difficult within the API Gateway and the differences between a “REST” and an “HTTP” API endpoint are sometimes confusing. And then, there is AppSync (hosted GraphQL)…. I see a lot of potential to simplify this setup and make it easier for developers to build APIs on AWS.

Enterprises & Govcloud

When talking about enterprises in general and enterprises building for Govcloud (and going forward the European Sovereign Cloud), the users would love to get features and services rolled out to Govcloud environments more frequent then currently. they also complain about the not all parts of the AWS console and the tooling being aware of the different partitions (“normal” AWS vs. “govcloud”). This should be improved aswell.

On the optimization and simplification front, I am regularly hearing the feedback that switching between different AWS accounts is a big issue – as we call out the “multi-account” deployments as a best practices, it becomes increasingly important to switch between them and simplify the integration.

Interview partners say the same about multi-region deployments, where the console does not support interacting with applications that are deployed in multiple regions. There’s also not a lot of out-of-the-box support for these kind of deployments within AWS.

When I recently listened to the [AWS Developers Podcast]() episode focused on the IAM identity center, I did hear a lot of very positive things on how to use it and integrate it within your organizations landscape. I do agree that it makes a lot of things simpler than IAM, but improving the User Experience and allowing additional automations to be implemented would be helpful.

General simplifications

Looking at the never-ending announcements about new releases focused on Generative AI, Amazon Bedrock, Amazon Q, Amazon Q Developer, Amazon Q for Business, …? It becomes difficult to navigate the landscape even after only 18 months since Generative AI has become a hype.

From the outside, the AWS’ messaging is not clear and distracting. With many teams exploring different options, the confusion will become bigger. We need to clarify which names, technologies and services to use in which use case. And it needs to be clearer what AWS wants to be in a the Gnerative AI landscape: a “building blocks” providers (through Bedrock and the Converse API), or a “player” in the field of users using generative AI – competing with OpenAI and others. this message is not yet clear – at least to me.

Making things simpler – helping architects to take better decisions

If I look at the AWS landscape a s cloud architect, I would love to be able to take decisions better and faster being supported by AWS. A tool or a service that supports taking decisions based on business requirements and scalability would be awesome, allowing me to focus on building applications and services instead of making me an expert in choosing between the “correct” compute mode for my applications. There’s just n possible options to build applications on AWS. [Serverlessland]() is a great step on maing this decisions easier, but we need more then that!

Thanks to the contributors 😉

While some of the participants of my small survey do not want to be named, I can thank Benjamin, Ran, Matt Morgan, Matt Martz and Andres for their contributions to this blog post. Your detailed feedback and input helped me a lot to shape this post – thank you for all you do for the AWS community.

Wrap up – the art of simplification

In november 2024 I believe that being a “great” cloud architect means, being able to take smart decisions and knowing when and why to choose specific combinations of services, this is an art that a lot of us still need to learn.

k8s is not always the right answer, but some times it might be. AWS Lambda and serverless applications is also not the best choice for everyone.

Simplifying your decisioning tree in architecture makes your role as a cloud architect easier.

What do you think? Where can AWS make your life as a builder and cloud architect simpler and easier?

Views: 167

Leave a Reply