I've spent the last week exploring how to run containers on AWS, since I have more experience with GCP, particularly with GKE.

Here's what I've learned.

Your Options

There's basically 4 options. We're excluding AWS Lambda or other PaaS offerings, because those aren't container-based.

Elastic Container Service (ECS) where you manage the EC2 instances
Elastic Container Service (ECS) with AWS Fargate — AWS manages the instances
Elastic Kubernetes Service (EKS) where you manage the node groups (node pools in GKE)
Elastic Kubernetes Service (EKS) with AWS Fargate — AWS manages the instances

The most interesting options are the 2 Fargate options; infra management is clearly moving farther and farther behind the scenes. We probably won't be upgrading node pools in 2 years.

The `ECS` API

Let's go through the ECS API and see how it compares to the Kubernetes API.

A Container Instance is a node. VMs that are part of your ECS cluster. I'm not sure why they didn't call these nodes.

Launch Types

I only named two launch types above, but there are three available.

Fargate - AWS provisions and manages the VMs behind the scenes. No version upgrades, nada. Little more expensive but less to deal with.
EC2 - you have to manage the VMs. This was the original mode of ECS. You do this with an abstraction called a capacity provider.
External - you have on-prem VMs that are registered with your cluster. See AWS ECS Anywhere.

If you don't have some weird compliance requirement that's stopping you, I'd recommend using Fargate.

It's less of a question of "which is better", because that's clear. The real question is—"Is Fargate mature enough to replace the EC2 launch type for most use cases?". And I believe that's a yes.

`TaskDefinition`

This is similar to a Kubernetes Pod
Can have multiple containers, which can communicate with each other via localhost
Can share volumes
You want them to scale up and down together
You deploy them together
"You should create task definitions that group the containers that are used for a common purpose." - the docs.

I would go further and say no more than 1 main container and any additional supporting containers, i.e. the sidecar pattern.

portMappings, a nested field, is very similar to a Service of type NodePort in Kubernetes. It allows the container to access port(s) on the host container instance.

`Service`

Not to be confused with a Kubernetes Service (which provides a stable IP, among other things).

Services maintain the availability of your tasks (closer to a Kubernetes ReplicaSet or Deployment). You provide it with a task definition and a launch type.

placementConstraints is similar to node affinity / anti-affinity or taints and tolerations.

Networking

There are a few different networkConfigurations available.

In awsvpc, tasks receive their own elastic network interface and a private IPv4 address. This puts it on par with an EC2 instance, from a networking perspective.
In bridge, it uses Docker's virtual network.
In host, the task maps container ports to the ENI of the host EC2 instance. Keep in mind ports on host nodes are finite resources in your cluster.

If you're using the Fargate launch mode, you have to use awsvpc.

This is interesting to compare to Kubernetes, because Kubernetes is like a combination of awsvpc and bridge. Pods are given their own IPs, but they're virtual (kube-proxy edits the node's IP tables)

In Kubernetes it can also be implemented many different ways; you have to choose a network plugin. In managed Kubernetes they have good default choices and you usually don't think about this.

Service Discovery

It's very common for one microservice to want to call another. You don't want to call public endpoints, because that's additional load on your networking infrastructure (e.g. a NAT Gateway, an API Gateway), and it's also going over the public internet.

In ECS, to accomplish this, you use service discovery, which is integrated with Amazon Route 53.

You register the service into a private DNS namespace, and DNS records, which reference the private IP, are created for a service. You can then hit your service at <service discovery service name>.<service discovery namespace>. Good thing we're not overloading the word "service". 😅

A typical workflow would create one "Service discovery service" per ECS Service, with all IP addresses having A name records.

This was added in 2018, and is a good example of ECS starting out overly simple, and growing more complicated, towards Kubernetes.

Relationship To Load Balancing

To understand this, we need to go over some of the load balancing abstractions in AWS.

TargetGroup - a set of endpoints, or Targets. We will have one of these per ECS service.
Listener - Listens to requests from clients on a protocol or port. We'll have 1 of these per (protocol, port) combination that we support. In the example below, just one, for HTTP.
ListenerRule - This is what connects the Listener and the TargetGroup.

e.g. if path is /hello, go to this TargetGroup. Or if it's /foo, redirect to /bar.

So, we will have

1 load balancer
1 listener, for HTTP and port 80
1 target group per ECS Service
1 listener rule per ECS Service

Here's an example, in Pulumi.

import * as pulumi from "@pulumi/pulumi";
import * as awsx from "@pulumi/awsx";

const vpc = awsx.ec2.Vpc.getDefault();
const cluster = new awsx.ecs.Cluster("main", { vpc });

// Notice we're using the EC2 launch type
const asg = cluster.createAutoScalingGroup("main", {
  /* Why define this field? See this issue - https://github.com/pulumi/pulumi-awsx/issues/289 */
  subnetIds: vpc.publicSubnetIds,
  launchConfigurationArgs: { instanceType: "t2.medium" },
});

const loadBalancer = new awsx.lb.ApplicationLoadBalancer("main", {
  external: true,
});
const httpListener = loadBalancer.createListener("http-listener", { port: 80 });

// Avocado Service
const avocadoServiceTG = loadBalancer.createTargetGroup("avocado-service", {
  port: 80,
});

httpListener.addListenerRule("avocado-service-lr", {
  actions: [
    {
      type: "forward",
      targetGroupArn: avocadoServiceTG.targetGroup.arn,
    },
  ],
  conditions: [
    {
      pathPattern: {
        values: ["/avocado"],
      },
    },
  ],
});

new awsx.ecs.EC2Service("avocado-service", {
  cluster: cluster,
  taskDefinitionArgs: {
    vpc: vpc,
    container: {
      image: "ealen/echo-server",
      memory: 512,
      portMappings: [avocadoServiceTG],
      environment: [
        {
          name: "MESSAGE",
          value: "Avocado Service 🥑",
        },
      ],
    },
  },
});

// Pretzel Service
const pretzelServiceTG = loadBalancer.createTargetGroup("pretzel-service", {
  port: 80,
});
httpListener.addListenerRule("pretzel-service-lr", {
  actions: [
    {
      type: "forward",
      targetGroupArn: pretzelServiceTG.targetGroup.arn,
    },
  ],
  conditions: [
    {
      pathPattern: {
        values: ["/pretzel"],
      },
    },
  ],
});

new awsx.ecs.EC2Service("pretzel-service", {
  cluster: cluster,
  taskDefinitionArgs: {
    vpc: vpc,
    container: {
      image: "ealen/echo-server",
      memory: 512,
      portMappings: [pretzelServiceTG],
      environment: [
        {
          name: "MESSAGE",
          value: "Pretzel Service 🥨",
        },
      ],
    },
  },
});

export const frontendURL = pulumi.interpolate`http://${httpListener.endpoint.hostname}`;

$ curl -s $(pulumi stack output frontendURL)/avocado | jq .environment.MESSAGE
"Avocado Service 🥑"

$ curl -s $(pulumi stack output frontendURL)/pretzel | jq .environment.MESSAGE
"Pretzel Service 🥨"

Autoscaling?

Yeah, ECS autoscales well. You do this by adding a "scaling policy". You've got a few options there.

Target based scaling - scale based on some metric
Step scaling - when some alarm goes off, scale up to the next step. When the next alarm goes off, scale to the next step.
Scheduled scaling - scale based on date and time.

These are really good options. Many companies know their system is going to have a lot of traffic at some given time, e.g. 09:00 on Monday morning, and scheduled scaling is simple.

The other two seem a bit more complex to tune, but really good options.

Additional Notes

ECS does rolling deployments by default, has an option for blue/green (CODE_DEPLOY), and a way to have even finer-grained control.
Workloads bit slower to start than Kubernetes. I changed two environment variables across two tasks, and that took 8 minutes for me.
Fargate is especially slow to start, because it can involve scaling up. GKE Autopilot has the same problem.

Fargate on EKS

Fargate on EKS might only be similar to Fargate on ECS in name and ability. Certainly not in implementation or how you use it.

In order to use Fargate on EKS, you have to create a Fargate profile.

You then use label selectors for pods in order to determine which, if any, Fargate profile applies. It schedules the pod using its own scheduler, on what is basically its own managed node pool. They will handle scaling and upgrading for you.

You just have to think through your memory and vCPU requests, which you should be doing anyway.

Fargate on EKS is very similar to GKE Autopilot. It's clear that these Containers as a Service tools are the future for container orchestration. Few people really want to deal with version upgrades and manually scaling.

Running Containers on AWS