Owner: Platform Engineering · Primary Platform: Amazon EKS · Review Cycle: Quarterly
The organisation's container strategy is built on Kubernetes, delivered as a managed service via Amazon EKS on AWS and RKE2 for on-premises workloads. All production container workloads must run on an approved Kubernetes platform. Docker standalone and unmanaged Kubernetes clusters are not permitted in production.
| Platform | Environment | Kubernetes Version | Node OS |
|---|---|---|---|
| Amazon EKS | AWS Production | 1.29 | Bottlerocket |
| Amazon EKS | AWS Non-Production | 1.29 | Bottlerocket |
| RKE2 | On-Prem (vSphere) | 1.28 | RHEL 9 |
| AKS | Azure (select workloads) | 1.29 | Azure Linux |
┌─────────────────────────────┐
│ AWS EKS Control Plane │
│ (AWS Managed) │
└─────────────┬───────────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
┌──────────▼──────────┐ ┌──────────▼──────────┐ ┌─────────▼───────────┐
│ System Node Group │ │ App Node Group (On │ │ Spot Node Group │
│ (On-Demand, m6i.xl)│ │ Demand, m6i.2xl) │ │ (Spot, mixed policy)│
│ min:2 max:4 │ │ min:3 max:20 │ │ min:0 max:30 │
└─────────────────────┘ └─────────────────────┘ └─────────────────────┘
CoreDNS, kube-proxy Production workloads Batch, CI/CD runners
AWS Load Balancer Ctrl
| Add-on | Version | Purpose |
|---|---|---|
| AWS Load Balancer Controller | 2.7 | ALB / NLB provisioning |
| EBS CSI Driver | 1.28 | EBS persistent volumes |
| EFS CSI Driver | 1.7 | EFS shared storage |
| Cluster Autoscaler | 9.35 | Node scaling |
| Karpenter | 0.35 | Advanced node provisioning |
| Datadog Agent | 3.x | Observability |
| Falco | 0.37 | Runtime security |
| Cert-Manager | 1.14 | TLS certificate management |
| External-DNS | 0.14 | Route 53 DNS automation |
All application images must use an approved base image. Unapproved base images are blocked by the admission controller.
| Base Image | Tag | Use Case |
|---|---|---|
internal-registry/rhel9-ubi |
9.3-hardened |
Java, Python, Go apps |
internal-registry/ubuntu-minimal |
22.04-hardened |
Debian-compatible apps |
internal-registry/alpine-base |
3.19-hardened |
Small utility containers |
internal-registry/dotnet-runtime |
8.0-rhel9 |
.NET applications |
internal-registry/node-runtime |
20-alpine-hardened |
Node.js applications |
latest tag in production manifests — use immutable digest or semver.# Example namespace configuration
apiVersion: v1
kind: Namespace
metadata:
name: payments-prod
labels:
env: production
team: payments
tier: "1"
annotations:
contacts/owner: "payments-eng@company.com"
cost-centre: "CC-2011"
RBAC is managed through GitOps (ArgoCD). Teams are granted:
view access to their own namespaces by defaultedit access only for non-production namespacescluster-admin except for Platform Engineering break-glass accountsAll namespaces have resource quotas applied. Default quotas (overridable via request):
| Tier | CPU Request Limit | Memory Request Limit | Max Pods |
|---|---|---|---|
| Production | 32 cores | 128 Gi | 200 |
| Non-Production | 16 cores | 64 Gi | 100 |
| Development | 8 cores | 32 Gi | 50 |
Code Push → GitHub Actions / GitLab CI
│
├── Build (Docker BuildKit)
├── Unit Tests
├── SAST (CodeQL / Semgrep)
├── Image Scan (Inspector)
├── Image Sign (Cosign)
├── Push to ECR
│
└── GitOps: Update Helm values → ArgoCD sync
│
├── Dev (auto-sync)
├── Staging (auto-sync)
└── Production (manual gate / approval)
git@internal:platform/k8s-config.gitkubectl apply in production triggers an alert| Signal | Tool | Retention |
|---|---|---|
| Metrics | Datadog / Prometheus | 15 months |
| Logs | Datadog Logs / OpenSearch | 13 months |
| Traces | Datadog APM / AWS X-Ray | 30 days |
| Events | Kubernetes Events → Datadog | 7 days |
All applications must expose a /healthz (liveness) and /readyz (readiness) endpoint.