Right-Sizing Containers: Measuring and Optimizing Resource Usage

Jurg van Vliet

Featured

Right-Sizing Containers: Measuring and Optimizing Resource Usage

Most containers request 2GB, use 200MB. Prometheus queries show actual vs requested. Right-sizing saves money, improves density, reduces environmental impact.

By Jurg van Vliet

Published Dec 5, 2025 · Updated Dec 13, 2025

The Over-Provisioning Problem

When you deploy a container to Kubernetes, you set resource requests and limits:

resources:
  requests:
    memory: "2Gi"
    cpu: "1000m"
  limits:
    memory: "4Gi"
    cpu: "2000m"

Kubernetes uses requests for scheduling (this pod needs 2GB memory, so schedule it on a node with at least 2GB available). It uses limits to prevent runaway processes (kill this pod if it exceeds 4GB memory).

The problem: most teams overestimate. A container might request 2GB memory and consistently use 200MB. This wastes resources at multiple levels:

Cost: You're paying for capacity you don't use. If your nodes have 32GB memory and you request 2GB per pod, you can schedule 16 pods. If those pods actually use 200MB, you could fit 160 pods—10x more.

Density: Kubernetes can't pack workloads efficiently when requests are inflated. You need more nodes, which costs more and uses more energy.

Performance: Over-limited resources can hurt performance. A CPU limit that's too low causes throttling. Memory limits that are too tight cause OOMKills during legitimate spikes.

Measuring Actual Usage

Before changing anything, measure. Prometheus with kube-state-metrics gives you the data.

Memory usage vs request:

# Ratio of actual usage to requested memory
(
  container_memory_working_set_bytes{container!="",container!="POD"}
  /
  (kube_pod_container_resource_requests{resource="memory"} > 0)
) * 100

If this ratio is consistently below 20%, you're over-provisioned.

CPU usage vs request:

# CPU usage as percentage of request
(
  rate(container_cpu_usage_seconds_total{container!="",container!="POD"}[5m])
  /
  (kube_pod_container_resource_requests{resource="cpu"} > 0)
) * 100

CPU is trickier because usage is spiky. Look at 95th percentile over a week, not instant values.

Setting Requests Based on Reality

Step 1: Query actual usage over at least a week (preferably 30 days)

Step 2: Find p95 or p99 usage (not average—you need to handle peaks)

Step 3: Add headroom:

Memory: +30-50% (memory usage spikes matter)
CPU: +50-100% (CPU is burstable, be generous)

Step 4: Set limits:

Memory limit: 2x request (catch runaway processes, allow temporary spikes)
CPU limit: 2-4x request (or no limit—CPU throttling hurts performance)

Example:

# Before (guessed)
resources:
  requests:
    memory: "2Gi"
    cpu: "1000m"
  limits:
    memory: "4Gi"
    cpu: "2000m"

# After (measured)
# p95 usage: 180MB memory, 120m CPU
resources:
  requests:
    memory: "256Mi"    # 180MB * 1.4 ≈ 256MB
    cpu: "200m"        # 120m * 1.6 ≈ 200m
  limits:
    memory: "512Mi"    # 2x request
    cpu: "1000m"       # 5x request (allow burst)

This is 8x better memory density. Instead of 16 pods per 32GB node, you can fit 120 pods.

Quarterly Review Process

Resource usage changes over time. Features are added, traffic patterns shift. Review quarterly:

# Query Prometheus for usage over last 30 days
# Generate recommendations
# Apply changes gradually

Make changes incrementally. Update one service, monitor for a week, then move to the next. If you see OOMKills, you were too aggressive—add more headroom.

Spot Instances for Appropriate Workloads

Not all workloads need guaranteed capacity. Some can tolerate interruption:

CI/CD pipelines: Can restart if preempted. Use spot instances, save 60-80%.

Batch processing: Can checkpoint and resume. Perfect for spot.

Development environments: Interruption is annoying, not critical. We run dev on spot.

Not appropriate for spot:

User-facing production applications (interruption affects users)
Stateful databases (complex to handle interruption safely)
Real-time processing (can't tolerate delays)

Data Hygiene for Storage Efficiency

Storage seems cheap, so data accumulates. Logs, backups, old snapshots—it all adds up.

Define retention policies early:

# Object storage lifecycle
lifecycle_rule:
  enabled: true
  expiration:
    days: 365
  transition:
    days: 90
    storage_class: GLACIER

Actually delete when retention expires. We saved 40% storage costs by implementing retention policies and sticking to them.

Why Efficiency Matters

Cost: Right-sizing our infrastructure saved roughly €300/month. That's €3,600/year.

Performance: After right-sizing, we had more available capacity for bursts. Counter-intuitively, using resources more efficiently improved performance.

Environment: Fewer idle resources mean less energy wasted. Data centers consume about 1.5% of global electricity. Efficiency at scale matters.

Efficiency isn't just environmental virtue. Efficient systems cost less, run better, and are easier to operate.

Sources:

#efficiency #rightsizing #kubernetes #sustainability #costoptimization