Automatic scaling

3 min read Updated 5 days ago

Automatic scaling

Automatically scale your application instances based on real-time load, ensuring optimal performance during traffic spikes while controlling costs.

How automatic scaling works

When you enable automatic scaling, your application automatically adjusts the number of running instances based on actual demand. The system monitors your application's request queue and worker utilization to make intelligent scaling decisions.

Scaling range

Your application scales between:

  • Minimum instances: The base number of instances you configure (your guaranteed capacity)
  • Maximum instances: Up to 3x your base instances (providing headroom for traffic spikes)

For example, if you set 2 base instances, automatic scaling can expand up to 6 instances when needed.

Scaling timing

The system responds quickly to load changes:

  • Scale up: New instances are added within 30 seconds when demand increases
  • Scale down: Instances are removed after 5 minutes of reduced demand

This asymmetric timing ensures your application responds quickly to traffic spikes while avoiding unnecessary scaling during brief changes.

Enabling automatic scaling

To enable automatic scaling:

  1. Navigate to your application's Resources tab
  2. Toggle the "Autoscaling" switch
  3. Adjust the maximum instances slider (defaults to 3x your base instances)
  4. Click "Save changes" to apply
Cost consideration: With automatic scaling enabled, you're charged based on the actual number of running instances. Your costs will vary between your minimum and maximum instance settings depending on your application's load.

Optimizing for automatic scaling

To get the most from automatic scaling, consider these best practices:

Minimize initialization time

Fast application startup is crucial for responsive scaling:

  • Keep init commands minimal: Heavy operations during startup delay new instances from serving traffic
  • Use build commands instead: Move compilation, asset building, and other intensive tasks to the build phase

Configure appropriate base instances

Set your minimum instances based on your expected baseline traffic:

  • Production applications: Start with at least 2 instances for redundancy
  • High-traffic applications: Set a higher base to reduce scaling frequency. A higher base also handles the initial increased load better.

When automatic scaling triggers

Your application scales up when:

  • Request queues start building up (more than 2 pending requests)
  • Worker processes are heavily utilized (over 80% busy)
  • Traffic suddenly spikes beyond current capacity

Your application scales down when:

  • No requests are queued
  • Worker utilization drops below 30%
  • Load has been consistently low for several minutes

Limitations and considerations

  • Maximum scaling is limited to 3x your base instances
  • New instances need time to initialize before serving traffic
  • Scaling decisions are based on queue depth and worker utilization, not CPU or memory usage