Automatic scaling
Automatic scaling
Automatically scale your application instances based on real-time load, ensuring optimal performance during traffic spikes while controlling costs.
How automatic scaling works
When you enable automatic scaling, your application automatically adjusts the number of running instances based on actual demand. The system monitors your application's request queue and worker utilization to make intelligent scaling decisions.
Scaling range
Your application scales between:
- Minimum instances: The base number of instances you configure (your guaranteed capacity)
- Maximum instances: Up to 3x your base instances (providing headroom for traffic spikes)
For example, if you set 2 base instances, automatic scaling can expand up to 6 instances when needed.
Scaling timing
The system responds quickly to load changes:
- Scale up: New instances are added within 30 seconds when demand increases
- Scale down: Instances are removed after 5 minutes of reduced demand
This asymmetric timing ensures your application responds quickly to traffic spikes while avoiding unnecessary scaling during brief changes.
Enabling automatic scaling
To enable automatic scaling:
- Navigate to your application's Resources tab
- Toggle the "Autoscaling" switch
- Adjust the maximum instances slider (defaults to 3x your base instances)
- Click "Save changes" to apply
Optimizing for automatic scaling
To get the most from automatic scaling, consider these best practices:
Minimize initialization time
Fast application startup is crucial for responsive scaling:
- Keep init commands minimal: Heavy operations during startup delay new instances from serving traffic
- Use build commands instead: Move compilation, asset building, and other intensive tasks to the build phase
Configure appropriate base instances
Set your minimum instances based on your expected baseline traffic:
- Production applications: Start with at least 2 instances for redundancy
- High-traffic applications: Set a higher base to reduce scaling frequency. A higher base also handles the initial increased load better.
When automatic scaling triggers
Your application scales up when:
- Request queues start building up (more than 2 pending requests)
- Worker processes are heavily utilized (over 80% busy)
- Traffic suddenly spikes beyond current capacity
Your application scales down when:
- No requests are queued
- Worker utilization drops below 30%
- Load has been consistently low for several minutes
Limitations and considerations
- Maximum scaling is limited to 3x your base instances
- New instances need time to initialize before serving traffic
- Scaling decisions are based on queue depth and worker utilization, not CPU or memory usage