Add detailed server provisioning checklist and analysis guide for atop logs

This commit is contained in:
Vincent Verbruggen
2026-03-13 09:42:30 +01:00
parent a4fa80bf74
commit 33c80f4afd
2 changed files with 276 additions and 8 deletions

View File

@@ -1,8 +1,61 @@
Use forge to create server
Tag the ec2 instance and the root storage
After creation add elastic ip
Add monitoring in forge
Update root volume to gp3
Install atop
enable aws backup
Setup forge database baclups
# Server Provisioning Checklist
## AWS / Forge Setup
- [ ] Use Forge to create server
- [ ] Tag the EC2 instance and the root storage
- [ ] After creation add elastic IP
- [ ] Add monitoring in Forge
- [ ] Update root volume to gp3
- [ ] Enable AWS backup
- [ ] Setup Forge database backups
- [ ] Set up SSH key access for team members
## OS Tooling
- [ ] Install atop (`apt install atop`, verify it runs via systemd and writes to `/var/log/atop/`)
- [ ] Install htop (`apt install htop`)
- [ ] Install gdu or ncdu (`apt install gdu` or `apt install ncdu`) for disk usage analysis
## Redis Hardening
- [ ] Set `maxmemory` to an appropriate limit (e.g. 2gb for a 16GB server)
- [ ] Set `maxmemory-policy allkeys-lru`
- [ ] Disable RDB persistence if not needed (`save ""`) to prevent fork-based OOM
- [ ] Persist config: `redis-cli CONFIG REWRITE`
- [ ] Verify config survives reboot: check `/etc/redis/redis.conf` directly
## Laravel / Horizon / Pulse
- [ ] Verify Horizon trim settings in `config/horizon.php` (recent/completed: 60 min or less)
- [ ] If Pulse is enabled, ensure `pulse:work` is running in supervisor
- [ ] If Pulse is not used, disable it entirely (remove provider or `PULSE_ENABLED=false`)
- [ ] Set queue worker memory limits (`--memory=256`) and max jobs (`--max-jobs=500`)
## PHP-FPM
- [ ] Remove unused PHP-FPM pools/versions (only keep the version the site uses)
- [ ] Tune `pm.max_children` based on available RAM and per-worker memory usage
## Swap
- [ ] Verify swap is configured (at least 2 GB for a 16GB server)
- [ ] Check `vm.swappiness` is set appropriately (default 60 is fine for most cases)
## Security
- [ ] Verify UFW is enabled and only allows necessary ports (22, 80, 443)
- [ ] Disable password-based SSH login (`PasswordAuthentication no`)
- [ ] Verify unattended-upgrades is enabled for security patches
## Deployment
- [ ] Verify deployment script does not spawn hundreds of parallel processes (serialize unzip/rm)
- [ ] Cap node build memory: `NODE_OPTIONS=--max-old-space-size=512` in deploy script
- [ ] Test a deploy on the new server before going live
## Monitoring / Alerting
- [ ] Set up memory usage alerting (CloudWatch, Forge, or similar) so OOM situations are caught before they crash the server
- [ ] Set up disk usage alerting (logs and atop files can fill disks over time)
- [ ] Configure atop log retention (`/etc/default/atop`, default keeps 28 days)