Files
Obsidian-Vault/Work/Inbox/Servers.md

62 lines
2.4 KiB
Markdown

# Server Provisioning Checklist
## AWS / Forge Setup
- [ ] Use Forge to create server
- [ ] Tag the EC2 instance and the root storage
- [ ] After creation add elastic IP
- [ ] Add monitoring in Forge
- [ ] Update root volume to gp3
- [ ] Enable AWS backup
- [ ] Setup Forge database backups
- [ ] Set up SSH key access for team members
## OS Tooling
- [ ] Install atop (`apt install atop`, verify it runs via systemd and writes to `/var/log/atop/`)
- [ ] Install htop (`apt install htop`)
- [ ] Install gdu or ncdu (`apt install gdu` or `apt install ncdu`) for disk usage analysis
## Redis Hardening
- [ ] Set `maxmemory` to an appropriate limit (e.g. 2gb for a 16GB server)
- [ ] Set `maxmemory-policy allkeys-lru`
- [ ] Disable RDB persistence if not needed (`save ""`) to prevent fork-based OOM
- [ ] Persist config: `redis-cli CONFIG REWRITE`
- [ ] Verify config survives reboot: check `/etc/redis/redis.conf` directly
## Laravel / Horizon / Pulse
- [ ] Verify Horizon trim settings in `config/horizon.php` (recent/completed: 60 min or less)
- [ ] If Pulse is enabled, ensure `pulse:work` is running in supervisor
- [ ] If Pulse is not used, disable it entirely (remove provider or `PULSE_ENABLED=false`)
- [ ] Set queue worker memory limits (`--memory=256`) and max jobs (`--max-jobs=500`)
## PHP-FPM
- [ ] Remove unused PHP-FPM pools/versions (only keep the version the site uses)
- [ ] Tune `pm.max_children` based on available RAM and per-worker memory usage
## Swap
- [ ] Verify swap is configured (at least 2 GB for a 16GB server)
- [ ] Check `vm.swappiness` is set appropriately (default 60 is fine for most cases)
## Security
- [ ] Verify UFW is enabled and only allows necessary ports (22, 80, 443)
- [ ] Disable password-based SSH login (`PasswordAuthentication no`)
- [ ] Verify unattended-upgrades is enabled for security patches
## Deployment
- [ ] Verify deployment script does not spawn hundreds of parallel processes (serialize unzip/rm)
- [ ] Cap node build memory: `NODE_OPTIONS=--max-old-space-size=512` in deploy script
- [ ] Test a deploy on the new server before going live
## Monitoring / Alerting
- [ ] Set up memory usage alerting (CloudWatch, Forge, or similar) so OOM situations are caught before they crash the server
- [ ] Set up disk usage alerting (logs and atop files can fill disks over time)
- [ ] Configure atop log retention (`/etc/default/atop`, default keeps 28 days)