Obsidian-Vault/Work/Inbox/Servers.md

# Server Provisioning Checklist

## AWS / Forge Setup

- [ ] Use Forge to create server
- [ ] Tag the EC2 instance and the root storage
- [ ] After creation add elastic IP
- [ ] Add monitoring in Forge
- [ ] Update root volume to gp3
- [ ] Enable AWS backup
- [ ] Setup Forge database backups
- [ ] Set up SSH key access for team members

## OS Tooling

- [ ] Install atop (`apt install atop`, verify it runs via systemd and writes to `/var/log/atop/`)
- [ ] Install htop (`apt install htop`)
- [ ] Install gdu or ncdu (`apt install gdu` or `apt install ncdu`) for disk usage analysis

## Redis Hardening

- [ ] Set `maxmemory` to an appropriate limit (e.g. 2gb for a 16GB server)
- [ ] Set `maxmemory-policy allkeys-lru`
- [ ] Disable RDB persistence if not needed (`save ""`) to prevent fork-based OOM
- [ ] Persist config: `redis-cli CONFIG REWRITE`
- [ ] Verify config survives reboot: check `/etc/redis/redis.conf` directly

## Laravel / Horizon / Pulse

- [ ] Verify Horizon trim settings in `config/horizon.php` (recent/completed: 60 min or less)
- [ ] If Pulse is enabled, ensure `pulse:work` is running in supervisor
- [ ] If Pulse is not used, disable it entirely (remove provider or `PULSE_ENABLED=false`)
- [ ] Set queue worker memory limits (`--memory=256`) and max jobs (`--max-jobs=500`)

## PHP-FPM

- [ ] Remove unused PHP-FPM pools/versions (only keep the version the site uses)
- [ ] Tune `pm.max_children` based on available RAM and per-worker memory usage

## Swap

- [ ] Verify swap is configured (at least 2 GB for a 16GB server)
- [ ] Check `vm.swappiness` is set appropriately (default 60 is fine for most cases)

## Security

- [ ] Verify UFW is enabled and only allows necessary ports (22, 80, 443)
- [ ] Disable password-based SSH login (`PasswordAuthentication no`)
- [ ] Verify unattended-upgrades is enabled for security patches

## Deployment

- [ ] Verify deployment script does not spawn hundreds of parallel processes (serialize unzip/rm)
- [ ] Cap node build memory: `NODE_OPTIONS=--max-old-space-size=512` in deploy script
- [ ] Test a deploy on the new server before going live

## Monitoring / Alerting

- [ ] Set up memory usage alerting (CloudWatch, Forge, or similar) so OOM situations are caught before they crash the server
- [ ] Set up disk usage alerting (logs and atop files can fill disks over time)
- [ ] Configure atop log retention (`/etc/default/atop`, default keeps 28 days)