# Analysing atop Binary Logs with Python ## Overview atop writes binary log files (typically `/var/log/atop/atop_YYYYMMDD`) that contain per-minute snapshots of system and per-process stats. These can be read on a remote machine using Python without needing atop installed locally. ## Prerequisites ```bash python3 -m venv /tmp/atop_venv source /tmp/atop_venv/bin/activate pip install atoparser ``` The `atoparser` package provides struct definitions but we parse the binary directly for flexibility. ## File Format | Component | Size (bytes) | Notes | |---|---|---| | Raw header | 480 | File-level metadata, magic `0xfeedbeef` | | Per-record header | 96 | Timestamp, compressed data lengths, process counts | | System stats (sstat) | variable | zlib-compressed system memory/CPU/disk data | | Process stats (pstat) | variable | zlib-compressed per-process data, each entry 840 bytes (TStat) | Records are sequential: `[raw header][rec1 header][rec1 sstat][rec1 pstat][rec2 header][rec2 sstat][rec2 pstat]...` ## Record Header Layout (96 bytes) ```python curtime = struct.unpack(' 0 else '?' vmem = struct.unpack(' 0 and isproc == 1 and rmem > 0: procs.append({ 'pid': pid, 'ppid': ppid, 'name': name, 'nthr': nthr, 'state': state, 'vmem_mb': vmem * pagesize / (1024*1024), 'rmem_mb': rmem * pagesize / (1024*1024), 'pmem_mb': pmem * pagesize / (1024*1024), 'vswap_mb': vswap * pagesize / (1024*1024), }) return procs # Iterate through all records pos = rawheadlen while pos + rawreclen <= len(data): rec = data[pos:pos+rawreclen] curtime = struct.unpack(' 1780000000: # Try to find next valid record by scanning forward found = False for skip in range(4, 500, 4): if pos + skip + 4 > len(data): break ts_val = struct.unpack('8.0f}M PSS={stats['pmem']:>8.0f}M Swap={stats['vswap']:>6.0f}M") pos = pstat_start + pcomplen ``` ## Notes on RSS vs PSS - **RSS (Resident Set Size):** Physical RAM mapped into the process. Includes shared libraries and mmap'd files. Over-counts shared memory (counted in full for every process that maps it). - **PSS (Proportional Set Size):** Shared pages divided by the number of processes sharing them. More accurate for total memory accounting. - **MySQL RSS is misleading:** InnoDB mmap's its data files, inflating RSS by gigabytes. Actual private memory is much lower (check with `top` or `smem`). - **Redis RSS is accurate:** Redis stores data in heap (anonymous) memory, so RSS closely reflects real usage. - **PHP-FPM RSS over-counts:** Workers share PHP code pages. PSS shows true per-worker cost. ## Gotchas 1. The timestamp validation range needs adjusting per file. Use `date -d @TIMESTAMP` to check. 2. Some records may have alignment gaps between them -- the skip-forward loop handles this. 3. The `isproc` field at offset 64 distinguishes processes from threads. Filter by `isproc == 1` to avoid double-counting thread memory. 4. The `name` field is truncated to 15 characters. Long process names like `php-fpm8.4` fit, but `amazon-ssm-agent` becomes `amazon-ssm-agen`. 5. All memory values in the TStat struct are in pages (4096 bytes on most Linux systems).