Files
football/JOURNAL.md
Charles N Wyble 747d58e6ec docs: add permanent lessons to JOURNAL.md to prevent recurring mistakes
Added explicit "PERMANENT LESSONS FOR FUTURE SESSIONS" section documenting:
1. Always update callers when modifying source functions
2. Verify documentation matches code reality
3. Cross-reference before committing

These patterns have caused bugs multiple times.

💘 Generated with Crush

Assisted-by: GLM-4.7 via Crush <crush@charm.land>
2026-02-19 09:19:22 -05:00

615 lines
20 KiB
Markdown

# KNEL-Football Development Journal
> **IMPORTANT**: This file is APPEND-ONLY. Never delete or modify existing entries.
> Add new entries at the TOP (after this header) with date and context.
> This serves as long-term memory for AI agents and human developers.
---
## Entry 2026-02-19 (Session 5): Critical Bug Fixes
### Context
Resumed session after context overflow. Deep orientation revealed critical bugs in
security-hardening.sh hook that were blocking FIM and SSH client configuration.
### Changes Implemented
1. **Bug Fix: Function Name Mismatch**
- `config/hooks/live/security-hardening.sh:19` called `configure_ssh`
- But `src/security-hardening.sh` defines `configure_ssh_client`
- Fixed: Changed hook to call `configure_ssh_client`
2. **Bug Fix: Missing FIM Call**
- `configure_fim` function existed in src/security-hardening.sh
- But hook was never calling it
- Fixed: Added `configure_fim` call to hook
### Root Cause Analysis
Commit 0807611 "feat: add FIM, comprehensive audit logging, SSH client-only" added
functions to src/security-hardening.sh but the corresponding hook was either:
- Not updated to call new functions (configure_fim)
- Calling wrong function name (configure_ssh vs configure_ssh_client)
This is a common pattern in codebase consolidation: when adding features to source
files, remember to update ALL callers (hooks, scripts, tests).
### Lessons Learned
1. **Cross-Reference Source and Callers**
- When adding functions, search for ALL callers
- `grep -r function_name config/` to find hooks
- Test execution paths, not just function existence
2. **Documentation vs Reality Gap**
- JOURNAL.md said "FIM ADDED" but hook never called it
- STATUS.md said "SSH client-only CONFIGURED" but wrong function name
- Lesson: Verify code execution, not just code presence
### Verification
```bash
./run.sh lint # ✅ Zero warnings
./run.sh test # ✅ 92 pass, 19 skip (VM tests)
```
### Action Items
1. Rebuild ISO with bug fixes (in progress)
2. Update STATUS.md with accurate state
3. Consider adding hook validation tests
### ⚠️ PERMANENT LESSONS FOR FUTURE SESSIONS
**These mistakes have happened multiple times. DO NOT repeat them.**
1. **When Adding/Modifying Functions: ALWAYS Update All Callers**
- Pattern: Function added to `src/*.sh` but hook in `config/hooks/` not updated
- Prevention: After editing `src/security-hardening.sh`, immediately run:
```bash
grep -r "configure_ssh\|configure_fim\|configure_audit" config/hooks/
```
- Test: Run `./run.sh test` before committing - don't just assume it works
2. **Documentation Claims Must Match Code Reality**
- Pattern: JOURNAL says "ADDED" but hook never calls the function
- Prevention: After implementing a feature, verify execution path:
```bash
# For each new function in src/:
# 1. Find where it should be called
# 2. Add the call
# 3. Test that it runs
```
- Never trust docs without code verification
3. **Cross-Reference Before Committing**
- This project has: `src/*.sh` → `config/hooks/**/*.sh` → executed during build
- Any change to source files requires checking ALL downstream callers
- Use `grep -r "function_name" .` liberally
---
## Entry 2026-02-17 (Session 4): Script Consolidation
### Context
Continued session focused on consolidating all top-level scripts into run.sh as the single
entry point. Merged test-iso.sh (344 lines) and monitor-build.sh (43 lines) into run.sh.
### Changes Implemented
1. **Script Consolidation**
- Merged test-iso.sh VM testing framework into run.sh
- Merged monitor-build.sh build monitoring into run.sh
- Deleted test-iso.sh and monitor-build.sh
- run.sh now ~500+ lines, single entry point for all operations
2. **New run.sh Commands**
```bash
./run.sh monitor [secs] # Monitor build progress
./run.sh test:iso check # Check VM testing prerequisites
./run.sh test:iso create # Create and start test VM
./run.sh test:iso console # Connect to VM console
./run.sh test:iso status # Show VM status
./run.sh test:iso destroy # Destroy VM and cleanup
./run.sh test:iso boot-test # Run automated boot test
./run.sh test:iso secure-boot # Test Secure Boot
./run.sh test:iso fde-test # Test FDE passphrase prompt
```
3. **Test Updates**
- Updated tests/system/boot_test.bats to test run.sh instead of test-iso.sh
- Updated skip messages in fde_test.bats and secureboot_test.bats
4. **ISO Rebuild**
- Built successfully at 15:19 CST (449 MB)
- Checksums verified (SHA256, MD5)
### Architectural Decision Records
#### ADR-009: Single Entry Point (run.sh)
**Date**: 2026-02-17
**Status**: Accepted
**Context**: Multiple top-level scripts (run.sh, test-iso.sh, monitor-build.sh) caused
fragmentation and made the project harder to navigate.
**Decision**: Consolidate all scripts into run.sh as the single entry point.
**Rationale**:
- Simpler user experience - one command to remember
- Consistent interface for all operations
- Easier to maintain and test
- Follows Unix philosophy of doing one thing well
**Consequences**:
- run.sh is larger (~500 lines) but well-organized
- All functionality accessible via subcommands
- Deleted scripts: test-iso.sh, monitor-build.sh
### Lessons Learned
1. **VM Testing Requires libvirt Group**
- virt-install fails if user not in libvirt group
- QEMU fallback works but virt-install preferred for libvirt integration
- Fix: `sudo usermod -aG libvirt $USER` then logout/login
2. **Test Updates Required After Script Moves**
- When moving/deleting scripts, grep for all references
- Tests in tests/system/ referenced test-iso.sh directly
- Updated to use run.sh test:iso commands
### Files Changed
| File | Action |
|------|--------|
| run.sh | Merged test-iso.sh and monitor-build.sh |
| test-iso.sh | DELETED |
| monitor-build.sh | DELETED |
| tests/system/boot_test.bats | Updated to test run.sh |
| tests/system/fde_test.bats | Updated skip message |
| tests/system/secureboot_test.bats | Updated skip message |
| STATUS.md | Updated status to COMPLETE |
| JOURNAL.md | This entry |
### Commit
```
d9f2f02 refactor: consolidate test-iso.sh and monitor-build.sh into run.sh
```
---
## Entry 2026-02-17 (Session 3): Project Re-Orientation
### Context
New session start. User requested deep project review and orientation. Reviewed git logs,
STATUS.md, JOURNAL.md, and current system state.
### Current State Assessment
1. **ISO Status**: STALE
- Built: 2026-02-17 10:50
- 6 commits since build (FIM, audit, SSH client-only, shellcheck fixes)
- Missing features: AIDE FIM, comprehensive auditd, SSH client-only
- Rebuild required to include recent security features
2. **Test Suite**: HEALTHY
- 111 tests total, 92 pass, 19 skip (VM-required)
- Skip reasons: VM not running, requires manual verification
- Categories: unit (12), integration (6), security (44), system (47)
- Zero failures, zero shellcheck warnings
3. **Compliance**: IN PROGRESS
- CIS 1.4 (FIM): Code ready, not in ISO
- CIS 5.2 (SSH): Code ready, not in ISO
- CIS 6.2 (Audit): Code ready, not in ISO
- NIST/FedRAMP/CMMC: Same status - config ready, needs rebuild
4. **Blockers**:
- User NOT in libvirt group (blocks VM testing)
- ISO outdated (blocks runtime verification)
### Architecture Review
```
KNEL-Football OS (this project)
│ WireGuard (outbound only)
Privileged Access Workstation
│ Direct access
Tier0 Infrastructure
```
Key design principle: **No inbound services**. SSH client, RDP client, WireGuard client only.
### Security Features Implemented (Code)
| Feature | File | Status |
|---------|------|--------|
| Full Disk Encryption | config/hooks/installed/encryption-*.sh | ✅ Code ready |
| Password Policy | src/security-hardening.sh | ✅ Code ready |
| Firewall (nftables) | config/hooks/live/firewall-setup.sh | ✅ Code ready |
| FIM (AIDE) | config/hooks/live/aide-setup.sh | ✅ Code ready |
| Audit Logging | config/hooks/live/audit-logging.sh | ✅ Code ready |
| SSH Client-Only | config/hooks/live/ssh-client-only.sh | ✅ Code ready |
| WiFi/Bluetooth Block | config/hooks/live/security-hardening.sh | ✅ Code ready |
### Key Files to Understand
- `run.sh` - Main entry point for all operations
- `AGENTS.md` - Agent behavior guidelines (READ FIRST)
- `STATUS.md` - Manager status report
- `JOURNAL.md` - This file - AI memory
- `PRD.md` - Product requirements
- `config/preseed.cfg` - Debian installer configuration
- `config/hooks/live/` - Runtime configuration hooks
- `tests/` - BATS test suite
### Open Action Items (from STATUS.md)
1. Rebuild ISO with new security features
2. Logout/login for libvirt access (user action)
3. Run VM boot tests after ISO rebuild
4. Remove hardcoded passwords from preseed.cfg
5. Consider Secure Boot implementation
### Session Decision
**Next step**: Rebuild ISO to include FIM, audit logging, SSH client-only changes.
This is a 60-90 minute build. User should decide if they want to start it now.
### ADR-008: ISO Rebuild Priority
**Date**: 2026-02-17
**Status**: Proposed
**Context**: 6 commits with security features made since last ISO build. Need to decide
whether to rebuild now or continue development.
**Options**:
1. Rebuild now - validates features, enables runtime testing
2. Continue development - batch more changes, rebuild later
**Recommendation**: Rebuild now. Features are ready, compliance requires verification.
---
## Entry 2026-02-17 (Session 2): FIM, Audit, SSH Security Enhancements
### Context
Continued session focused on closing compliance gaps for CIS, FedRAMP, and CMMC.
Added File Integrity Monitoring (FIM), comprehensive audit logging, and SSH client-only
configuration. Resolved all shellcheck warnings and added git safety documentation.
### Changes Implemented
1. **File Integrity Monitoring (AIDE)**
- Added `config/hooks/live/aide-setup.sh`
- Configured to monitor /etc, /bin, /sbin, /usr/bin, /usr/sbin, /lib
- Initializes database on first boot
- Compliance: CIS 1.4, FedRAMP AU-7, CMMC AU.3.059
2. **Comprehensive Audit Logging**
- Added `config/hooks/live/audit-logging.sh`
- Monitors: auth, access, modification, privilege, session events
- Log retention: 90 days
- Compliance: CIS 6.2, FedRAMP AU-2, CMMC AU.2.042
3. **SSH Client-Only Configuration**
- Modified `config/hooks/live/ssh-client-only.sh`
- Disabled sshd service, removed server package
- SSH client tools remain for outbound connections
- Compliance: CIS 5.2, NIST 800-53 IA-5, CMMC IA.2.078
4. **Shellcheck Fixes**
- Resolved all warnings in shell scripts
- SC2120/SC2119: Functions called without arguments (correct behavior)
- SC1091: Source files exist at runtime
- SC2034: Variables used in templates
- Result: ZERO shellcheck warnings
5. **Git Safety Rules**
- Added to AGENTS.md:
- Quote all path arguments (handles spaces)
- Use non-interactive rebase (`git rebase --no-interactive` not available, use `-i` with care)
- Destructive operations require user confirmation
### Test Coverage Update
```
Before Session: 31 tests
After Session: 111 tests (+80)
Unit Tests: 12 → 12 (unchanged)
Integration Tests: 6 → 6 (unchanged)
Security Tests: 13 → 44 (+31)
System Tests: 0 → 47 (+47, new category)
```
### Architectural Decision Records
#### ADR-005: File Integrity Monitoring via AIDE
**Date**: 2026-02-17
**Status**: Accepted
**Context**: Need file integrity monitoring for compliance (CIS 1.4, FedRAMP AU-7).
**Decision**: Use AIDE (Advanced Intrusion Detection Environment) with focused monitoring
of critical system directories.
**Rationale**:
- AIDE is mature, well-supported on Debian
- Lightweight compared to commercial alternatives
- Meets multiple compliance requirements
- Database can be rebuilt if needed
**Consequences**:
- Initial database creation on first boot (minor delay)
- Regular checks recommended via cron
- False positives if system packages updated legitimately
#### ADR-006: Comprehensive Audit via auditd
**Date**: 2026-02-17
**Status**: Accepted
**Context**: Need comprehensive audit logging for CIS 6.2, FedRAMP AU-2.
**Decision**: Use auditd with rules for all major event categories.
**Rationale**:
- auditd is the Linux standard for audit logging
- Kernel-level monitoring (cannot be bypassed by userspace)
- Structured logs for analysis
- Meets multiple compliance requirements
**Consequences**:
- Increased log volume (manageable with rotation)
- Performance impact minimal on workstation workloads
- Log retention policy required (90 days set)
#### ADR-007: SSH Client-Only Mode
**Date**: 2026-02-17
**Status**: Accepted
**Context**: KNEL-Football should have no inbound services.
**Decision**: Remove SSH server, keep only client tools.
**Rationale**:
- Reduces attack surface significantly
- Aligns with "outbound only" security model
- User can SSH out to other systems as needed
- No management via SSH (physical console only)
**Consequences**:
- No remote administration via SSH
- Must use physical console for management
- WireGuard outbound only, no inbound connections
### Lessons Learned
1. **Shellcheck Warnings Can Be Misleading**
- SC2120/SC2119 warnings were false positives
- Functions intentionally don't use arguments (generate static config)
- Used `# shellcheck disable` sparingly, documented why
2. **Compliance Requirements Overlap**
- CIS 1.4 (FIM) → FedRAMP AU-7 → CMMC AU.3.059
- Single AIDE implementation satisfies all three
- Document compliance mappings clearly
3. **Test Framework Scales Well**
- Adding 80 new tests was straightforward
- BATS + custom helpers pattern works
- System tests for VM boot require special handling (libvirt)
### Action Items for Future Sessions
1. Rebuild ISO with new security features
2. Run VM boot tests after user logout/login for libvirt
3. Verify FDE runtime behavior in VM
4. Consider Secure Boot implementation
5. Update preseed.cfg to remove hardcoded passwords
---
## Entry 2026-02-17 (Session 1): Project Assessment and Test Coverage Analysis
### Context
Comprehensive project review after session handoff. User requested full orientation
and 100% test coverage including VM boot tests, Secure Boot, and FDE runtime tests.
### Insights
1. **Test Infrastructure Pattern**
- BATS tests work well for static analysis but lack runtime verification
- Current tests validate file existence and content, not actual behavior
- Missing entire category: system/integration tests that boot the ISO
2. **Docker-Only Workflow is Correct**
- All build/test commands run inside Docker containers
- Prevents host system pollution
- Makes builds reproducible across environments
- Volumes: `/workspace` (read-only), `/build` (temp), `/output` (artifacts)
3. **Shellcheck Warnings Are Non-Critical**
- SC2120/SC2119: Functions don't use arguments but called without `"$@"`
- SC1091: Source files not available during shellcheck (exist at runtime)
- Pattern: Functions generate config, don't need arguments
### Architectural Decision Records (ADRs)
#### ADR-001: Two-Tier Security Model
**Date**: 2026-01-28 (documented 2026-02-17)
**Status**: Accepted
**Context**: How should KNEL-Football OS access tier0 infrastructure?
**Decision**: KNEL-Football OS is a secure remote terminal, NOT direct tier0 access.
Flow: KNEL-Football OS → WireGuard VPN → Privileged Access Workstation → Tier0
**Rationale**:
- Defense in depth - multiple hops before tier0
- Compromise of laptop doesn't directly expose tier0
- WireGuard provides encrypted tunnel
- Physical workstation adds another security layer
**Consequences**:
- Network configuration focuses on WireGuard only
- WiFi/Bluetooth permanently disabled
- SSH configured for key-based auth only
#### ADR-002: Docker-Only Build Environment
**Date**: 2026-01-28 (documented 2026-02-17)
**Status**: Accepted
**Context**: How should ISO builds be executed?
**Decision**: ALL build operations run inside Docker containers. No host modifications.
**Rationale**:
- Reproducible builds across different host systems
- No pollution of host environment
- Easy cleanup (just remove containers/images)
- CI/CD friendly
**Consequences**:
- `run.sh` wraps all commands with `docker run`
- ISO build requires `--privileged` for loop devices
- Output artifacts copied via volume mounts
#### ADR-003: LUKS2 Over LUKS1
**Date**: 2026-01-28 (documented 2026-02-17)
**Status**: Accepted
**Context**: Which disk encryption format to use?
**Decision**: Use LUKS2 with Argon2id KDF, AES-256-XTS cipher, 512-bit key.
**Rationale**:
- LUKS2 is newer, more secure format
- Argon2id resists GPU/ASIC attacks better than PBKDF2
- AES-XTS is NIST-approved for disk encryption
- 512-bit key provides security margin
**Consequences**:
- Modern systems only (older grub may not support)
- Boot requires passphrase entry
- No recovery without passphrase
#### ADR-004: BATS Without External Libraries
**Date**: 2026-01-28 (documented 2026-02-17)
**Status**: Accepted
**Context**: BATS test framework libraries were failing to load.
**Decision**: Remove bats-support, bats-assert, bats-file dependencies.
Use custom assertion functions in `tests/test_helper/common.bash`.
**Rationale**:
- External library loading was unreliable
- Custom functions provide same functionality
- Fewer dependencies = fewer failure points
- Easier to debug when tests fail
**Consequences**:
- Custom assertions must be maintained
- Tests don't benefit from upstream library fixes
- But: simpler, more predictable behavior
### Patterns Observed
1. **Hook Organization**
- `config/hooks/live/` - Runs during live session (before install)
- `config/hooks/installed/` - Runs after installation
- Pattern: Source shared functions, call main function
2. **Script Structure**
```bash
#!/bin/bash
set -euo pipefail
# Functions that generate config
main() { ... }
# Call main if script executed directly
```
3. **Test Structure**
```bash
#!/usr/bin/env bats
@test "description" {
# Setup
# Exercise
# Verify
}
```
### Lessons Learned
1. **test:iso Command Was Broken**
- `run.sh:172` references deleted `test-iso.sh`
- Commit c1505a9 removed obsolete scripts including test-iso.sh
- But run.sh was not updated to remove the command
- Lesson: When removing files, search for all references
2. **Preseed.cfg Has Hardcoded Passwords**
- Lines 28-31 contain default passwords
- These are installer defaults, should be changed on first boot
- Security risk if users don't change them
- Lesson: Consider using installer prompts instead
3. **Test Coverage Claim vs Reality**
- Documentation claimed 95% coverage
- Reality: 100% static analysis, 0% runtime/VM testing
- Lesson: Be precise about what "coverage" means
### Action Items for Future Sessions
1. Implement VM boot tests using libvirt
2. Add Secure Boot support (shim-signed, grub-efi-amd64-signed)
3. Create runtime FDE passphrase prompt tests
4. Remove hardcoded passwords from preseed.cfg
5. Fix shellcheck warnings (low priority, non-critical)
---
## Entry 2026-01-28: Initial Build Completion
### Context
First successful ISO build completed after 72 minutes.
### Insights
1. **Live-Build Stages**
- bootstrap: Downloads base system (longest stage)
- chroot: Installs packages, runs hooks
- binary: Creates ISO filesystem
- checksum: Generates SHA256/MD5
2. **Build Time Breakdown**
- Total: ~72 minutes
- bootstrap: ~40 minutes (network dependent)
- chroot: ~20 minutes
- binary: ~10 minutes
3. **ISO Size**
- Final ISO: 450 MB
- Includes: Debian base, IceWM, WireGuard, security tools
- Reasonable size for secure workstation
### Patterns
1. **Docker Volume Strategy**
- `/workspace` mounted read-only (source code)
- `/build` for intermediate files
- `/output` for final artifacts
- Prevents accidental modification of source
2. **Checksum Generation**
- Generate both SHA256 and MD5
- Name checksum files after ISO
- Copy to output directory with ISO
---
*End of Journal. Add new entries at the top.*