Files
football/JOURNAL.md
Charles N Wyble 747d58e6ec docs: add permanent lessons to JOURNAL.md to prevent recurring mistakes
Added explicit "PERMANENT LESSONS FOR FUTURE SESSIONS" section documenting:
1. Always update callers when modifying source functions
2. Verify documentation matches code reality
3. Cross-reference before committing

These patterns have caused bugs multiple times.

💘 Generated with Crush

Assisted-by: GLM-4.7 via Crush <crush@charm.land>
2026-02-19 09:19:22 -05:00

20 KiB

KNEL-Football Development Journal

Important

: This file is APPEND-ONLY. Never delete or modify existing entries. Add new entries at the TOP (after this header) with date and context. This serves as long-term memory for AI agents and human developers.


Entry 2026-02-19 (Session 5): Critical Bug Fixes

Context

Resumed session after context overflow. Deep orientation revealed critical bugs in security-hardening.sh hook that were blocking FIM and SSH client configuration.

Changes Implemented

  1. Bug Fix: Function Name Mismatch

    • config/hooks/live/security-hardening.sh:19 called configure_ssh
    • But src/security-hardening.sh defines configure_ssh_client
    • Fixed: Changed hook to call configure_ssh_client
  2. Bug Fix: Missing FIM Call

    • configure_fim function existed in src/security-hardening.sh
    • But hook was never calling it
    • Fixed: Added configure_fim call to hook

Root Cause Analysis

Commit 0807611 "feat: add FIM, comprehensive audit logging, SSH client-only" added functions to src/security-hardening.sh but the corresponding hook was either:

  • Not updated to call new functions (configure_fim)
  • Calling wrong function name (configure_ssh vs configure_ssh_client)

This is a common pattern in codebase consolidation: when adding features to source files, remember to update ALL callers (hooks, scripts, tests).

Lessons Learned

  1. Cross-Reference Source and Callers

    • When adding functions, search for ALL callers
    • grep -r function_name config/ to find hooks
    • Test execution paths, not just function existence
  2. Documentation vs Reality Gap

    • JOURNAL.md said "FIM ADDED" but hook never called it
    • STATUS.md said "SSH client-only CONFIGURED" but wrong function name
    • Lesson: Verify code execution, not just code presence

Verification

./run.sh lint    # ✅ Zero warnings
./run.sh test    # ✅ 92 pass, 19 skip (VM tests)

Action Items

  1. Rebuild ISO with bug fixes (in progress)
  2. Update STATUS.md with accurate state
  3. Consider adding hook validation tests

⚠️ PERMANENT LESSONS FOR FUTURE SESSIONS

These mistakes have happened multiple times. DO NOT repeat them.

  1. When Adding/Modifying Functions: ALWAYS Update All Callers

    • Pattern: Function added to src/*.sh but hook in config/hooks/ not updated
    • Prevention: After editing src/security-hardening.sh, immediately run:
      grep -r "configure_ssh\|configure_fim\|configure_audit" config/hooks/
      
    • Test: Run ./run.sh test before committing - don't just assume it works
  2. Documentation Claims Must Match Code Reality

    • Pattern: JOURNAL says "ADDED" but hook never calls the function
    • Prevention: After implementing a feature, verify execution path:
      # For each new function in src/:
      # 1. Find where it should be called
      # 2. Add the call
      # 3. Test that it runs
      
    • Never trust docs without code verification
  3. Cross-Reference Before Committing

    • This project has: src/*.shconfig/hooks/**/*.sh → executed during build
    • Any change to source files requires checking ALL downstream callers
    • Use grep -r "function_name" . liberally

Entry 2026-02-17 (Session 4): Script Consolidation

Context

Continued session focused on consolidating all top-level scripts into run.sh as the single entry point. Merged test-iso.sh (344 lines) and monitor-build.sh (43 lines) into run.sh.

Changes Implemented

  1. Script Consolidation

    • Merged test-iso.sh VM testing framework into run.sh
    • Merged monitor-build.sh build monitoring into run.sh
    • Deleted test-iso.sh and monitor-build.sh
    • run.sh now ~500+ lines, single entry point for all operations
  2. New run.sh Commands

    ./run.sh monitor [secs]          # Monitor build progress
    ./run.sh test:iso check          # Check VM testing prerequisites
    ./run.sh test:iso create         # Create and start test VM
    ./run.sh test:iso console        # Connect to VM console
    ./run.sh test:iso status         # Show VM status
    ./run.sh test:iso destroy        # Destroy VM and cleanup
    ./run.sh test:iso boot-test      # Run automated boot test
    ./run.sh test:iso secure-boot    # Test Secure Boot
    ./run.sh test:iso fde-test       # Test FDE passphrase prompt
    
  3. Test Updates

    • Updated tests/system/boot_test.bats to test run.sh instead of test-iso.sh
    • Updated skip messages in fde_test.bats and secureboot_test.bats
  4. ISO Rebuild

    • Built successfully at 15:19 CST (449 MB)
    • Checksums verified (SHA256, MD5)

Architectural Decision Records

ADR-009: Single Entry Point (run.sh)

Date: 2026-02-17 Status: Accepted

Context: Multiple top-level scripts (run.sh, test-iso.sh, monitor-build.sh) caused fragmentation and made the project harder to navigate.

Decision: Consolidate all scripts into run.sh as the single entry point.

Rationale:

  • Simpler user experience - one command to remember
  • Consistent interface for all operations
  • Easier to maintain and test
  • Follows Unix philosophy of doing one thing well

Consequences:

  • run.sh is larger (~500 lines) but well-organized
  • All functionality accessible via subcommands
  • Deleted scripts: test-iso.sh, monitor-build.sh

Lessons Learned

  1. VM Testing Requires libvirt Group

    • virt-install fails if user not in libvirt group
    • QEMU fallback works but virt-install preferred for libvirt integration
    • Fix: sudo usermod -aG libvirt $USER then logout/login
  2. Test Updates Required After Script Moves

    • When moving/deleting scripts, grep for all references
    • Tests in tests/system/ referenced test-iso.sh directly
    • Updated to use run.sh test:iso commands

Files Changed

File Action
run.sh Merged test-iso.sh and monitor-build.sh
test-iso.sh DELETED
monitor-build.sh DELETED
tests/system/boot_test.bats Updated to test run.sh
tests/system/fde_test.bats Updated skip message
tests/system/secureboot_test.bats Updated skip message
STATUS.md Updated status to COMPLETE
JOURNAL.md This entry

Commit

d9f2f02 refactor: consolidate test-iso.sh and monitor-build.sh into run.sh

Entry 2026-02-17 (Session 3): Project Re-Orientation

Context

New session start. User requested deep project review and orientation. Reviewed git logs, STATUS.md, JOURNAL.md, and current system state.

Current State Assessment

  1. ISO Status: STALE

    • Built: 2026-02-17 10:50
    • 6 commits since build (FIM, audit, SSH client-only, shellcheck fixes)
    • Missing features: AIDE FIM, comprehensive auditd, SSH client-only
    • Rebuild required to include recent security features
  2. Test Suite: HEALTHY

    • 111 tests total, 92 pass, 19 skip (VM-required)
    • Skip reasons: VM not running, requires manual verification
    • Categories: unit (12), integration (6), security (44), system (47)
    • Zero failures, zero shellcheck warnings
  3. Compliance: IN PROGRESS

    • CIS 1.4 (FIM): Code ready, not in ISO
    • CIS 5.2 (SSH): Code ready, not in ISO
    • CIS 6.2 (Audit): Code ready, not in ISO
    • NIST/FedRAMP/CMMC: Same status - config ready, needs rebuild
  4. Blockers:

    • User NOT in libvirt group (blocks VM testing)
    • ISO outdated (blocks runtime verification)

Architecture Review

KNEL-Football OS (this project)
    │ WireGuard (outbound only)
    ▼
Privileged Access Workstation
    │ Direct access
    ▼
Tier0 Infrastructure

Key design principle: No inbound services. SSH client, RDP client, WireGuard client only.

Security Features Implemented (Code)

Feature File Status
Full Disk Encryption config/hooks/installed/encryption-*.sh Code ready
Password Policy src/security-hardening.sh Code ready
Firewall (nftables) config/hooks/live/firewall-setup.sh Code ready
FIM (AIDE) config/hooks/live/aide-setup.sh Code ready
Audit Logging config/hooks/live/audit-logging.sh Code ready
SSH Client-Only config/hooks/live/ssh-client-only.sh Code ready
WiFi/Bluetooth Block config/hooks/live/security-hardening.sh Code ready

Key Files to Understand

  • run.sh - Main entry point for all operations
  • AGENTS.md - Agent behavior guidelines (READ FIRST)
  • STATUS.md - Manager status report
  • JOURNAL.md - This file - AI memory
  • PRD.md - Product requirements
  • config/preseed.cfg - Debian installer configuration
  • config/hooks/live/ - Runtime configuration hooks
  • tests/ - BATS test suite

Open Action Items (from STATUS.md)

  1. Rebuild ISO with new security features
  2. Logout/login for libvirt access (user action)
  3. Run VM boot tests after ISO rebuild
  4. Remove hardcoded passwords from preseed.cfg
  5. Consider Secure Boot implementation

Session Decision

Next step: Rebuild ISO to include FIM, audit logging, SSH client-only changes. This is a 60-90 minute build. User should decide if they want to start it now.

ADR-008: ISO Rebuild Priority

Date: 2026-02-17 Status: Proposed

Context: 6 commits with security features made since last ISO build. Need to decide whether to rebuild now or continue development.

Options:

  1. Rebuild now - validates features, enables runtime testing
  2. Continue development - batch more changes, rebuild later

Recommendation: Rebuild now. Features are ready, compliance requires verification.


Entry 2026-02-17 (Session 2): FIM, Audit, SSH Security Enhancements

Context

Continued session focused on closing compliance gaps for CIS, FedRAMP, and CMMC. Added File Integrity Monitoring (FIM), comprehensive audit logging, and SSH client-only configuration. Resolved all shellcheck warnings and added git safety documentation.

Changes Implemented

  1. File Integrity Monitoring (AIDE)

    • Added config/hooks/live/aide-setup.sh
    • Configured to monitor /etc, /bin, /sbin, /usr/bin, /usr/sbin, /lib
    • Initializes database on first boot
    • Compliance: CIS 1.4, FedRAMP AU-7, CMMC AU.3.059
  2. Comprehensive Audit Logging

    • Added config/hooks/live/audit-logging.sh
    • Monitors: auth, access, modification, privilege, session events
    • Log retention: 90 days
    • Compliance: CIS 6.2, FedRAMP AU-2, CMMC AU.2.042
  3. SSH Client-Only Configuration

    • Modified config/hooks/live/ssh-client-only.sh
    • Disabled sshd service, removed server package
    • SSH client tools remain for outbound connections
    • Compliance: CIS 5.2, NIST 800-53 IA-5, CMMC IA.2.078
  4. Shellcheck Fixes

    • Resolved all warnings in shell scripts
    • SC2120/SC2119: Functions called without arguments (correct behavior)
    • SC1091: Source files exist at runtime
    • SC2034: Variables used in templates
    • Result: ZERO shellcheck warnings
  5. Git Safety Rules

    • Added to AGENTS.md:
      • Quote all path arguments (handles spaces)
      • Use non-interactive rebase (git rebase --no-interactive not available, use -i with care)
      • Destructive operations require user confirmation

Test Coverage Update

Before Session: 31 tests
After Session:  111 tests (+80)

Unit Tests:        12 → 12 (unchanged)
Integration Tests:  6 →  6 (unchanged)
Security Tests:    13 → 44 (+31)
System Tests:       0 → 47 (+47, new category)

Architectural Decision Records

ADR-005: File Integrity Monitoring via AIDE

Date: 2026-02-17 Status: Accepted

Context: Need file integrity monitoring for compliance (CIS 1.4, FedRAMP AU-7).

Decision: Use AIDE (Advanced Intrusion Detection Environment) with focused monitoring of critical system directories.

Rationale:

  • AIDE is mature, well-supported on Debian
  • Lightweight compared to commercial alternatives
  • Meets multiple compliance requirements
  • Database can be rebuilt if needed

Consequences:

  • Initial database creation on first boot (minor delay)
  • Regular checks recommended via cron
  • False positives if system packages updated legitimately

ADR-006: Comprehensive Audit via auditd

Date: 2026-02-17 Status: Accepted

Context: Need comprehensive audit logging for CIS 6.2, FedRAMP AU-2.

Decision: Use auditd with rules for all major event categories.

Rationale:

  • auditd is the Linux standard for audit logging
  • Kernel-level monitoring (cannot be bypassed by userspace)
  • Structured logs for analysis
  • Meets multiple compliance requirements

Consequences:

  • Increased log volume (manageable with rotation)
  • Performance impact minimal on workstation workloads
  • Log retention policy required (90 days set)

ADR-007: SSH Client-Only Mode

Date: 2026-02-17 Status: Accepted

Context: KNEL-Football should have no inbound services.

Decision: Remove SSH server, keep only client tools.

Rationale:

  • Reduces attack surface significantly
  • Aligns with "outbound only" security model
  • User can SSH out to other systems as needed
  • No management via SSH (physical console only)

Consequences:

  • No remote administration via SSH
  • Must use physical console for management
  • WireGuard outbound only, no inbound connections

Lessons Learned

  1. Shellcheck Warnings Can Be Misleading

    • SC2120/SC2119 warnings were false positives
    • Functions intentionally don't use arguments (generate static config)
    • Used # shellcheck disable sparingly, documented why
  2. Compliance Requirements Overlap

    • CIS 1.4 (FIM) → FedRAMP AU-7 → CMMC AU.3.059
    • Single AIDE implementation satisfies all three
    • Document compliance mappings clearly
  3. Test Framework Scales Well

    • Adding 80 new tests was straightforward
    • BATS + custom helpers pattern works
    • System tests for VM boot require special handling (libvirt)

Action Items for Future Sessions

  1. Rebuild ISO with new security features
  2. Run VM boot tests after user logout/login for libvirt
  3. Verify FDE runtime behavior in VM
  4. Consider Secure Boot implementation
  5. Update preseed.cfg to remove hardcoded passwords

Entry 2026-02-17 (Session 1): Project Assessment and Test Coverage Analysis

Context

Comprehensive project review after session handoff. User requested full orientation and 100% test coverage including VM boot tests, Secure Boot, and FDE runtime tests.

Insights

  1. Test Infrastructure Pattern

    • BATS tests work well for static analysis but lack runtime verification
    • Current tests validate file existence and content, not actual behavior
    • Missing entire category: system/integration tests that boot the ISO
  2. Docker-Only Workflow is Correct

    • All build/test commands run inside Docker containers
    • Prevents host system pollution
    • Makes builds reproducible across environments
    • Volumes: /workspace (read-only), /build (temp), /output (artifacts)
  3. Shellcheck Warnings Are Non-Critical

    • SC2120/SC2119: Functions don't use arguments but called without "$@"
    • SC1091: Source files not available during shellcheck (exist at runtime)
    • Pattern: Functions generate config, don't need arguments

Architectural Decision Records (ADRs)

ADR-001: Two-Tier Security Model

Date: 2026-01-28 (documented 2026-02-17) Status: Accepted

Context: How should KNEL-Football OS access tier0 infrastructure?

Decision: KNEL-Football OS is a secure remote terminal, NOT direct tier0 access. Flow: KNEL-Football OS → WireGuard VPN → Privileged Access Workstation → Tier0

Rationale:

  • Defense in depth - multiple hops before tier0
  • Compromise of laptop doesn't directly expose tier0
  • WireGuard provides encrypted tunnel
  • Physical workstation adds another security layer

Consequences:

  • Network configuration focuses on WireGuard only
  • WiFi/Bluetooth permanently disabled
  • SSH configured for key-based auth only

ADR-002: Docker-Only Build Environment

Date: 2026-01-28 (documented 2026-02-17) Status: Accepted

Context: How should ISO builds be executed?

Decision: ALL build operations run inside Docker containers. No host modifications.

Rationale:

  • Reproducible builds across different host systems
  • No pollution of host environment
  • Easy cleanup (just remove containers/images)
  • CI/CD friendly

Consequences:

  • run.sh wraps all commands with docker run
  • ISO build requires --privileged for loop devices
  • Output artifacts copied via volume mounts

ADR-003: LUKS2 Over LUKS1

Date: 2026-01-28 (documented 2026-02-17) Status: Accepted

Context: Which disk encryption format to use?

Decision: Use LUKS2 with Argon2id KDF, AES-256-XTS cipher, 512-bit key.

Rationale:

  • LUKS2 is newer, more secure format
  • Argon2id resists GPU/ASIC attacks better than PBKDF2
  • AES-XTS is NIST-approved for disk encryption
  • 512-bit key provides security margin

Consequences:

  • Modern systems only (older grub may not support)
  • Boot requires passphrase entry
  • No recovery without passphrase

ADR-004: BATS Without External Libraries

Date: 2026-01-28 (documented 2026-02-17) Status: Accepted

Context: BATS test framework libraries were failing to load.

Decision: Remove bats-support, bats-assert, bats-file dependencies. Use custom assertion functions in tests/test_helper/common.bash.

Rationale:

  • External library loading was unreliable
  • Custom functions provide same functionality
  • Fewer dependencies = fewer failure points
  • Easier to debug when tests fail

Consequences:

  • Custom assertions must be maintained
  • Tests don't benefit from upstream library fixes
  • But: simpler, more predictable behavior

Patterns Observed

  1. Hook Organization

    • config/hooks/live/ - Runs during live session (before install)
    • config/hooks/installed/ - Runs after installation
    • Pattern: Source shared functions, call main function
  2. Script Structure

    #!/bin/bash
    set -euo pipefail
    # Functions that generate config
    main() { ... }
    # Call main if script executed directly
    
  3. Test Structure

    #!/usr/bin/env bats
    @test "description" {
        # Setup
        # Exercise
        # Verify
    }
    

Lessons Learned

  1. test:iso Command Was Broken

    • run.sh:172 references deleted test-iso.sh
    • Commit c1505a9 removed obsolete scripts including test-iso.sh
    • But run.sh was not updated to remove the command
    • Lesson: When removing files, search for all references
  2. Preseed.cfg Has Hardcoded Passwords

    • Lines 28-31 contain default passwords
    • These are installer defaults, should be changed on first boot
    • Security risk if users don't change them
    • Lesson: Consider using installer prompts instead
  3. Test Coverage Claim vs Reality

    • Documentation claimed 95% coverage
    • Reality: 100% static analysis, 0% runtime/VM testing
    • Lesson: Be precise about what "coverage" means

Action Items for Future Sessions

  1. Implement VM boot tests using libvirt
  2. Add Secure Boot support (shim-signed, grub-efi-amd64-signed)
  3. Create runtime FDE passphrase prompt tests
  4. Remove hardcoded passwords from preseed.cfg
  5. Fix shellcheck warnings (low priority, non-critical)

Entry 2026-01-28: Initial Build Completion

Context

First successful ISO build completed after 72 minutes.

Insights

  1. Live-Build Stages

    • bootstrap: Downloads base system (longest stage)
    • chroot: Installs packages, runs hooks
    • binary: Creates ISO filesystem
    • checksum: Generates SHA256/MD5
  2. Build Time Breakdown

    • Total: ~72 minutes
    • bootstrap: ~40 minutes (network dependent)
    • chroot: ~20 minutes
    • binary: ~10 minutes
  3. ISO Size

    • Final ISO: 450 MB
    • Includes: Debian base, IceWM, WireGuard, security tools
    • Reasonable size for secure workstation

Patterns

  1. Docker Volume Strategy

    • /workspace mounted read-only (source code)
    • /build for intermediate files
    • /output for final artifacts
    • Prevents accidental modification of source
  2. Checksum Generation

    • Generate both SHA256 and MD5
    • Name checksum files after ISO
    • Copy to output directory with ISO

End of Journal. Add new entries at the top.