Added explicit "PERMANENT LESSONS FOR FUTURE SESSIONS" section documenting: 1. Always update callers when modifying source functions 2. Verify documentation matches code reality 3. Cross-reference before committing These patterns have caused bugs multiple times. 💘 Generated with Crush Assisted-by: GLM-4.7 via Crush <crush@charm.land>
20 KiB
KNEL-Football Development Journal
Important
: This file is APPEND-ONLY. Never delete or modify existing entries. Add new entries at the TOP (after this header) with date and context. This serves as long-term memory for AI agents and human developers.
Entry 2026-02-19 (Session 5): Critical Bug Fixes
Context
Resumed session after context overflow. Deep orientation revealed critical bugs in security-hardening.sh hook that were blocking FIM and SSH client configuration.
Changes Implemented
-
Bug Fix: Function Name Mismatch
config/hooks/live/security-hardening.sh:19calledconfigure_ssh- But
src/security-hardening.shdefinesconfigure_ssh_client - Fixed: Changed hook to call
configure_ssh_client
-
Bug Fix: Missing FIM Call
configure_fimfunction existed in src/security-hardening.sh- But hook was never calling it
- Fixed: Added
configure_fimcall to hook
Root Cause Analysis
Commit 0807611 "feat: add FIM, comprehensive audit logging, SSH client-only" added
functions to src/security-hardening.sh but the corresponding hook was either:
- Not updated to call new functions (configure_fim)
- Calling wrong function name (configure_ssh vs configure_ssh_client)
This is a common pattern in codebase consolidation: when adding features to source files, remember to update ALL callers (hooks, scripts, tests).
Lessons Learned
-
Cross-Reference Source and Callers
- When adding functions, search for ALL callers
grep -r function_name config/to find hooks- Test execution paths, not just function existence
-
Documentation vs Reality Gap
- JOURNAL.md said "FIM ADDED" but hook never called it
- STATUS.md said "SSH client-only CONFIGURED" but wrong function name
- Lesson: Verify code execution, not just code presence
Verification
./run.sh lint # ✅ Zero warnings
./run.sh test # ✅ 92 pass, 19 skip (VM tests)
Action Items
- Rebuild ISO with bug fixes (in progress)
- Update STATUS.md with accurate state
- Consider adding hook validation tests
⚠️ PERMANENT LESSONS FOR FUTURE SESSIONS
These mistakes have happened multiple times. DO NOT repeat them.
-
When Adding/Modifying Functions: ALWAYS Update All Callers
- Pattern: Function added to
src/*.shbut hook inconfig/hooks/not updated - Prevention: After editing
src/security-hardening.sh, immediately run:grep -r "configure_ssh\|configure_fim\|configure_audit" config/hooks/ - Test: Run
./run.sh testbefore committing - don't just assume it works
- Pattern: Function added to
-
Documentation Claims Must Match Code Reality
- Pattern: JOURNAL says "ADDED" but hook never calls the function
- Prevention: After implementing a feature, verify execution path:
# For each new function in src/: # 1. Find where it should be called # 2. Add the call # 3. Test that it runs - Never trust docs without code verification
-
Cross-Reference Before Committing
- This project has:
src/*.sh→config/hooks/**/*.sh→ executed during build - Any change to source files requires checking ALL downstream callers
- Use
grep -r "function_name" .liberally
- This project has:
Entry 2026-02-17 (Session 4): Script Consolidation
Context
Continued session focused on consolidating all top-level scripts into run.sh as the single entry point. Merged test-iso.sh (344 lines) and monitor-build.sh (43 lines) into run.sh.
Changes Implemented
-
Script Consolidation
- Merged test-iso.sh VM testing framework into run.sh
- Merged monitor-build.sh build monitoring into run.sh
- Deleted test-iso.sh and monitor-build.sh
- run.sh now ~500+ lines, single entry point for all operations
-
New run.sh Commands
./run.sh monitor [secs] # Monitor build progress ./run.sh test:iso check # Check VM testing prerequisites ./run.sh test:iso create # Create and start test VM ./run.sh test:iso console # Connect to VM console ./run.sh test:iso status # Show VM status ./run.sh test:iso destroy # Destroy VM and cleanup ./run.sh test:iso boot-test # Run automated boot test ./run.sh test:iso secure-boot # Test Secure Boot ./run.sh test:iso fde-test # Test FDE passphrase prompt -
Test Updates
- Updated tests/system/boot_test.bats to test run.sh instead of test-iso.sh
- Updated skip messages in fde_test.bats and secureboot_test.bats
-
ISO Rebuild
- Built successfully at 15:19 CST (449 MB)
- Checksums verified (SHA256, MD5)
Architectural Decision Records
ADR-009: Single Entry Point (run.sh)
Date: 2026-02-17 Status: Accepted
Context: Multiple top-level scripts (run.sh, test-iso.sh, monitor-build.sh) caused fragmentation and made the project harder to navigate.
Decision: Consolidate all scripts into run.sh as the single entry point.
Rationale:
- Simpler user experience - one command to remember
- Consistent interface for all operations
- Easier to maintain and test
- Follows Unix philosophy of doing one thing well
Consequences:
- run.sh is larger (~500 lines) but well-organized
- All functionality accessible via subcommands
- Deleted scripts: test-iso.sh, monitor-build.sh
Lessons Learned
-
VM Testing Requires libvirt Group
- virt-install fails if user not in libvirt group
- QEMU fallback works but virt-install preferred for libvirt integration
- Fix:
sudo usermod -aG libvirt $USERthen logout/login
-
Test Updates Required After Script Moves
- When moving/deleting scripts, grep for all references
- Tests in tests/system/ referenced test-iso.sh directly
- Updated to use run.sh test:iso commands
Files Changed
| File | Action |
|---|---|
| run.sh | Merged test-iso.sh and monitor-build.sh |
| test-iso.sh | DELETED |
| monitor-build.sh | DELETED |
| tests/system/boot_test.bats | Updated to test run.sh |
| tests/system/fde_test.bats | Updated skip message |
| tests/system/secureboot_test.bats | Updated skip message |
| STATUS.md | Updated status to COMPLETE |
| JOURNAL.md | This entry |
Commit
d9f2f02 refactor: consolidate test-iso.sh and monitor-build.sh into run.sh
Entry 2026-02-17 (Session 3): Project Re-Orientation
Context
New session start. User requested deep project review and orientation. Reviewed git logs, STATUS.md, JOURNAL.md, and current system state.
Current State Assessment
-
ISO Status: STALE
- Built: 2026-02-17 10:50
- 6 commits since build (FIM, audit, SSH client-only, shellcheck fixes)
- Missing features: AIDE FIM, comprehensive auditd, SSH client-only
- Rebuild required to include recent security features
-
Test Suite: HEALTHY
- 111 tests total, 92 pass, 19 skip (VM-required)
- Skip reasons: VM not running, requires manual verification
- Categories: unit (12), integration (6), security (44), system (47)
- Zero failures, zero shellcheck warnings
-
Compliance: IN PROGRESS
- CIS 1.4 (FIM): Code ready, not in ISO
- CIS 5.2 (SSH): Code ready, not in ISO
- CIS 6.2 (Audit): Code ready, not in ISO
- NIST/FedRAMP/CMMC: Same status - config ready, needs rebuild
-
Blockers:
- User NOT in libvirt group (blocks VM testing)
- ISO outdated (blocks runtime verification)
Architecture Review
KNEL-Football OS (this project)
│ WireGuard (outbound only)
▼
Privileged Access Workstation
│ Direct access
▼
Tier0 Infrastructure
Key design principle: No inbound services. SSH client, RDP client, WireGuard client only.
Security Features Implemented (Code)
| Feature | File | Status |
|---|---|---|
| Full Disk Encryption | config/hooks/installed/encryption-*.sh | ✅ Code ready |
| Password Policy | src/security-hardening.sh | ✅ Code ready |
| Firewall (nftables) | config/hooks/live/firewall-setup.sh | ✅ Code ready |
| FIM (AIDE) | config/hooks/live/aide-setup.sh | ✅ Code ready |
| Audit Logging | config/hooks/live/audit-logging.sh | ✅ Code ready |
| SSH Client-Only | config/hooks/live/ssh-client-only.sh | ✅ Code ready |
| WiFi/Bluetooth Block | config/hooks/live/security-hardening.sh | ✅ Code ready |
Key Files to Understand
run.sh- Main entry point for all operationsAGENTS.md- Agent behavior guidelines (READ FIRST)STATUS.md- Manager status reportJOURNAL.md- This file - AI memoryPRD.md- Product requirementsconfig/preseed.cfg- Debian installer configurationconfig/hooks/live/- Runtime configuration hookstests/- BATS test suite
Open Action Items (from STATUS.md)
- Rebuild ISO with new security features
- Logout/login for libvirt access (user action)
- Run VM boot tests after ISO rebuild
- Remove hardcoded passwords from preseed.cfg
- Consider Secure Boot implementation
Session Decision
Next step: Rebuild ISO to include FIM, audit logging, SSH client-only changes. This is a 60-90 minute build. User should decide if they want to start it now.
ADR-008: ISO Rebuild Priority
Date: 2026-02-17 Status: Proposed
Context: 6 commits with security features made since last ISO build. Need to decide whether to rebuild now or continue development.
Options:
- Rebuild now - validates features, enables runtime testing
- Continue development - batch more changes, rebuild later
Recommendation: Rebuild now. Features are ready, compliance requires verification.
Entry 2026-02-17 (Session 2): FIM, Audit, SSH Security Enhancements
Context
Continued session focused on closing compliance gaps for CIS, FedRAMP, and CMMC. Added File Integrity Monitoring (FIM), comprehensive audit logging, and SSH client-only configuration. Resolved all shellcheck warnings and added git safety documentation.
Changes Implemented
-
File Integrity Monitoring (AIDE)
- Added
config/hooks/live/aide-setup.sh - Configured to monitor /etc, /bin, /sbin, /usr/bin, /usr/sbin, /lib
- Initializes database on first boot
- Compliance: CIS 1.4, FedRAMP AU-7, CMMC AU.3.059
- Added
-
Comprehensive Audit Logging
- Added
config/hooks/live/audit-logging.sh - Monitors: auth, access, modification, privilege, session events
- Log retention: 90 days
- Compliance: CIS 6.2, FedRAMP AU-2, CMMC AU.2.042
- Added
-
SSH Client-Only Configuration
- Modified
config/hooks/live/ssh-client-only.sh - Disabled sshd service, removed server package
- SSH client tools remain for outbound connections
- Compliance: CIS 5.2, NIST 800-53 IA-5, CMMC IA.2.078
- Modified
-
Shellcheck Fixes
- Resolved all warnings in shell scripts
- SC2120/SC2119: Functions called without arguments (correct behavior)
- SC1091: Source files exist at runtime
- SC2034: Variables used in templates
- Result: ZERO shellcheck warnings
-
Git Safety Rules
- Added to AGENTS.md:
- Quote all path arguments (handles spaces)
- Use non-interactive rebase (
git rebase --no-interactivenot available, use-iwith care) - Destructive operations require user confirmation
- Added to AGENTS.md:
Test Coverage Update
Before Session: 31 tests
After Session: 111 tests (+80)
Unit Tests: 12 → 12 (unchanged)
Integration Tests: 6 → 6 (unchanged)
Security Tests: 13 → 44 (+31)
System Tests: 0 → 47 (+47, new category)
Architectural Decision Records
ADR-005: File Integrity Monitoring via AIDE
Date: 2026-02-17 Status: Accepted
Context: Need file integrity monitoring for compliance (CIS 1.4, FedRAMP AU-7).
Decision: Use AIDE (Advanced Intrusion Detection Environment) with focused monitoring of critical system directories.
Rationale:
- AIDE is mature, well-supported on Debian
- Lightweight compared to commercial alternatives
- Meets multiple compliance requirements
- Database can be rebuilt if needed
Consequences:
- Initial database creation on first boot (minor delay)
- Regular checks recommended via cron
- False positives if system packages updated legitimately
ADR-006: Comprehensive Audit via auditd
Date: 2026-02-17 Status: Accepted
Context: Need comprehensive audit logging for CIS 6.2, FedRAMP AU-2.
Decision: Use auditd with rules for all major event categories.
Rationale:
- auditd is the Linux standard for audit logging
- Kernel-level monitoring (cannot be bypassed by userspace)
- Structured logs for analysis
- Meets multiple compliance requirements
Consequences:
- Increased log volume (manageable with rotation)
- Performance impact minimal on workstation workloads
- Log retention policy required (90 days set)
ADR-007: SSH Client-Only Mode
Date: 2026-02-17 Status: Accepted
Context: KNEL-Football should have no inbound services.
Decision: Remove SSH server, keep only client tools.
Rationale:
- Reduces attack surface significantly
- Aligns with "outbound only" security model
- User can SSH out to other systems as needed
- No management via SSH (physical console only)
Consequences:
- No remote administration via SSH
- Must use physical console for management
- WireGuard outbound only, no inbound connections
Lessons Learned
-
Shellcheck Warnings Can Be Misleading
- SC2120/SC2119 warnings were false positives
- Functions intentionally don't use arguments (generate static config)
- Used
# shellcheck disablesparingly, documented why
-
Compliance Requirements Overlap
- CIS 1.4 (FIM) → FedRAMP AU-7 → CMMC AU.3.059
- Single AIDE implementation satisfies all three
- Document compliance mappings clearly
-
Test Framework Scales Well
- Adding 80 new tests was straightforward
- BATS + custom helpers pattern works
- System tests for VM boot require special handling (libvirt)
Action Items for Future Sessions
- Rebuild ISO with new security features
- Run VM boot tests after user logout/login for libvirt
- Verify FDE runtime behavior in VM
- Consider Secure Boot implementation
- Update preseed.cfg to remove hardcoded passwords
Entry 2026-02-17 (Session 1): Project Assessment and Test Coverage Analysis
Context
Comprehensive project review after session handoff. User requested full orientation and 100% test coverage including VM boot tests, Secure Boot, and FDE runtime tests.
Insights
-
Test Infrastructure Pattern
- BATS tests work well for static analysis but lack runtime verification
- Current tests validate file existence and content, not actual behavior
- Missing entire category: system/integration tests that boot the ISO
-
Docker-Only Workflow is Correct
- All build/test commands run inside Docker containers
- Prevents host system pollution
- Makes builds reproducible across environments
- Volumes:
/workspace(read-only),/build(temp),/output(artifacts)
-
Shellcheck Warnings Are Non-Critical
- SC2120/SC2119: Functions don't use arguments but called without
"$@" - SC1091: Source files not available during shellcheck (exist at runtime)
- Pattern: Functions generate config, don't need arguments
- SC2120/SC2119: Functions don't use arguments but called without
Architectural Decision Records (ADRs)
ADR-001: Two-Tier Security Model
Date: 2026-01-28 (documented 2026-02-17) Status: Accepted
Context: How should KNEL-Football OS access tier0 infrastructure?
Decision: KNEL-Football OS is a secure remote terminal, NOT direct tier0 access. Flow: KNEL-Football OS → WireGuard VPN → Privileged Access Workstation → Tier0
Rationale:
- Defense in depth - multiple hops before tier0
- Compromise of laptop doesn't directly expose tier0
- WireGuard provides encrypted tunnel
- Physical workstation adds another security layer
Consequences:
- Network configuration focuses on WireGuard only
- WiFi/Bluetooth permanently disabled
- SSH configured for key-based auth only
ADR-002: Docker-Only Build Environment
Date: 2026-01-28 (documented 2026-02-17) Status: Accepted
Context: How should ISO builds be executed?
Decision: ALL build operations run inside Docker containers. No host modifications.
Rationale:
- Reproducible builds across different host systems
- No pollution of host environment
- Easy cleanup (just remove containers/images)
- CI/CD friendly
Consequences:
run.shwraps all commands withdocker run- ISO build requires
--privilegedfor loop devices - Output artifacts copied via volume mounts
ADR-003: LUKS2 Over LUKS1
Date: 2026-01-28 (documented 2026-02-17) Status: Accepted
Context: Which disk encryption format to use?
Decision: Use LUKS2 with Argon2id KDF, AES-256-XTS cipher, 512-bit key.
Rationale:
- LUKS2 is newer, more secure format
- Argon2id resists GPU/ASIC attacks better than PBKDF2
- AES-XTS is NIST-approved for disk encryption
- 512-bit key provides security margin
Consequences:
- Modern systems only (older grub may not support)
- Boot requires passphrase entry
- No recovery without passphrase
ADR-004: BATS Without External Libraries
Date: 2026-01-28 (documented 2026-02-17) Status: Accepted
Context: BATS test framework libraries were failing to load.
Decision: Remove bats-support, bats-assert, bats-file dependencies.
Use custom assertion functions in tests/test_helper/common.bash.
Rationale:
- External library loading was unreliable
- Custom functions provide same functionality
- Fewer dependencies = fewer failure points
- Easier to debug when tests fail
Consequences:
- Custom assertions must be maintained
- Tests don't benefit from upstream library fixes
- But: simpler, more predictable behavior
Patterns Observed
-
Hook Organization
config/hooks/live/- Runs during live session (before install)config/hooks/installed/- Runs after installation- Pattern: Source shared functions, call main function
-
Script Structure
#!/bin/bash set -euo pipefail # Functions that generate config main() { ... } # Call main if script executed directly -
Test Structure
#!/usr/bin/env bats @test "description" { # Setup # Exercise # Verify }
Lessons Learned
-
test:iso Command Was Broken
run.sh:172references deletedtest-iso.sh- Commit
c1505a9removed obsolete scripts including test-iso.sh - But run.sh was not updated to remove the command
- Lesson: When removing files, search for all references
-
Preseed.cfg Has Hardcoded Passwords
- Lines 28-31 contain default passwords
- These are installer defaults, should be changed on first boot
- Security risk if users don't change them
- Lesson: Consider using installer prompts instead
-
Test Coverage Claim vs Reality
- Documentation claimed 95% coverage
- Reality: 100% static analysis, 0% runtime/VM testing
- Lesson: Be precise about what "coverage" means
Action Items for Future Sessions
- Implement VM boot tests using libvirt
- Add Secure Boot support (shim-signed, grub-efi-amd64-signed)
- Create runtime FDE passphrase prompt tests
- Remove hardcoded passwords from preseed.cfg
- Fix shellcheck warnings (low priority, non-critical)
Entry 2026-01-28: Initial Build Completion
Context
First successful ISO build completed after 72 minutes.
Insights
-
Live-Build Stages
- bootstrap: Downloads base system (longest stage)
- chroot: Installs packages, runs hooks
- binary: Creates ISO filesystem
- checksum: Generates SHA256/MD5
-
Build Time Breakdown
- Total: ~72 minutes
- bootstrap: ~40 minutes (network dependent)
- chroot: ~20 minutes
- binary: ~10 minutes
-
ISO Size
- Final ISO: 450 MB
- Includes: Debian base, IceWM, WireGuard, security tools
- Reasonable size for secure workstation
Patterns
-
Docker Volume Strategy
/workspacemounted read-only (source code)/buildfor intermediate files/outputfor final artifacts- Prevents accidental modification of source
-
Checksum Generation
- Generate both SHA256 and MD5
- Name checksum files after ISO
- Copy to output directory with ISO
End of Journal. Add new entries at the top.