Files

Charles N Wyble 747d58e6ec docs: add permanent lessons to JOURNAL.md to prevent recurring mistakes

Added explicit "PERMANENT LESSONS FOR FUTURE SESSIONS" section documenting:
1. Always update callers when modifying source functions
2. Verify documentation matches code reality
3. Cross-reference before committing

These patterns have caused bugs multiple times.

💘 Generated with Crush

Assisted-by: GLM-4.7 via Crush <crush@charm.land>

2026-02-19 09:19:22 -05:00

20 KiB

Raw Blame History

KNEL-Football Development Journal

Important

: This file is APPEND-ONLY. Never delete or modify existing entries. Add new entries at the TOP (after this header) with date and context. This serves as long-term memory for AI agents and human developers.

Entry 2026-02-19 (Session 5): Critical Bug Fixes

Context

Resumed session after context overflow. Deep orientation revealed critical bugs in security-hardening.sh hook that were blocking FIM and SSH client configuration.

Changes Implemented

Bug Fix: Function Name Mismatch
- config/hooks/live/security-hardening.sh:19 called configure_ssh
- But src/security-hardening.sh defines configure_ssh_client
- Fixed: Changed hook to call configure_ssh_client
Bug Fix: Missing FIM Call
- configure_fim function existed in src/security-hardening.sh
- But hook was never calling it
- Fixed: Added configure_fim call to hook

Root Cause Analysis

Commit 0807611 "feat: add FIM, comprehensive audit logging, SSH client-only" added functions to src/security-hardening.sh but the corresponding hook was either:

Not updated to call new functions (configure_fim)
Calling wrong function name (configure_ssh vs configure_ssh_client)

This is a common pattern in codebase consolidation: when adding features to source files, remember to update ALL callers (hooks, scripts, tests).

Lessons Learned

Cross-Reference Source and Callers
- When adding functions, search for ALL callers
- grep -r function_name config/ to find hooks
- Test execution paths, not just function existence
Documentation vs Reality Gap
- JOURNAL.md said "FIM ADDED" but hook never called it
- STATUS.md said "SSH client-only CONFIGURED" but wrong function name
- Lesson: Verify code execution, not just code presence

Verification

./run.sh lint    # ✅ Zero warnings
./run.sh test    # ✅ 92 pass, 19 skip (VM tests)

Action Items

Rebuild ISO with bug fixes (in progress)
Update STATUS.md with accurate state
Consider adding hook validation tests

⚠️ PERMANENT LESSONS FOR FUTURE SESSIONS

These mistakes have happened multiple times. DO NOT repeat them.

When Adding/Modifying Functions: ALWAYS Update All Callers
- Pattern: Function added to src/*.sh but hook in config/hooks/ not updated
- Prevention: After editing src/security-hardening.sh, immediately run:
```
grep -r "configure_ssh\|configure_fim\|configure_audit" config/hooks/
```
- Test: Run ./run.sh test before committing - don't just assume it works
Documentation Claims Must Match Code Reality
- Pattern: JOURNAL says "ADDED" but hook never calls the function
- Prevention: After implementing a feature, verify execution path:
```
# For each new function in src/:
# 1. Find where it should be called
# 2. Add the call
# 3. Test that it runs
```
- Never trust docs without code verification
Cross-Reference Before Committing
- This project has: src/*.sh → config/hooks/**/*.sh → executed during build
- Any change to source files requires checking ALL downstream callers
- Use grep -r "function_name" . liberally

Entry 2026-02-17 (Session 4): Script Consolidation

Context

Continued session focused on consolidating all top-level scripts into run.sh as the single entry point. Merged test-iso.sh (344 lines) and monitor-build.sh (43 lines) into run.sh.

Changes Implemented

Script Consolidation
- Merged test-iso.sh VM testing framework into run.sh
- Merged monitor-build.sh build monitoring into run.sh
- Deleted test-iso.sh and monitor-build.sh
- run.sh now ~500+ lines, single entry point for all operations

New run.sh Commands

./run.sh monitor [secs]          # Monitor build progress
./run.sh test:iso check          # Check VM testing prerequisites
./run.sh test:iso create         # Create and start test VM
./run.sh test:iso console        # Connect to VM console
./run.sh test:iso status         # Show VM status
./run.sh test:iso destroy        # Destroy VM and cleanup
./run.sh test:iso boot-test      # Run automated boot test
./run.sh test:iso secure-boot    # Test Secure Boot
./run.sh test:iso fde-test       # Test FDE passphrase prompt

Test Updates
- Updated tests/system/boot_test.bats to test run.sh instead of test-iso.sh
- Updated skip messages in fde_test.bats and secureboot_test.bats
ISO Rebuild
- Built successfully at 15:19 CST (449 MB)
- Checksums verified (SHA256, MD5)

Architectural Decision Records

ADR-009: Single Entry Point (run.sh)

Date: 2026-02-17 Status: Accepted

Context: Multiple top-level scripts (run.sh, test-iso.sh, monitor-build.sh) caused fragmentation and made the project harder to navigate.

Decision: Consolidate all scripts into run.sh as the single entry point.

Rationale:

Simpler user experience - one command to remember
Consistent interface for all operations
Easier to maintain and test
Follows Unix philosophy of doing one thing well

Consequences:

run.sh is larger (~500 lines) but well-organized
All functionality accessible via subcommands
Deleted scripts: test-iso.sh, monitor-build.sh

Lessons Learned

VM Testing Requires libvirt Group
- virt-install fails if user not in libvirt group
- QEMU fallback works but virt-install preferred for libvirt integration
- Fix: sudo usermod -aG libvirt $USER then logout/login
Test Updates Required After Script Moves
- When moving/deleting scripts, grep for all references
- Tests in tests/system/ referenced test-iso.sh directly
- Updated to use run.sh test:iso commands

Files Changed

File	Action
run.sh	Merged test-iso.sh and monitor-build.sh
test-iso.sh	DELETED
monitor-build.sh	DELETED
tests/system/boot_test.bats	Updated to test run.sh
tests/system/fde_test.bats	Updated skip message
tests/system/secureboot_test.bats	Updated skip message
STATUS.md	Updated status to COMPLETE
JOURNAL.md	This entry

Commit

d9f2f02 refactor: consolidate test-iso.sh and monitor-build.sh into run.sh

Entry 2026-02-17 (Session 3): Project Re-Orientation

Context

New session start. User requested deep project review and orientation. Reviewed git logs, STATUS.md, JOURNAL.md, and current system state.

Current State Assessment

ISO Status: STALE
- Built: 2026-02-17 10:50
- 6 commits since build (FIM, audit, SSH client-only, shellcheck fixes)
- Missing features: AIDE FIM, comprehensive auditd, SSH client-only
- Rebuild required to include recent security features
Test Suite: HEALTHY
- 111 tests total, 92 pass, 19 skip (VM-required)
- Skip reasons: VM not running, requires manual verification
- Categories: unit (12), integration (6), security (44), system (47)
- Zero failures, zero shellcheck warnings
Compliance: IN PROGRESS
- CIS 1.4 (FIM): Code ready, not in ISO
- CIS 5.2 (SSH): Code ready, not in ISO
- CIS 6.2 (Audit): Code ready, not in ISO
- NIST/FedRAMP/CMMC: Same status - config ready, needs rebuild
Blockers:
- User NOT in libvirt group (blocks VM testing)
- ISO outdated (blocks runtime verification)

Architecture Review

KNEL-Football OS (this project)
    │ WireGuard (outbound only)
    ▼
Privileged Access Workstation
    │ Direct access
    ▼
Tier0 Infrastructure

Key design principle: No inbound services. SSH client, RDP client, WireGuard client only.

Security Features Implemented (Code)

Feature	File	Status
Full Disk Encryption	config/hooks/installed/encryption-*.sh	✅ Code ready
Password Policy	src/security-hardening.sh	✅ Code ready
Firewall (nftables)	config/hooks/live/firewall-setup.sh	✅ Code ready
FIM (AIDE)	config/hooks/live/aide-setup.sh	✅ Code ready
Audit Logging	config/hooks/live/audit-logging.sh	✅ Code ready
SSH Client-Only	config/hooks/live/ssh-client-only.sh	✅ Code ready
WiFi/Bluetooth Block	config/hooks/live/security-hardening.sh	✅ Code ready

Key Files to Understand

run.sh - Main entry point for all operations
AGENTS.md - Agent behavior guidelines (READ FIRST)
STATUS.md - Manager status report
JOURNAL.md - This file - AI memory
PRD.md - Product requirements
config/preseed.cfg - Debian installer configuration
config/hooks/live/ - Runtime configuration hooks
tests/ - BATS test suite

Open Action Items (from STATUS.md)

Rebuild ISO with new security features
Logout/login for libvirt access (user action)
Run VM boot tests after ISO rebuild
Remove hardcoded passwords from preseed.cfg
Consider Secure Boot implementation

Session Decision

Next step: Rebuild ISO to include FIM, audit logging, SSH client-only changes. This is a 60-90 minute build. User should decide if they want to start it now.

ADR-008: ISO Rebuild Priority

Date: 2026-02-17 Status: Proposed

Context: 6 commits with security features made since last ISO build. Need to decide whether to rebuild now or continue development.

Options:

Rebuild now - validates features, enables runtime testing
Continue development - batch more changes, rebuild later

Recommendation: Rebuild now. Features are ready, compliance requires verification.

Entry 2026-02-17 (Session 2): FIM, Audit, SSH Security Enhancements

Context

Continued session focused on closing compliance gaps for CIS, FedRAMP, and CMMC. Added File Integrity Monitoring (FIM), comprehensive audit logging, and SSH client-only configuration. Resolved all shellcheck warnings and added git safety documentation.

Changes Implemented

File Integrity Monitoring (AIDE)
- Added config/hooks/live/aide-setup.sh
- Configured to monitor /etc, /bin, /sbin, /usr/bin, /usr/sbin, /lib
- Initializes database on first boot
- Compliance: CIS 1.4, FedRAMP AU-7, CMMC AU.3.059
Comprehensive Audit Logging
- Added config/hooks/live/audit-logging.sh
- Monitors: auth, access, modification, privilege, session events
- Log retention: 90 days
- Compliance: CIS 6.2, FedRAMP AU-2, CMMC AU.2.042
SSH Client-Only Configuration
- Modified config/hooks/live/ssh-client-only.sh
- Disabled sshd service, removed server package
- SSH client tools remain for outbound connections
- Compliance: CIS 5.2, NIST 800-53 IA-5, CMMC IA.2.078
Shellcheck Fixes
- Resolved all warnings in shell scripts
- SC2120/SC2119: Functions called without arguments (correct behavior)
- SC1091: Source files exist at runtime
- SC2034: Variables used in templates
- Result: ZERO shellcheck warnings
Git Safety Rules
- Added to AGENTS.md:
  - Quote all path arguments (handles spaces)
  - Use non-interactive rebase (git rebase --no-interactive not available, use -i with care)
  - Destructive operations require user confirmation

Test Coverage Update

Before Session: 31 tests
After Session:  111 tests (+80)

Unit Tests:        12 → 12 (unchanged)
Integration Tests:  6 →  6 (unchanged)
Security Tests:    13 → 44 (+31)
System Tests:       0 → 47 (+47, new category)

Architectural Decision Records

ADR-005: File Integrity Monitoring via AIDE

Date: 2026-02-17 Status: Accepted

Context: Need file integrity monitoring for compliance (CIS 1.4, FedRAMP AU-7).

Decision: Use AIDE (Advanced Intrusion Detection Environment) with focused monitoring of critical system directories.

Rationale:

AIDE is mature, well-supported on Debian
Lightweight compared to commercial alternatives
Meets multiple compliance requirements
Database can be rebuilt if needed

Consequences:

Initial database creation on first boot (minor delay)
Regular checks recommended via cron
False positives if system packages updated legitimately

ADR-006: Comprehensive Audit via auditd

Date: 2026-02-17 Status: Accepted

Context: Need comprehensive audit logging for CIS 6.2, FedRAMP AU-2.

Decision: Use auditd with rules for all major event categories.

Rationale:

auditd is the Linux standard for audit logging
Kernel-level monitoring (cannot be bypassed by userspace)
Structured logs for analysis
Meets multiple compliance requirements

Consequences:

Increased log volume (manageable with rotation)
Performance impact minimal on workstation workloads
Log retention policy required (90 days set)

ADR-007: SSH Client-Only Mode

Date: 2026-02-17 Status: Accepted

Context: KNEL-Football should have no inbound services.

Decision: Remove SSH server, keep only client tools.

Rationale:

Reduces attack surface significantly
Aligns with "outbound only" security model
User can SSH out to other systems as needed
No management via SSH (physical console only)

Consequences:

No remote administration via SSH
Must use physical console for management
WireGuard outbound only, no inbound connections

Lessons Learned

Shellcheck Warnings Can Be Misleading
- SC2120/SC2119 warnings were false positives
- Functions intentionally don't use arguments (generate static config)
- Used # shellcheck disable sparingly, documented why
Compliance Requirements Overlap
- CIS 1.4 (FIM) → FedRAMP AU-7 → CMMC AU.3.059
- Single AIDE implementation satisfies all three
- Document compliance mappings clearly
Test Framework Scales Well
- Adding 80 new tests was straightforward
- BATS + custom helpers pattern works
- System tests for VM boot require special handling (libvirt)

Action Items for Future Sessions

Rebuild ISO with new security features
Run VM boot tests after user logout/login for libvirt
Verify FDE runtime behavior in VM
Consider Secure Boot implementation
Update preseed.cfg to remove hardcoded passwords

Entry 2026-02-17 (Session 1): Project Assessment and Test Coverage Analysis

Context

Comprehensive project review after session handoff. User requested full orientation and 100% test coverage including VM boot tests, Secure Boot, and FDE runtime tests.

Insights

Test Infrastructure Pattern
- BATS tests work well for static analysis but lack runtime verification
- Current tests validate file existence and content, not actual behavior
- Missing entire category: system/integration tests that boot the ISO
Docker-Only Workflow is Correct
- All build/test commands run inside Docker containers
- Prevents host system pollution
- Makes builds reproducible across environments
- Volumes: /workspace (read-only), /build (temp), /output (artifacts)
Shellcheck Warnings Are Non-Critical
- SC2120/SC2119: Functions don't use arguments but called without "$@"
- SC1091: Source files not available during shellcheck (exist at runtime)
- Pattern: Functions generate config, don't need arguments

Architectural Decision Records (ADRs)

ADR-001: Two-Tier Security Model

Date: 2026-01-28 (documented 2026-02-17) Status: Accepted

Context: How should KNEL-Football OS access tier0 infrastructure?

Decision: KNEL-Football OS is a secure remote terminal, NOT direct tier0 access. Flow: KNEL-Football OS → WireGuard VPN → Privileged Access Workstation → Tier0

Rationale:

Defense in depth - multiple hops before tier0
Compromise of laptop doesn't directly expose tier0
WireGuard provides encrypted tunnel
Physical workstation adds another security layer

Consequences:

Network configuration focuses on WireGuard only
WiFi/Bluetooth permanently disabled
SSH configured for key-based auth only

ADR-002: Docker-Only Build Environment

Date: 2026-01-28 (documented 2026-02-17) Status: Accepted

Context: How should ISO builds be executed?

Decision: ALL build operations run inside Docker containers. No host modifications.

Rationale:

Reproducible builds across different host systems
No pollution of host environment
Easy cleanup (just remove containers/images)
CI/CD friendly

Consequences:

run.sh wraps all commands with docker run
ISO build requires --privileged for loop devices
Output artifacts copied via volume mounts

ADR-003: LUKS2 Over LUKS1

Date: 2026-01-28 (documented 2026-02-17) Status: Accepted

Context: Which disk encryption format to use?

Decision: Use LUKS2 with Argon2id KDF, AES-256-XTS cipher, 512-bit key.

Rationale:

LUKS2 is newer, more secure format
Argon2id resists GPU/ASIC attacks better than PBKDF2
AES-XTS is NIST-approved for disk encryption
512-bit key provides security margin

Consequences:

Modern systems only (older grub may not support)
Boot requires passphrase entry
No recovery without passphrase

ADR-004: BATS Without External Libraries

Date: 2026-01-28 (documented 2026-02-17) Status: Accepted

Context: BATS test framework libraries were failing to load.

Decision: Remove bats-support, bats-assert, bats-file dependencies. Use custom assertion functions in tests/test_helper/common.bash.

Rationale:

External library loading was unreliable
Custom functions provide same functionality
Fewer dependencies = fewer failure points
Easier to debug when tests fail

Consequences:

Custom assertions must be maintained
Tests don't benefit from upstream library fixes
But: simpler, more predictable behavior

Patterns Observed

Hook Organization
- config/hooks/live/ - Runs during live session (before install)
- config/hooks/installed/ - Runs after installation
- Pattern: Source shared functions, call main function

Script Structure

#!/bin/bash
set -euo pipefail
# Functions that generate config
main() { ... }
# Call main if script executed directly

Test Structure

#!/usr/bin/env bats
@test "description" {
    # Setup
    # Exercise
    # Verify
}

Lessons Learned

test:iso Command Was Broken
- run.sh:172 references deleted test-iso.sh
- Commit c1505a9 removed obsolete scripts including test-iso.sh
- But run.sh was not updated to remove the command
- Lesson: When removing files, search for all references
Preseed.cfg Has Hardcoded Passwords
- Lines 28-31 contain default passwords
- These are installer defaults, should be changed on first boot
- Security risk if users don't change them
- Lesson: Consider using installer prompts instead
Test Coverage Claim vs Reality
- Documentation claimed 95% coverage
- Reality: 100% static analysis, 0% runtime/VM testing
- Lesson: Be precise about what "coverage" means

Action Items for Future Sessions

Implement VM boot tests using libvirt
Add Secure Boot support (shim-signed, grub-efi-amd64-signed)
Create runtime FDE passphrase prompt tests
Remove hardcoded passwords from preseed.cfg
Fix shellcheck warnings (low priority, non-critical)

Entry 2026-01-28: Initial Build Completion

Context

First successful ISO build completed after 72 minutes.

Insights

Live-Build Stages
- bootstrap: Downloads base system (longest stage)
- chroot: Installs packages, runs hooks
- binary: Creates ISO filesystem
- checksum: Generates SHA256/MD5
Build Time Breakdown
- Total: ~72 minutes
- bootstrap: ~40 minutes (network dependent)
- chroot: ~20 minutes
- binary: ~10 minutes
ISO Size
- Final ISO: 450 MB
- Includes: Debian base, IceWM, WireGuard, security tools
- Reasonable size for secure workstation

Patterns

Docker Volume Strategy
- /workspace mounted read-only (source code)
- /build for intermediate files
- /output for final artifacts
- Prevents accidental modification of source
Checksum Generation
- Generate both SHA256 and MD5
- Name checksum files after ISO
- Copy to output directory with ISO

End of Journal. Add new entries at the top.

20 KiB Raw Blame History

KNEL-Football Development Journal

Entry 2026-02-19 (Session 5): Critical Bug Fixes

Context

Changes Implemented

Root Cause Analysis

Lessons Learned

Verification

Action Items

⚠️ PERMANENT LESSONS FOR FUTURE SESSIONS

Entry 2026-02-17 (Session 4): Script Consolidation

Context

Changes Implemented

Architectural Decision Records

ADR-009: Single Entry Point (run.sh)

Lessons Learned

Files Changed

Commit

Entry 2026-02-17 (Session 3): Project Re-Orientation

Context

Current State Assessment

Architecture Review

Security Features Implemented (Code)

Key Files to Understand

Open Action Items (from STATUS.md)

Session Decision

ADR-008: ISO Rebuild Priority

Entry 2026-02-17 (Session 2): FIM, Audit, SSH Security Enhancements

Context

Changes Implemented

Test Coverage Update

Architectural Decision Records

ADR-005: File Integrity Monitoring via AIDE

ADR-006: Comprehensive Audit via auditd

ADR-007: SSH Client-Only Mode

Lessons Learned

Action Items for Future Sessions

Entry 2026-02-17 (Session 1): Project Assessment and Test Coverage Analysis

Context

Insights

Architectural Decision Records (ADRs)

ADR-001: Two-Tier Security Model

ADR-002: Docker-Only Build Environment

ADR-003: LUKS2 Over LUKS1

ADR-004: BATS Without External Libraries

Patterns Observed

Lessons Learned

Action Items for Future Sessions

Entry 2026-01-28: Initial Build Completion

Context

Insights

Patterns

20 KiB

Raw Blame History