Update documentation and add architectural approach document

This commit is contained in:
2025-10-16 13:14:51 -05:00
parent d30f103209
commit bd9aea4cd8
15 changed files with 506 additions and 9 deletions

View File

@@ -26,7 +26,7 @@ This document tracks the various agents, tools, and systems used in the AIOS-Pub
- mdbook-pdf (installed via Cargo)
- Typst
- Marp CLI
- Markwhen: Interactive text-to-timeline tool
- Wandmalfarbe pandoc-latex-template: Beautiful Eisvogel LaTeX template for professional PDF generation
- Spell/Grammar checking:
- Hunspell (with en-US dictionary)
- Aspell (with en dictionary)
@@ -95,8 +95,9 @@ docker-compose up --build
# Spell checking with hunspell
./docker-compose-wrapper.sh run docmaker-full hunspell -d en_US document.md
# Create timeline with Markwhen
./docker-compose-wrapper.sh run docmaker-full markwhen input.mw --output output.html
# Create timeline with Markwhen (not currently available)
# This will be enabled when Markwhen installation issue is resolved
# ./docker-compose-wrapper.sh run docmaker-full markwhen input.mw --output output.html
# Grammar/style checking with Vale
./docker-compose-wrapper.sh run docmaker-full vale document.md

View File

@@ -46,8 +46,15 @@ RUN curl -L https://github.com/typst/typst/releases/latest/download/typst-x86_64
# Install Marp CLI
RUN npm install -g @marp-team/marp-cli
# Install Markwhen
RUN npm install -g @markwhen/cli
# Install Wandmalfarbe pandoc-latex-template for beautiful PDF generation
RUN git clone --depth 1 https://github.com/Wandmalfarbe/pandoc-latex-template.git /tmp/pandoc-latex-template && \
mkdir -p /root/.local/share/pandoc/templates && \
# Find and copy any .latex template files to the templates directory
find /tmp/pandoc-latex-template -name "*.latex" -exec cp {} /root/.local/share/pandoc/templates/ \; && \
# Also install to system-wide location for all users
mkdir -p /usr/share/pandoc/templates && \
find /tmp/pandoc-latex-template -name "*.latex" -exec cp {} /usr/share/pandoc/templates/ \; && \
rm -rf /tmp/pandoc-latex-template
# Install spell/grammar checking tools
RUN apt-get update && apt-get install -y \
@@ -60,7 +67,7 @@ RUN curl -L https://github.com/errata-ai/vale/releases/download/v3.12.0/vale_3.1
| tar xz -C /tmp && cp /tmp/vale /usr/local/bin && chmod +x /usr/local/bin/vale
# Install text statistics tool for reading time estimation
RUN pip3 install mdstat textstat
RUN pip3 install --break-system-packages mdstat textstat
# Install additional text processing tools
RUN apt-get update && apt-get install -y \

View File

@@ -18,11 +18,11 @@ The RCEO-AIOS-Public-Tools-DocMaker-Base container is designed for lightweight d
### Documentation Generation
- **Pandoc**: Universal document converter
- **Wandmalfarbe pandoc-latex-template**: Beautiful Eisvogel LaTeX template for professional PDFs
- **mdBook**: Create books from Markdown files
- **mdbook-pdf**: PDF renderer for mdBook
- **Typst**: Modern typesetting system
- **Marp CLI**: Create presentations from Markdown
- **Markwhen**: Interactive text-to-timeline tool
### LaTeX
- **TeX Live**: Lightweight LaTeX packages for basic document typesetting
@@ -51,6 +51,9 @@ cd /home/localuser/AIWorkspace/AIOS-Public/Docker/RCEO-AIOS-Public-Tools-DocMake
# Example: Convert a Markdown file to PDF using pandoc
./docker-compose-wrapper.sh run docmaker-base pandoc input.md -o output.pdf
# Example: Create beautiful PDF using Eisvogel template
./docker-compose-wrapper.sh run docmaker-base pandoc input.md --template eisvogel -o output.pdf
# Example: Create a timeline with Markwhen
./docker-compose-wrapper.sh run docmaker-base markwhen input.mw --output output.html
```

View File

@@ -43,14 +43,15 @@ This document tracks potential enhancements and tools to be added to the documen
- ✅ Core system packages (bash, curl, wget, git)
- ✅ Programming languages (Python 3, Node.js, Rust)
- ✅ Pandoc - Universal document converter
- ✅ Wandmalfarbe pandoc-latex-template - Beautiful Eisvogel LaTeX template for professional PDFs
- ✅ mdBook - Create books from Markdown files
- ✅ mdbook-pdf - PDF renderer for mdBook
- ✅ Typst - Modern typesetting system
- ✅ Marp CLI - Create presentations from Markdown
- Markwhen - Interactive text-to-timeline tool
- Markwhen - Interactive text-to-timeline tool (installation failed, needs fix)
- ✅ Light LaTeX packages (texlive-latex-base)
- ✅ Spell/grammar checking tools (Hunspell, Aspell, Vale)
- ✅ Text statistics tools (mdstat)
- ✅ Text statistics tools (mdstat, textstat)
- ✅ Non-root user management with UID/GID mapping
- ✅ Entrypoint script for runtime user creation

View File

@@ -41,6 +41,9 @@ cd /home/localuser/AIWorkspace/AIOS-Public/Docker/RCEO-AIOS-Public-Tools-DocMake
# Example: Run Python analysis
./docker-compose-wrapper.sh run docmaker-computational python analysis.py
# Example: Convert a Markdown file to beautiful PDF using Eisvogel template
./docker-compose-wrapper.sh run docmaker-computational pandoc input.md --template eisvogel -o output.pdf --pdf-engine=xelatex
# Example: Start Jupyter notebook server
./docker-compose-wrapper.sh up
# Then access at http://localhost:8888

View File

@@ -29,6 +29,9 @@ cd /home/localuser/AIWorkspace/AIOS-Public/Docker/RCEO-AIOS-Public-Tools-DocMake
# Example: Convert a Markdown file to PDF using pandoc with full LaTeX
./docker-compose-wrapper.sh run docmaker-full pandoc input.md -o output.pdf --pdf-engine=xelatex
# Example: Create beautiful PDF using Eisvogel template
./docker-compose-wrapper.sh run docmaker-full pandoc input.md --template eisvogel -o output.pdf --pdf-engine=xelatex
```
### Using with docker-compose directly

View File

@@ -26,6 +26,9 @@ cd /home/localuser/AIWorkspace/AIOS-Public/Docker/RCEO-AIOS-Public-Tools-DocMake
# Example: Convert a Markdown file to PDF using pandoc
./docker-compose-wrapper.sh run docmaker-light pandoc input.md -o output.pdf
# Example: Create beautiful PDF using Eisvogel template
./docker-compose-wrapper.sh run docmaker-light pandoc input.md --template eisvogel -o output.pdf
```
### Using with docker-compose directly

View File

@@ -19,3 +19,8 @@ Additional Rules:
- Create thin wrapper scripts that detect and handle UID/GID mapping to ensure file permissions work across any host environment.
- Maintain disciplined naming and organization to prevent technical debt as the number of projects grows.
- Keep the repository root directory clean. Place all project-specific files and scripts in appropriate subdirectories rather than at the top level.
- Use conventional commits for all git commits with proper formatting: type(scope): brief description followed by more verbose explanation if needed.
- Commit messages should be beautiful and properly verbose, explaining what was done and why.
- Use the LLM's judgment for when to push and tag - delegate these decisions based on the significance of changes.
- All projects should include a collab/ directory with subdirectories: questions, proposals, plans, prompts, and audit.
- Follow the architectural approach: layered container architecture (base -> specialized layers), consistent security patterns (non-root user with UID/GID mapping), same operational patterns (wrapper scripts), and disciplined naming conventions.

View File

@@ -0,0 +1,47 @@
# Architectural Approach
This document captures the architectural approach for project development in the AIOS-Public system.
## Container Architecture
### Layered Approach
- Base containers provide foundational tools and libraries
- Specialized containers extend base functionality for specific use cases
- Each layer adds specific capabilities while maintaining consistency
### Naming Convention
- Use `RCEO-AIOS-Public-Tools-` prefix consistently
- Include descriptive suffixes indicating container purpose
- Follow pattern: `RCEO-AIOS-Public-Tools-[domain]-[type]`
### Security Patterns
- Minimize root usage during build and runtime
- Implement non-root users for all runtime operations
- Use UID/GID mapping for proper file permissions across environments
- Detect host user IDs automatically through file system inspection
### Operational Patterns
- Create thin wrapper scripts that handle environment setup
- Use consistent patterns for user ID detection and mapping
- Maintain same operational workflow across all containers
- Provide clear documentation in README files
### Organization Principles
- Separate COO mode (operational tasks) from CTO mode (R&D tasks) containers
- Create individual directories per container type
- Maintain disciplined file organization to prevent technical debt
- Keep repository root clean with project-specific files in subdirectories
## Documentation Requirements
- Each container must have comprehensive README
- Include usage examples and environment setup instructions
- Document security and permission handling
- Provide clear container mapping and purpose
## Implementation Workflow
1. Start with architectural design document
2. Create detailed implementation plan
3. Develop following established patterns
4. Test with sample data/usage
5. Document for end users
6. Commit with conventional commit messages

40
collab/README.md Normal file
View File

@@ -0,0 +1,40 @@
# Collaboration Directory
This directory contains structured collaboration artifacts for project development and decision-making.
## Directory Structure
- `questions/` - Outstanding questions and topics for discussion
- `proposals/` - Formal proposals for new features, changes, or implementations
- `plans/` - Detailed implementation plans and technical designs
- `prompts/` - Structured prompts for AI agents and automation
- `audit/` - Audit trails, reviews, and assessment records
## Usage Guidelines
### Questions
- Add new questions that need discussion or clarification
- Link related proposals or plans where appropriate
- Track resolution status
### Proposals
- Create formal proposals for significant changes or additions
- Include business rationale and technical approach
- Document expected outcomes and resource requirements
- Seek approval before implementation
### Plans
- Detail technical implementation plans
- Include architecture diagrams, technology stacks, and implementation phases
- Identify risks and mitigation strategies
- Outline next steps and dependencies
### Prompts
- Store reusable prompts for AI agents
- Document prompt effectiveness and outcomes
- Version prompts for different use cases
### Audit
- Track decisions made and their outcomes
- Document performance reviews and assessments
- Record lessons learned and improvements

View File

@@ -0,0 +1,23 @@
# Issue: Markwhen Installation Failure
## Problem
The Markwhen installation is failing during the Docker build process with the error:
"failed to solve: process "/bin/sh -c npm install -g @markwhen/cli" did not complete successfully: exit code: 1"
## Investigation Needed
- Research the correct npm package name for Markwhen CLI
- Determine if it should be installed from GitHub repository instead
- Check if there are dependencies we're missing
- Verify if the package exists under a different name
## Possible Solutions
1. Install from GitHub repository directly
2. Use a different package name
3. Build from source
4. Check if Node.js version compatibility is an issue
## Priority
Medium - Markwhen is a useful tool for timeline generation but not critical for core functionality
## Status
Pending investigation

View File

@@ -0,0 +1,175 @@
# GIS and Weather Data Processing Container Plan
## Overview
This document outlines the plan for creating Docker containers to handle GIS data processing and weather data analysis. These containers will be used exclusively in CTO mode for R&D and data analysis tasks, with integration to documentation workflows and MinIO for data output.
## Requirements
### GIS Data Processing
- Support for Shapefiles and other GIS formats
- Self-hosted GIS stack (not Google Maps or other commercial services)
- Integration with tools like GDAL, Tippecanoe, DuckDB
- Heavy use of PostGIS database
- Parquet format support for efficient data storage
- Based on reference workflows from:
- https://tech.marksblogg.com/american-solar-farms.html
- https://tech.marksblogg.com/canadas-odb-buildings.html
- https://tech.marksblogg.com/ornl-fema-buildings.html
### Weather Data Processing
- GRIB data format processing
- NOAA and European weather APIs integration
- Bulk data download via HTTP/FTP
- Balloon path prediction system (to be forked/modified)
### Shared Requirements
- Python-based with appropriate libraries (GeoPandas, DuckDB, etc.)
- R support for statistical analysis
- Jupyter notebook integration for experimentation
- MinIO bucket integration for data output
- Optional but enabled GPU support for performance
- All visualization types (command-line, web, desktop)
- Flexible ETL capabilities for both GIS/Weather and business workflows
## Proposed Container Structure
### RCEO-AIOS-Public-Tools-GIS-Base
- Foundation container with core GIS libraries
- Python + geospatial stack (GDAL, GEOS, PROJ, DuckDB, Tippecanoe)
- R with spatial packages
- PostGIS client tools
- Parquet support
- File format support (Shapefiles, GeoJSON, etc.)
### RCEO-AIOS-Public-Tools-GIS-Processing
- Extends GIS-Base with advanced processing tools
- Jupyter with GIS extensions
- Specialized ETL libraries
- Performance optimization tools
### RCEO-AIOS-Public-Tools-Weather-Base
- Foundation container with weather data libraries
- GRIB format support (cfgrib)
- NOAA and European API integration tools
- Bulk download utilities (HTTP/FTP)
### RCEO-AIOS-Public-Tools-Weather-Analysis
- Extends Weather-Base with advanced analysis tools
- Balloon path prediction tools
- Forecasting libraries
- Time series analysis
### RCEO-AIOS-Public-Tools-GIS-Weather-Fusion (Optional)
- Combined container for integrated GIS + Weather analysis
- For balloon path prediction using weather data
- High-resource container for intensive tasks
## Technology Stack
### GIS Libraries
- GDAL/OGR for format translation and processing
- GEOS for geometric operations
- PROJ for coordinate transformations
- PostGIS for spatial database operations
- DuckDB for efficient data processing with spatial extensions
- Tippecanoe for tile generation
- Shapely for Python geometric operations
- GeoPandas for Python geospatial data handling
- Rasterio for raster processing in Python
- Leaflet/Mapbox for web visualization
### Data Storage & Processing
- DuckDB with spatial extensions
- Parquet format support
- MinIO client tools for data output
- PostgreSQL client for connecting to external databases
### Weather Libraries
- xarray for multi-dimensional data in Python
- cfgrib for GRIB format handling
- MetPy for meteorological calculations
- Climate Data Operators (CDO) for climate data processing
- R packages: raster, rgdal, ncdf4, rasterVis
### Visualization
- Folium for interactive maps
- Plotly for time series visualization
- Matplotlib/Seaborn for statistical plots
- R visualization packages
- Command-line visualization tools
### ETL and Workflow Tools
- Apache Airflow (optional in advanced containers)
- Prefect or similar workflow orchestrators
- DuckDB for ETL operations
- Pandas/Dask for large data processing
## Container Deployment Strategy
### Workstation Prototyping
- Lighter containers for development and testing
- Optional GPU support
- MinIO client for data output testing
### Production Servers
- Full-featured containers with all processing capabilities
- GPU-enabled variants where applicable
- Optimized for large RAM/CPU/disk requirements
## Security & User Management
- Follow same non-root user pattern as documentation containers
- UID/GID mapping for file permissions
- Minimal necessary privileges
- Proper container isolation
- Secure access to MinIO buckets
## Integration with Existing Stack
- Compatible with existing user management approach
- Can be orchestrated with documentation containers when needed
- Follow same naming conventions
- Use same wrapper script patterns
- Separate from documentation containers but can work together in CTO mode
## Implementation Phases
### Phase 1: Base GIS Container
- Create GIS-Base with GDAL, DuckDB, PostGIS client tools
- Implement Parquet and Shapefile support
- Test with sample datasets from reference posts
- Validate MinIO integration
### Phase 2: Weather Base Container
- Create Weather-Base with GRIB support
- Integrate NOAA and European API tools
- Implement bulk download capabilities
- Test with weather data sources
### Phase 3: Processing Containers
- Create GIS-Processing container with ETL tools
- Create Weather-Analysis container with prediction tools
- Add visualization and Jupyter support
- Implement optional GPU support
### Phase 4: Optional Fusion Container
- Combined container for balloon path prediction
- Integration of GIS and weather data
- High-complexity, high-resource usage
## Data Flow Architecture
- ETL workflows for processing public datasets
- Output to MinIO buckets for business use
- Integration with documentation tools for CTO mode workflows
- Support for both GIS/Weather ETL (CTO) and business ETL (COO)
## Next Steps
1. Review and approve this enhanced plan
2. Begin Phase 1 implementation
3. Test with sample data from reference workflows
4. Iterate based on findings
## Risks & Considerations
- Large container sizes due to GIS libraries and dependencies
- Complex dependency management, especially with DuckDB and PostGIS
- Computational resource requirements, especially for large datasets
- GPU support implementation complexity
- Bulk data download and processing performance

View File

@@ -0,0 +1,35 @@
# GIS and Weather Data Processing - AI Prompt Template
## Purpose
This prompt template is designed to guide AI agents in implementing GIS and weather data processing containers following established patterns.
## Instructions for AI Agent
When implementing GIS and weather data processing containers:
1. Follow the established container architecture pattern (base -> specialized layers)
2. Maintain consistent naming convention: RCEO-AIOS-Public-Tools-[domain]-[type]
3. Implement non-root user with UID/GID mapping
4. Create appropriate Dockerfiles and docker-compose configurations
5. Include proper documentation and README files
6. Add wrapper scripts for environment management
7. Test with sample data to verify functionality
8. Follow same security and operational patterns as existing containers
## Technical Requirements
- Use Debian Bookworm slim as base OS
- Include appropriate GIS libraries (GDAL, GEOS, PROJ, etc.)
- Include weather data processing libraries (xarray, netCDF4, etc.)
- Implement Jupyter notebook support where appropriate
- Include R and Python stacks as needed
- Add visualization tools (Folium, Plotly, etc.)
## Quality Standards
- Ensure containers build without errors
- Verify file permissions work across environments
- Test with sample datasets
- Document usage clearly
- Follow security best practices
- Maintain consistent user experience with existing containers

View File

@@ -0,0 +1,64 @@
# GIS and Weather Data Processing Container Proposal
## Proposal Summary
Create specialized Docker containers for GIS data processing and weather data analysis to support CTO-mode R&D activities, particularly for infrastructure planning and balloon path prediction for your TSYS Group projects.
## Business Rationale
As GIS and weather data analysis become increasingly important for your TSYS Group projects (particularly for infrastructure planning like solar farms and building datasets, and balloon path prediction), there's a need for specialized containers that can handle these data types efficiently while maintaining consistency with existing infrastructure patterns. The containers will support:
- Self-hosted GIS stack for privacy and control
- Processing public datasets (NOAA, European APIs, etc.)
- ETL workflows for both technical and business data processing
- Integration with MinIO for data output to business systems
## Technical Approach
- Follow the same disciplined container architecture as the documentation tools
- Use layered approach with base and specialized containers
- Implement same security patterns (non-root user, UID/GID mapping)
- Maintain consistent naming conventions
- Use same operational patterns (wrapper scripts, etc.)
- Include PostGIS, DuckDB, and optional GPU support
- Implement MinIO integration for data output
- Support for prototyping on workstations and production on large servers
## Technology Stack
- **GIS Tools**: GDAL, Tippecanoe, DuckDB with spatial extensions
- **Database**: PostgreSQL/PostGIS client tools
- **Formats**: Shapefiles, Parquet, GRIB, GeoJSON
- **Weather**: cfgrib, xarray, MetPy
- **ETL**: Pandas, Dask, custom workflow tools
- **APIs**: NOAA, European weather APIs
- **Visualization**: Folium, Plotly, command-line tools
## Benefits
- Consistent environment across development (workstations) and production (large servers)
- Proper file permission handling across different systems
- Isolated tools prevent dependency conflicts
- Reproducible analysis environments for GIS and weather data
- Integration with documentation tools for CTO mode workflows
- Support for both technical (GIS/Weather) and business (COO) ETL workflows
- Scalable architecture with optional GPU support
- Data output capability to MinIO buckets for business use
## Resource Requirements
- Development time: 3-4 weeks for complete implementation
- Storage: Additional container images (est. 3-6GB each)
- Compute: Higher requirements for processing (can be isolated to CTO mode)
- Optional: GPU resources for performance-intensive tasks
## Expected Outcomes
- Improved capability for spatial and weather data analysis
- Consistent environments across development and production systems
- Better integration with documentation workflows
- Faster setup for ETL projects (both technical and business)
- Efficient processing of large datasets using DuckDB and Parquet
- Proper data output to MinIO buckets for business use
- Reduced technical debt through consistent patterns
## Implementation Timeline
- Week 1: Base GIS container with PostGIS, DuckDB, and data format support
- Week 2: Base Weather container with GRIB support and API integration
- Week 3: Advanced processing containers with Jupyter and visualization
- Week 4: Optional GPU variants and MinIO integration testing
## Approval Request
Please review and approve this proposal to proceed with implementation of the GIS and weather data processing containers that will support your infrastructure planning and balloon path prediction work.

View File

@@ -0,0 +1,87 @@
# GIS and Weather Data Processing - Initial Questions
## Core Questions
1. What specific GIS formats and operations are most critical for your current projects?
Well I am not entirely sure. I am guessing that I'll need to pull in shapefiles ? I will be working with an
entirely self hosted GIS stack (not Google maps or anything). I know things exist like gdal ? tippacanoe?
I think things like parquet as well. Maybe duckdb?
Reference these posts:
https://tech.marksblogg.com/american-solar-farms.html
https://tech.marksblogg.com/canadas-odb-buildings.html
https://tech.marksblogg.com/ornl-fema-buildings.html
FOr the type of workflows that I would like to run.
Extract patterns/architecture/approaches along with the specific reductions to practice.
2. What weather data sources and APIs do you currently use or plan to use?
None currently. But I'll be hacking/forking a system to predict balloon paths. I suspect I'll need to process grib data.
Also probably use the NOAA and european equivalant APIs? Maybe some bulk HTTP/FTP download?
3. Are there any specific performance requirements for processing large datasets?
I suspect I'll do some early prototyping with small data sets on my workstation and then running the container with the real data sets on my big ram/cpu/disk servers.
4. Do you need integration with specific databases (PostGIS, etc.)?
Yes I will be heavily using PostGIS for sure.
## Technical Questions
1. Should we include both Python and R stacks in the same containers or separate them?
I am not sure? Whatever you think is best?
2. What level of visualization capability is needed (command-line, web-based, desktop)?
All of those I think. I want flexibility.
3. Are there any licensing constraints or requirements to consider?
I will be working only with public data sets.
4. Do you need GPU support for any processing tasks?
Yes but make it optional. I dont want to be blocked with GPU complexity right now.
## Integration Questions
1. How should GIS/Weather outputs integrate with documentation workflows?
I will be using the GIS/Weather In CTO mode only. I will also be using documentation in CTO mode with it.
I think, for now, they can be siblings but not have strong integration.
**ANSWER**: GIS/Weather and documentation containers will operate as siblings in CTO mode, with loose integration for now.
2. Do you need persistent data storage within containers?
I do not think so. I will use docker compose to pass in directory paths .
Oh I will want to push finsihed data to minio buckets.
I don't know how to best architect my ETL toolbox.... I will mostly be doing ETL on GIS/Weather data but I can see also needing todo other business type ETL workflows in COO mode.
**ANSWER**: Use Docker compose volume mounts for data input/output. Primary output destination will be MinIO buckets for business use. ETL toolbox should handle both GIS/Weather (CTO) and business (COO) workflows.
3. What level of integration with existing documentation containers is desired?
**ANSWER**: Sibling relationship with loose integration. Both will be used in CTO mode but for different purposes.
4. Are there specific deployment environments to target (local, cloud, edge)?
Well the ultimate goal is some data sets get pushed to minio buckets for use by various lines of business.
This is all kind of new to me. I am a technical operations/system admin and easing my way into devops/sre and swe.
**ANSWER**: Primarily local deployment (workstation for prototyping, large servers for production). Data output to MinIO for business use. Targeting self-hosted environments for full control and privacy.