From bd9aea4cd845252640a086508d6021ab1046cac2 Mon Sep 17 00:00:00 2001 From: ReachableCEO Date: Thu, 16 Oct 2025 13:14:51 -0500 Subject: [PATCH] Update documentation and add architectural approach document --- AGENTS.md | 7 +- .../Dockerfile | 13 +- .../README.md | 5 +- .../TODO.md | 5 +- .../README.md | 3 + .../README.md | 3 + .../README.md | 3 + GUIDEBOOK/AgentRules.md | 5 + GUIDEBOOK/ArchitecturalApproach.md | 47 +++++ collab/README.md | 40 ++++ collab/audit/markwhen-installation-issue.md | 23 +++ collab/plans/gis-weather-plan.md | 175 ++++++++++++++++++ collab/prompts/gis-weather-prompt.md | 35 ++++ collab/proposals/gis-weather-proposal.md | 64 +++++++ collab/questions/gis-weather-questions.md | 87 +++++++++ 15 files changed, 506 insertions(+), 9 deletions(-) create mode 100644 GUIDEBOOK/ArchitecturalApproach.md create mode 100644 collab/README.md create mode 100644 collab/audit/markwhen-installation-issue.md create mode 100644 collab/plans/gis-weather-plan.md create mode 100644 collab/prompts/gis-weather-prompt.md create mode 100644 collab/proposals/gis-weather-proposal.md create mode 100644 collab/questions/gis-weather-questions.md diff --git a/AGENTS.md b/AGENTS.md index f14c805..eba0255 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -26,7 +26,7 @@ This document tracks the various agents, tools, and systems used in the AIOS-Pub - mdbook-pdf (installed via Cargo) - Typst - Marp CLI -- Markwhen: Interactive text-to-timeline tool +- Wandmalfarbe pandoc-latex-template: Beautiful Eisvogel LaTeX template for professional PDF generation - Spell/Grammar checking: - Hunspell (with en-US dictionary) - Aspell (with en dictionary) @@ -95,8 +95,9 @@ docker-compose up --build # Spell checking with hunspell ./docker-compose-wrapper.sh run docmaker-full hunspell -d en_US document.md -# Create timeline with Markwhen -./docker-compose-wrapper.sh run docmaker-full markwhen input.mw --output output.html +# Create timeline with Markwhen (not currently available) +# This will be enabled when Markwhen installation issue is resolved +# ./docker-compose-wrapper.sh run docmaker-full markwhen input.mw --output output.html # Grammar/style checking with Vale ./docker-compose-wrapper.sh run docmaker-full vale document.md diff --git a/Docker/RCEO-AIOS-Public-Tools-DocMaker-Base/Dockerfile b/Docker/RCEO-AIOS-Public-Tools-DocMaker-Base/Dockerfile index b6dcb28..58f6337 100644 --- a/Docker/RCEO-AIOS-Public-Tools-DocMaker-Base/Dockerfile +++ b/Docker/RCEO-AIOS-Public-Tools-DocMaker-Base/Dockerfile @@ -46,8 +46,15 @@ RUN curl -L https://github.com/typst/typst/releases/latest/download/typst-x86_64 # Install Marp CLI RUN npm install -g @marp-team/marp-cli -# Install Markwhen -RUN npm install -g @markwhen/cli +# Install Wandmalfarbe pandoc-latex-template for beautiful PDF generation +RUN git clone --depth 1 https://github.com/Wandmalfarbe/pandoc-latex-template.git /tmp/pandoc-latex-template && \ + mkdir -p /root/.local/share/pandoc/templates && \ + # Find and copy any .latex template files to the templates directory + find /tmp/pandoc-latex-template -name "*.latex" -exec cp {} /root/.local/share/pandoc/templates/ \; && \ + # Also install to system-wide location for all users + mkdir -p /usr/share/pandoc/templates && \ + find /tmp/pandoc-latex-template -name "*.latex" -exec cp {} /usr/share/pandoc/templates/ \; && \ + rm -rf /tmp/pandoc-latex-template # Install spell/grammar checking tools RUN apt-get update && apt-get install -y \ @@ -60,7 +67,7 @@ RUN curl -L https://github.com/errata-ai/vale/releases/download/v3.12.0/vale_3.1 | tar xz -C /tmp && cp /tmp/vale /usr/local/bin && chmod +x /usr/local/bin/vale # Install text statistics tool for reading time estimation -RUN pip3 install mdstat textstat +RUN pip3 install --break-system-packages mdstat textstat # Install additional text processing tools RUN apt-get update && apt-get install -y \ diff --git a/Docker/RCEO-AIOS-Public-Tools-DocMaker-Base/README.md b/Docker/RCEO-AIOS-Public-Tools-DocMaker-Base/README.md index 433046c..2bf1881 100644 --- a/Docker/RCEO-AIOS-Public-Tools-DocMaker-Base/README.md +++ b/Docker/RCEO-AIOS-Public-Tools-DocMaker-Base/README.md @@ -18,11 +18,11 @@ The RCEO-AIOS-Public-Tools-DocMaker-Base container is designed for lightweight d ### Documentation Generation - **Pandoc**: Universal document converter +- **Wandmalfarbe pandoc-latex-template**: Beautiful Eisvogel LaTeX template for professional PDFs - **mdBook**: Create books from Markdown files - **mdbook-pdf**: PDF renderer for mdBook - **Typst**: Modern typesetting system - **Marp CLI**: Create presentations from Markdown -- **Markwhen**: Interactive text-to-timeline tool ### LaTeX - **TeX Live**: Lightweight LaTeX packages for basic document typesetting @@ -51,6 +51,9 @@ cd /home/localuser/AIWorkspace/AIOS-Public/Docker/RCEO-AIOS-Public-Tools-DocMake # Example: Convert a Markdown file to PDF using pandoc ./docker-compose-wrapper.sh run docmaker-base pandoc input.md -o output.pdf +# Example: Create beautiful PDF using Eisvogel template +./docker-compose-wrapper.sh run docmaker-base pandoc input.md --template eisvogel -o output.pdf + # Example: Create a timeline with Markwhen ./docker-compose-wrapper.sh run docmaker-base markwhen input.mw --output output.html ``` diff --git a/Docker/RCEO-AIOS-Public-Tools-DocMaker-Base/TODO.md b/Docker/RCEO-AIOS-Public-Tools-DocMaker-Base/TODO.md index 17efc5f..273efd9 100644 --- a/Docker/RCEO-AIOS-Public-Tools-DocMaker-Base/TODO.md +++ b/Docker/RCEO-AIOS-Public-Tools-DocMaker-Base/TODO.md @@ -43,14 +43,15 @@ This document tracks potential enhancements and tools to be added to the documen - ✅ Core system packages (bash, curl, wget, git) - ✅ Programming languages (Python 3, Node.js, Rust) - ✅ Pandoc - Universal document converter +- ✅ Wandmalfarbe pandoc-latex-template - Beautiful Eisvogel LaTeX template for professional PDFs - ✅ mdBook - Create books from Markdown files - ✅ mdbook-pdf - PDF renderer for mdBook - ✅ Typst - Modern typesetting system - ✅ Marp CLI - Create presentations from Markdown -- ✅ Markwhen - Interactive text-to-timeline tool +- ⏳ Markwhen - Interactive text-to-timeline tool (installation failed, needs fix) - ✅ Light LaTeX packages (texlive-latex-base) - ✅ Spell/grammar checking tools (Hunspell, Aspell, Vale) -- ✅ Text statistics tools (mdstat) +- ✅ Text statistics tools (mdstat, textstat) - ✅ Non-root user management with UID/GID mapping - ✅ Entrypoint script for runtime user creation diff --git a/Docker/RCEO-AIOS-Public-Tools-DocMaker-Computational/README.md b/Docker/RCEO-AIOS-Public-Tools-DocMaker-Computational/README.md index 0588d01..cd5c23d 100644 --- a/Docker/RCEO-AIOS-Public-Tools-DocMaker-Computational/README.md +++ b/Docker/RCEO-AIOS-Public-Tools-DocMaker-Computational/README.md @@ -41,6 +41,9 @@ cd /home/localuser/AIWorkspace/AIOS-Public/Docker/RCEO-AIOS-Public-Tools-DocMake # Example: Run Python analysis ./docker-compose-wrapper.sh run docmaker-computational python analysis.py +# Example: Convert a Markdown file to beautiful PDF using Eisvogel template +./docker-compose-wrapper.sh run docmaker-computational pandoc input.md --template eisvogel -o output.pdf --pdf-engine=xelatex + # Example: Start Jupyter notebook server ./docker-compose-wrapper.sh up # Then access at http://localhost:8888 diff --git a/Docker/RCEO-AIOS-Public-Tools-DocMaker-Full/README.md b/Docker/RCEO-AIOS-Public-Tools-DocMaker-Full/README.md index 68b4761..d1dca07 100644 --- a/Docker/RCEO-AIOS-Public-Tools-DocMaker-Full/README.md +++ b/Docker/RCEO-AIOS-Public-Tools-DocMaker-Full/README.md @@ -29,6 +29,9 @@ cd /home/localuser/AIWorkspace/AIOS-Public/Docker/RCEO-AIOS-Public-Tools-DocMake # Example: Convert a Markdown file to PDF using pandoc with full LaTeX ./docker-compose-wrapper.sh run docmaker-full pandoc input.md -o output.pdf --pdf-engine=xelatex + +# Example: Create beautiful PDF using Eisvogel template +./docker-compose-wrapper.sh run docmaker-full pandoc input.md --template eisvogel -o output.pdf --pdf-engine=xelatex ``` ### Using with docker-compose directly diff --git a/Docker/RCEO-AIOS-Public-Tools-DocMaker-Light/README.md b/Docker/RCEO-AIOS-Public-Tools-DocMaker-Light/README.md index 61e10d6..356cd9a 100644 --- a/Docker/RCEO-AIOS-Public-Tools-DocMaker-Light/README.md +++ b/Docker/RCEO-AIOS-Public-Tools-DocMaker-Light/README.md @@ -26,6 +26,9 @@ cd /home/localuser/AIWorkspace/AIOS-Public/Docker/RCEO-AIOS-Public-Tools-DocMake # Example: Convert a Markdown file to PDF using pandoc ./docker-compose-wrapper.sh run docmaker-light pandoc input.md -o output.pdf + +# Example: Create beautiful PDF using Eisvogel template +./docker-compose-wrapper.sh run docmaker-light pandoc input.md --template eisvogel -o output.pdf ``` ### Using with docker-compose directly diff --git a/GUIDEBOOK/AgentRules.md b/GUIDEBOOK/AgentRules.md index c0bab33..1868cd4 100644 --- a/GUIDEBOOK/AgentRules.md +++ b/GUIDEBOOK/AgentRules.md @@ -19,3 +19,8 @@ Additional Rules: - Create thin wrapper scripts that detect and handle UID/GID mapping to ensure file permissions work across any host environment. - Maintain disciplined naming and organization to prevent technical debt as the number of projects grows. - Keep the repository root directory clean. Place all project-specific files and scripts in appropriate subdirectories rather than at the top level. +- Use conventional commits for all git commits with proper formatting: type(scope): brief description followed by more verbose explanation if needed. +- Commit messages should be beautiful and properly verbose, explaining what was done and why. +- Use the LLM's judgment for when to push and tag - delegate these decisions based on the significance of changes. +- All projects should include a collab/ directory with subdirectories: questions, proposals, plans, prompts, and audit. +- Follow the architectural approach: layered container architecture (base -> specialized layers), consistent security patterns (non-root user with UID/GID mapping), same operational patterns (wrapper scripts), and disciplined naming conventions. diff --git a/GUIDEBOOK/ArchitecturalApproach.md b/GUIDEBOOK/ArchitecturalApproach.md new file mode 100644 index 0000000..218c35a --- /dev/null +++ b/GUIDEBOOK/ArchitecturalApproach.md @@ -0,0 +1,47 @@ +# Architectural Approach + +This document captures the architectural approach for project development in the AIOS-Public system. + +## Container Architecture + +### Layered Approach +- Base containers provide foundational tools and libraries +- Specialized containers extend base functionality for specific use cases +- Each layer adds specific capabilities while maintaining consistency + +### Naming Convention +- Use `RCEO-AIOS-Public-Tools-` prefix consistently +- Include descriptive suffixes indicating container purpose +- Follow pattern: `RCEO-AIOS-Public-Tools-[domain]-[type]` + +### Security Patterns +- Minimize root usage during build and runtime +- Implement non-root users for all runtime operations +- Use UID/GID mapping for proper file permissions across environments +- Detect host user IDs automatically through file system inspection + +### Operational Patterns +- Create thin wrapper scripts that handle environment setup +- Use consistent patterns for user ID detection and mapping +- Maintain same operational workflow across all containers +- Provide clear documentation in README files + +### Organization Principles +- Separate COO mode (operational tasks) from CTO mode (R&D tasks) containers +- Create individual directories per container type +- Maintain disciplined file organization to prevent technical debt +- Keep repository root clean with project-specific files in subdirectories + +## Documentation Requirements +- Each container must have comprehensive README +- Include usage examples and environment setup instructions +- Document security and permission handling +- Provide clear container mapping and purpose + +## Implementation Workflow +1. Start with architectural design document +2. Create detailed implementation plan +3. Develop following established patterns +4. Test with sample data/usage +5. Document for end users +6. Commit with conventional commit messages \ No newline at end of file diff --git a/collab/README.md b/collab/README.md new file mode 100644 index 0000000..5693182 --- /dev/null +++ b/collab/README.md @@ -0,0 +1,40 @@ +# Collaboration Directory + +This directory contains structured collaboration artifacts for project development and decision-making. + +## Directory Structure + +- `questions/` - Outstanding questions and topics for discussion +- `proposals/` - Formal proposals for new features, changes, or implementations +- `plans/` - Detailed implementation plans and technical designs +- `prompts/` - Structured prompts for AI agents and automation +- `audit/` - Audit trails, reviews, and assessment records + +## Usage Guidelines + +### Questions +- Add new questions that need discussion or clarification +- Link related proposals or plans where appropriate +- Track resolution status + +### Proposals +- Create formal proposals for significant changes or additions +- Include business rationale and technical approach +- Document expected outcomes and resource requirements +- Seek approval before implementation + +### Plans +- Detail technical implementation plans +- Include architecture diagrams, technology stacks, and implementation phases +- Identify risks and mitigation strategies +- Outline next steps and dependencies + +### Prompts +- Store reusable prompts for AI agents +- Document prompt effectiveness and outcomes +- Version prompts for different use cases + +### Audit +- Track decisions made and their outcomes +- Document performance reviews and assessments +- Record lessons learned and improvements \ No newline at end of file diff --git a/collab/audit/markwhen-installation-issue.md b/collab/audit/markwhen-installation-issue.md new file mode 100644 index 0000000..79613fd --- /dev/null +++ b/collab/audit/markwhen-installation-issue.md @@ -0,0 +1,23 @@ +# Issue: Markwhen Installation Failure + +## Problem +The Markwhen installation is failing during the Docker build process with the error: +"failed to solve: process "/bin/sh -c npm install -g @markwhen/cli" did not complete successfully: exit code: 1" + +## Investigation Needed +- Research the correct npm package name for Markwhen CLI +- Determine if it should be installed from GitHub repository instead +- Check if there are dependencies we're missing +- Verify if the package exists under a different name + +## Possible Solutions +1. Install from GitHub repository directly +2. Use a different package name +3. Build from source +4. Check if Node.js version compatibility is an issue + +## Priority +Medium - Markwhen is a useful tool for timeline generation but not critical for core functionality + +## Status +Pending investigation \ No newline at end of file diff --git a/collab/plans/gis-weather-plan.md b/collab/plans/gis-weather-plan.md new file mode 100644 index 0000000..7f176f0 --- /dev/null +++ b/collab/plans/gis-weather-plan.md @@ -0,0 +1,175 @@ +# GIS and Weather Data Processing Container Plan + +## Overview +This document outlines the plan for creating Docker containers to handle GIS data processing and weather data analysis. These containers will be used exclusively in CTO mode for R&D and data analysis tasks, with integration to documentation workflows and MinIO for data output. + +## Requirements + +### GIS Data Processing +- Support for Shapefiles and other GIS formats +- Self-hosted GIS stack (not Google Maps or other commercial services) +- Integration with tools like GDAL, Tippecanoe, DuckDB +- Heavy use of PostGIS database +- Parquet format support for efficient data storage +- Based on reference workflows from: + - https://tech.marksblogg.com/american-solar-farms.html + - https://tech.marksblogg.com/canadas-odb-buildings.html + - https://tech.marksblogg.com/ornl-fema-buildings.html + +### Weather Data Processing +- GRIB data format processing +- NOAA and European weather APIs integration +- Bulk data download via HTTP/FTP +- Balloon path prediction system (to be forked/modified) + +### Shared Requirements +- Python-based with appropriate libraries (GeoPandas, DuckDB, etc.) +- R support for statistical analysis +- Jupyter notebook integration for experimentation +- MinIO bucket integration for data output +- Optional but enabled GPU support for performance +- All visualization types (command-line, web, desktop) +- Flexible ETL capabilities for both GIS/Weather and business workflows + +## Proposed Container Structure + +### RCEO-AIOS-Public-Tools-GIS-Base +- Foundation container with core GIS libraries +- Python + geospatial stack (GDAL, GEOS, PROJ, DuckDB, Tippecanoe) +- R with spatial packages +- PostGIS client tools +- Parquet support +- File format support (Shapefiles, GeoJSON, etc.) + +### RCEO-AIOS-Public-Tools-GIS-Processing +- Extends GIS-Base with advanced processing tools +- Jupyter with GIS extensions +- Specialized ETL libraries +- Performance optimization tools + +### RCEO-AIOS-Public-Tools-Weather-Base +- Foundation container with weather data libraries +- GRIB format support (cfgrib) +- NOAA and European API integration tools +- Bulk download utilities (HTTP/FTP) + +### RCEO-AIOS-Public-Tools-Weather-Analysis +- Extends Weather-Base with advanced analysis tools +- Balloon path prediction tools +- Forecasting libraries +- Time series analysis + +### RCEO-AIOS-Public-Tools-GIS-Weather-Fusion (Optional) +- Combined container for integrated GIS + Weather analysis +- For balloon path prediction using weather data +- High-resource container for intensive tasks + +## Technology Stack + +### GIS Libraries +- GDAL/OGR for format translation and processing +- GEOS for geometric operations +- PROJ for coordinate transformations +- PostGIS for spatial database operations +- DuckDB for efficient data processing with spatial extensions +- Tippecanoe for tile generation +- Shapely for Python geometric operations +- GeoPandas for Python geospatial data handling +- Rasterio for raster processing in Python +- Leaflet/Mapbox for web visualization + +### Data Storage & Processing +- DuckDB with spatial extensions +- Parquet format support +- MinIO client tools for data output +- PostgreSQL client for connecting to external databases + +### Weather Libraries +- xarray for multi-dimensional data in Python +- cfgrib for GRIB format handling +- MetPy for meteorological calculations +- Climate Data Operators (CDO) for climate data processing +- R packages: raster, rgdal, ncdf4, rasterVis + +### Visualization +- Folium for interactive maps +- Plotly for time series visualization +- Matplotlib/Seaborn for statistical plots +- R visualization packages +- Command-line visualization tools + +### ETL and Workflow Tools +- Apache Airflow (optional in advanced containers) +- Prefect or similar workflow orchestrators +- DuckDB for ETL operations +- Pandas/Dask for large data processing + +## Container Deployment Strategy + +### Workstation Prototyping +- Lighter containers for development and testing +- Optional GPU support +- MinIO client for data output testing + +### Production Servers +- Full-featured containers with all processing capabilities +- GPU-enabled variants where applicable +- Optimized for large RAM/CPU/disk requirements + +## Security & User Management +- Follow same non-root user pattern as documentation containers +- UID/GID mapping for file permissions +- Minimal necessary privileges +- Proper container isolation +- Secure access to MinIO buckets + +## Integration with Existing Stack +- Compatible with existing user management approach +- Can be orchestrated with documentation containers when needed +- Follow same naming conventions +- Use same wrapper script patterns +- Separate from documentation containers but can work together in CTO mode + +## Implementation Phases + +### Phase 1: Base GIS Container +- Create GIS-Base with GDAL, DuckDB, PostGIS client tools +- Implement Parquet and Shapefile support +- Test with sample datasets from reference posts +- Validate MinIO integration + +### Phase 2: Weather Base Container +- Create Weather-Base with GRIB support +- Integrate NOAA and European API tools +- Implement bulk download capabilities +- Test with weather data sources + +### Phase 3: Processing Containers +- Create GIS-Processing container with ETL tools +- Create Weather-Analysis container with prediction tools +- Add visualization and Jupyter support +- Implement optional GPU support + +### Phase 4: Optional Fusion Container +- Combined container for balloon path prediction +- Integration of GIS and weather data +- High-complexity, high-resource usage + +## Data Flow Architecture +- ETL workflows for processing public datasets +- Output to MinIO buckets for business use +- Integration with documentation tools for CTO mode workflows +- Support for both GIS/Weather ETL (CTO) and business ETL (COO) + +## Next Steps +1. Review and approve this enhanced plan +2. Begin Phase 1 implementation +3. Test with sample data from reference workflows +4. Iterate based on findings + +## Risks & Considerations +- Large container sizes due to GIS libraries and dependencies +- Complex dependency management, especially with DuckDB and PostGIS +- Computational resource requirements, especially for large datasets +- GPU support implementation complexity +- Bulk data download and processing performance \ No newline at end of file diff --git a/collab/prompts/gis-weather-prompt.md b/collab/prompts/gis-weather-prompt.md new file mode 100644 index 0000000..12166ff --- /dev/null +++ b/collab/prompts/gis-weather-prompt.md @@ -0,0 +1,35 @@ +# GIS and Weather Data Processing - AI Prompt Template + +## Purpose +This prompt template is designed to guide AI agents in implementing GIS and weather data processing containers following established patterns. + +## Instructions for AI Agent + +When implementing GIS and weather data processing containers: + +1. Follow the established container architecture pattern (base -> specialized layers) +2. Maintain consistent naming convention: RCEO-AIOS-Public-Tools-[domain]-[type] +3. Implement non-root user with UID/GID mapping +4. Create appropriate Dockerfiles and docker-compose configurations +5. Include proper documentation and README files +6. Add wrapper scripts for environment management +7. Test with sample data to verify functionality +8. Follow same security and operational patterns as existing containers + +## Technical Requirements + +- Use Debian Bookworm slim as base OS +- Include appropriate GIS libraries (GDAL, GEOS, PROJ, etc.) +- Include weather data processing libraries (xarray, netCDF4, etc.) +- Implement Jupyter notebook support where appropriate +- Include R and Python stacks as needed +- Add visualization tools (Folium, Plotly, etc.) + +## Quality Standards + +- Ensure containers build without errors +- Verify file permissions work across environments +- Test with sample datasets +- Document usage clearly +- Follow security best practices +- Maintain consistent user experience with existing containers \ No newline at end of file diff --git a/collab/proposals/gis-weather-proposal.md b/collab/proposals/gis-weather-proposal.md new file mode 100644 index 0000000..e7cc80d --- /dev/null +++ b/collab/proposals/gis-weather-proposal.md @@ -0,0 +1,64 @@ +# GIS and Weather Data Processing Container Proposal + +## Proposal Summary +Create specialized Docker containers for GIS data processing and weather data analysis to support CTO-mode R&D activities, particularly for infrastructure planning and balloon path prediction for your TSYS Group projects. + +## Business Rationale +As GIS and weather data analysis become increasingly important for your TSYS Group projects (particularly for infrastructure planning like solar farms and building datasets, and balloon path prediction), there's a need for specialized containers that can handle these data types efficiently while maintaining consistency with existing infrastructure patterns. The containers will support: +- Self-hosted GIS stack for privacy and control +- Processing public datasets (NOAA, European APIs, etc.) +- ETL workflows for both technical and business data processing +- Integration with MinIO for data output to business systems + +## Technical Approach +- Follow the same disciplined container architecture as the documentation tools +- Use layered approach with base and specialized containers +- Implement same security patterns (non-root user, UID/GID mapping) +- Maintain consistent naming conventions +- Use same operational patterns (wrapper scripts, etc.) +- Include PostGIS, DuckDB, and optional GPU support +- Implement MinIO integration for data output +- Support for prototyping on workstations and production on large servers + +## Technology Stack +- **GIS Tools**: GDAL, Tippecanoe, DuckDB with spatial extensions +- **Database**: PostgreSQL/PostGIS client tools +- **Formats**: Shapefiles, Parquet, GRIB, GeoJSON +- **Weather**: cfgrib, xarray, MetPy +- **ETL**: Pandas, Dask, custom workflow tools +- **APIs**: NOAA, European weather APIs +- **Visualization**: Folium, Plotly, command-line tools + +## Benefits +- Consistent environment across development (workstations) and production (large servers) +- Proper file permission handling across different systems +- Isolated tools prevent dependency conflicts +- Reproducible analysis environments for GIS and weather data +- Integration with documentation tools for CTO mode workflows +- Support for both technical (GIS/Weather) and business (COO) ETL workflows +- Scalable architecture with optional GPU support +- Data output capability to MinIO buckets for business use + +## Resource Requirements +- Development time: 3-4 weeks for complete implementation +- Storage: Additional container images (est. 3-6GB each) +- Compute: Higher requirements for processing (can be isolated to CTO mode) +- Optional: GPU resources for performance-intensive tasks + +## Expected Outcomes +- Improved capability for spatial and weather data analysis +- Consistent environments across development and production systems +- Better integration with documentation workflows +- Faster setup for ETL projects (both technical and business) +- Efficient processing of large datasets using DuckDB and Parquet +- Proper data output to MinIO buckets for business use +- Reduced technical debt through consistent patterns + +## Implementation Timeline +- Week 1: Base GIS container with PostGIS, DuckDB, and data format support +- Week 2: Base Weather container with GRIB support and API integration +- Week 3: Advanced processing containers with Jupyter and visualization +- Week 4: Optional GPU variants and MinIO integration testing + +## Approval Request +Please review and approve this proposal to proceed with implementation of the GIS and weather data processing containers that will support your infrastructure planning and balloon path prediction work. \ No newline at end of file diff --git a/collab/questions/gis-weather-questions.md b/collab/questions/gis-weather-questions.md new file mode 100644 index 0000000..26f3266 --- /dev/null +++ b/collab/questions/gis-weather-questions.md @@ -0,0 +1,87 @@ +# GIS and Weather Data Processing - Initial Questions + +## Core Questions + +1. What specific GIS formats and operations are most critical for your current projects? + +Well I am not entirely sure. I am guessing that I'll need to pull in shapefiles ? I will be working with an +entirely self hosted GIS stack (not Google maps or anything). I know things exist like gdal ? tippacanoe? + +I think things like parquet as well. Maybe duckdb? + +Reference these posts: + +https://tech.marksblogg.com/american-solar-farms.html +https://tech.marksblogg.com/canadas-odb-buildings.html +https://tech.marksblogg.com/ornl-fema-buildings.html + +FOr the type of workflows that I would like to run. + +Extract patterns/architecture/approaches along with the specific reductions to practice. + +2. What weather data sources and APIs do you currently use or plan to use? + +None currently. But I'll be hacking/forking a system to predict balloon paths. I suspect I'll need to process grib data. +Also probably use the NOAA and european equivalant APIs? Maybe some bulk HTTP/FTP download? + +3. Are there any specific performance requirements for processing large datasets? + +I suspect I'll do some early prototyping with small data sets on my workstation and then running the container with the real data sets on my big ram/cpu/disk servers. + + +4. Do you need integration with specific databases (PostGIS, etc.)? + +Yes I will be heavily using PostGIS for sure. + +## Technical Questions + +1. Should we include both Python and R stacks in the same containers or separate them? + +I am not sure? Whatever you think is best? + +2. What level of visualization capability is needed (command-line, web-based, desktop)? + +All of those I think. I want flexibility. + + +3. Are there any licensing constraints or requirements to consider? + +I will be working only with public data sets. + + +4. Do you need GPU support for any processing tasks? + +Yes but make it optional. I dont want to be blocked with GPU complexity right now. + + +## Integration Questions + +1. How should GIS/Weather outputs integrate with documentation workflows? + +I will be using the GIS/Weather In CTO mode only. I will also be using documentation in CTO mode with it. + +I think, for now, they can be siblings but not have strong integration. + +**ANSWER**: GIS/Weather and documentation containers will operate as siblings in CTO mode, with loose integration for now. + +2. Do you need persistent data storage within containers? + +I do not think so. I will use docker compose to pass in directory paths . + +Oh I will want to push finsihed data to minio buckets. + +I don't know how to best architect my ETL toolbox.... I will mostly be doing ETL on GIS/Weather data but I can see also needing todo other business type ETL workflows in COO mode. + +**ANSWER**: Use Docker compose volume mounts for data input/output. Primary output destination will be MinIO buckets for business use. ETL toolbox should handle both GIS/Weather (CTO) and business (COO) workflows. + +3. What level of integration with existing documentation containers is desired? + +**ANSWER**: Sibling relationship with loose integration. Both will be used in CTO mode but for different purposes. + +4. Are there specific deployment environments to target (local, cloud, edge)? + +Well the ultimate goal is some data sets get pushed to minio buckets for use by various lines of business. + +This is all kind of new to me. I am a technical operations/system admin and easing my way into devops/sre and swe. + +**ANSWER**: Primarily local deployment (workstation for prototyping, large servers for production). Data output to MinIO for business use. Targeting self-hosted environments for full control and privacy. \ No newline at end of file