diff --git a/collab/README.md b/collab/README.md deleted file mode 100644 index 5693182..0000000 --- a/collab/README.md +++ /dev/null @@ -1,40 +0,0 @@ -# Collaboration Directory - -This directory contains structured collaboration artifacts for project development and decision-making. - -## Directory Structure - -- `questions/` - Outstanding questions and topics for discussion -- `proposals/` - Formal proposals for new features, changes, or implementations -- `plans/` - Detailed implementation plans and technical designs -- `prompts/` - Structured prompts for AI agents and automation -- `audit/` - Audit trails, reviews, and assessment records - -## Usage Guidelines - -### Questions -- Add new questions that need discussion or clarification -- Link related proposals or plans where appropriate -- Track resolution status - -### Proposals -- Create formal proposals for significant changes or additions -- Include business rationale and technical approach -- Document expected outcomes and resource requirements -- Seek approval before implementation - -### Plans -- Detail technical implementation plans -- Include architecture diagrams, technology stacks, and implementation phases -- Identify risks and mitigation strategies -- Outline next steps and dependencies - -### Prompts -- Store reusable prompts for AI agents -- Document prompt effectiveness and outcomes -- Version prompts for different use cases - -### Audit -- Track decisions made and their outcomes -- Document performance reviews and assessments -- Record lessons learned and improvements \ No newline at end of file diff --git a/collab/audit/markwhen-installation-issue.md b/collab/audit/markwhen-installation-issue.md deleted file mode 100644 index 79613fd..0000000 --- a/collab/audit/markwhen-installation-issue.md +++ /dev/null @@ -1,23 +0,0 @@ -# Issue: Markwhen Installation Failure - -## Problem -The Markwhen installation is failing during the Docker build process with the error: -"failed to solve: process "/bin/sh -c npm install -g @markwhen/cli" did not complete successfully: exit code: 1" - -## Investigation Needed -- Research the correct npm package name for Markwhen CLI -- Determine if it should be installed from GitHub repository instead -- Check if there are dependencies we're missing -- Verify if the package exists under a different name - -## Possible Solutions -1. Install from GitHub repository directly -2. Use a different package name -3. Build from source -4. Check if Node.js version compatibility is an issue - -## Priority -Medium - Markwhen is a useful tool for timeline generation but not critical for core functionality - -## Status -Pending investigation \ No newline at end of file diff --git a/collab/plans/gis-weather-plan.md b/collab/plans/gis-weather-plan.md deleted file mode 100644 index 7f176f0..0000000 --- a/collab/plans/gis-weather-plan.md +++ /dev/null @@ -1,175 +0,0 @@ -# GIS and Weather Data Processing Container Plan - -## Overview -This document outlines the plan for creating Docker containers to handle GIS data processing and weather data analysis. These containers will be used exclusively in CTO mode for R&D and data analysis tasks, with integration to documentation workflows and MinIO for data output. - -## Requirements - -### GIS Data Processing -- Support for Shapefiles and other GIS formats -- Self-hosted GIS stack (not Google Maps or other commercial services) -- Integration with tools like GDAL, Tippecanoe, DuckDB -- Heavy use of PostGIS database -- Parquet format support for efficient data storage -- Based on reference workflows from: - - https://tech.marksblogg.com/american-solar-farms.html - - https://tech.marksblogg.com/canadas-odb-buildings.html - - https://tech.marksblogg.com/ornl-fema-buildings.html - -### Weather Data Processing -- GRIB data format processing -- NOAA and European weather APIs integration -- Bulk data download via HTTP/FTP -- Balloon path prediction system (to be forked/modified) - -### Shared Requirements -- Python-based with appropriate libraries (GeoPandas, DuckDB, etc.) -- R support for statistical analysis -- Jupyter notebook integration for experimentation -- MinIO bucket integration for data output -- Optional but enabled GPU support for performance -- All visualization types (command-line, web, desktop) -- Flexible ETL capabilities for both GIS/Weather and business workflows - -## Proposed Container Structure - -### RCEO-AIOS-Public-Tools-GIS-Base -- Foundation container with core GIS libraries -- Python + geospatial stack (GDAL, GEOS, PROJ, DuckDB, Tippecanoe) -- R with spatial packages -- PostGIS client tools -- Parquet support -- File format support (Shapefiles, GeoJSON, etc.) - -### RCEO-AIOS-Public-Tools-GIS-Processing -- Extends GIS-Base with advanced processing tools -- Jupyter with GIS extensions -- Specialized ETL libraries -- Performance optimization tools - -### RCEO-AIOS-Public-Tools-Weather-Base -- Foundation container with weather data libraries -- GRIB format support (cfgrib) -- NOAA and European API integration tools -- Bulk download utilities (HTTP/FTP) - -### RCEO-AIOS-Public-Tools-Weather-Analysis -- Extends Weather-Base with advanced analysis tools -- Balloon path prediction tools -- Forecasting libraries -- Time series analysis - -### RCEO-AIOS-Public-Tools-GIS-Weather-Fusion (Optional) -- Combined container for integrated GIS + Weather analysis -- For balloon path prediction using weather data -- High-resource container for intensive tasks - -## Technology Stack - -### GIS Libraries -- GDAL/OGR for format translation and processing -- GEOS for geometric operations -- PROJ for coordinate transformations -- PostGIS for spatial database operations -- DuckDB for efficient data processing with spatial extensions -- Tippecanoe for tile generation -- Shapely for Python geometric operations -- GeoPandas for Python geospatial data handling -- Rasterio for raster processing in Python -- Leaflet/Mapbox for web visualization - -### Data Storage & Processing -- DuckDB with spatial extensions -- Parquet format support -- MinIO client tools for data output -- PostgreSQL client for connecting to external databases - -### Weather Libraries -- xarray for multi-dimensional data in Python -- cfgrib for GRIB format handling -- MetPy for meteorological calculations -- Climate Data Operators (CDO) for climate data processing -- R packages: raster, rgdal, ncdf4, rasterVis - -### Visualization -- Folium for interactive maps -- Plotly for time series visualization -- Matplotlib/Seaborn for statistical plots -- R visualization packages -- Command-line visualization tools - -### ETL and Workflow Tools -- Apache Airflow (optional in advanced containers) -- Prefect or similar workflow orchestrators -- DuckDB for ETL operations -- Pandas/Dask for large data processing - -## Container Deployment Strategy - -### Workstation Prototyping -- Lighter containers for development and testing -- Optional GPU support -- MinIO client for data output testing - -### Production Servers -- Full-featured containers with all processing capabilities -- GPU-enabled variants where applicable -- Optimized for large RAM/CPU/disk requirements - -## Security & User Management -- Follow same non-root user pattern as documentation containers -- UID/GID mapping for file permissions -- Minimal necessary privileges -- Proper container isolation -- Secure access to MinIO buckets - -## Integration with Existing Stack -- Compatible with existing user management approach -- Can be orchestrated with documentation containers when needed -- Follow same naming conventions -- Use same wrapper script patterns -- Separate from documentation containers but can work together in CTO mode - -## Implementation Phases - -### Phase 1: Base GIS Container -- Create GIS-Base with GDAL, DuckDB, PostGIS client tools -- Implement Parquet and Shapefile support -- Test with sample datasets from reference posts -- Validate MinIO integration - -### Phase 2: Weather Base Container -- Create Weather-Base with GRIB support -- Integrate NOAA and European API tools -- Implement bulk download capabilities -- Test with weather data sources - -### Phase 3: Processing Containers -- Create GIS-Processing container with ETL tools -- Create Weather-Analysis container with prediction tools -- Add visualization and Jupyter support -- Implement optional GPU support - -### Phase 4: Optional Fusion Container -- Combined container for balloon path prediction -- Integration of GIS and weather data -- High-complexity, high-resource usage - -## Data Flow Architecture -- ETL workflows for processing public datasets -- Output to MinIO buckets for business use -- Integration with documentation tools for CTO mode workflows -- Support for both GIS/Weather ETL (CTO) and business ETL (COO) - -## Next Steps -1. Review and approve this enhanced plan -2. Begin Phase 1 implementation -3. Test with sample data from reference workflows -4. Iterate based on findings - -## Risks & Considerations -- Large container sizes due to GIS libraries and dependencies -- Complex dependency management, especially with DuckDB and PostGIS -- Computational resource requirements, especially for large datasets -- GPU support implementation complexity -- Bulk data download and processing performance \ No newline at end of file diff --git a/collab/prompts/gis-weather-prompt.md b/collab/prompts/gis-weather-prompt.md deleted file mode 100644 index 12166ff..0000000 --- a/collab/prompts/gis-weather-prompt.md +++ /dev/null @@ -1,35 +0,0 @@ -# GIS and Weather Data Processing - AI Prompt Template - -## Purpose -This prompt template is designed to guide AI agents in implementing GIS and weather data processing containers following established patterns. - -## Instructions for AI Agent - -When implementing GIS and weather data processing containers: - -1. Follow the established container architecture pattern (base -> specialized layers) -2. Maintain consistent naming convention: RCEO-AIOS-Public-Tools-[domain]-[type] -3. Implement non-root user with UID/GID mapping -4. Create appropriate Dockerfiles and docker-compose configurations -5. Include proper documentation and README files -6. Add wrapper scripts for environment management -7. Test with sample data to verify functionality -8. Follow same security and operational patterns as existing containers - -## Technical Requirements - -- Use Debian Bookworm slim as base OS -- Include appropriate GIS libraries (GDAL, GEOS, PROJ, etc.) -- Include weather data processing libraries (xarray, netCDF4, etc.) -- Implement Jupyter notebook support where appropriate -- Include R and Python stacks as needed -- Add visualization tools (Folium, Plotly, etc.) - -## Quality Standards - -- Ensure containers build without errors -- Verify file permissions work across environments -- Test with sample datasets -- Document usage clearly -- Follow security best practices -- Maintain consistent user experience with existing containers \ No newline at end of file diff --git a/collab/proposals/gis-weather-proposal.md b/collab/proposals/gis-weather-proposal.md deleted file mode 100644 index e7cc80d..0000000 --- a/collab/proposals/gis-weather-proposal.md +++ /dev/null @@ -1,64 +0,0 @@ -# GIS and Weather Data Processing Container Proposal - -## Proposal Summary -Create specialized Docker containers for GIS data processing and weather data analysis to support CTO-mode R&D activities, particularly for infrastructure planning and balloon path prediction for your TSYS Group projects. - -## Business Rationale -As GIS and weather data analysis become increasingly important for your TSYS Group projects (particularly for infrastructure planning like solar farms and building datasets, and balloon path prediction), there's a need for specialized containers that can handle these data types efficiently while maintaining consistency with existing infrastructure patterns. The containers will support: -- Self-hosted GIS stack for privacy and control -- Processing public datasets (NOAA, European APIs, etc.) -- ETL workflows for both technical and business data processing -- Integration with MinIO for data output to business systems - -## Technical Approach -- Follow the same disciplined container architecture as the documentation tools -- Use layered approach with base and specialized containers -- Implement same security patterns (non-root user, UID/GID mapping) -- Maintain consistent naming conventions -- Use same operational patterns (wrapper scripts, etc.) -- Include PostGIS, DuckDB, and optional GPU support -- Implement MinIO integration for data output -- Support for prototyping on workstations and production on large servers - -## Technology Stack -- **GIS Tools**: GDAL, Tippecanoe, DuckDB with spatial extensions -- **Database**: PostgreSQL/PostGIS client tools -- **Formats**: Shapefiles, Parquet, GRIB, GeoJSON -- **Weather**: cfgrib, xarray, MetPy -- **ETL**: Pandas, Dask, custom workflow tools -- **APIs**: NOAA, European weather APIs -- **Visualization**: Folium, Plotly, command-line tools - -## Benefits -- Consistent environment across development (workstations) and production (large servers) -- Proper file permission handling across different systems -- Isolated tools prevent dependency conflicts -- Reproducible analysis environments for GIS and weather data -- Integration with documentation tools for CTO mode workflows -- Support for both technical (GIS/Weather) and business (COO) ETL workflows -- Scalable architecture with optional GPU support -- Data output capability to MinIO buckets for business use - -## Resource Requirements -- Development time: 3-4 weeks for complete implementation -- Storage: Additional container images (est. 3-6GB each) -- Compute: Higher requirements for processing (can be isolated to CTO mode) -- Optional: GPU resources for performance-intensive tasks - -## Expected Outcomes -- Improved capability for spatial and weather data analysis -- Consistent environments across development and production systems -- Better integration with documentation workflows -- Faster setup for ETL projects (both technical and business) -- Efficient processing of large datasets using DuckDB and Parquet -- Proper data output to MinIO buckets for business use -- Reduced technical debt through consistent patterns - -## Implementation Timeline -- Week 1: Base GIS container with PostGIS, DuckDB, and data format support -- Week 2: Base Weather container with GRIB support and API integration -- Week 3: Advanced processing containers with Jupyter and visualization -- Week 4: Optional GPU variants and MinIO integration testing - -## Approval Request -Please review and approve this proposal to proceed with implementation of the GIS and weather data processing containers that will support your infrastructure planning and balloon path prediction work. \ No newline at end of file diff --git a/collab/questions/gis-weather-questions.md b/collab/questions/gis-weather-questions.md deleted file mode 100644 index 26f3266..0000000 --- a/collab/questions/gis-weather-questions.md +++ /dev/null @@ -1,87 +0,0 @@ -# GIS and Weather Data Processing - Initial Questions - -## Core Questions - -1. What specific GIS formats and operations are most critical for your current projects? - -Well I am not entirely sure. I am guessing that I'll need to pull in shapefiles ? I will be working with an -entirely self hosted GIS stack (not Google maps or anything). I know things exist like gdal ? tippacanoe? - -I think things like parquet as well. Maybe duckdb? - -Reference these posts: - -https://tech.marksblogg.com/american-solar-farms.html -https://tech.marksblogg.com/canadas-odb-buildings.html -https://tech.marksblogg.com/ornl-fema-buildings.html - -FOr the type of workflows that I would like to run. - -Extract patterns/architecture/approaches along with the specific reductions to practice. - -2. What weather data sources and APIs do you currently use or plan to use? - -None currently. But I'll be hacking/forking a system to predict balloon paths. I suspect I'll need to process grib data. -Also probably use the NOAA and european equivalant APIs? Maybe some bulk HTTP/FTP download? - -3. Are there any specific performance requirements for processing large datasets? - -I suspect I'll do some early prototyping with small data sets on my workstation and then running the container with the real data sets on my big ram/cpu/disk servers. - - -4. Do you need integration with specific databases (PostGIS, etc.)? - -Yes I will be heavily using PostGIS for sure. - -## Technical Questions - -1. Should we include both Python and R stacks in the same containers or separate them? - -I am not sure? Whatever you think is best? - -2. What level of visualization capability is needed (command-line, web-based, desktop)? - -All of those I think. I want flexibility. - - -3. Are there any licensing constraints or requirements to consider? - -I will be working only with public data sets. - - -4. Do you need GPU support for any processing tasks? - -Yes but make it optional. I dont want to be blocked with GPU complexity right now. - - -## Integration Questions - -1. How should GIS/Weather outputs integrate with documentation workflows? - -I will be using the GIS/Weather In CTO mode only. I will also be using documentation in CTO mode with it. - -I think, for now, they can be siblings but not have strong integration. - -**ANSWER**: GIS/Weather and documentation containers will operate as siblings in CTO mode, with loose integration for now. - -2. Do you need persistent data storage within containers? - -I do not think so. I will use docker compose to pass in directory paths . - -Oh I will want to push finsihed data to minio buckets. - -I don't know how to best architect my ETL toolbox.... I will mostly be doing ETL on GIS/Weather data but I can see also needing todo other business type ETL workflows in COO mode. - -**ANSWER**: Use Docker compose volume mounts for data input/output. Primary output destination will be MinIO buckets for business use. ETL toolbox should handle both GIS/Weather (CTO) and business (COO) workflows. - -3. What level of integration with existing documentation containers is desired? - -**ANSWER**: Sibling relationship with loose integration. Both will be used in CTO mode but for different purposes. - -4. Are there specific deployment environments to target (local, cloud, edge)? - -Well the ultimate goal is some data sets get pushed to minio buckets for use by various lines of business. - -This is all kind of new to me. I am a technical operations/system admin and easing my way into devops/sre and swe. - -**ANSWER**: Primarily local deployment (workstation for prototyping, large servers for production). Data output to MinIO for business use. Targeting self-hosted environments for full control and privacy. \ No newline at end of file