Add core architecture patterns and GIS/weather components from AIOS-Public

This commit is contained in:
2025-10-16 13:14:30 -05:00
parent 782eec63a5
commit 5887f4e729
32 changed files with 1970 additions and 1 deletions

View File

@@ -0,0 +1,175 @@
# GIS and Weather Data Processing Container Plan
## Overview
This document outlines the plan for creating Docker containers to handle GIS data processing and weather data analysis. These containers will be used exclusively in CTO mode for R&D and data analysis tasks, with integration to documentation workflows and MinIO for data output.
## Requirements
### GIS Data Processing
- Support for Shapefiles and other GIS formats
- Self-hosted GIS stack (not Google Maps or other commercial services)
- Integration with tools like GDAL, Tippecanoe, DuckDB
- Heavy use of PostGIS database
- Parquet format support for efficient data storage
- Based on reference workflows from:
- https://tech.marksblogg.com/american-solar-farms.html
- https://tech.marksblogg.com/canadas-odb-buildings.html
- https://tech.marksblogg.com/ornl-fema-buildings.html
### Weather Data Processing
- GRIB data format processing
- NOAA and European weather APIs integration
- Bulk data download via HTTP/FTP
- Balloon path prediction system (to be forked/modified)
### Shared Requirements
- Python-based with appropriate libraries (GeoPandas, DuckDB, etc.)
- R support for statistical analysis
- Jupyter notebook integration for experimentation
- MinIO bucket integration for data output
- Optional but enabled GPU support for performance
- All visualization types (command-line, web, desktop)
- Flexible ETL capabilities for both GIS/Weather and business workflows
## Proposed Container Structure
### RCEO-AIOS-Public-Tools-GIS-Base
- Foundation container with core GIS libraries
- Python + geospatial stack (GDAL, GEOS, PROJ, DuckDB, Tippecanoe)
- R with spatial packages
- PostGIS client tools
- Parquet support
- File format support (Shapefiles, GeoJSON, etc.)
### RCEO-AIOS-Public-Tools-GIS-Processing
- Extends GIS-Base with advanced processing tools
- Jupyter with GIS extensions
- Specialized ETL libraries
- Performance optimization tools
### RCEO-AIOS-Public-Tools-Weather-Base
- Foundation container with weather data libraries
- GRIB format support (cfgrib)
- NOAA and European API integration tools
- Bulk download utilities (HTTP/FTP)
### RCEO-AIOS-Public-Tools-Weather-Analysis
- Extends Weather-Base with advanced analysis tools
- Balloon path prediction tools
- Forecasting libraries
- Time series analysis
### RCEO-AIOS-Public-Tools-GIS-Weather-Fusion (Optional)
- Combined container for integrated GIS + Weather analysis
- For balloon path prediction using weather data
- High-resource container for intensive tasks
## Technology Stack
### GIS Libraries
- GDAL/OGR for format translation and processing
- GEOS for geometric operations
- PROJ for coordinate transformations
- PostGIS for spatial database operations
- DuckDB for efficient data processing with spatial extensions
- Tippecanoe for tile generation
- Shapely for Python geometric operations
- GeoPandas for Python geospatial data handling
- Rasterio for raster processing in Python
- Leaflet/Mapbox for web visualization
### Data Storage & Processing
- DuckDB with spatial extensions
- Parquet format support
- MinIO client tools for data output
- PostgreSQL client for connecting to external databases
### Weather Libraries
- xarray for multi-dimensional data in Python
- cfgrib for GRIB format handling
- MetPy for meteorological calculations
- Climate Data Operators (CDO) for climate data processing
- R packages: raster, rgdal, ncdf4, rasterVis
### Visualization
- Folium for interactive maps
- Plotly for time series visualization
- Matplotlib/Seaborn for statistical plots
- R visualization packages
- Command-line visualization tools
### ETL and Workflow Tools
- Apache Airflow (optional in advanced containers)
- Prefect or similar workflow orchestrators
- DuckDB for ETL operations
- Pandas/Dask for large data processing
## Container Deployment Strategy
### Workstation Prototyping
- Lighter containers for development and testing
- Optional GPU support
- MinIO client for data output testing
### Production Servers
- Full-featured containers with all processing capabilities
- GPU-enabled variants where applicable
- Optimized for large RAM/CPU/disk requirements
## Security & User Management
- Follow same non-root user pattern as documentation containers
- UID/GID mapping for file permissions
- Minimal necessary privileges
- Proper container isolation
- Secure access to MinIO buckets
## Integration with Existing Stack
- Compatible with existing user management approach
- Can be orchestrated with documentation containers when needed
- Follow same naming conventions
- Use same wrapper script patterns
- Separate from documentation containers but can work together in CTO mode
## Implementation Phases
### Phase 1: Base GIS Container
- Create GIS-Base with GDAL, DuckDB, PostGIS client tools
- Implement Parquet and Shapefile support
- Test with sample datasets from reference posts
- Validate MinIO integration
### Phase 2: Weather Base Container
- Create Weather-Base with GRIB support
- Integrate NOAA and European API tools
- Implement bulk download capabilities
- Test with weather data sources
### Phase 3: Processing Containers
- Create GIS-Processing container with ETL tools
- Create Weather-Analysis container with prediction tools
- Add visualization and Jupyter support
- Implement optional GPU support
### Phase 4: Optional Fusion Container
- Combined container for balloon path prediction
- Integration of GIS and weather data
- High-complexity, high-resource usage
## Data Flow Architecture
- ETL workflows for processing public datasets
- Output to MinIO buckets for business use
- Integration with documentation tools for CTO mode workflows
- Support for both GIS/Weather ETL (CTO) and business ETL (COO)
## Next Steps
1. Review and approve this enhanced plan
2. Begin Phase 1 implementation
3. Test with sample data from reference workflows
4. Iterate based on findings
## Risks & Considerations
- Large container sizes due to GIS libraries and dependencies
- Complex dependency management, especially with DuckDB and PostGIS
- Computational resource requirements, especially for large datasets
- GPU support implementation complexity
- Bulk data download and processing performance