feat(output): add wrapper and failed conversion handling

This commit is contained in:
2025-10-15 13:39:37 -05:00
parent 7afec09482
commit d7682ee87d
11 changed files with 324 additions and 0 deletions

14
AGENTS.md Normal file
View File

@@ -0,0 +1,14 @@
# Agent Overview
This repository splits work into two coordinated areas:
- `input/` upstream tooling that prepares role-specific Markdown resumes.
- `output/` the conversion pipeline that renders those Markdown files into DOCX/PDF deliverables.
Agents should treat these areas independently so changes can be reasoned about and tested in isolation.
## Working Guidelines
- Keep shared instructions in this file minimal; place deeper guidance in `input/AGENTS.md` or `output/AGENTS.md` as appropriate.
- When making automated edits, avoid touching both `input/` and `output/` in the same change set unless the work explicitly spans both pipelines.
- Resume conversion templates live under `input/templates`. Output services mount them read-only; update templates from the input side and verify with a fresh conversion run.
- Use conventional commits (`<type>(scope): <message>`) to signal which side of the system a change targets, e.g., `feat(output): add failed-processing bucket`.

18
output/AGENTS.md Normal file
View File

@@ -0,0 +1,18 @@
# Output Agent Guide
## Scope
The `output/` tree houses the delivery pipeline that watches for approved Markdown resumes, converts them to DOCX/PDF using Pandoc, and archives the source material. It is intended to run independently from the `input/` authoring workflow.
## Key Components
- `ForRelease/inbox`: manually populated with a single vetted `.md` resume for conversion.
- `ForRelease/outbox`: timestamped folders containing generated DOCX/PDF pairs ready for sharing.
- `ForRelease/processed`: timestamped archives of Markdown files that converted successfully.
- `ForRelease/failed`: Markdown originals for conversion attempts that Pandoc could not render.
- `Docker/`: container definition, watcher script, and wrapper to run the stack without root-owned outputs.
## Operational Rules
- Always launch the service with `Docker/run-output-processor.sh` so the container inherits the callers UID/GID.
- Before testing, ensure `ForRelease/inbox` is empty; this watcher expects at most one Markdown file at a time.
- Monitor logs via `./run-output-processor.sh logs -f` while converting to confirm the Markdown leaves inbox and the exports appear in outbox.
- If Pandoc fails, the Markdown moves to `ForRelease/failed`; fix the content there, then move it back to `inbox` for another run.
- Only remove history from `outbox/` or `processed/` after you are certain the artifacts are no longer needed.

26
output/Docker/Dockerfile Normal file
View File

@@ -0,0 +1,26 @@
FROM debian:bookworm
ENV DEBIAN_FRONTEND=noninteractive \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
RUN apt-get update \
&& apt-get install --yes --no-install-recommends \
python3 \
python3-venv \
gosu \
pandoc \
texlive-full \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY watch_and_convert.py entrypoint.sh ./
RUN chmod +x /app/entrypoint.sh /app/watch_and_convert.py
ENV PUID=1000 \
PGID=1000
ENTRYPOINT ["/app/entrypoint.sh"]

View File

@@ -0,0 +1,19 @@
name: ResumeCustomizer-OutputProcessor
services:
resumecustomizer-outputprocessor:
build:
context: .
dockerfile: Dockerfile
container_name: ResumeCustomizer-OutputProcessor
restart: always
environment:
PUID: "${LOCAL_UID:-1000}"
PGID: "${LOCAL_GID:-1000}"
volumes:
- ../ForRelease/inbox:/data/inbox
- ../ForRelease/outbox:/data/outbox
- ../ForRelease/processed:/data/processed
- ../ForRelease/failed:/data/failed
- ../../input/templates:/templates:ro
- /etc/localtime:/etc/localtime:ro

18
output/Docker/entrypoint.sh Executable file
View File

@@ -0,0 +1,18 @@
#!/usr/bin/env bash
set -euo pipefail
PUID=${PUID:-1000}
PGID=${PGID:-1000}
if ! command -v gosu >/dev/null 2>&1; then
echo "gosu is required but not installed" >&2
exit 1
fi
if [ -d /data ]; then
chown -R "${PUID}:${PGID}" /data
fi
export HOME=${HOME:-/tmp}
exec gosu "${PUID}:${PGID}" python3 /app/watch_and_convert.py

View File

@@ -0,0 +1,28 @@
#!/usr/bin/env bash
# Wrapper to run docker compose with the caller's UID/GID so generated files stay writable.
set -euo pipefail
if ! command -v docker >/dev/null 2>&1; then
echo "Error: docker is not installed or not on PATH." >&2
exit 1
fi
if docker compose version >/dev/null 2>&1; then
COMPOSE_CMD=(docker compose)
elif command -v docker-compose >/dev/null 2>&1; then
COMPOSE_CMD=(docker-compose)
else
echo "Error: docker compose plugin or docker-compose binary is required." >&2
exit 1
fi
CALLER_UID=$(id -u)
CALLER_GID=$(id -g)
SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
# Run docker compose from the Docker directory so it picks up the bundled yaml.
(
cd "${SCRIPT_DIR}"
LOCAL_UID="${CALLER_UID}" LOCAL_GID="${CALLER_GID}" "${COMPOSE_CMD[@]}" "$@"
)

View File

@@ -0,0 +1,159 @@
#!/usr/bin/env python3
"""
Monitor the inbox directory for Markdown files and convert them to DOCX/PDF outputs.
The script runs indefinitely inside the container, polling the inbox for new files.
When a Markdown file is found, pandoc generates DOCX and PDF outputs using the
reference templates, places the results in a timestamped outbox path, and moves the
original Markdown file into the processed directory.
"""
import logging
import shutil
import subprocess
import time
from datetime import datetime
from pathlib import Path
INBOX = Path("/data/inbox")
OUTBOX = Path("/data/outbox")
PROCESSED = Path("/data/processed")
FAILED = Path("/data/failed")
TEMPLATES = Path("/templates")
DOCX_TEMPLATE = TEMPLATES / "resume-reference.docx"
TEX_TEMPLATE = TEMPLATES / "resume-template.tex"
POLL_INTERVAL_SECONDS = 5
def ensure_environment() -> None:
"""Verify required files and directories exist before processing starts."""
missing = []
for path in (INBOX, OUTBOX, PROCESSED, FAILED, DOCX_TEMPLATE, TEX_TEMPLATE):
if not path.exists():
missing.append(str(path))
if missing:
raise FileNotFoundError(
"Required paths are missing inside the container: " + ", ".join(missing)
)
def run_pandoc(input_md: Path, output_docx: Path, output_pdf: Path) -> None:
"""Invoke pandoc twice to create DOCX and PDF artifacts."""
subprocess.run(
[
"pandoc",
str(input_md),
"--from",
"gfm",
"--to",
"docx",
"--reference-doc",
str(DOCX_TEMPLATE),
"--output",
str(output_docx),
],
check=True,
)
subprocess.run(
[
"pandoc",
str(input_md),
"--from",
"gfm",
"--pdf-engine",
"xelatex",
"--template",
str(TEX_TEMPLATE),
"--output",
str(output_pdf),
],
check=True,
)
def build_timestamp_dir(base: Path, timestamp: datetime) -> Path:
"""Create (if needed) and return the timestamped directory under base."""
subdir = (
base
/ timestamp.strftime("%Y")
/ timestamp.strftime("%m")
/ timestamp.strftime("%d")
/ timestamp.strftime("%H%M")
)
subdir.mkdir(parents=True, exist_ok=True)
return subdir
def process_markdown(md_file: Path) -> None:
"""Convert the Markdown file and move it into the processed directory."""
timestamp = datetime.now().astimezone()
out_dir = build_timestamp_dir(OUTBOX, timestamp)
processed_dir = build_timestamp_dir(PROCESSED, timestamp)
stem = md_file.stem
output_docx = out_dir / f"{stem}.docx"
output_pdf = out_dir / f"{stem}.pdf"
logging.info("Processing %s", md_file.name)
run_pandoc(md_file, output_docx, output_pdf)
processed_target = processed_dir / md_file.name
counter = 1
while processed_target.exists():
processed_target = processed_dir / f"{stem}_{counter}.md"
counter += 1
shutil.move(str(md_file), processed_target)
logging.info("Completed %s -> %s (processed archived at %s)", md_file.name, out_dir, processed_target)
def move_to_failed(md_file: Path) -> None:
"""Move the markdown file into the failed directory to avoid repeated retries."""
if not md_file.exists():
return
stem = md_file.stem
failed_target = FAILED / md_file.name
counter = 1
while failed_target.exists():
failed_target = FAILED / f"{stem}_{counter}.md"
counter += 1
FAILED.mkdir(parents=True, exist_ok=True)
shutil.move(str(md_file), failed_target)
logging.info("Archived %s in failed directory at %s", md_file.name, failed_target)
def main() -> None:
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s",
)
ensure_environment()
logging.info("Resume customizer watcher started")
while True:
md_files = sorted(INBOX.glob("*.md"))
if not md_files:
time.sleep(POLL_INTERVAL_SECONDS)
continue
for md_file in md_files:
try:
process_markdown(md_file)
except subprocess.CalledProcessError as exc:
logging.error("Pandoc failed for %s: %s", md_file.name, exc)
move_to_failed(md_file)
except Exception as exc: # noqa: BLE001
logging.exception("Unexpected error while processing %s: %s", md_file.name, exc)
time.sleep(POLL_INTERVAL_SECONDS)
if __name__ == "__main__":
main()

View File

View File

@@ -0,0 +1 @@

View File

@@ -0,0 +1 @@

40
output/README.md Normal file
View File

@@ -0,0 +1,40 @@
# Output Pipeline Overview
This directory contains the post-processing side of ResumeCustomizer. It is responsible for taking job-targeted Markdown resumes produced elsewhere in the system and turning them into printable DOCX/PDF artifacts.
## Directory Layout
- `ForRelease/inbox`: drop a single `*.md` file here to trigger conversion.
- `ForRelease/outbox/YYYY/MM/DD/HHMM`: conversion results (paired `.docx` and `.pdf`) organized by timestamp so repeated runs never overwrite each other.
- `ForRelease/processed/YYYY/MM/DD/HHMM`: archives of Markdown files that converted successfully.
- `ForRelease/failed`: Markdown files that encountered an error during conversion (contains `.gitkeep` to preserve the directory).
- `Docker/`: container definition, watcher script, and helper wrapper that run the conversion daemon.
## Running the Output Processor
Use the wrapper so the container writes files with your UID/GID:
```bash
cd output/Docker
./run-output-processor.sh up -d
```
The script detects either the Docker Compose plugin or the legacy `docker-compose` binary and forwards any additional arguments you supply (`down`, `logs`, etc.).
## What the Watcher Does
1. Polls `ForRelease/inbox` every few seconds for Markdown files.
2. Runs Pandoc using the shared DOCX and LaTeX templates to generate DOCX/PDF.
3. Drops the exports into the timestamped folder under `ForRelease/outbox`.
4. Moves the original Markdown into the matching timestamp folder under `ForRelease/processed`.
5. If the Pandoc conversion fails, moves the Markdown into `ForRelease/failed` so it can be reviewed without blocking subsequent runs.
## Prerequisites
- Docker Engine with either the Compose plugin (`docker compose`) or standalone `docker-compose`.
- Pandoc templates available under `input/templates` relative to the repo root (mounted read-only into the container).
Stop the service with:
```bash
cd output/Docker
./run-output-processor.sh down
```
Log output is available through `./run-output-processor.sh logs -f`.