feat(output): add wrapper and failed conversion handling
This commit is contained in:
14
AGENTS.md
Normal file
14
AGENTS.md
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
# Agent Overview
|
||||||
|
|
||||||
|
This repository splits work into two coordinated areas:
|
||||||
|
|
||||||
|
- `input/` – upstream tooling that prepares role-specific Markdown resumes.
|
||||||
|
- `output/` – the conversion pipeline that renders those Markdown files into DOCX/PDF deliverables.
|
||||||
|
|
||||||
|
Agents should treat these areas independently so changes can be reasoned about and tested in isolation.
|
||||||
|
|
||||||
|
## Working Guidelines
|
||||||
|
- Keep shared instructions in this file minimal; place deeper guidance in `input/AGENTS.md` or `output/AGENTS.md` as appropriate.
|
||||||
|
- When making automated edits, avoid touching both `input/` and `output/` in the same change set unless the work explicitly spans both pipelines.
|
||||||
|
- Resume conversion templates live under `input/templates`. Output services mount them read-only; update templates from the input side and verify with a fresh conversion run.
|
||||||
|
- Use conventional commits (`<type>(scope): <message>`) to signal which side of the system a change targets, e.g., `feat(output): add failed-processing bucket`.
|
||||||
18
output/AGENTS.md
Normal file
18
output/AGENTS.md
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
# Output Agent Guide
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
The `output/` tree houses the delivery pipeline that watches for approved Markdown resumes, converts them to DOCX/PDF using Pandoc, and archives the source material. It is intended to run independently from the `input/` authoring workflow.
|
||||||
|
|
||||||
|
## Key Components
|
||||||
|
- `ForRelease/inbox`: manually populated with a single vetted `.md` resume for conversion.
|
||||||
|
- `ForRelease/outbox`: timestamped folders containing generated DOCX/PDF pairs ready for sharing.
|
||||||
|
- `ForRelease/processed`: timestamped archives of Markdown files that converted successfully.
|
||||||
|
- `ForRelease/failed`: Markdown originals for conversion attempts that Pandoc could not render.
|
||||||
|
- `Docker/`: container definition, watcher script, and wrapper to run the stack without root-owned outputs.
|
||||||
|
|
||||||
|
## Operational Rules
|
||||||
|
- Always launch the service with `Docker/run-output-processor.sh` so the container inherits the caller’s UID/GID.
|
||||||
|
- Before testing, ensure `ForRelease/inbox` is empty; this watcher expects at most one Markdown file at a time.
|
||||||
|
- Monitor logs via `./run-output-processor.sh logs -f` while converting to confirm the Markdown leaves inbox and the exports appear in outbox.
|
||||||
|
- If Pandoc fails, the Markdown moves to `ForRelease/failed`; fix the content there, then move it back to `inbox` for another run.
|
||||||
|
- Only remove history from `outbox/` or `processed/` after you are certain the artifacts are no longer needed.
|
||||||
26
output/Docker/Dockerfile
Normal file
26
output/Docker/Dockerfile
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
FROM debian:bookworm
|
||||||
|
|
||||||
|
ENV DEBIAN_FRONTEND=noninteractive \
|
||||||
|
PYTHONDONTWRITEBYTECODE=1 \
|
||||||
|
PYTHONUNBUFFERED=1
|
||||||
|
|
||||||
|
RUN apt-get update \
|
||||||
|
&& apt-get install --yes --no-install-recommends \
|
||||||
|
python3 \
|
||||||
|
python3-venv \
|
||||||
|
gosu \
|
||||||
|
pandoc \
|
||||||
|
texlive-full \
|
||||||
|
&& apt-get clean \
|
||||||
|
&& rm -rf /var/lib/apt/lists/*
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
COPY watch_and_convert.py entrypoint.sh ./
|
||||||
|
|
||||||
|
RUN chmod +x /app/entrypoint.sh /app/watch_and_convert.py
|
||||||
|
|
||||||
|
ENV PUID=1000 \
|
||||||
|
PGID=1000
|
||||||
|
|
||||||
|
ENTRYPOINT ["/app/entrypoint.sh"]
|
||||||
19
output/Docker/docker-compose.yml
Normal file
19
output/Docker/docker-compose.yml
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
name: ResumeCustomizer-OutputProcessor
|
||||||
|
|
||||||
|
services:
|
||||||
|
resumecustomizer-outputprocessor:
|
||||||
|
build:
|
||||||
|
context: .
|
||||||
|
dockerfile: Dockerfile
|
||||||
|
container_name: ResumeCustomizer-OutputProcessor
|
||||||
|
restart: always
|
||||||
|
environment:
|
||||||
|
PUID: "${LOCAL_UID:-1000}"
|
||||||
|
PGID: "${LOCAL_GID:-1000}"
|
||||||
|
volumes:
|
||||||
|
- ../ForRelease/inbox:/data/inbox
|
||||||
|
- ../ForRelease/outbox:/data/outbox
|
||||||
|
- ../ForRelease/processed:/data/processed
|
||||||
|
- ../ForRelease/failed:/data/failed
|
||||||
|
- ../../input/templates:/templates:ro
|
||||||
|
- /etc/localtime:/etc/localtime:ro
|
||||||
18
output/Docker/entrypoint.sh
Executable file
18
output/Docker/entrypoint.sh
Executable file
@@ -0,0 +1,18 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
PUID=${PUID:-1000}
|
||||||
|
PGID=${PGID:-1000}
|
||||||
|
|
||||||
|
if ! command -v gosu >/dev/null 2>&1; then
|
||||||
|
echo "gosu is required but not installed" >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ -d /data ]; then
|
||||||
|
chown -R "${PUID}:${PGID}" /data
|
||||||
|
fi
|
||||||
|
|
||||||
|
export HOME=${HOME:-/tmp}
|
||||||
|
|
||||||
|
exec gosu "${PUID}:${PGID}" python3 /app/watch_and_convert.py
|
||||||
28
output/Docker/run-output-processor.sh
Executable file
28
output/Docker/run-output-processor.sh
Executable file
@@ -0,0 +1,28 @@
|
|||||||
|
#!/usr/bin/env bash
|
||||||
|
# Wrapper to run docker compose with the caller's UID/GID so generated files stay writable.
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
if ! command -v docker >/dev/null 2>&1; then
|
||||||
|
echo "Error: docker is not installed or not on PATH." >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if docker compose version >/dev/null 2>&1; then
|
||||||
|
COMPOSE_CMD=(docker compose)
|
||||||
|
elif command -v docker-compose >/dev/null 2>&1; then
|
||||||
|
COMPOSE_CMD=(docker-compose)
|
||||||
|
else
|
||||||
|
echo "Error: docker compose plugin or docker-compose binary is required." >&2
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
CALLER_UID=$(id -u)
|
||||||
|
CALLER_GID=$(id -g)
|
||||||
|
|
||||||
|
SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
|
||||||
|
|
||||||
|
# Run docker compose from the Docker directory so it picks up the bundled yaml.
|
||||||
|
(
|
||||||
|
cd "${SCRIPT_DIR}"
|
||||||
|
LOCAL_UID="${CALLER_UID}" LOCAL_GID="${CALLER_GID}" "${COMPOSE_CMD[@]}" "$@"
|
||||||
|
)
|
||||||
159
output/Docker/watch_and_convert.py
Executable file
159
output/Docker/watch_and_convert.py
Executable file
@@ -0,0 +1,159 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Monitor the inbox directory for Markdown files and convert them to DOCX/PDF outputs.
|
||||||
|
|
||||||
|
The script runs indefinitely inside the container, polling the inbox for new files.
|
||||||
|
When a Markdown file is found, pandoc generates DOCX and PDF outputs using the
|
||||||
|
reference templates, places the results in a timestamped outbox path, and moves the
|
||||||
|
original Markdown file into the processed directory.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import logging
|
||||||
|
import shutil
|
||||||
|
import subprocess
|
||||||
|
import time
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
INBOX = Path("/data/inbox")
|
||||||
|
OUTBOX = Path("/data/outbox")
|
||||||
|
PROCESSED = Path("/data/processed")
|
||||||
|
FAILED = Path("/data/failed")
|
||||||
|
TEMPLATES = Path("/templates")
|
||||||
|
|
||||||
|
DOCX_TEMPLATE = TEMPLATES / "resume-reference.docx"
|
||||||
|
TEX_TEMPLATE = TEMPLATES / "resume-template.tex"
|
||||||
|
|
||||||
|
POLL_INTERVAL_SECONDS = 5
|
||||||
|
|
||||||
|
|
||||||
|
def ensure_environment() -> None:
|
||||||
|
"""Verify required files and directories exist before processing starts."""
|
||||||
|
missing = []
|
||||||
|
for path in (INBOX, OUTBOX, PROCESSED, FAILED, DOCX_TEMPLATE, TEX_TEMPLATE):
|
||||||
|
if not path.exists():
|
||||||
|
missing.append(str(path))
|
||||||
|
|
||||||
|
if missing:
|
||||||
|
raise FileNotFoundError(
|
||||||
|
"Required paths are missing inside the container: " + ", ".join(missing)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def run_pandoc(input_md: Path, output_docx: Path, output_pdf: Path) -> None:
|
||||||
|
"""Invoke pandoc twice to create DOCX and PDF artifacts."""
|
||||||
|
subprocess.run(
|
||||||
|
[
|
||||||
|
"pandoc",
|
||||||
|
str(input_md),
|
||||||
|
"--from",
|
||||||
|
"gfm",
|
||||||
|
"--to",
|
||||||
|
"docx",
|
||||||
|
"--reference-doc",
|
||||||
|
str(DOCX_TEMPLATE),
|
||||||
|
"--output",
|
||||||
|
str(output_docx),
|
||||||
|
],
|
||||||
|
check=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
subprocess.run(
|
||||||
|
[
|
||||||
|
"pandoc",
|
||||||
|
str(input_md),
|
||||||
|
"--from",
|
||||||
|
"gfm",
|
||||||
|
"--pdf-engine",
|
||||||
|
"xelatex",
|
||||||
|
"--template",
|
||||||
|
str(TEX_TEMPLATE),
|
||||||
|
"--output",
|
||||||
|
str(output_pdf),
|
||||||
|
],
|
||||||
|
check=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def build_timestamp_dir(base: Path, timestamp: datetime) -> Path:
|
||||||
|
"""Create (if needed) and return the timestamped directory under base."""
|
||||||
|
subdir = (
|
||||||
|
base
|
||||||
|
/ timestamp.strftime("%Y")
|
||||||
|
/ timestamp.strftime("%m")
|
||||||
|
/ timestamp.strftime("%d")
|
||||||
|
/ timestamp.strftime("%H%M")
|
||||||
|
)
|
||||||
|
subdir.mkdir(parents=True, exist_ok=True)
|
||||||
|
return subdir
|
||||||
|
|
||||||
|
|
||||||
|
def process_markdown(md_file: Path) -> None:
|
||||||
|
"""Convert the Markdown file and move it into the processed directory."""
|
||||||
|
timestamp = datetime.now().astimezone()
|
||||||
|
out_dir = build_timestamp_dir(OUTBOX, timestamp)
|
||||||
|
processed_dir = build_timestamp_dir(PROCESSED, timestamp)
|
||||||
|
|
||||||
|
stem = md_file.stem
|
||||||
|
output_docx = out_dir / f"{stem}.docx"
|
||||||
|
output_pdf = out_dir / f"{stem}.pdf"
|
||||||
|
|
||||||
|
logging.info("Processing %s", md_file.name)
|
||||||
|
run_pandoc(md_file, output_docx, output_pdf)
|
||||||
|
|
||||||
|
processed_target = processed_dir / md_file.name
|
||||||
|
counter = 1
|
||||||
|
while processed_target.exists():
|
||||||
|
processed_target = processed_dir / f"{stem}_{counter}.md"
|
||||||
|
counter += 1
|
||||||
|
|
||||||
|
shutil.move(str(md_file), processed_target)
|
||||||
|
logging.info("Completed %s -> %s (processed archived at %s)", md_file.name, out_dir, processed_target)
|
||||||
|
|
||||||
|
|
||||||
|
def move_to_failed(md_file: Path) -> None:
|
||||||
|
"""Move the markdown file into the failed directory to avoid repeated retries."""
|
||||||
|
if not md_file.exists():
|
||||||
|
return
|
||||||
|
|
||||||
|
stem = md_file.stem
|
||||||
|
failed_target = FAILED / md_file.name
|
||||||
|
counter = 1
|
||||||
|
while failed_target.exists():
|
||||||
|
failed_target = FAILED / f"{stem}_{counter}.md"
|
||||||
|
counter += 1
|
||||||
|
|
||||||
|
FAILED.mkdir(parents=True, exist_ok=True)
|
||||||
|
shutil.move(str(md_file), failed_target)
|
||||||
|
logging.info("Archived %s in failed directory at %s", md_file.name, failed_target)
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
logging.basicConfig(
|
||||||
|
level=logging.INFO,
|
||||||
|
format="%(asctime)s [%(levelname)s] %(message)s",
|
||||||
|
)
|
||||||
|
|
||||||
|
ensure_environment()
|
||||||
|
logging.info("Resume customizer watcher started")
|
||||||
|
|
||||||
|
while True:
|
||||||
|
md_files = sorted(INBOX.glob("*.md"))
|
||||||
|
if not md_files:
|
||||||
|
time.sleep(POLL_INTERVAL_SECONDS)
|
||||||
|
continue
|
||||||
|
|
||||||
|
for md_file in md_files:
|
||||||
|
try:
|
||||||
|
process_markdown(md_file)
|
||||||
|
except subprocess.CalledProcessError as exc:
|
||||||
|
logging.error("Pandoc failed for %s: %s", md_file.name, exc)
|
||||||
|
move_to_failed(md_file)
|
||||||
|
except Exception as exc: # noqa: BLE001
|
||||||
|
logging.exception("Unexpected error while processing %s: %s", md_file.name, exc)
|
||||||
|
|
||||||
|
time.sleep(POLL_INTERVAL_SECONDS)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
0
output/ForRelease/failed/.gitkeep
Normal file
0
output/ForRelease/failed/.gitkeep
Normal file
1
output/ForRelease/outbox/.gitkeep
Normal file
1
output/ForRelease/outbox/.gitkeep
Normal file
@@ -0,0 +1 @@
|
|||||||
|
|
||||||
1
output/ForRelease/processed/.gitkeep
Normal file
1
output/ForRelease/processed/.gitkeep
Normal file
@@ -0,0 +1 @@
|
|||||||
|
|
||||||
40
output/README.md
Normal file
40
output/README.md
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
# Output Pipeline Overview
|
||||||
|
|
||||||
|
This directory contains the post-processing side of ResumeCustomizer. It is responsible for taking job-targeted Markdown resumes produced elsewhere in the system and turning them into printable DOCX/PDF artifacts.
|
||||||
|
|
||||||
|
## Directory Layout
|
||||||
|
- `ForRelease/inbox`: drop a single `*.md` file here to trigger conversion.
|
||||||
|
- `ForRelease/outbox/YYYY/MM/DD/HHMM`: conversion results (paired `.docx` and `.pdf`) organized by timestamp so repeated runs never overwrite each other.
|
||||||
|
- `ForRelease/processed/YYYY/MM/DD/HHMM`: archives of Markdown files that converted successfully.
|
||||||
|
- `ForRelease/failed`: Markdown files that encountered an error during conversion (contains `.gitkeep` to preserve the directory).
|
||||||
|
- `Docker/`: container definition, watcher script, and helper wrapper that run the conversion daemon.
|
||||||
|
|
||||||
|
## Running the Output Processor
|
||||||
|
Use the wrapper so the container writes files with your UID/GID:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd output/Docker
|
||||||
|
./run-output-processor.sh up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
The script detects either the Docker Compose plugin or the legacy `docker-compose` binary and forwards any additional arguments you supply (`down`, `logs`, etc.).
|
||||||
|
|
||||||
|
## What the Watcher Does
|
||||||
|
1. Polls `ForRelease/inbox` every few seconds for Markdown files.
|
||||||
|
2. Runs Pandoc using the shared DOCX and LaTeX templates to generate DOCX/PDF.
|
||||||
|
3. Drops the exports into the timestamped folder under `ForRelease/outbox`.
|
||||||
|
4. Moves the original Markdown into the matching timestamp folder under `ForRelease/processed`.
|
||||||
|
5. If the Pandoc conversion fails, moves the Markdown into `ForRelease/failed` so it can be reviewed without blocking subsequent runs.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
- Docker Engine with either the Compose plugin (`docker compose`) or standalone `docker-compose`.
|
||||||
|
- Pandoc templates available under `input/templates` relative to the repo root (mounted read-only into the container).
|
||||||
|
|
||||||
|
Stop the service with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd output/Docker
|
||||||
|
./run-output-processor.sh down
|
||||||
|
```
|
||||||
|
|
||||||
|
Log output is available through `./run-output-processor.sh logs -f`.
|
||||||
Reference in New Issue
Block a user