ProDG Mainframe — Architecture Document

Version: 1.0.0
Date: 2026-04-27
Author: Hermes Agent (prodg-mainframe deployment)
Classification: Internal — CEO/CIO Eyes Only


1. Executive Summary

The ProDG Mainframe is a self-hosted, containerized infrastructure platform serving as the primary orchestration and data layer for ProDG Studio operations. It runs on a Hetzner VPS (65.109.89.215) and provides:

  • Identity & Network: Self-hosted Headscale Tailnet with Caddy TLS termination
  • Secrets Management: Infisical (self-hosted) + Vaultwarden (password vault)
  • Data Layer: PostgreSQL 16, Redis 7, MinIO S3-compatible object storage
  • Observability: Prometheus, Grafana, Loki, Promtail, node-exporter, cadvisor
  • Agent Orchestration: Hermes API (FastAPI) with 3-tier trust model
  • Offsite Backups: Backblaze B2 via rclone, daily automated

2. Host Specifications

AttributeValue
Hostmainframe.prodg.studio
Public IP65.109.89.215
OSUbuntu 24.04.3 LTS
Kernel6.8.0-100-generic
Disk436 GB RAID (19G used, 396G free)
Docker Networkcompose_prodg-internal (172.19.0.0/16)
Tailscale IP100.64.0.1
SSH Keysmitch-laptop (ed25519), Mainframe key

3. Service Inventory

3.1 Core Infrastructure

ServiceContainerImageInternal PortHost BindingExternal Domain
Caddycaddycaddy:280, 443, 20190.0.0.0All TLS domains
Headscaleheadscaleheadscale/headscale:0.23.08080, 9090, 3478/udp0.0.0.0headscale.prodg.studio
PostgreSQLpostgrespostgres:16-alpine5432127.0.0.1
Redisredisredis:7-alpine6379127.0.0.1

3.2 Application Services

ServiceContainerImageInternal PortHost BindingExternal Domain
Infisicalinfisicalinfisical/infisical:latest-postgres8080127.0.0.1:8082secrets.mainframe.prodg.studio
Vaultwardenvaultwardenvaultwarden/server:latest80127.0.0.1:8083vault.mainframe.prodg.studio
MinIOminiominio/minio:latest9000, 9001127.0.0.1s3.mainframe.prodg.studio
Hermes APIhermes-apiprodg/hermes-api:latest8000127.0.0.1api.mainframe.prodg.studio

3.3 Observability Stack

ServiceContainerImageInternal PortHost BindingExternal Domain
Prometheusprometheusprom/prometheus:latest9090— (internal only)
Grafanagrafanagrafana/grafana:latest3000127.0.0.1metrics.mainframe.prodg.studio
Lokilokigrafana/loki:latest3100
Promtailpromtailgrafana/promtail:latest9080
node-exporternode-exporterprom/node-exporter:latest9100
cadvisorcadvisorgcr.io/cadvisor/cadvisor:latest8080

4. Network Architecture

4.1 Public Ingress (Caddy)

Internet → Cloudflare DNS (grey cloud) → 65.109.89.215:80/443 → Caddy → Internal Docker Network

All public domains terminate TLS at Caddy using Let’s Encrypt ACME HTTP-01 challenges.

4.2 Tailnet (Headscale)

Tailscale Clients → headscale.prodg.studio:443 → Caddy → Headscale:8080
  • Control plane: HTTPS via Caddy reverse proxy
  • DERP/STUN: UDP 3478 direct from Headscale container
  • Metrics: HTTP 9090 (scraped by Prometheus internally)

4.3 Docker Internal Network

  • Network: compose_prodg-internal (bridge, 172.19.0.0/16)
  • Services communicate via container names as hostnames
  • Host-bound ports (127.0.0.1) are NOT exposed to the internet
  • Only Caddy (80, 443) and Headscale (8080, 9090, 3478) bind to 0.0.0.0

5. DNS Records (Cloudflare)

RecordTypeValueProxy
headscale.prodg.studioA65.109.89.215DNS-only (☀️ grey)
*.mainframe.prodg.studioA65.109.89.215DNS-only (☀️ grey)

Critical: Orange-cloud (proxied) mode MUST remain disabled for all infrastructure records. Caddy requires direct IP reachability for ACME HTTP-01 challenges.


6. Certificate Management

  • Provider: Let’s Encrypt (staging for test, production for live)
  • Automation: Caddy handles issuance and renewal automatically
  • Storage: /opt/prodg/data/caddy/ (persistent volume)
  • Domains secured: All 7 public endpoints

7. Secrets Architecture

7.1 Current State (Transitional)

Secrets are stored in /opt/prodg/compose/.env (600 permissions, root-only). Planned migration to Infisical (Phase 9) will eliminate this file.

7.2 Key Secrets

SecretLocationPurpose
POSTGRES_PASSWORD.envDatabase authentication
REDIS_PASSWORD.envRedis AUTH
INFISICAL_ENCRYPTION_KEY.envInfisical data encryption
INFISICAL_AUTH_SECRET.envInfisical session/JWT
VAULTWARDEN_ADMIN_TOKEN.envVaultwarden admin panel
MINIO_ROOT_PASSWORD.envMinIO root credentials
GRAFANA_ADMIN_PASSWORD.envGrafana login
TELEGRAM_BOT_TOKEN.envAlert notifications
HERMES_API_TOKEN.envAPI authentication
B2_KEY_SECRET.envBackblaze B2 application key

8. Backup Architecture

8.1 Backup Scope

TargetMethodFrequencyRetentionDestination
PostgreSQLpg_dumpall + gzipDaily 03:00 UTC30 daysB2 MainframeBackup/postgres/
MinIO datatar archiveDaily 03:00 UTC30 daysB2 MainframeBackup/minio/
Compose configstar archiveDaily 03:00 UTC90 daysB2 MainframeBackup/configs/

8.2 Automation

  • Tool: rclone v1.60.1
  • Scheduler: Hermes cron job (job_id: 44256d53266e)
  • Notification: Telegram group on success/failure
  • Log: /var/log/prodg-backup.log

9. Agent Trust Tiers

TierNameRuntimeCapabilities
T1InternalHost / Docker socketFull infra, orchestration, dispatch
T2TrustedOn-box containersResearch, safe inference
T3UntrustedRemote Tailscale nodes / ModalBurst inference, untrusted code

10. File System Layout

/opt/prodg/
├── backups/
│   ├── postgres/              # Local PG dumps
│   ├── scripts/
│   │   ├── backup-all.sh
│   │   ├── backup-postgres.sh
│   │   ├── backup-minio.sh
│   │   ├── backup-configs.sh
│   │   └── update-rclone-conf.sh
│   └── .phase*.env             # Phase environment files (legacy)
├── compose/
│   ├── docker-compose.yml      # Stack definition
│   ├── Caddyfile               # Reverse proxy rules
│   ├── .env                    # Consolidated secrets
│   ├── prometheus/
│   │   └── prometheus.yml
│   ├── grafana/
│   │   └── provisioning/
│   │       ├── datasources/
│   │       │   ├── prometheus.yml
│   │       │   └── loki.yml
│   │       └── alerting/
│   │           ├── contactpoints.yml
│   │           ├── notificationpolicies.yml
│   │           └── rules.yml
│   ├── loki/
│   │   └── loki.yml
│   ├── promtail/
│   │   └── promtail.yml
│   └── headscale/
│       └── config/
│           └── config.yaml
├── data/
│   ├── caddy/                  # TLS certificates
│   ├── caddy-config/
│   ├── grafana/                # Dashboards + SQLite
│   ├── loki/                   # Log chunks + index
│   ├── minio/                  # Object storage data
│   ├── postgres/               # PostgreSQL data
│   ├── redis/                  # Redis AOF + RDB
│   └── vaultwarden/            # Password vault data
├── hermes-api/
│   ├── Dockerfile
│   ├── .dockerignore
│   └── app/
│       └── main.py             # FastAPI orchestrator
└── scripts/
    └── postgres-init/          # DB initialization scripts

11. Prometheus Scrape Targets (All UP)

JobTargetEndpoint
caddycaddy:2019/metrics
cadvisorcadvisor:8080/metrics
grafanagrafana:3000/metrics
headscale-metricsheadscale:9090/metrics
hermes-apihermes-api:8000/metrics
lokiloki:3100/metrics
node-exporternode-exporter:9100/metrics
prometheuslocalhost:9090/metrics
promtailpromtail:9080/metrics

12. Known Issues & Technical Debt

  1. Grafana Caddyfile warning — Non-blocking: Caddyfile input is not formatted; run 'caddy fmt --overwrite'
  2. Grafana plugin installer errors — Non-blocking: permission denied on bundled plugin dir
  3. Caddy admin API on 0.0.0.0:2019 — Required for Prometheus scraping; mitigated by Docker network isolation
  4. Infisical/Vaultwarden running as root — Should migrate to prodg service user (Phase 9)
  5. Modal dispatch is a stub — Requires Modal SDK integration for production use
  6. rclone.conf stored on host — Auto-generated from .env; will be migrated to Infisical secret injection

13. Firewall (UFW)

Status: active
To                         Action      From
--                         ------      ----
22/tcp                     ALLOW       Anywhere
80/tcp                     ALLOW       Anywhere
443/tcp                    ALLOW       Anywhere
8080/tcp                   ALLOW       Anywhere

Headscale ports 8080 (control) and 3478 (DERP) are directly exposed. Prometheus 9090 is NOT exposed to host.


Document Version: 1.0.0 — Generated by Hermes Agent