Data Platform Architecture

Hetzner · Medallion Architecture · Multi-Tenant · EU Self-Hosted

● GDPR COMPLIANT
Sources
Event Stream
Bronze
Silver
Gold
Applications
Infrastructure
GDPR / Security
Alerting
Data Quality
- - - Dashed = Planned
Sources
Simplisan ERP
Primary data source. Crawler extracts patient, business & tenant data.
BATCH · HOURLY
Simplisan API
Future: REST API integration replacing Crawler for direct data fetch.
PLANNED
Crawler
Extracts data from Simplisan, runs on schedule via Dagster.
DAGSTER MANAGED
patients_application
Multi-tenant patient portal with subdomain per tenant (e.g., praxis-mueller.app.com). Emits events on address change, appointment booking.
SUBDOMAIN · EVENT SOURCE
Ingestion
Hetzner AX52 ×3
#1: Dagster, dbt, Crawler. #2: Redpanda, MinIO. #3: PostHog (ClickHouse + Kafka). All in Germany, GDPR compliant.
EU · GDPR · €225/mo
Redpanda
Kafka-compatible event streaming. Handles patient events in real-time. Includes Schema Registry.
NEAR REAL-TIME
Why Redpanda?
Simpler than Kafka, single binary, built-in Schema Registry, lower ops overhead for small teams.
Schema Registry
Validates event schemas (Avro/JSON). Rejects unknown schemas to dead-letter topic.
BUILT INTO REDPANDA
Dead Letter Topic
Failed/rejected events land here. Alerts fire immediately via Grafana.
AUTO-ALERT
Schema Diff (Crawler)
Compares incoming Simplisan schema vs last known. Additive → auto-accept + log. Breaking → halt + alert.
DAGSTER SENSOR
Schema change handling:
• New column → auto-accept, alert
• Rename/delete → halt pipeline
• Requires manual approval to proceed
Bronze (Raw)
MinIO (S3-compat)
Object storage on Hetzner. Stores raw data as Apache Iceberg tables (Parquet).
ICEBERG · SCHEMA-ON-READ
Why Iceberg?
Native schema evolution — add, rename, drop columns without breaking readers. Time-travel for audits.
bronz_data
All raw data lands here: Crawler batch + Redpanda events. Append-only, immutable. tenant_id column on every record.
APPEND-ONLY
PII Tagging
Columns containing PII are tagged at bronze level. Feeds GDPR deletion pipeline.
GDPR
bronz quality gate silver
dbt Tests + Soda
Schema contracts, null checks, referential integrity, row count anomalies. Fails loudly on violations.
GATE: BRONZ→SILVER
Silver (Clean)
silver_app
Cleaned patient & business data. Deduped, type-cast, normalized. dbt models with contracts enforced.
DBT CONTRACTS
silver_bi
BI-oriented transforms. Aggregations, joins, pre-computed metrics for analytics.
DBT CONTRACTS
Pseudonymization
PII fields hashed/masked in silver. Reversible only with key stored in Vault.
GDPR · SILVER
silver quality gate gold
dbt Tests + Soda
Business logic validation, cross-table consistency, tenant data isolation checks.
GATE: SILVER→GOLD
Gold (Business)
Postgres (Ubicloud Managed)
Ubicloud on Hetzner Germany. Gold tables served to applications. Schema-per-tenant isolation. Automated backups, PITR, encryption at rest.
TENANT ISOLATED · MANAGED
App-Serving Gold
gold_app_patients
Patient records, appointments, addresses. Read-only views per tenant schema.
READ ONLY
gold_app_business
Business/practice data per tenant.
READ ONLY
BI-Serving Gold
gold_bi_tennent
Tenant-level BI metrics and KPIs.
READ ONLY
gold_bi_market
Cross-tenant market analytics (anonymized).
READ ONLY · ANON
Applications
patients_application
Shared app, subdomain per tenant (e.g., praxis-mueller.app.com). Filters by tenant_id internally. Emits events to Redpanda on address change / appointment booking.
SUBDOMAIN · MULTI-TENANT · EVENTS
tennent_business_application
Tenant admin/business dashboard. Reads from gold_app_business.
READ ONLY
tennent_bi_application
Tenant BI dashboard. Reads from gold_bi_tennent.
READ ONLY
market_bi_application
Market analytics. Reads anonymized cross-tenant data from gold_bi_market.
READ ONLY · ANON
PostHog (Self-Hosted)
Dedicated Hetzner AX52. Runs ClickHouse + Kafka + Postgres internally. Tracks tenant-level feature adoption and patient-level behavior (anonymized). All events tagged with tenant_id.
SELF-HOSTED · GDPR · AX52 #3
Why self-hosted?
PostHog Cloud sends data to US servers. Patient behavior data under GDPR must stay in EU. Self-hosting on Hetzner Germany = full control.
Cross-Cutting Infrastructure
Dagster (Orchestrator)
Manages all pipelines: Crawler schedule, dbt runs, quality gates, schema diff sensors. Native dbt integration. Asset lineage shows blast radius of schema changes.
ORCHESTRATION
Grafana + Alertmanager
Monitoring dashboards. Alerts on: pipeline failures, schema changes, quality gate failures, dead-letter events, SLA breaches.
MONITORING
GDPR Deletion Pipeline
Tenant/patient deletion requests. Traverses all layers (bronze → silver → gold) using PII tags. Dagster-orchestrated. Audit log retained.
RIGHT TO ERASURE
Tenant Isolation
Bronze/Silver: tenant_id column + row-level security. Gold: schema-per-tenant in Postgres. Verified by cross-cutting dbt tests.
ACCESS CONTROL
DataHub / OpenLineage
Data catalog and lineage tracking. Trace any record from Simplisan → bronze → silver → gold → application.
LINEAGE
dbt (Transformations)
All bronze→silver→gold transforms. Column-level contracts enforced. If Simplisan renames/deletes a column, dbt fails loudly — never silently passes NULLs.
SCHEMA CONTRACTS
Data Flow Paths
Batch Path (Hourly)
Simplisan → Crawler → Schema Diff → bronz_data → Quality Gate → silver_app / silver_bi → Quality Gate → gold_* → Applications
DAGSTER SCHEDULED
Event Path (Real-Time)
patients_application → Redpanda → Schema Registry validates → bronz_data → Quality Gate → silver_app → Quality Gate → gold_app_patients → patients_application
REDPANDA STREAMED
Failure Path
Schema violation → Dead Letter Topic / Pipeline Halt → Grafana Alert → Team notification (Slack/Email) → Manual review → Approve or fix → Resume
NEVER SILENT
GDPR Deletion Path
Deletion request → Dagster triggers traversal → PII-tagged columns in bronze (Iceberg delete) → silver (recompute) → gold (cascade) → Audit log
RIGHT TO ERASURE
Monthly Cost Breakdown
~€440–510/mo total
Component Service Cost/mo
AX52 #1 — Compute Dagster, dbt, Crawler €75
AX52 #2 — Streaming + Storage Redpanda, MinIO €75
AX52 #3 — Analytics PostHog self-hosted (ClickHouse, Kafka, PG) €75
Object Storage (3 TB) Hetzner Object Storage (Bronze/Silver) €25
Managed Postgres Ubicloud on Hetzner (Gold layer) €80–150
All Software Dagster, dbt, Redpanda, Iceberg, Grafana, Soda, PostHog €0
DNS + Misc Domain, wildcard SSL (*.app.com), misc €10
Claude AI Claude Pro — €100/mo plan (AI-assisted dev & ops) €100
Total Monthly Cost €440–510
AWS Equivalent: €2,000–4,000/mo
RDS Postgres (~€400), MSK/Kafka (~€500), EC2 ×3 (~€450), S3 + transfer (~€200), managed Airflow (~€300+), PostHog Cloud (~€450). You're saving 5–10x.
5–10x SAVINGS
Scaling Milestones
• 5+ TB data → add 4th AX52 (+€75/mo)
• Heavy Postgres load → upgrade Ubicloud tier (+€50–100)
• Want managed streaming → Redpanda Cloud (+€100–200)
• DataHub lineage server → small VM (+€10–15)
GROWTH PATH
All prices excl. 19% German VAT
Post-April 2026 Hetzner pricing. Hetzner Object Storage at €6.49 base + pay-as-you-go. Ubicloud Postgres on Hetzner Germany region.
Hypothetical: What If You Had to Hire This Team in Germany?
Role Gross Salary/yr
(incl. employer contrib.)
Head of Product
Glassdoor DE, 251 submissions — Jan 2026
€88k–140k
Senior Fullstack Engineer
Glassdoor DE + Levels.fyi DE — 2025/26
€62k–95k
Senior Data / Cloud Engineer
TechPays EU + Glassdoor DE — 2025/26 · +20–40% for modern stack (dbt, Dagster)
€68k–100k
3-person team total €218k–335k/yr
Gross salary = full cost to employer, incl. employer social contributions (~21%).
Sources: Glassdoor.de · Levels.fyi/Germany · TechPays.com/europe/germany
For context
Hiring this team at market rate in Germany runs €218k–335k/yr — roughly €18k–28k/mo in payroll alone. The full infra stack is €440–510/mo.
MARKET REFERENCE