Sources
Simplisan ERP
Primary data source. Crawler extracts patient, business & tenant data.
BATCH · HOURLY
Simplisan API
Future: REST API integration replacing Crawler for direct data fetch.
PLANNED
Crawler
Extracts data from Simplisan, runs on schedule via Dagster.
DAGSTER MANAGED
patients_application
Multi-tenant patient portal with subdomain per tenant (e.g., praxis-mueller.app.com). Emits events on address change, appointment booking.
SUBDOMAIN · EVENT SOURCE
Ingestion
Hetzner AX52 ×3
#1: Dagster, dbt, Crawler. #2: Redpanda, MinIO. #3: PostHog (ClickHouse + Kafka). All in Germany, GDPR compliant.
EU · GDPR · €225/mo
Redpanda
Kafka-compatible event streaming. Handles patient events in real-time. Includes Schema Registry.
NEAR REAL-TIME
Why Redpanda?
Simpler than Kafka, single binary, built-in Schema Registry, lower ops overhead for small teams.
Schema Registry
Validates event schemas (Avro/JSON). Rejects unknown schemas to dead-letter topic.
BUILT INTO REDPANDA
Dead Letter Topic
Failed/rejected events land here. Alerts fire immediately via Grafana.
AUTO-ALERT
Schema Diff (Crawler)
Compares incoming Simplisan schema vs last known. Additive → auto-accept + log. Breaking → halt + alert.
DAGSTER SENSOR
Schema change handling:
• New column → auto-accept, alert
• Rename/delete → halt pipeline
• Requires manual approval to proceed
Bronze (Raw)
MinIO (S3-compat)
Object storage on Hetzner. Stores raw data as Apache Iceberg tables (Parquet).
ICEBERG · SCHEMA-ON-READ
Why Iceberg?
Native schema evolution — add, rename, drop columns without breaking readers. Time-travel for audits.
bronz_data
All raw data lands here: Crawler batch + Redpanda events. Append-only, immutable. tenant_id column on every record.
APPEND-ONLY
PII Tagging
Columns containing PII are tagged at bronze level. Feeds GDPR deletion pipeline.
GDPR
bronz
→
quality gate
→
silver
dbt Tests + Soda
Schema contracts, null checks, referential integrity, row count anomalies. Fails loudly on violations.
GATE: BRONZ→SILVER
Silver (Clean)
silver_app
Cleaned patient & business data. Deduped, type-cast, normalized. dbt models with contracts enforced.
DBT CONTRACTS
silver_bi
BI-oriented transforms. Aggregations, joins, pre-computed metrics for analytics.
DBT CONTRACTS
Pseudonymization
PII fields hashed/masked in silver. Reversible only with key stored in Vault.
GDPR · SILVER
silver
→
quality gate
→
gold
dbt Tests + Soda
Business logic validation, cross-table consistency, tenant data isolation checks.
GATE: SILVER→GOLD
Applications
patients_application
Shared app, subdomain per tenant (e.g., praxis-mueller.app.com). Filters by tenant_id internally. Emits events to Redpanda on address change / appointment booking.
SUBDOMAIN · MULTI-TENANT · EVENTS
tennent_business_application
Tenant admin/business dashboard. Reads from gold_app_business.
READ ONLY
tennent_bi_application
Tenant BI dashboard. Reads from gold_bi_tennent.
READ ONLY
market_bi_application
Market analytics. Reads anonymized cross-tenant data from gold_bi_market.
READ ONLY · ANON
PostHog (Self-Hosted)
Dedicated Hetzner AX52. Runs ClickHouse + Kafka + Postgres internally. Tracks tenant-level feature adoption and patient-level behavior (anonymized). All events tagged with tenant_id.
SELF-HOSTED · GDPR · AX52 #3
Why self-hosted?
PostHog Cloud sends data to US servers. Patient behavior data under GDPR must stay in EU. Self-hosting on Hetzner Germany = full control.