Technical details

SoTranscribe — the technical write-up.

The jargon-y version. Pipeline, jurisdictional reasoning, hardware sovereignty research, lineage and ethos. Written for technical readers, funders, and journalists.

How it works

One pipeline, end to end.

Recordings   →  Dedupe   →  Whisper Large v3   →  Diarize   →  Markdown + JSON   →  Your storage
(any source,    (hash-       (faster-whisper,         (pyannote,    (timestamps,           (self-host /
 incl wearable)  based)        word timestamps)        speakers)     speakers, meta)        AT / your NAS)
      

Continuous mode

Set it and forget it.

  • Detect — new file appears in a watched folder (Drive sync, OMI wearable, voice-memo export, anything).
  • Dedupe — content-hash check; identical files don’t get re-processed.
  • Filter — rules you set (silent files, < 30s clips, screen-share-only meetings) get skipped.
  • Queue — passed files line up for transcription; full GPU control, no remote API calls.
  • Optional review — flag anything for human dismissal or extra tagging before it’s processed.

Coming next: LLM-based or script-based auto-tagging, seeded by your initial tags — so a year of recordings finds the right project context on its own.

Two ways to run it

Pick the friction you want.

Self-host

Free · EvoBioSys License

  • Run it on your own GPU (RTX 3090-class or better)
  • Source on GitHub; GitLab self-hosted in V2
  • Full control, full responsibility
  • Community support via mail & Matrix (coming)

Hosted

Beta waitlist · Austrian servers (Anexia)

  • We host on EU/AT-sovereign infrastructure we colocate (Anexia, Klagenfurt & Wien)
  • Zero ops — upload audio, get transcripts back
  • Billing in EUR via Mollie (NL) + self-hosted Lago (FR)
  • No US payment rails — ever

Why Austria — and why not Switzerland (anymore)

In 2026, Austria is in materially better jurisdictional shape.

The conventional “privacy provider” choice has been Switzerland for two decades: non-EU, non-CLOUD-Act, constitutional privacy. We held that view too. As of 2026, the picture has shifted enough that we host in Austria only.

What changed in Switzerland

  • The Federal Council’s pending revision of the VÜPF / VIS-NDB ordinance — specifically Article 50a — would obligate covered providers to perform decryption on the provider side when served. That is a structural backdoor mandate, not a court-order regime.
  • Data-retention obligation (6 months of IP logs) kicks in at as few as 5,000 users — an extraordinarily low threshold.
  • Government-ID identification mandate for users of covered services.
  • Critically, this is being done by Federal Council ordinance, not by parliamentary law — which sidesteps the usual Swiss public-debate brake on surveillance expansion.
  • Proton, the Swiss flagship privacy provider, has publicly announced it is relocating infrastructure out of Switzerland into the EU because of this trajectory. When the canary is moving, you don’t move into the cage.

What Austria offers instead

  • § 134 / 135a StPO — provider compulsion to release content data requires an individual court order; bulk requests are not legally available.
  • Verfassungsgerichtshof (G47/2012, 27 June 2014) struck down general data-retention obligations as unconstitutional — Austria has no equivalent of the new Swiss IP-log-retention regime.
  • EU member — GDPR-native, single legal regime, frictionless EU billing.
  • Outside U.S. CLOUD Act reach (CLOUD Act is jurisdictional over US providers, not over EU member states).

Proton ist dabei, die Schweiz zu verlassen. Wir bauen direkt in Österreich.

We host with Anexia in Klagenfurt and Wien — Austrian-owned, named facilities. EU customers get GDPR-native billing. Self-host users get the same source code regardless.

Why no Stripe? Why no US providers?

Because the CLOUD Act exists.

US-jurisdiction providers — Stripe, Google, AWS — can be compelled under the CLOUD Act to hand over data even when it’s stored in the EU. Audio of meetings is some of the most sensitive content a person creates. We don’t want that exposure on our chain, and you shouldn’t either.

Our stack is named, boring, and verifiable: Mollie (NL) for cards. Lago (FR, self-hosted, open-source) for invoicing. GoCardless for SEPA direct debit. Bunny Fonts (SI) instead of Google Fonts. Plausible (EU instance, no cookies) for analytics — if we add any. Open Collective Europe for sponsorship, with a public ledger.

V1 caveats we don’t hide: GitHub Pages and the GitHub repo are still US-hosted. Migration to a German Hetzner instance and self-hosted GitLab is the V2 commitment.

Hardware sovereignty research

We test on multiple architectures.

The transcription pipeline runs on commodity x86 today (AMD or Intel, Linux), but sovereignty-conscious users care about the chip beneath the OS. Our R&D scope includes compatibility testing across multiple CPU architectures so users can pick the platform that fits their threat model:

  • RISC-V — SiFive HiFive Premier P550, Milk-V Megrez, DeepComputing DC-ROMA. Open ISA, partial open-firmware. Bootstrapping platform for our hardware-sovereignty work today.
  • Apple Silicon — Mac Studio / Mac Mini. Closed but commercially incentivised against backdoors. Strong inference performance for users who already have one.
  • Intel — mainstream x86 baseline. Familiar, broad compatibility.
  • AMD — high-throughput baseline (Threadripper / Epyc). What we use for our own bulk testing.
  • POWER9 (Talos II)roadmap. Raptor Computing Systems still ships these; they remain the best fully-open-firmware workstation platform. We plan to evaluate as a flagship sovereign-hosting platform once production load justifies it.

The point isn’t to ship five different binaries. The point is that the same open-source SoTranscribe pipeline runs on whatever hardware a user trusts. Sovereignty as choice, not as ideology.

Transparent costs

Hosting isn’t free — here’s the math.

  • GPU compute — Whisper Large v3 wants RTX 3090-class memory or better. Colocation in Austria (Anexia, Klagenfurt & Wien) runs roughly €100–250/month per node, depending on utilization.
  • Storage — cheap per-TB, but accumulates. Audio is bulky.
  • Bandwidth — egress is the silent line item.

When the hosted tier launches, the actual numbers go in the public ledger via Open Collective Europe. Patronage funds a clearly-priced commons, not a black box.

FAQ — advanced

Honest answers.

Why no Stripe?
CLOUD Act exposure. We use Mollie (NL) for cards and self-hosted Lago (FR) for invoicing. Your billing record never leaves the European jurisdiction.
What about my data?
Self-host: nothing leaves your machine. Hosted: audio sits on EU/AT-sovereign servers (Anexia, Klagenfurt & Wien), named providers, never touching US infrastructure. Crown-jewel data — meeting audio is exactly that — deserves the strongest available promise.
Can I leave?
Yes. The EvoBioSys License permits export and migration. Hosted tier exports your full transcripts + metadata in standard formats (Markdown + JSON). No lock-in moats. If you’re unhappy, the door opens outward.
Where is V1 actually hosted?
This page is on GitHub Pages today — we’re flagging that openly because it’s a US-jurisdiction host. The migration to a German Hetzner instance (and self-hosted GitLab for the repo) is V2. The marketing page is public anyway, but principle matters.
What’s the implementation stack?
Rust backend, React frontend for the SoTranscribe app itself. Whisper via faster-whisper (Python, CTranslate2). Diarization via pyannote.audio.
What’s “idea2.life”?
A turquoise (Spiral Dynamics tier-2) incubator with a federated, sovereign-by-default worldview. SoTranscribe is its first baby. Read more.