Architecture

Two services. Stateless front, stateful back.

Marketing on Next.js 15 SSR, the engine on FastAPI, both on Cloud Run. Firestore for async jobs, Cloud SQL Postgres for the cache, Gemini for everything that thinks.

Topology

Marketing and the dashboard are split into two independently deployable Cloud Run services, both pinned to us-central1. DNS lives at Cloudflare in DNS-only mode (gray cloud); TLS terminates at ghs.googlehosted.com.

topology

                ┌─────────────────────────────┐
                │  ipzilla.app                │  Next.js 15 SSR (web/)
                │  marketing + auth + plans   │  Cloud Run · region us-central1
                └──────────────┬──────────────┘
                               │
                               │  Firebase Auth (project ipzilla-76104)
                               │
                ┌──────────────▼──────────────┐
                │  app.ipzilla.app            │  FastAPI (app/)
                │  IP Radar dashboard + API   │  Cloud Run · region us-central1
                │                             │  /api/v1/ip-radar/*
                │                             │  /api/v1/patents/*
                └──────────────┬──────────────┘
                               │
        ┌──────────────────────┼──────────────────────┐
        ▼                      ▼                      ▼
  Gemini API            Firestore              Cloud SQL Postgres
  (Vertex AI)           research_jobs +         patent_claim_extractions
  scope, claim          credits ledger          fto_verdicts_cache
  reading, FTO,         (always-on)             (read-through cache)
  multi-agent

Request flow — happy path for /research/start

A research job is async by design. The router charges credits atomically, persists a Firestore document, and returns a job_id in under 200 ms. The streaming orchestrator runs in a background task; each subtask result lands in its own Firestore doc the moment it completes — so the dashboard can poll progress, and a Cloud Run instance swap mid-job loses zero work.

flow

   user POST  ──►  patents.py router
                      │ Depends(require_credits("ip_radar_research"))  → 300 cr
                      ▼
                research_jobs.py  ─create_job(uid)→  Firestore doc
                      │
                      │ asyncio.create_task(_run_research_job(...))
                      ▼
                patent_research_agent.py   (the streaming orchestrator)
                      │
                      ├─► patent_query_planner.py   (Gemini → SearchPlan)
                      ├─► patent_service.py         (BigQuery → Google Patents)
                      ├─► patent_claim_fetcher.py   (USPTO PEDS / EPO OPS)
                      ├─► patent_claim_reader.py    (Gemini Pro per-patent)
                      ├─► patent_fto.py             (holistic verdict)
                      ├─► patent_fto_elementwise.py (per-element verdict)
                      ├─► patent_reranker.py        (LLM rerank top-K)
                      └─► patent_taxonomy.py        (entity recognition)
                      │
                      ▼ all writes go via:
                patent_cache.py  (Cloud SQL read-through cache)
                      │
                      ▼ final state:
                research_jobs Firestore doc
                      │
            poll: GET /research/{id}        ◄ progressive subtask map
            download: GET /export.zip        ◄ self-contained dashboard ZIP

Scale envelope

The FastAPI tier runs min-instances=2, max-instances=20, concurrency=80, timeout=600s. That handles ~1,600 concurrent inflight requests before autoscaling tops out — about an order of magnitude beyond what any current customer could schedule. Heavy traffic does not pin instances: the LLM round-trips dominate latency, which means each instance spends most of its time on asyncio waits, not CPU.

The expensive surface — /research/start — is rate-limited to 2/hour per user via slowapi, on top of the credit ledger. Cheap GETs sit at 60/minute, Gemini-backed POSTs at 10/minute. The credit gate runs as a FastAPI dependency on every billable endpoint, with idempotency keys stored in Firestore so a retried request never double-charges.

Cost envelope

Component	Quiet (10 jobs/day)	Active (500 jobs/day)	High (5K jobs/day)
Cloud Run (1 vCPU, 1 GiB)	~$5/mo	~$30/mo	~$200/mo
Firestore reads + writes	<$1/mo	$5/mo	$40/mo
Gemini Flash + Pro calls	~$2/mo	~$80/mo	~$700/mo
Cloud SQL Postgres (smallest)	$0	~$50/mo	~$200/mo
Total	~$10/mo	~$170/mo	~$1,200/mo

Most of the marginal cost is Gemini calls per research job (roughly $0.10–$1.00 per job depending on the patent count). Monitored via Cloud Monitoring and capped via Cloud Billing budgets and the max-instances ceiling.