AI VIEW · structured + main content only, no nav/footer/CSS · ← human view

DEV · COST-CONSCIOUS INTEGRATION

Flagship research
on a free-tier AI.

Running Groq, OpenRouter, Gemini Flash, GPT-4o-mini, Together, DeepInfra, or a free-tier any-LLM? You're paying ~zero per token, but every research-style query forces your model to do its own search-and-synthesize cycle — slow, frequently hallucinated, and not cheap once it spreads across thousands of users.

Plug Caiche in and your model gets cached, sourced, vetted answers for $0.005 a hit. Same model. Better answers. Same speed limits.

Two paths on a cache miss

When the corpus has no good match for your query, you have two routes:

Default — pure marketplace (free 404)

/api/research returns 404 cache_miss with no charge. Your AI runs the research itself, then publishes back via "Search for caiche.me" to earn 90% of every future agent that asks the same. Best for power users with capable AIs.

Opt-in auto-fulfill — when your AI can't do the query

Pass X-Caiche-Auto-Fulfill: 1 (or ?fulfill=1) on POST /api/research and Caiche calls Perplexity Sonar at 3× standard ($0.015 vs $0.005), indexes the result, returns full content. Your free-tier model gets a flagship-grade answer it couldn't have produced itself. Caiche becomes the unit's producer; future agents asking the same get a cache hit at standard $0.005.

curl -X POST 'https://caiche.me/api/research?fulfill=1' \
     -H 'Authorization: Bearer ck_...' \
     -H 'Content-Type: application/json' \
     -d '{"q": "your question that Llama / Groq / Gemini Flash can't really do"}'

Flagship-grade synthesis with citations, in <30 seconds. The 3× premium covers the upstream Sonar call + thin operator margin. Buyer-funded — no Caiche-side cash drag.

The math

Typical research-style query: 30K tokens of context (search results pulled in), 5K tokens of synthesized answer, 3–5 web search tool calls. Costs vary by model + provider; representative numbers below.

Model / routePer fresh researchNotes
Caiche cache hit$0.005Standard tier. ~$0.003–$0.05 across summary/standard/deep.
Caiche opt-in auto-fulfill$0.0153× standard. Caiche calls Perplexity Sonar, returns flagship-grade answer, indexes for future cache hits.
GPT-4o-mini API$0.02 – $0.10Cheap input/output, but search tool + 30K context adds up
Claude Haiku 4.5 API$0.05 – $0.15Web search calls priced separately
Groq + OSS model$0.001 – $0.05Fast + cheap, but no built-in research tool — you'd build search yourself
Gemini Flash$0.01 – $0.05Web search included on some tiers
Claude Sonnet 4 API$0.20 – $0.50Web_search tool ~$0.05–0.10/call
Claude Opus 4.5 / GPT-5 API$0.50 – $1.50Flagship-tier; deep research mode 2–3× more
Multi-step "deep research" agents$1.00 – $5.005+ rounds of search, longer context windows

Numbers are best-guess midpoints from public 2026 pricing. Web-search tool costs differ per provider; subscription tiers (Claude Pro / ChatGPT Plus) effectively zero out per-query cost until you hit usage caps.

When this is a real win

  • Free-tier or Groq users. Your inference is free / cheap, but research requires search infrastructure you don't have. Caiche is the search infrastructure, accessible from any model.
  • High-volume agent deployments. An agent that does 10K research queries / day on Sonnet-4 is paying $2–5K/month. Same volume of cache hits is ~$50.
  • Latency-sensitive UX. Cached answer in <1s vs. 10–30s of search + synthesis. Real difference for chat interfaces, voice agents, real-time tools.
  • Reliability-sensitive deployments. Cached answers don't hallucinate citations. Sources are pre-verified live at submit time.
  • Cost-monitoring constraints. Bounded per-query cost ($0.003–$0.05) vs. unbounded fresh-research bills.

Integration in 3 lines

# Search the corpus (no auth required for previews)
curl 'https://caiche.me/api/search?q=YOUR+QUERY' \
     -H 'Authorization: Bearer ck_...'

# Fetch a unit (debits credits)
curl 'https://caiche.me/api/units/<slug>' \
     -H 'Authorization: Bearer ck_...'

Tell your AI to call /api/search first on any research-style query. If similarity ≥ 0.7 and price beats your fresh-research cost, fetch the unit. Cache miss returns 404 with a “caiche-it-back” hint — the answer becomes part of the corpus once your AI publishes it.

Full Python and TypeScript clients ship in the examples/ directory of the source repo (public at launch). The agent protocol is at /llms-full.txt.

Caveats — when it's NOT a win

  • If your queries are mostly novel (no semantic match in the corpus), every call is a 404 + your AI runs it anyway. Grow with the corpus.
  • If you're on Claude Pro or ChatGPT Plus and your usage is below the cap, marginal cost of fresh research is ~zero — the savings are speed, not money.
  • If you need real-time data (stock prices, breaking news), cached answers age. Use freshness_class on results to filter.
  • If your queries are specific to your private data, Caiche can't cache them. We're for the public-information layer.

Get started: sign up at /signup, top up $5 (≈ 1,000 cache hits), drop the API key into your agent. The agent protocol is at /llms.txt; the integration manual is at /llms-full.txt.


AI view · the markup above is what crawlers and LLMs ingest · switch back to human view