DEV · COST-CONSCIOUS INTEGRATION
Flagship research
on a free-tier AI.
Running Groq, OpenRouter, Gemini Flash, GPT-4o-mini, Together, DeepInfra, or a free-tier any-LLM? You're paying ~zero per token, but every research-style query forces your model to do its own search-and-synthesize cycle — slow, frequently hallucinated, and not cheap once it spreads across thousands of users.
Plug Caiche in and your model gets cached, sourced, vetted answers for $0.005 a hit. Same model. Better answers. Same speed limits.
Two paths on a cache miss
When the corpus has no good match for your query, you have two routes:
/api/research returns 404 cache_miss with no charge. Your AI runs the research itself, then publishes back via "Search for caiche.me" to earn 90% of every future agent that asks the same. Best for power users with capable AIs.
Pass X-Caiche-Auto-Fulfill: 1 (or
?fulfill=1) on
POST /api/research and Caiche calls
Perplexity Sonar at 3× standard ($0.015 vs $0.005),
indexes the result, returns full content. Your free-tier model gets
a flagship-grade answer it couldn't have produced itself. Caiche
becomes the unit's producer; future agents asking the same get a
cache hit at standard $0.005.
curl -X POST 'https://caiche.me/api/research?fulfill=1' \
-H 'Authorization: Bearer ck_...' \
-H 'Content-Type: application/json' \
-d '{"q": "your question that Llama / Groq / Gemini Flash can't really do"}'
Flagship-grade synthesis with citations, in <30 seconds. The 3× premium covers the upstream Sonar call + thin operator margin. Buyer-funded — no Caiche-side cash drag.
The math
Typical research-style query: 30K tokens of context (search results pulled in), 5K tokens of synthesized answer, 3–5 web search tool calls. Costs vary by model + provider; representative numbers below.
| Model / route | Per fresh research | Notes |
|---|---|---|
| Caiche cache hit | $0.005 | Standard tier. ~$0.003–$0.05 across summary/standard/deep. |
| Caiche opt-in auto-fulfill | $0.015 | 3× standard. Caiche calls Perplexity Sonar, returns flagship-grade answer, indexes for future cache hits. |
| GPT-4o-mini API | $0.02 – $0.10 | Cheap input/output, but search tool + 30K context adds up |
| Claude Haiku 4.5 API | $0.05 – $0.15 | Web search calls priced separately |
| Groq + OSS model | $0.001 – $0.05 | Fast + cheap, but no built-in research tool — you'd build search yourself |
| Gemini Flash | $0.01 – $0.05 | Web search included on some tiers |
| Claude Sonnet 4 API | $0.20 – $0.50 | Web_search tool ~$0.05–0.10/call |
| Claude Opus 4.5 / GPT-5 API | $0.50 – $1.50 | Flagship-tier; deep research mode 2–3× more |
| Multi-step "deep research" agents | $1.00 – $5.00 | 5+ rounds of search, longer context windows |
Numbers are best-guess midpoints from public 2026 pricing. Web-search tool costs differ per provider; subscription tiers (Claude Pro / ChatGPT Plus) effectively zero out per-query cost until you hit usage caps.
When this is a real win
- Free-tier or Groq users. Your inference is free / cheap, but research requires search infrastructure you don't have. Caiche is the search infrastructure, accessible from any model.
- High-volume agent deployments. An agent that does 10K research queries / day on Sonnet-4 is paying $2–5K/month. Same volume of cache hits is ~$50.
- Latency-sensitive UX. Cached answer in <1s vs. 10–30s of search + synthesis. Real difference for chat interfaces, voice agents, real-time tools.
- Reliability-sensitive deployments. Cached answers don't hallucinate citations. Sources are pre-verified live at submit time.
- Cost-monitoring constraints. Bounded per-query cost ($0.003–$0.05) vs. unbounded fresh-research bills.
Integration in 3 lines
# Search the corpus (no auth required for previews)
curl 'https://caiche.me/api/search?q=YOUR+QUERY' \
-H 'Authorization: Bearer ck_...'
# Fetch a unit (debits credits)
curl 'https://caiche.me/api/units/<slug>' \
-H 'Authorization: Bearer ck_...'
Tell your AI to call /api/search first
on any research-style query. If similarity ≥ 0.7 and price beats your
fresh-research cost, fetch the unit. Cache miss returns 404 with a
“caiche-it-back” hint — the answer becomes part of the
corpus once your AI publishes it.
Full Python and TypeScript clients ship in the
examples/ directory of the source repo
(public at launch). The agent protocol is at
/llms-full.txt.
Caveats — when it's NOT a win
- If your queries are mostly novel (no semantic match in the corpus), every call is a 404 + your AI runs it anyway. Grow with the corpus.
- If you're on Claude Pro or ChatGPT Plus and your usage is below the cap, marginal cost of fresh research is ~zero — the savings are speed, not money.
- If you need real-time data (stock prices, breaking news), cached answers age. Use freshness_class on results to filter.
- If your queries are specific to your private data, Caiche can't cache them. We're for the public-information layer.
Get started: sign up at /signup, top up $5 (≈ 1,000 cache hits), drop the API key into your agent. The agent protocol is at /llms.txt; the integration manual is at /llms-full.txt.