API reference
The /api/v1/skim endpoint takes a URL and returns
an article in the format you ask for. Authenticate with a
personal access token you create on the
keys page.
Authentication
Every call to /api/v1/skim needs a key. Pass it
any of three ways:
X-Api-Key: db_...(header)Authorization: Bearer db_...(header)?api_key=db_...(query string — handy for browser-address-bar GETs; keep in mind URLs land in server logs, so prefer headers for production)
Keys are SHA-256 hashed on the server — only the 7-char
prefix is recoverable after creation. Revoke a key from the
admin UI at any time; the next call using it returns
401 Unauthorized.
POST /api/v1/skim
Request body (JSON):
{
"url": "https://en.wikipedia.org/wiki/JavaScript",
"format": "json" | "md" | "markdown" | "html",
"scripting": true
}
curl (Markdown out):
curl -X POST https://broski.daisi.ai/api/v1/skim \
-H "Content-Type: application/json" \
-H "Authorization: Bearer db_YOUR_KEY_HERE" \
-d '{"url":"https://en.wikipedia.org/wiki/JavaScript","format":"md"}'
GET /api/v1/skim
Same service, query-string arguments. Every field of the
POST body maps to a query param of the same name. Pass the
API key as api_key or via the usual headers.
GET https://broski.daisi.ai/api/v1/skim?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FJavaScript
&format=md&scripting=true&api_key=db_YOUR_KEY_HERE
curl:
curl "https://broski.daisi.ai/api/v1/skim?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FJavaScript&format=md" \
-H "Authorization: Bearer db_YOUR_KEY_HERE"
Response (200, text/markdown):
# JavaScript - Wikipedia
on _en.wikipedia.org_ • [source](https://en.wikipedia.org/wiki/JavaScript)
JavaScript (/ˈdʒɑːvəskrɪpt/), often abbreviated as JS, is a
programming language and core technology of the Web...
Response format
Pick your output with the format field:
json(default) — structuredArticleContentwithtitle,byline,publishedAt,lang,siteName,description,heroImage,plainText,images[],links[],navLinks[].md/markdown— CommonMark with inline links, tables, and hero-image reference.html— reader-mode HTML, just the article subtree.
The scripting flag
When scripting is true (the default),
the engine executes the page's inline + external scripts before
extraction. This is required for SPAs (Next.js, Nuxt, Remix)
where the interesting content is client-rendered. Set it to
false for static-HTML pages — it's both faster and
more deterministic.
Error responses
400 Bad Request- Missing or malformed
url. Body is{ "error": "..." }. 401 Unauthorized- No credentials, or a revoked / expired key.
502 Bad Gateway- Upstream fetch failed — DNS, TLS, the site refused, or the
content stream was malformed. Details in the
application/problem+jsonbody. 499 Client Closed Request- The HTTP client disconnected mid-skim; the skim was cancelled and nothing was billed.
Gofer — multi-source search + crawl
Three endpoints under /api/v1/gofer wrap the
Gofer crawler + 14 built-in search providers (Wikipedia,
arXiv, GitHub, Hacker News, Stack Exchange, CrossRef,
OpenLibrary, Reddit, DuckDuckGo, Brave, Mojeek, GDELT,
Guardian, Bing News). Every endpoint accepts both
POST (JSON body) and GET (query
string) — every JSON field maps to a query param of the
same name. Array params on GET use repeated keys
(?seeds=a&seeds=b).
POST /api/v1/gofer/search
Query → hit list (no crawl).
{
"query": "quantum error correction",
"sources": "Scholarly, Community", // SearchSource flags
"perProviderLimit": 10
}
GET equivalent:
GET /api/v1/gofer/search?query=quantum+error+correction
&sources=Scholarly,Community&perProviderLimit=10&api_key=db_…
POST /api/v1/gofer/crawl
Seed URLs → crawled articles (no search).
{
"seeds": ["https://en.wikipedia.org/wiki/JavaScript"],
"maxDepth": 0, // clamped to [0,5]
"maxPages": 50, // clamped to [1,500]
"degreeOfParallelism": 8, // clamped to [1,32]
"stayOnHost": false,
"followLinks": false,
"selectors": ["article.post-body"], // optional CSS honing
"headers": { "Accept-Language": "en-US" }
}
POST /api/v1/gofer/research
Full pipeline: search → dedup → crawl. The headline endpoint for LLM research — one call back, markdown plus search attribution for every page.
{
"query": "rust programming language",
"sources": "Scholarly, News",
"perProviderLimit": 10, // hits per provider
"maxCrawled": 20, // crawl the top N unique URLs
"degreeOfParallelism": 8,
"selectors": ["article"], // optional
"headers": { "Accept-Language": "en-US" }
}
Response: array of { search, crawl } pairs —
search is the originating SearchResult
(which provider found it + snippet + extras), crawl
is the GoferResult with title,
markdown, plainText, links,
and fetch timing.
Available sources
Individual: Wikipedia, Arxiv,
GitHub, HackerNews, StackExchange,
CrossRef, OpenLibrary, Reddit,
DuckDuckGo, Brave, Mojeek,
Gdelt, Guardian, BingNews.
Bundles: Scholarly (wiki + arxiv + crossref + oldb),
Community (gh + hn + se + reddit),
Web (ddg + brave + mojeek),
News (gdelt + guardian + bingnews),
All.
Mix with commas: Scholarly, News.
GET /api/v1/health
Unauthenticated heartbeat. Returns 200 with
{ "status": "ok" }. Use for uptime probes.
Limits
Request body capped at 50 KiB; response streaming capped at
50 MiB per skim. PDFs that require features outside our
shipped scope (non-empty passwords, CID fonts without
/ToUnicode, image-only scans) return a short
article explaining the fallback — they don't 500.