API reference

The /api/v1/skim endpoint takes a URL and returns an article in the format you ask for. Authenticate with a personal access token you create on the keys page.

Authentication

Every call to /api/v1/skim needs a key. Pass it any of three ways:

X-Api-Key: db_... (header)
Authorization: Bearer db_... (header)
?api_key=db_... (query string — handy for browser-address-bar GETs; keep in mind URLs land in server logs, so prefer headers for production)

Keys are SHA-256 hashed on the server — only the 7-char prefix is recoverable after creation. Revoke a key from the admin UI at any time; the next call using it returns 401 Unauthorized.

POST /api/v1/skim

Request body (JSON):

{
  "url":       "https://en.wikipedia.org/wiki/JavaScript",
  "format":    "json" | "md" | "markdown" | "html",
  "scripting": true
}

curl (Markdown out):

curl -X POST https://broski.daisi.ai/api/v1/skim \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer db_YOUR_KEY_HERE" \
  -d '{"url":"https://en.wikipedia.org/wiki/JavaScript","format":"md"}'

GET /api/v1/skim

Same service, query-string arguments. Every field of the POST body maps to a query param of the same name. Pass the API key as api_key or via the usual headers.

GET https://broski.daisi.ai/api/v1/skim?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FJavaScript
    &format=md&scripting=true&api_key=db_YOUR_KEY_HERE

curl:

curl "https://broski.daisi.ai/api/v1/skim?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FJavaScript&format=md" \
  -H "Authorization: Bearer db_YOUR_KEY_HERE"

Response (200, text/markdown):

# JavaScript - Wikipedia

on _en.wikipedia.org_ • [source](https://en.wikipedia.org/wiki/JavaScript)

JavaScript (/ˈdʒɑːvəskrɪpt/), often abbreviated as JS, is a
programming language and core technology of the Web...

Response format

Pick your output with the format field:

json (default) — structured ArticleContent with title, byline, publishedAt, lang, siteName, description, heroImage, plainText, images[], links[], navLinks[].
md / markdown — CommonMark with inline links, tables, and hero-image reference.
html — reader-mode HTML, just the article subtree.

The scripting flag

When scripting is true (the default), the engine executes the page's inline + external scripts before extraction. This is required for SPAs (Next.js, Nuxt, Remix) where the interesting content is client-rendered. Set it to false for static-HTML pages — it's both faster and more deterministic.

Error responses

400 Bad Request: Missing or malformed url. Body is { "error": "..." }.
401 Unauthorized: No credentials, or a revoked / expired key.
502 Bad Gateway: Upstream fetch failed — DNS, TLS, the site refused, or the content stream was malformed. Details in the application/problem+json body.
499 Client Closed Request: The HTTP client disconnected mid-skim; the skim was cancelled and nothing was billed.

Gofer — multi-source search + crawl

Three endpoints under /api/v1/gofer wrap the Gofer crawler + 14 built-in search providers (Wikipedia, arXiv, GitHub, Hacker News, Stack Exchange, CrossRef, OpenLibrary, Reddit, DuckDuckGo, Brave, Mojeek, GDELT, Guardian, Bing News). Every endpoint accepts both POST (JSON body) and GET (query string) — every JSON field maps to a query param of the same name. Array params on GET use repeated keys (?seeds=a&seeds=b).

`POST /api/v1/gofer/search`

Query → hit list (no crawl).

{
  "query":              "quantum error correction",
  "sources":            "Scholarly, Community",     // SearchSource flags
  "perProviderLimit":   10
}

GET equivalent:

GET /api/v1/gofer/search?query=quantum+error+correction
    &sources=Scholarly,Community&perProviderLimit=10&api_key=db_…

`POST /api/v1/gofer/crawl`

Seed URLs → crawled articles (no search).

{
  "seeds":              ["https://en.wikipedia.org/wiki/JavaScript"],
  "maxDepth":           0,                          // clamped to [0,5]
  "maxPages":           50,                         // clamped to [1,500]
  "degreeOfParallelism": 8,                         // clamped to [1,32]
  "stayOnHost":         false,
  "followLinks":        false,
  "selectors":          ["article.post-body"],      // optional CSS honing
  "headers":            { "Accept-Language": "en-US" }
}

`POST /api/v1/gofer/research`

Full pipeline: search → dedup → crawl. The headline endpoint for LLM research — one call back, markdown plus search attribution for every page.

{
  "query":              "rust programming language",
  "sources":            "Scholarly, News",
  "perProviderLimit":   10,                         // hits per provider
  "maxCrawled":         20,                         // crawl the top N unique URLs
  "degreeOfParallelism": 8,
  "selectors":          ["article"],                // optional
  "headers":            { "Accept-Language": "en-US" }
}

Response: array of { search, crawl } pairs — search is the originating SearchResult (which provider found it + snippet + extras), crawl is the GoferResult with title, markdown, plainText, links, and fetch timing.

Available sources

Individual: Wikipedia, Arxiv, GitHub, HackerNews, StackExchange, CrossRef, OpenLibrary, Reddit, DuckDuckGo, Brave, Mojeek, Gdelt, Guardian, BingNews.

Bundles: Scholarly (wiki + arxiv + crossref + oldb), Community (gh + hn + se + reddit), Web (ddg + brave + mojeek), News (gdelt + guardian + bingnews), All. Mix with commas: Scholarly, News.

GET /api/v1/health

Unauthenticated heartbeat. Returns 200 with { "status": "ok" }. Use for uptime probes.

Limits

Request body capped at 50 KiB; response streaming capped at 50 MiB per skim. PDFs that require features outside our shipped scope (non-empty passwords, CID fonts without /ToUnicode, image-only scans) return a short article explaining the fallback — they don't 500.