Roguelite Labs Wiki

A working knowledge base for AI tools, models, and agentic workflows.

Companion to roguelitelabs.xyz — browse /stack and /log on the main site.

Models

Capability profiles, benchmarks, pricing, and honest assessments.

claude-fable-5 — new Claude 5 top-tier, highest-complexity reasoning and long-horizon analysis, replaces Opus at the ceiling
claude-sonnet-5 — current agentic default in Claude Code and API, replaces Sonnet 4.6
claude-opus-4-8 — 82.3% OSWorld-Verified, adaptive thinking, dynamic parallel subagents
claude-sonnet-4-6 — 79.6% SWE-bench, 72.7% OSWorld, superseded by Sonnet 5
claude-opus-4-6 — 128K output tokens, orchestrator role, legacy
gpt-5-5 — 82.7% Terminal-Bench 2.0, agentic-first design, dominant ecosystem
gpt-5-4 — three-tier system (instant/thinking/pro)
gemini-3-5-flash — 76.2% Terminal-Bench 2.1, 4× faster inference, $1.50/$9 pricing
gemini-2-5-pro — best long-context, leads coding benchmarks, free via AI Studio
gemma-4 — 31B Apache 2.0, 89.2% AIME 2026, #3 open model
deepseek-r1 — 671B MIT license, 97.3% MATH-500, best open math reasoning
grok-4 — best current-events accuracy, deep X data integration
kimi-k2 — 1T MoE from Moonshot AI, MIT license, leading open-weights Intelligence Index score
claude-3-5-sonnet — 49.0% SWE-bench Verified (Oct update), first computer use in public beta, $3/$15 pricing
gpt-4o — natively omnimodal (text/audio/image/video), ~320ms voice latency, 128K context, $2.50/$10 pricing
o1 — chain-of-thought reasoning model, 97% MATH-500, 77.3% GPQA Diamond, first to beat PhD experts
llama-4 — Scout (17B/109B MoE, 10M context) and Maverick (17B/400B MoE), natively multimodal, April 2025
claude-haiku-4-5 — $1/$5 pricing, 73.3% SWE-bench, 2× faster than Sonnet 4.5, computer use + extended thinking
gemini-2-5-ultra — Gemini 2.5 Pro + Deep Think reasoning mode; Ultra = subscription tier, not a separate model
llama-3-1 — 8B/70B/405B open-weights, 128K context, commercial license, first open frontier-class model
claude-3-7-sonnet — first hybrid reasoning model, introduced extended thinking, superseded by 4.x
nvidia-nemotron-3-ultra — 550B MoE, 48 AA Intelligence Index, #1 US open-weights, 1M context, 300+ tok/s
microsoft-mai-thinking-1 — ~1T MoE, 97.0% AIME 2025, commercially licensed training, Maia 200 inference
minimax-m3 — MSA architecture, 1M context, 59.0% SWE-Bench Pro, 9×/15× prefill/decode speedup vs M2

Tools

Agents, editors, and runtimes that make up a working AI stack.

claude-code — agentic CLI, hook system, context compaction
codex-cli — OpenAI terminal agent, sandboxed execution, open-source
cursor — AI-native VS Code fork, $100M ARR, large codebase navigation
amp-code — Sourcegraph agent, think-out-loud mode, multi-model
ollama — local inference CLI, pipeline-friendly, OpenAI-compatible API
lm-studio — GUI local model runner, good Apple Silicon support
exe-dev — remote VMs for overnight Claude Code sessions
exo — distributed inference across multiple devices; TB5-chained Mac clusters for 400B+ models
perplexity-comet — AI-native browser, agent executes web tasks autonomously, first consumer computer-use product
windsurf — VS Code fork by Codeium, Cascade flow-based agent, acquired by Cognition July 2025
cline — open-source VS Code agent extension, model-agnostic, Plan and Act mode, no subscription
github-copilot — Microsoft/GitHub assistant, Workspace browser agent, Project Polaris model, 20M+ users

Concepts

The underlying ideas, protocols, and benchmarks worth understanding.

extended-thinking — chain-of-thought interleaved with tool use, first in Claude 3.7
reasoning-models — thinking tokens, o1/o3/R1, trade-offs vs fast models, benchmarks
tool-use — structured function calling, client vs server tools, agentic loops
prompt-caching — cache static prefixes, 90% cost reduction on cache reads
context-window — token limits, 1M context, lost-in-the-middle problem
computer-use — model sees screen, moves cursor, clicks; 72.7% OSWorld
mcp — Model Context Protocol, standard for tool/context integration
vibe-coding — Karpathy Feb 2025, iterate on behavior not implementation
evals — SWE-bench, OSWorld, AIME, LiveCodeBench explained
rag — retrieval augmented generation, grounding in external documents
agentic-workflows — multi-step autonomous agent loops, patterns and failure modes
distributed-inference — pipeline/tensor parallelism, bandwidth math, exo vs vLLM, Apple Silicon advantage
gpu-clouds — raw GPU rental (Lambda, CoreWeave, Vast), managed inference (Together, Groq, Fireworks), hyperscaler AI services
language-of-thought — Fodor's Mentalese hypothesis; systematicity, compositionality, and what reasoning requires
lot-llm-paradox — three positions on whether LLMs reason; the system-level synthesis
linguistic-relativity — Sapir-Whorf, Boroditsky, Pica on number words; language as cognitive infrastructure
future-time-reference — Chen (2013) on FTR grammar and savings behavior; the AI collaboration implication
extended-cognition — EC vs Extended Mind; why the distinction matters; literacy as coupling mechanism
distributed-cognition — Hutchins; cognitive processes as system-level properties; ship navigation and cockpit studies
tools-for-thought — Bush → Engelbart → Kay → Victor → Nielsen/Matuschak; augmentation vs automation
hci-ai — Norman's gulfs, Suchman's situated action, Endsley's situation awareness; HCI in the AI paradigm
assistive-technology — the AT evolution arc, curb cut effect, AAC; scaffold vs substitute
ephemeral-software — on-demand generated tools; specification as the new development bottleneck
personalized-systems — Licklider's symbiosis vision, PKM, adaptive cognitive scaffolding
universal-design-cognition — six derived principles for cognitive tool design; AI as cognitive policy instrument

Workflows

How the stack fits together in practice.

multi-agent-setup — orchestrator + subagents, parallel execution, cost patterns
overnight-runs — unattended sessions on exe.dev, hook notifications, task design

Music

Artists, sounds, and the aesthetic logic behind them.

don-toliver — psychedelic trap · dark luxury · autotune as instrument · Cactus Jack
julia-wolf — alt-pop · cinematic · emotionally vulnerable · feminine angst
dom-dolla — tech house · groove-first · club records · Melbourne

Blockchain Infrastructure

Ethereum scaling, rollup architecture, and the protocols beneath the L2 ecosystem.

op-stack — Optimism's modular L2 framework; powers Base, Zora, Unichain, and the Superchain
optimism — OP Mainnet, OP token, bicameral governance, RetroPGF
rollups — optimistic vs ZK, fraud proofs vs validity proofs, L2BEAT staging
ethereum-l2s — comparison of Arbitrum, Base, OP Mainnet, zkSync, StarkNet by TVL and approach
superchain — shared sequencer vision, cross-chain interop, Superchain fee flywheel
data-availability — calldata, EIP-4844 blobs, Celestia, EigenDA — where rollup data lives
eigenlayer — restaking, AVSs, $18B+ economic security sharing, verifiable cloud direction
base — Coinbase's L2, onchain economy thesis, 89% of Superchain revenue in 2025
stablecoins — USDC, USDT, DAI, algorithmic failures (Terra), yield-bearing (USDe, sDAI, USDY)
real-world-assets — tokenized treasuries, private credit, institutional DeFi; BlackRock BUIDL, Ondo, Centrifuge
dao-governance — token voting, delegation, timelocks, governance attacks, veToken model
mev — frontrunning, sandwich attacks, Flashbots, MEV-Boost, PBS; hundreds of millions extracted annually
flash-loans — atomic uncollateralized loans; arbitrage, collateral swaps, oracle manipulation attacks (Beanstalk $182M)
amm — constant product formula, Uniswap V3 concentrated liquidity, Curve StableSwap, impermanent loss, TWAP oracles

blockchain14

AMMs (Automated Market Makers)4 min Base2 min DAO Governance5 min Data Availability3 min EigenLayer2 min Ethereum L2s2 min Flash Loans4 min MEV (Maximal Extractable Value)4 min OP Stack2 min Optimism2 min Real World Assets4 min Rollups2 min Stablecoins5 min Superchain2 min

music3

Dom Dolla4 min Don Toliver3 min Julia Wolf3 min

tools12

Amp Code2 min Claude Code3 min Cline3 min Codex CLI2 min Cursor2 min exe.dev2 min exo4 min GitHub Copilot3 min LM Studio2 min Ollama2 min Perplexity Comet2 min Windsurf2 min

workflows2

Multi-Agent Setup2 min Overnight Runs3 min