Write-ahead log via NATS JetStream. Batched persistence to your database. 7× faster than checkpointing directly to a database — and crash-recoverable.
Per-event write latency · p50
Measured locally against NATS + MySQL.
Reproduce with cargo run --bin benchmark.
The problem
A multi-step agent that crashes mid-run loses all progress and replays every LLM call from the beginning. Existing checkpointers write synchronously to a database — making every event as slow as a database round-trip.
Synchronous DB writes add ~1ms per event. Across hundreds of steps, this compounds.
Without a durable log, a process crash anywhere in the pipeline means restarting from zero.
At-least-once delivery without idempotency creates duplicate rows and corrupted state.
How it works
Every save_event call is acknowledged by NATS JetStream before returning to your agent.
A background writer batches those events into your database — keeping the hot path fast and the data durable.
Your agent publishes an event. Skialith serialises it and sends it to NATS JetStream.
JetStream confirms the write in ~133us. Your agent unblocks — no DB wait.
A background task collects events and flushes in efficient batches with automatic retry.
Agent
| save_event / checkpoint
v
Skialith sidecar
|-- NATS JetStream <-- PubAck ~133us returned to caller
| |
| +-- Background batch writer
| +-- MySQL / TiDB (async, retried, idempotent)
|
+-- trace_ingest consumer --> agent_traces table Performance
Run the benchmarks yourself against a local NATS + MySQL stack.
| Scenario | p50 | p95 | p99 |
|---|---|---|---|
| save_event (NATS PubAck) | 133 us | 265 us | 386 us |
| Baseline MySQL INSERT | 986 us | 1.5 ms | 2.6 ms |
cargo run --bin benchmark Integrations
Agents are plain async functions. SDKs are thin HTTP clients — no Rust required.
from skialith import SkialithAgent
async with SkialithAgent(agent_id="my-agent") as agent:
state = await agent.resume()
await agent.checkpoint(
step=state.step_index,
data={"messages": messages}
)
await agent.save_event("step-1", {
"kind": "thought", "text": "..."
}) import { SkialithAgent } from "@skialith/agent-core";
const agent = new SkialithAgent({ agentId: "my-agent" });
const state = await agent.resume();
await agent.checkpoint(state.stepIndex, { messages });
await agent.saveEvent("step-1", {
kind: "thought", text: "..."
}); from skialith.langchain import SkialithCheckpointer
checkpointer = SkialithCheckpointer()
app = graph.compile(checkpointer=checkpointer)
# No other changes needed
result = await app.ainvoke(
{"messages": [...]},
config={"configurable": {"thread_id": "agent-1"}}
) Design Partner Program
We are collaborating with a small number of teams running AI agents in production. If you are hitting the limits of existing checkpointing approaches and want to shape what we build next, we would like to hear from you.