Source-grounded rewrite of 529 published docs pages with per-unit information-loss verification: 1,713 factual corrections cited to src/**, generated surfaces regenerated, frontmatter titles preserved for i18n, release notes pages untouched. All docs gates green. Closes #100141
17 KiB
summary, read_when, title
| summary | read_when | title | ||
|---|---|---|---|---|
| Infer-first CLI for provider-backed model, image, audio, TTS, video, web, and embedding workflows |
|
Inference CLI |
openclaw infer is the canonical headless surface for provider-backed inference. It exposes capability families (model, image, audio, tts, video, web, embedding), not raw gateway RPC names or agent tool ids. openclaw capability ... is an alias for the same command tree.
Reasons to prefer it over a one-off provider wrapper:
- Reuses providers and models already configured in OpenClaw.
- Stable
--jsonenvelope for scripts and agent-driven automation (see JSON output). - Runs the normal local path without the gateway for most subcommands.
- For end-to-end provider checks, it exercises the shipped CLI, config loading, default-agent resolution, bundled plugin activation, and the shared capability runtime before the provider request goes out.
Turn infer into a skill
Copy and paste this to an agent:
Read https://docs.openclaw.ai/cli/infer, then create a skill that routes my common workflows to `openclaw infer`.
Focus on model runs, image generation, video generation, audio transcription, TTS, web search, and embeddings.
A good infer-based skill maps common user intents to the right subcommand, includes a few canonical examples per workflow, prefers openclaw infer ... over lower-level alternatives, and does not re-document the entire infer surface in the skill body.
Command tree
openclaw infer
list
inspect
model
run
list
inspect
providers
auth login
auth logout
auth status
image
generate
edit
describe
describe-many
providers
audio
transcribe
providers
tts
convert
voices
providers
personas
status
enable
disable
set-provider
set-persona
video
generate
describe
providers
web
search
fetch
providers
embedding
create
providers
infer list / infer inspect --name <capability> show this tree as data (capability id, transports, description).
Common tasks
| Task | Command | Notes |
|---|---|---|
| Run a text/model prompt | openclaw infer model run --prompt "..." --json |
Local by default |
| Run a model prompt on images | openclaw infer model run --prompt "Describe this" --file ./image.png --model provider/model |
Repeat --file for multiple images |
| Generate an image | openclaw infer image generate --prompt "..." --json |
Use image edit when starting from an existing file |
| Describe an image file or URL | openclaw infer image describe --file ./image.png --prompt "..." --json |
--model must be an image-capable <provider/model> |
| Transcribe audio | openclaw infer audio transcribe --file ./memo.m4a --json |
--model must be <provider/model> |
| Synthesize speech | openclaw infer tts convert --text "..." --output ./speech.mp3 --json |
tts status only runs through the gateway |
| Generate a video | openclaw infer video generate --prompt "..." --json |
Supports provider hints such as --resolution |
| Describe a video file | openclaw infer video describe --file ./clip.mp4 --json |
--model must be <provider/model> |
| Search the web | openclaw infer web search --query "..." --json |
|
| Fetch a web page | openclaw infer web fetch --url https://example.com --json |
|
| Create embeddings | openclaw infer embedding create --text "..." --json |
Behavior
- Use
--jsonwhen the output feeds another command or script; text output otherwise. - Use
--provideror--model provider/modelto pin a specific backend. - Use
model run --thinking <level>for a one-shot thinking/reasoning override:off,minimal,low,medium,high,adaptive,xhigh, ormax. - For
image describe,audio transcribe, andvideo describe,--modelmust use the form<provider/model>. - For
image describe,--fileaccepts local paths and HTTP(S) URLs; remote URLs go through the normal media-fetch SSRF policy. - Stateless execution commands (
model run,image *,audio *,video *,web *,embedding *) default to local. Gateway-managed state commands (tts status) default to gateway. - The local path never requires the gateway to be running.
- Local
model runis a lean one-shot provider completion: it resolves the configured agent model and auth but does not start a chat-agent turn, load tools, or open bundled MCP servers. model run --fileattaches image files (auto-detected MIME type) to the prompt; repeat--filefor multiple images. Non-image files are rejected — useinfer audio transcribeorinfer video describeinstead.model run --gatewayexercises Gateway routing, saved auth, provider selection, and the embedded runtime, but stays a raw model probe: no prior session transcript, bootstrap/AGENTS context, tools, or bundled MCP servers.model run --gateway --model <provider/model>requires a trusted-operator gateway credential, because it asks the Gateway to run a one-off provider/model override.
Model
Text inference and model/provider inspection.
openclaw infer model run --prompt "Reply with exactly: smoke-ok" --json
openclaw infer model run --prompt "Summarize this changelog entry" --model openai/gpt-5.4 --json
openclaw infer model run --prompt "Describe this image in one sentence" --file ./photo.jpg --model google/gemini-2.5-flash --json
openclaw infer model run --prompt "Use more reasoning here" --thinking high --json
openclaw infer model providers --json
openclaw infer model inspect --model gpt-5.5 --json
Use full <provider/model> refs with --local to smoke-test one provider without starting the Gateway or loading the agent tool surface:
openclaw infer model run --local --model anthropic/claude-sonnet-4-6 --prompt "Reply with exactly: pong" --json
openclaw infer model run --local --model cerebras/zai-glm-4.7 --prompt "Reply with exactly: pong" --json
openclaw infer model run --local --model google/gemini-2.5-flash --prompt "Reply with exactly: pong" --json
openclaw infer model run --local --model groq/llama-3.1-8b-instant --prompt "Reply with exactly: pong" --json
openclaw infer model run --local --model mistral/mistral-medium-3-5 --prompt "Reply with exactly: pong" --json
openclaw infer model run --local --model mistral/mistral-small-latest --prompt "Reply with exactly: pong" --json
openclaw infer model run --local --model openai/gpt-5.5 --prompt "Reply with exactly: pong" --json
openclaw infer model run --local --model ollama/qwen2.5vl:7b --prompt "Describe this image." --file ./photo.jpg --json
Notes:
- Local
model runis the narrowest CLI smoke for provider/model/auth health: for non-ChatGPT-Codex providers it sends only the supplied prompt. - Local
model run --model <provider/model>can resolve exact bundled static-catalog rows (the same rowsopenclaw models list --allshows) before that provider is written to config. Provider auth is still required; missing credentials fail as auth errors, notUnknown model. - For Mistral Medium 3.5 reasoning probes, leave temperature unset/default. Mistral rejects
reasoning_effort="high"withtemperature: 0; use default temperature or a non-zero value such as0.7. - OpenAI ChatGPT/Codex OAuth (
openai-chatgpt-responsesAPI) local probes add a minimal system instruction so the transport can populate its requiredinstructionsfield — no full agent context, tools, memory, or session transcript. model run --fileattaches image content directly to the single user message. Common formats (PNG, JPEG, WebP) work when MIME type is detected asimage/*; unsupported or unrecognized files fail before the provider is called. Useinfer image describeinstead when you want OpenClaw's image-model routing and fallbacks rather than a direct multimodal-model probe.- The selected model must support image input; text-only models may reject the request at the provider layer.
model run --promptmust contain non-whitespace text; empty prompts are rejected before any provider or Gateway call.- Local
model runexits non-zero when the provider returns no text output, so unreachable providers and empty completions do not look like successful probes. - Use
model run --gatewayto test Gateway routing or agent-runtime setup while keeping the model input raw. Useopenclaw agentor a chat surface for full agent context, tools, memory, and session transcript. --thinking adaptivemaps to the completion-runtime levelmedium;--thinking maxmaps tomaxfor OpenAI models that support the native max effort, otherwisexhigh.model auth login,model auth logout, andmodel auth statusmanage saved provider auth state.
Image
Generation, edit, and description.
openclaw infer image generate --prompt "friendly lobster illustration" --json
openclaw infer image generate --prompt "cinematic product photo of headphones" --json
openclaw infer image generate --model openai/gpt-image-1.5 --output-format png --background transparent --prompt "simple red circle sticker on a transparent background" --json
openclaw infer image generate --model openai/gpt-image-2 --quality low --openai-moderation low --prompt "low-cost draft poster" --json
openclaw infer image generate --prompt "slow image backend" --timeout-ms 180000 --json
openclaw infer image edit --file ./logo.png --model openai/gpt-image-1.5 --output-format png --background transparent --prompt "keep the logo, remove the background" --json
openclaw infer image edit --file ./poster.png --prompt "make this a vertical story ad" --size 2160x3840 --aspect-ratio 9:16 --resolution 4K --json
openclaw infer image describe --file ./photo.jpg --json
openclaw infer image describe --file https://example.com/photo.png --json
openclaw infer image describe --file ./receipt.jpg --prompt "Extract the merchant, date, and total" --json
openclaw infer image describe-many --file ./before.png --file ./after.png --prompt "Compare the screenshots and list visible UI changes" --json
openclaw infer image describe --file ./ui-screenshot.png --model openai/gpt-5.4-mini --json
openclaw infer image describe --file ./photo.jpg --model ollama/qwen2.5vl:7b --prompt "Describe the image in one sentence" --timeout-ms 300000 --json
Notes:
-
Use
image editwhen starting from existing input files;--size,--aspect-ratio, or--resolutionadd geometry hints on providers/models that support them. -
--output-format png --background transparentwith--model openai/gpt-image-1.5gives transparent-background OpenAI PNG output;--openai-backgroundis an OpenAI-specific alias for the same hint. Providers that do not declare background support report it as an ignored override (seeignoredOverridesin the JSON envelope). -
--quality low|medium|high|autoworks for providers that support image-quality hints, including OpenAI. OpenAI also accepts--openai-moderation low|auto. -
image providers --jsonlists which bundled image providers are discoverable, configured, selected, and which generation/edit capabilities each exposes. -
image generate --model <provider/model> --jsonis the narrowest live smoke for image-generation changes:openclaw infer image providers --json openclaw infer image generate \ --model google/gemini-3.1-flash-image-preview \ --prompt "Minimal flat test image: one blue square on a white background, no text." \ --output ./openclaw-infer-image-smoke.png \ --jsonThe response reports
ok,provider,model,attempts, and written output paths. When--outputis set, the final extension may follow the provider's returned MIME type. -
For
image describeandimage describe-many, use--promptfor a task-specific instruction (OCR, comparison, UI inspection, concise captioning). -
Use
--timeout-msfor slow local vision models or cold Ollama starts. -
For
image describe, an explicit--model(must be an image-capable<provider/model>) runs first, then tries configuredagents.defaults.imageModel.fallbacksif that call fails. Input-preparation errors (missing file, unsupported URL) fail before any fallback attempt, and the model must be image-capable in the model catalog or provider config. -
For local Ollama vision models, pull the model first and set
OLLAMA_API_KEYto any placeholder value, for exampleollama-local. See Ollama.
Audio
File transcription (not realtime session management).
openclaw infer audio transcribe --file ./memo.m4a --json
openclaw infer audio transcribe --file ./team-sync.m4a --language en --prompt "Focus on names and action items" --json
openclaw infer audio transcribe --file ./memo.m4a --model openai/whisper-1 --json
--model must be <provider/model>.
TTS
Speech synthesis and TTS provider/persona state.
openclaw infer tts convert --text "hello from openclaw" --output ./hello.mp3 --json
openclaw infer tts convert --text "Your build is complete" --output ./build-complete.mp3 --json
openclaw infer tts providers --json
openclaw infer tts personas --json
openclaw infer tts status --json
Notes:
tts statusonly supports--gateway(it reflects gateway-managed TTS state).- Use
tts providers,tts voices,tts personas,tts set-provider, andtts set-personato inspect and configure TTS behavior.
Video
Generation and description.
openclaw infer video generate --prompt "cinematic sunset over the ocean" --json
openclaw infer video generate --prompt "slow drone shot over a forest lake" --resolution 768P --duration 6 --json
openclaw infer video describe --file ./clip.mp4 --json
openclaw infer video describe --file ./clip.mp4 --model openai/gpt-5.4-mini --json
Notes:
video generateaccepts--size,--aspect-ratio,--resolution,--duration,--audio,--watermark, and--timeout-ms, forwarded to the video-generation runtime.--modelmust be<provider/model>forvideo describe.
Web
Search and fetch.
openclaw infer web search --query "OpenClaw docs" --json
openclaw infer web search --query "OpenClaw infer web providers" --json
openclaw infer web fetch --url https://docs.openclaw.ai/cli/infer --json
openclaw infer web providers --json
web providers lists available, configured, and selected providers for search and fetch.
Embedding
Vector creation and embedding-provider inspection.
openclaw infer embedding create --text "friendly lobster" --json
openclaw infer embedding create --text "customer support ticket: delayed shipment" --model openai/text-embedding-3-large --json
openclaw infer embedding providers --json
JSON output
Infer commands normalize JSON output under a shared envelope:
{
"ok": true,
"capability": "image.generate",
"transport": "local",
"provider": "openai",
"model": "gpt-image-2",
"attempts": [],
"outputs": []
}
Stable top-level fields:
okcapabilitytransportprovidermodelattemptsinputs(image attachments sent with the request, when applicable)outputsignoredOverrides(hint keys a provider does not support, when applicable)error
For generated media commands, outputs contains files written by OpenClaw. Use the path, mimeType, size, and any media-specific dimensions in that array for automation instead of parsing human-readable stdout.
Common pitfalls
# Bad
openclaw infer media image generate --prompt "friendly lobster"
# Good
openclaw infer image generate --prompt "friendly lobster"
# Bad
openclaw infer audio transcribe --file ./memo.m4a --model whisper-1 --json
# Good
openclaw infer audio transcribe --file ./memo.m4a --model openai/whisper-1 --json