github/microsoft-SkillOpt

mirror of https://github.com/microsoft/SkillOpt.git synced 2026-07-03 14:02:58 +08:00

Files

kaikai-macbook 41012e2d5e Support Qwen chat as optimizer backend

2026-06-01 16:44:49 +08:00

4.5 KiB

Raw Permalink Blame History

Configuration Reference

Complete reference for all SkillOpt configuration parameters.

Model

Parameter	Type	Default	Description
`model.backend`	str	`azure_openai`	Backend: `azure_openai` / `openai_chat` / `claude_code_exec` / `qwen`
`model.optimizer`	str	`gpt-5.5`	Optimizer model (for reflection & slow update)
`model.target`	str	`gpt-5.5`	Target model (for rollout execution)
`model.reasoning_effort`	str	`medium`	Reasoning effort level
`model.optimizer_backend`	str	`openai_chat`	Optimizer backend: `openai_chat` / `claude_chat` / `qwen_chat` / `minimax_chat`
`model.target_backend`	str	`openai_chat`	Target backend: chat backends plus execution harnesses
`model.qwen_chat_base_url`	str	`http://localhost:8000/v1`	Shared Qwen/vLLM OpenAI-compatible endpoint
`model.qwen_chat_enable_thinking`	bool	`false`	Shared Qwen thinking flag
`model.optimizer_qwen_chat_base_url`	str	—	Optimizer-specific Qwen/vLLM endpoint; overrides shared `qwen_chat_base_url`
`model.target_qwen_chat_base_url`	str	—	Target-specific Qwen/vLLM endpoint; overrides shared `qwen_chat_base_url`

Training (`train`)

Parameter	Type	Default	DL Analogy	Description
`train.num_epochs`	int	4	Epochs	Number of training epochs
`train.batch_size`	int	40	Batch size	Tasks sampled per step
`train.accumulation`	int	1	Gradient accumulation	Accumulation rounds per step
`train.seed`	int	42	Random seed	Reproducibility seed

Gradient / Reflection (`gradient`)

Parameter	Type	Default	Description
`gradient.minibatch_size`	int	8	Reflect minibatch size
`gradient.merge_batch_size`	int	8	Patch merge batch size
`gradient.analyst_workers`	int	16	Parallel reflection workers
`gradient.max_analyst_rounds`	int	3	Max rounds of analyst reflection
`gradient.failure_only`	bool	`false`	Only reflect on failures

Optimizer (`optimizer`)

Parameter	Type	Default	DL Analogy	Description
`optimizer.learning_rate`	int	4	Learning rate	Max edit patches per step (edit budget)
`optimizer.min_learning_rate`	int	2	Min LR	Min edits for decay schedulers
`optimizer.lr_scheduler`	str	`cosine`	LR schedule	`constant` / `linear` / `cosine` / `autonomous`
`optimizer.skill_update_mode`	str	`patch`	—	`patch` / `rewrite_from_suggestions` / `full_rewrite_minibatch`
`optimizer.use_slow_update`	bool	`true`	Momentum	Epoch-boundary longitudinal comparison & guidance
`optimizer.slow_update_samples`	int	20	—	Samples for slow update evaluation
`optimizer.use_meta_skill`	bool	`true`	Meta-learning	Cross-epoch optimizer-side strategy memory
`optimizer.longitudinal_pair_policy`	str	`mixed`	—	`mixed` / `changed` / `unchanged`

Evaluation (`evaluation`)

Parameter	Type	Default	Description
`evaluation.use_gate`	bool	`true`	Enable validation gating (accept/reject updates)
`evaluation.eval_test`	bool	`true`	Run test evaluation after training

Environment (`env`)

Parameter	Type	Default	Description
`env.name`	str	—	Benchmark name (e.g., `searchqa`, `docvqa`)
`env.data_path`	str	—	Path to dataset
`env.skill_init`	str	—	Path to initial seed skill (optional)
`env.split_mode`	str	`ratio`	`ratio` or `split_dir`
`env.split_ratio`	str	`2:1:7`	Train:val:test ratio
`env.exec_timeout`	int	120	Per-task timeout in seconds
`env.out_root`	str	—	Output directory

Azure OpenAI Credentials

Variable	Description
`AZURE_OPENAI_ENDPOINT` / `model.azure_openai_endpoint`	Azure resource endpoint
`AZURE_OPENAI_API_KEY` / `model.azure_openai_api_key`	Azure API key
`OPENAI_API_KEY`	OpenAI API key (for `openai_chat` backend)
`ANTHROPIC_API_KEY`	Anthropic API key (for `claude_code_exec` backend)
`QWEN_CHAT_BASE_URL`	Shared local vLLM endpoint for `qwen_chat`
`QWEN_CHAT_MODEL`	Shared served model name for `qwen_chat`
`QWEN_CHAT_API_KEY`	Optional API key for the shared Qwen endpoint
`OPTIMIZER_QWEN_CHAT_BASE_URL`	Optimizer-specific local vLLM endpoint
`OPTIMIZER_QWEN_CHAT_MODEL`	Optimizer-specific served model name
`TARGET_QWEN_CHAT_BASE_URL`	Target-specific local vLLM endpoint
`TARGET_QWEN_CHAT_MODEL`	Target-specific served model name