mirror of
https://github.com/microsoft/SkillOpt.git
synced 2026-07-03 22:24:36 +08:00
docs/reference/api.md previously documented a fictional EnvAdapter API
(execute / evaluate / build_prompt + DataItem / TaskResult) and a
BENCHMARK_REGISTRY that never existed in code. Anyone following the
documented contract would hit ImportError or TypeError on the first
instantiation.
Replace both pages with the real shape from skillopt/envs/base.py and
skillopt/datasets/base.py:
- EnvAdapter: build_train_env, build_eval_env, rollout, reflect,
get_task_types (the 5 actual abstract methods).
- Rollout dicts: id / hard / soft required; everything else preserved
into RolloutResult.extras.
- Reflect dicts: {patch, source_type} schema as consumed by
run_minibatch_reflect.
- BatchSpec: slotted-but-mutable dataclass matching the actual
definition (payload defaults to None, metadata to dict()).
- SplitDataLoader.load_split_items as the one mandatory loader method.
- Registry: _ENV_REGISTRY in scripts/train.py (lazy try/except
ImportError block), not a non-existent BENCHMARK_REGISTRY in
skillopt/envs/__init__.py.
- _base_: documented as a string path, since the current YAML loader
only accepts strings.
The new-benchmark.md guide now walks through a docfaithful worked
example with a real rollout helper (chat_target + scorer) instead of
hand-waving over the rollout step. Refs microsoft/SkillOpt#30.
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
196 lines
6.7 KiB
Markdown
196 lines
6.7 KiB
Markdown
# API Reference
|
|
|
|
This page documents the public Python API SkillOpt exposes for **extending the
|
|
framework** with new environments / benchmarks. For ready-made adapters,
|
|
browse [`skillopt/envs/`](https://github.com/microsoft/SkillOpt/tree/main/skillopt/envs).
|
|
|
|
> **Source of truth.** The classes below are real Python ABCs defined in
|
|
> `skillopt/envs/base.py`, `skillopt/datasets/base.py`, `skillopt/types.py`,
|
|
> and `skillopt/evaluation/gate.py`. If this page ever drifts, the code
|
|
> wins — please open an issue.
|
|
|
|
---
|
|
|
|
## Core Classes
|
|
|
|
### `EnvAdapter`
|
|
|
|
`skillopt/envs/base.py` — abstract adapter that connects the SkillOpt
|
|
trainer to an environment (benchmark, simulator, REST API, ...).
|
|
Subclasses **must** implement the five abstract methods below.
|
|
|
|
```python
|
|
from abc import ABC, abstractmethod
|
|
from skillopt.datasets.base import BaseDataLoader, BatchSpec
|
|
|
|
class EnvAdapter(ABC):
|
|
|
|
# ── Lifecycle hooks (have defaults; override only if needed) ────────
|
|
|
|
def setup(self, cfg: dict) -> None: ...
|
|
def get_dataloader(self) -> BaseDataLoader | None: ...
|
|
def requires_ray(self) -> bool: ... # default False
|
|
|
|
# ── Abstract methods (subclasses MUST implement) ────────────────────
|
|
|
|
@abstractmethod
|
|
def build_train_env(self, batch_size: int, seed: int, **kwargs):
|
|
"""Return an environment-manager object to be passed to rollout()."""
|
|
|
|
@abstractmethod
|
|
def build_eval_env(self, env_num: int, split: str, seed: int, **kwargs):
|
|
"""Like build_train_env() but for a fixed eval split."""
|
|
|
|
@abstractmethod
|
|
def rollout(self, env_manager, skill_content: str,
|
|
out_dir: str, **kwargs) -> list[dict]:
|
|
"""Run a batch of episodes with the current skill.
|
|
|
|
Each returned dict MUST contain:
|
|
- "id": str episode/task identifier
|
|
- "hard": int (0|1) pass/fail (may be float 0.0-1.0 if smoothed)
|
|
- "soft": float partial-credit score in [0.0, 1.0]
|
|
It MAY contain env-specific extra keys (parsed into RolloutResult.extras).
|
|
"""
|
|
|
|
@abstractmethod
|
|
def reflect(self, results: list[dict], skill_content: str,
|
|
out_dir: str, **kwargs) -> list[dict | None]:
|
|
"""Turn rollout results into a list of raw patch dicts.
|
|
|
|
Each dict (or None to drop the slot) MUST contain:
|
|
- "patch": {"edits": [...]} a Patch.to_dict() payload
|
|
- "source_type": "failure" | "success"
|
|
"""
|
|
|
|
@abstractmethod
|
|
def get_task_types(self) -> list[str]:
|
|
"""Distinct task-type strings used for stratified sampling."""
|
|
```
|
|
|
|
The trainer also calls a few default-implemented helpers on every adapter:
|
|
`build_reference_text`, `get_reference_metadata`, `attach_reference_context`,
|
|
`select_representative_items`, and `build_env_from_batch`. Read the docstrings
|
|
in `skillopt/envs/base.py` if you need to override any of these — most
|
|
benchmarks don't.
|
|
|
|
### `BaseDataLoader` / `SplitDataLoader`
|
|
|
|
`skillopt/datasets/base.py` — episode-planning loaders.
|
|
|
|
```python
|
|
class BaseDataLoader(ABC):
|
|
def setup(self, cfg: dict) -> None: ...
|
|
@abstractmethod
|
|
def build_train_batch(self, batch_size: int, seed: int, **kwargs) -> BatchSpec: ...
|
|
@abstractmethod
|
|
def build_eval_batch(self, env_num: int, split: str, seed: int, **kwargs) -> BatchSpec: ...
|
|
|
|
class SplitDataLoader(BaseDataLoader):
|
|
"""Concrete base for dataset-backed envs with on-disk train/val/test splits.
|
|
|
|
Subclasses only need to implement load_split_items() (and optionally
|
|
load_raw_items() if you also want ``split_mode='ratio'``).
|
|
"""
|
|
def load_split_items(self, split_path: str) -> list[dict]: ...
|
|
def load_raw_items(self, data_path: str) -> list[dict]: ... # optional
|
|
```
|
|
|
|
`SplitDataLoader` handles two layout modes:
|
|
|
|
| `split_mode` | What it expects |
|
|
|---|---|
|
|
| `"split_dir"` | A directory with `train/`, `val/`, `test/` subdirs already split. |
|
|
| `"ratio"` | A raw dataset path + `split_ratio: "2:1:7"` style string. |
|
|
|
|
In either case the items returned by `load_split_items()` are plain
|
|
`dict` objects with at minimum an `"id"` key.
|
|
|
|
### `BatchSpec`
|
|
|
|
`skillopt/datasets/base.py` — a slotted dataclass describing one batch
|
|
request the trainer hands to the adapter.
|
|
|
|
```python
|
|
@dataclass(slots=True)
|
|
class BatchSpec:
|
|
phase: str # "train" | "eval"
|
|
split: str # "train" | "val" | "test" | "valid_seen" | ...
|
|
seed: int
|
|
batch_size: int
|
|
payload: object | None = None # what the loader produced (e.g. list[dict])
|
|
metadata: dict = field(default_factory=dict)
|
|
```
|
|
|
|
### `Edit` / `Patch`
|
|
|
|
`skillopt/types.py` — the I/O types Reflect / Aggregate / Update produce
|
|
and consume.
|
|
|
|
```python
|
|
EditOp = Literal["append", "insert_after", "replace", "delete"]
|
|
|
|
@dataclass
|
|
class Edit:
|
|
op: EditOp
|
|
content: str = ""
|
|
target: str = ""
|
|
support_count: int | None = None
|
|
source_type: Literal["failure", "success"] | None = None
|
|
merge_level: int | None = None
|
|
update_origin: str = ""
|
|
update_target: str = ""
|
|
|
|
@dataclass
|
|
class Patch:
|
|
edits: list[Edit] = field(default_factory=list)
|
|
reasoning: str = ""
|
|
ranking_details: dict[str, Any] | None = None
|
|
```
|
|
|
|
Both types support `to_dict()` / `from_dict()` for serialization.
|
|
|
|
### `RolloutResult`
|
|
|
|
`skillopt/types.py` — the normalised rollout return type. The trainer
|
|
calls `RolloutResult.from_dict(...)` on each dict returned from
|
|
`EnvAdapter.rollout()`, so the only **hard** requirement on those dicts is
|
|
the three keys above (`id`, `hard`, `soft`). Extra fields are preserved
|
|
into `RolloutResult.extras`.
|
|
|
|
### `GateResult` / `GateAction`
|
|
|
|
`skillopt/evaluation/gate.py` — the validation-gate decision types
|
|
returned each epoch.
|
|
|
|
---
|
|
|
|
## Registering an environment
|
|
|
|
Environments are not registered via decorators or a `BENCHMARK_REGISTRY`
|
|
dict. The trainer keeps a lazy registry inside `scripts/train.py` —
|
|
`_ENV_REGISTRY` — populated by `_register_builtins()`. To add a new env
|
|
you append a `try / except ImportError` block there. See
|
|
[Add a New Benchmark](../guide/new-benchmark.md) for the full step-by-step.
|
|
|
|
---
|
|
|
|
## Backends (model layer)
|
|
|
|
The model layer lives under `skillopt.model.*`. Backends are selected
|
|
via `model.optimizer_backend` and `model.target_backend` in the config —
|
|
not via a base class subclass. Supported values (as of this writing):
|
|
|
|
| Backend | Optimizer? | Target? |
|
|
|---|---|---|
|
|
| `openai_chat` | ✓ | ✓ |
|
|
| `claude_chat` | ✓ | ✓ |
|
|
| `qwen_chat` | ✓ | ✓ |
|
|
| `minimax_chat` | ✓ | ✓ |
|
|
| `codex_exec` | — | ✓ |
|
|
| `claude_code_exec` | — | ✓ |
|
|
|
|
See `skillopt/model/backend_config.py` for the live whitelist and
|
|
[`docs/reference/config.md`](./config.md) for the per-backend
|
|
configuration keys.
|