microsoft-SkillOpt/docs/reference/api.md

# API Reference

This page documents the public Python API SkillOpt exposes for **extending the
framework** with new environments / benchmarks. For ready-made adapters,
browse [`skillopt/envs/`](https://github.com/microsoft/SkillOpt/tree/main/skillopt/envs).

> **Source of truth.** The classes below are real Python ABCs defined in
> `skillopt/envs/base.py`, `skillopt/datasets/base.py`, `skillopt/types.py`,
> and `skillopt/evaluation/gate.py`. If this page ever drifts, the code
> wins — please open an issue.

---

## Core Classes

### `EnvAdapter`

`skillopt/envs/base.py` — abstract adapter that connects the SkillOpt
trainer to an environment (benchmark, simulator, REST API, ...).
Subclasses **must** implement the five abstract methods below.

```python
from abc import ABC, abstractmethod
from skillopt.datasets.base import BaseDataLoader, BatchSpec

class EnvAdapter(ABC):

    # ── Lifecycle hooks (have defaults; override only if needed) ────────

    def setup(self, cfg: dict) -> None: ...
    def get_dataloader(self) -> BaseDataLoader | None: ...
    def requires_ray(self) -> bool: ...                 # default False

    # ── Abstract methods (subclasses MUST implement) ────────────────────

    @abstractmethod
    def build_train_env(self, batch_size: int, seed: int, **kwargs):
        """Return an environment-manager object to be passed to rollout()."""

    @abstractmethod
    def build_eval_env(self, env_num: int, split: str, seed: int, **kwargs):
        """Like build_train_env() but for a fixed eval split."""

    @abstractmethod
    def rollout(self, env_manager, skill_content: str,
                out_dir: str, **kwargs) -> list[dict]:
        """Run a batch of episodes with the current skill.

        Each returned dict MUST contain:
          - "id":   str        episode/task identifier
          - "hard": int (0|1)  pass/fail (may be float 0.0-1.0 if smoothed)
          - "soft": float      partial-credit score in [0.0, 1.0]
        It MAY contain env-specific extra keys (parsed into RolloutResult.extras).
        """

    @abstractmethod
    def reflect(self, results: list[dict], skill_content: str,
                out_dir: str, **kwargs) -> list[dict | None]:
        """Turn rollout results into a list of raw patch dicts.

        Each dict (or None to drop the slot) MUST contain:
          - "patch":       {"edits": [...]}     a Patch.to_dict() payload
          - "source_type": "failure" | "success"
        """

    @abstractmethod
    def get_task_types(self) -> list[str]:
        """Distinct task-type strings used for stratified sampling."""
```

The trainer also calls a few default-implemented helpers on every adapter:
`build_reference_text`, `get_reference_metadata`, `attach_reference_context`,
`select_representative_items`, and `build_env_from_batch`. Read the docstrings
in `skillopt/envs/base.py` if you need to override any of these — most
benchmarks don't.

### `BaseDataLoader` / `SplitDataLoader`

`skillopt/datasets/base.py` — episode-planning loaders.

```python
class BaseDataLoader(ABC):
    def setup(self, cfg: dict) -> None: ...
    @abstractmethod
    def build_train_batch(self, batch_size: int, seed: int, **kwargs) -> BatchSpec: ...
    @abstractmethod
    def build_eval_batch(self, env_num: int, split: str, seed: int, **kwargs) -> BatchSpec: ...

class SplitDataLoader(BaseDataLoader):
    """Concrete base for dataset-backed envs with on-disk train/val/test splits.

    Subclasses only need to implement load_split_items() (and optionally
    load_raw_items() if you also want ``split_mode='ratio'``).
    """
    def load_split_items(self, split_path: str) -> list[dict]: ...
    def load_raw_items(self, data_path: str) -> list[dict]: ...   # optional
```

`SplitDataLoader` handles two layout modes:

| `split_mode` | What it expects |
|---|---|
| `"split_dir"` | A directory with `train/`, `val/`, `test/` subdirs already split. |
| `"ratio"` | A raw dataset path + `split_ratio: "2:1:7"` style string. |

In either case the items returned by `load_split_items()` are plain
`dict` objects with at minimum an `"id"` key.

### `BatchSpec`

`skillopt/datasets/base.py` — a slotted dataclass describing one batch
request the trainer hands to the adapter.

```python
@dataclass(slots=True)
class BatchSpec:
    phase: str                 # "train" | "eval"
    split: str                 # "train" | "val" | "test" | "valid_seen" | ...
    seed: int
    batch_size: int
    payload: object | None = None     # what the loader produced (e.g. list[dict])
    metadata: dict = field(default_factory=dict)
```

### `Edit` / `Patch`

`skillopt/types.py` — the I/O types Reflect / Aggregate / Update produce
and consume.

```python
EditOp = Literal["append", "insert_after", "replace", "delete"]

@dataclass
class Edit:
    op: EditOp
    content: str = ""
    target: str = ""
    support_count: int | None = None
    source_type: Literal["failure", "success"] | None = None
    merge_level: int | None = None
    update_origin: str = ""
    update_target: str = ""

@dataclass
class Patch:
    edits: list[Edit] = field(default_factory=list)
    reasoning: str = ""
    ranking_details: dict[str, Any] | None = None
```

Both types support `to_dict()` / `from_dict()` for serialization.

### `RolloutResult`

`skillopt/types.py` — the normalised rollout return type. The trainer
calls `RolloutResult.from_dict(...)` on each dict returned from
`EnvAdapter.rollout()`, so the only **hard** requirement on those dicts is
the three keys above (`id`, `hard`, `soft`). Extra fields are preserved
into `RolloutResult.extras`.

### `GateResult` / `GateAction`

`skillopt/evaluation/gate.py` — the validation-gate decision types
returned each epoch.

---

## Registering an environment

Environments are not registered via decorators or a `BENCHMARK_REGISTRY`
dict. The trainer keeps a lazy registry inside `scripts/train.py` —
`_ENV_REGISTRY` — populated by `_register_builtins()`. To add a new env
you append a `try / except ImportError` block there. See
[Add a New Benchmark](../guide/new-benchmark.md) for the full step-by-step.

---

## Backends (model layer)

The model layer lives under `skillopt.model.*`. Backends are selected
via `model.optimizer_backend` and `model.target_backend` in the config —
not via a base class subclass. Supported values (as of this writing):

| Backend | Optimizer? | Target? |
|---|---|---|
| `openai_chat` | ✓ | ✓ |
| `claude_chat` | ✓ | ✓ |
| `qwen_chat` | ✓ | ✓ |
| `minimax_chat` | ✓ | ✓ |
| `codex_exec` | — | ✓ |
| `claude_code_exec` | — | ✓ |

See `skillopt/model/backend_config.py` for the live whitelist and
[`docs/reference/config.md`](./config.md) for the per-backend
configuration keys.