diff --git a/docs/guideline.html b/docs/guideline.html index 1c0d1d3..4029e6d 100644 --- a/docs/guideline.html +++ b/docs/guideline.html @@ -244,18 +244,19 @@ Verify installation
-
3 Data Preparation
- Split directory format - Item JSON schema - Split modes -
-
-
4 Quick Start
+
3 Quick Start
+ Your first demo Train a skill Evaluate a skill Output structure Auto-resume
+
+
4 Run on Your Own Data
+ Split directory format + Item JSON schema + Split modes +
5 How It Works
The training loop @@ -374,7 +375,7 @@ skillopt/ # the package @@ -438,49 +439,44 @@ skillopt/ # the package
python -c "import skillopt; print('SkillOpt ready!')"
- -
-

3.1 Split Directory Format #

-

With env.split_mode: split_dir (the recommended, deterministic mode), SkillOpt reads a directory containing train/, val/, and test/ subfolders, each holding a JSON array of task items:

-
data/my_split/
- ├─ train/items.json   # used for rollout (the "train split")
- ├─ val/items.json     # selection split → validation gate (valid_seen)
- └─ test/items.json    # held-out final eval (valid_unseen)
-
Split naming -

Internally the splits are referred to as train, valid_seen (validation/selection), and valid_unseen (test). The --split flag of eval_only.py uses these names.

-
+ +
+

3.1 Your First Demo #

+

What ships in this repo: ready-to-use configs and + pretrained skills (ckpt/) for six benchmarks, plus + lightweight ID manifests under data/. The manifests + list which examples each split uses but do not contain + the example contents — so for most benchmarks you materialize the data + once before training (see below).

+

Fastest out-of-the-box run — ALFWorld. Its bundled + split (data/alfworld_path_split) is directly usable; you + only need the ALFWorld game files:

+
pip install -e ".[alfworld]"
+alfworld-download
+export ALFWORLD_DATA=~/.cache/alfworld   # data root containing json_2.1.1
+
+python scripts/train.py \
+    --config configs/alfworld/default.yaml \
+    --split_dir data/alfworld_path_split \
+    --azure_openai_endpoint https://your-resource.openai.azure.com/ \
+    --optimizer_model gpt-5.5 \
+    --target_model gpt-5.5
+

Other benchmarks (e.g. SearchQA) require a one-time + data materialization step: download the raw dataset from the source + listed in data/README.md, + match the manifest IDs to raw examples (the README documents the lookup + key per benchmark), and write the resulting + train/val/test item files into a split directory. Then run + the commands in §3.2 with --split_dir pointing at it. The + required item fields are documented in §4.2.

+

To sanity-check your setup without training, evaluate a + packaged pretrained skill instead (§3.3 uses + ckpt/searchqa/gpt5.5_skill.md), or launch the monitoring + WebUI (§8.4).

-
-

3.2 Item JSON Schema #

-

Required fields depend on the benchmark; consult skillopt/envs/<benchmark>/dataloader.py for the exact contract. A SearchQA item, for example:

-
[
-  {
-    "id":       "unique_item_id",
-    "question": "Who wrote the novel ...",
-    "context":  "[DOC] relevant passage text ...",
-    "answers":  ["expected answer"]
-  }
-]
-
Datasets not included -

This repository ships no benchmark data. Prepare your own splits in the format above before training.

-
-
- -
-

3.3 Split Modes #

-
- - - - - -
env.split_modeBehavior
split_dirUse a pre-built directory with explicit train/val/test folders (set env.split_dir). Deterministic and reproducible.
ratioBuild a deterministic split on the fly from a single env.data_path, using split_seed (and a train:val:test ratio). Convenient for quick experiments.
-
- -
-

4.1 Train a Skill #

+

3.2 Train a Skill #

# Minimal SearchQA run
 python scripts/train.py \
     --config configs/searchqa/default.yaml \
@@ -504,7 +500,7 @@ skillopt/           # the package
     
-

4.2 Evaluate a Skill #

+

3.3 Evaluate a Skill #

Evaluate any skill document (a packaged reference skill, or a trained run's best_skill.md) without training:

# Evaluate the packaged GPT-5.5 SearchQA skill on the test split
 python scripts/eval_only.py \
@@ -525,7 +521,7 @@ skillopt/           # the package
     
-

4.3 Output Structure #

+

3.4 Output Structure #

outputs/<run_name>/
  ├─ config.json          # flattened runtime config
  ├─ history.json         # per-step training history
@@ -538,10 +534,58 @@ skillopt/           # the package
     
-

4.4 Auto-Resume #

+

3.5 Auto-Resume #

Each completed step persists its state to runtime_state.json and a steps/step_XXXX/ directory. Re-running the same command against the same out_root detects finished work and continues from the last completed step — including epoch-boundary slow-update and meta-skill stages.

+ +
+

4.1 Split Directory Format #

+

Bringing your own dataset takes three steps: + (1) create a split directory with train/ val/ test/ item + files in the format below; (2) make sure each item carries the fields + the closest existing benchmark adapter expects (§4.2); (3) point + --split_dir at it and train with that benchmark's config. + If no existing adapter matches your task shape (different rollout or + scoring logic), write a new benchmark adapter instead — see §7.2.

+ +

With env.split_mode: split_dir (the recommended, deterministic mode), SkillOpt reads a directory containing train/, val/, and test/ subfolders, each holding a JSON array of task items:

+
data/my_split/
+ ├─ train/items.json   # used for rollout (the "train split")
+ ├─ val/items.json     # selection split → validation gate (valid_seen)
+ └─ test/items.json    # held-out final eval (valid_unseen)
+
Split naming +

Internally the splits are referred to as train, valid_seen (validation/selection), and valid_unseen (test). The --split flag of eval_only.py uses these names.

+
+
+ +
+

4.2 Item JSON Schema #

+

Required fields depend on the benchmark; consult skillopt/envs/<benchmark>/dataloader.py for the exact contract. A SearchQA item, for example:

+
[
+  {
+    "id":       "unique_item_id",
+    "question": "Who wrote the novel ...",
+    "context":  "[DOC] relevant passage text ...",
+    "answers":  ["expected answer"]
+  }
+]
+
Datasets not included +

This repository ships no benchmark data. Prepare your own splits in the format above before training.

+
+
+ +
+

4.3 Split Modes #

+
+ + + + + +
env.split_modeBehavior
split_dirUse a pre-built directory with explicit train/val/test folders (set env.split_dir). Deterministic and reproducible.
ratioBuild a deterministic split on the fly from a single env.data_path, using split_seed (and a train:val:test ratio). Convenient for quick experiments.
+
+

5.1 The Training Loop #

@@ -749,7 +793,7 @@ skillopt/ # the package namestr""Benchmark name (searchqa, docvqa, alfworld, …). Selects the env module. skill_initstr""Path to a seed skill (empty = start from scratch). - split_modestrratioratio or split_dir (see §3.3). + split_modestrratioratio or split_dir (see §4.3). split_dirstr""Pre-split directory (when split_mode = split_dir). data_pathstr""Single dataset path (when split_mode = ratio). split_seedint42Seed for deterministic ratio splitting.