mirror of
https://github.com/microsoft/SkillOpt.git
synced 2026-07-03 14:02:58 +08:00
fix: rename remaining teacher/student refs, remove .gradio from repo
- Fix teacher/student in deep_reflect, meta_reflect, sealqa, babyvision, mathverse, mmrb, swebench envs and prompt templates - Remove .gradio/certificate.pem from tracked files - Add .gradio/ to .gitignore Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@@ -39,3 +39,4 @@ docs/reflact_conda_env_export.yml
|
||||
docs/reflact_overview.html
|
||||
docs/render_ablation_paper_tables.py
|
||||
docs/让*
|
||||
.gradio/
|
||||
|
||||
@@ -1,31 +0,0 @@
|
||||
-----BEGIN CERTIFICATE-----
|
||||
MIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw
|
||||
TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
|
||||
cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4
|
||||
WhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu
|
||||
ZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY
|
||||
MTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc
|
||||
h77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+
|
||||
0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U
|
||||
A5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW
|
||||
T8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH
|
||||
B5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC
|
||||
B5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv
|
||||
KBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn
|
||||
OlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn
|
||||
jh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw
|
||||
qHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI
|
||||
rU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV
|
||||
HRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq
|
||||
hkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL
|
||||
ubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ
|
||||
3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK
|
||||
NFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5
|
||||
ORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur
|
||||
TkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC
|
||||
jNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc
|
||||
oyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq
|
||||
4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA
|
||||
mRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d
|
||||
emyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=
|
||||
-----END CERTIFICATE-----
|
||||
@@ -1,20 +1,20 @@
|
||||
You are an expert diagnostic-probe designer for ALFWorld embodied tasks.
|
||||
|
||||
You will design one short diagnostic instruction to append to the student's prompt
|
||||
You will design one short diagnostic instruction to append to the target's prompt
|
||||
for a handful of representative ALFWorld trajectories.
|
||||
|
||||
The goal is to expose whether the student has the right intermediate subgoal,
|
||||
The goal is to expose whether the target has the right intermediate subgoal,
|
||||
object/receptacle state, and next-step intention without substantially changing
|
||||
the current scaffold.
|
||||
|
||||
## Hard Constraints
|
||||
1. Do NOT substantially change the student's existing action-selection scaffold.
|
||||
1. Do NOT substantially change the target's existing action-selection scaffold.
|
||||
2. Do NOT prescribe a brand-new planner or long multi-step policy.
|
||||
3. Do NOT ask for exhaustive search over all objects or all admissible actions.
|
||||
4. Keep the diagnostic readout brief and place it inside the existing <think>...</think> block.
|
||||
5. The student must still output exactly one admissible action inside <action>...</action>.
|
||||
5. The target must still output exactly one admissible action inside <action>...</action>.
|
||||
6. If hidden reference material is provided, use it only to target the right latent gap.
|
||||
7. Never copy hidden reference content into the student-facing probe.
|
||||
7. Never copy hidden reference content into the target-facing probe.
|
||||
|
||||
## Good Probe Targets
|
||||
- current subgoal
|
||||
@@ -31,5 +31,5 @@ the current scaffold.
|
||||
Respond ONLY with a valid JSON object:
|
||||
{
|
||||
"reasoning": "<why this probe reveals the latent skill gap>",
|
||||
"probe_instruction": "<the exact instruction text to append to the student prompt>"
|
||||
"probe_instruction": "<the exact instruction text to append to the target prompt>"
|
||||
}
|
||||
|
||||
@@ -10,7 +10,7 @@ from skillopt.gradient.reflect import run_minibatch_reflect
|
||||
from skillopt.envs.base import EnvAdapter
|
||||
from skillopt.envs.babyvision.dataloader import BabyVisionDataLoader
|
||||
from skillopt.envs.babyvision.rollout import run_batch
|
||||
from skillopt.model import get_student_backend
|
||||
from skillopt.model import get_target_backend
|
||||
|
||||
|
||||
class BabyVisionAdapter(EnvAdapter):
|
||||
@@ -165,7 +165,7 @@ class BabyVisionAdapter(EnvAdapter):
|
||||
random_seed = kwargs.get("random_seed")
|
||||
step_buffer_context = kwargs.get("step_buffer_context", "")
|
||||
meta_skill_context = kwargs.get("meta_skill_context", "")
|
||||
codex_backend = get_student_backend() == "codex_exec"
|
||||
codex_backend = get_target_backend() == "codex_exec"
|
||||
selected_items = self.select_representative_items(
|
||||
results,
|
||||
env_manager if isinstance(env_manager, list) else None,
|
||||
|
||||
@@ -1,14 +1,14 @@
|
||||
You are an expert diagnostic-probe designer for BabyVision-style visual reasoning tasks.
|
||||
|
||||
You will be shown representative trajectories, the current student skill, and the student's original prompt context.
|
||||
Design one SMALL diagnostic instruction that exposes the student's intermediate visual judgment without materially changing the original scaffold.
|
||||
You will be shown representative trajectories, the current target skill, and the target's original prompt context.
|
||||
Design one SMALL diagnostic instruction that exposes the target's intermediate visual judgment without materially changing the original scaffold.
|
||||
|
||||
## Hard Constraints
|
||||
1. Do NOT substantially change the original scaffold.
|
||||
2. Do NOT prescribe a new step-by-step solving method.
|
||||
3. You MAY ask for a short structured list of a few intermediate conclusions, candidate cues, or counted units, as long as it stays close to the original scaffold.
|
||||
4. Do NOT ask for exhaustive listing of all cells, all objects, or a full chain-of-thought.
|
||||
5. Ask only for a short readout that reveals the student's current latent state.
|
||||
5. Ask only for a short readout that reveals the target's current latent state.
|
||||
6. Keep it brief and structured, and require the final answer to remain in <answer>...</answer>.
|
||||
|
||||
## Good Probe Targets
|
||||
@@ -21,5 +21,5 @@ Design one SMALL diagnostic instruction that exposes the student's intermediate
|
||||
Respond ONLY with a valid JSON object:
|
||||
{
|
||||
"reasoning": "<why this probe is informative>",
|
||||
"probe_instruction": "<the exact instruction text to append to the student prompt>"
|
||||
"probe_instruction": "<the exact instruction text to append to the target prompt>"
|
||||
}
|
||||
|
||||
@@ -8,8 +8,8 @@ import os
|
||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||
|
||||
from skillopt.envs.babyvision.evaluator import evaluate_item, evaluation_mode, extract_boxed_answer
|
||||
from skillopt.model import chat_student_messages, get_student_backend, is_student_exec_backend
|
||||
from skillopt.model.codex_harness import prepare_workspace, render_skill_md, run_student_exec
|
||||
from skillopt.model import chat_target_messages, get_target_backend, is_target_exec_backend
|
||||
from skillopt.model.codex_harness import prepare_workspace, render_skill_md, run_target_exec
|
||||
from skillopt.prompts import load_prompt
|
||||
|
||||
def _build_system(skill_content: str) -> str:
|
||||
@@ -137,11 +137,11 @@ def _run_codex_once(
|
||||
images=[item["image_path"]],
|
||||
)
|
||||
prompt = (
|
||||
"Use the `skillopt-student` skill available in this workspace.\n"
|
||||
"Use the `skillopt-target` skill available in this workspace.\n"
|
||||
"Read `task.md`, inspect the attached image, and answer the question.\n"
|
||||
"Return the final answer in \\boxed{...}."
|
||||
)
|
||||
final_message, raw = run_student_exec(
|
||||
final_message, raw = run_target_exec(
|
||||
work_dir=work_dir,
|
||||
prompt=prompt,
|
||||
model=model,
|
||||
@@ -195,7 +195,7 @@ def process_one(
|
||||
pred_dir = os.path.join(out_root, "predictions", item_id)
|
||||
os.makedirs(pred_dir, exist_ok=True)
|
||||
|
||||
if is_student_exec_backend():
|
||||
if is_target_exec_backend():
|
||||
from skillopt.model import azure_openai as _llm
|
||||
|
||||
response = ""
|
||||
@@ -209,7 +209,7 @@ def process_one(
|
||||
pred_dir=pred_dir,
|
||||
item=item,
|
||||
skill_content=skill_content,
|
||||
model=_llm.STUDENT_DEPLOYMENT,
|
||||
model=_llm.TARGET_DEPLOYMENT,
|
||||
timeout=120,
|
||||
image_detail=image_detail,
|
||||
diagnostic_mode=diagnostic_mode if turn == 0 else False,
|
||||
@@ -224,9 +224,9 @@ def process_one(
|
||||
result["response"] = response
|
||||
result["agent_ok"] = True
|
||||
result["n_turns"] = len(conversation) - 1
|
||||
with open(os.path.join(pred_dir, "student_system_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
with open(os.path.join(pred_dir, "target_system_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
f.write(system_prompt)
|
||||
with open(os.path.join(pred_dir, "student_user_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
with open(os.path.join(pred_dir, "target_user_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
f.write(user_text)
|
||||
|
||||
eval_result = evaluate_item(
|
||||
@@ -299,7 +299,7 @@ def process_one(
|
||||
|
||||
for turn in range(max_turns):
|
||||
if turn == 0:
|
||||
resp_text, _ = chat_student_messages(
|
||||
resp_text, _ = chat_target_messages(
|
||||
messages=messages,
|
||||
max_completion_tokens=768,
|
||||
retries=5,
|
||||
@@ -317,7 +317,7 @@ def process_one(
|
||||
{"role": "assistant", "content": response},
|
||||
{"role": "user", "content": refinement_text},
|
||||
]
|
||||
resp_text, _ = chat_student_messages(
|
||||
resp_text, _ = chat_target_messages(
|
||||
messages=refinement_messages,
|
||||
max_completion_tokens=512,
|
||||
retries=5,
|
||||
@@ -332,9 +332,9 @@ def process_one(
|
||||
result["agent_ok"] = True
|
||||
result["n_turns"] = len(conversation) - 1
|
||||
|
||||
with open(os.path.join(pred_dir, "student_system_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
with open(os.path.join(pred_dir, "target_system_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
f.write(system_prompt)
|
||||
with open(os.path.join(pred_dir, "student_user_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
with open(os.path.join(pred_dir, "target_user_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
f.write(user_text)
|
||||
|
||||
eval_result = evaluate_item(
|
||||
|
||||
@@ -21,7 +21,7 @@ def run_no_reference_deep_reflect(
|
||||
output_requirements: list[str] | None = None,
|
||||
metadata_builder: Callable[[dict], dict] | None = None,
|
||||
) -> list[dict | None]:
|
||||
"""Run teacher-designed diagnostic probing without hidden references."""
|
||||
"""Run optimizer-designed diagnostic probing without hidden references."""
|
||||
if not getattr(adapter, "use_deep_reflect", False):
|
||||
return []
|
||||
if not isinstance(env_manager, list):
|
||||
|
||||
@@ -1,13 +1,13 @@
|
||||
You are an expert diagnostic-probe designer for theorem-grounded mathematical multiple-choice tasks.
|
||||
|
||||
You will be shown representative trajectories, the current student skill, and the student's original prompt context.
|
||||
Design one SMALL diagnostic instruction that exposes the student's intermediate judgment without materially changing the original scaffold.
|
||||
You will be shown representative trajectories, the current target skill, and the target's original prompt context.
|
||||
Design one SMALL diagnostic instruction that exposes the target's intermediate judgment without materially changing the original scaffold.
|
||||
|
||||
## Hard Constraints
|
||||
1. Do NOT substantially change the original scaffold.
|
||||
2. Do NOT prescribe a new multi-step theorem-solving procedure.
|
||||
3. Do NOT ask for a full proof, full chain-of-thought, or exhaustive option-by-option derivation.
|
||||
4. Ask only for a short readout of the signals already behind the student's current answer.
|
||||
4. Ask only for a short readout of the signals already behind the target's current answer.
|
||||
5. Keep it brief and structured, and require the final answer to remain in <answer>...</answer>.
|
||||
|
||||
## Good Probe Targets
|
||||
@@ -19,5 +19,5 @@ Design one SMALL diagnostic instruction that exposes the student's intermediate
|
||||
Respond ONLY with a valid JSON object:
|
||||
{
|
||||
"reasoning": "<why this probe is informative>",
|
||||
"probe_instruction": "<the exact instruction text to append to the student prompt>"
|
||||
"probe_instruction": "<the exact instruction text to append to the target prompt>"
|
||||
}
|
||||
|
||||
@@ -1,26 +1,26 @@
|
||||
You are an expert diagnostic-probe designer for theorem-grounded mathematical multiple-choice tasks executed through a Codex trace.
|
||||
|
||||
You will be shown representative trajectories, the current student skill, the student's original prompt context, hidden reference fields, and numbered Codex trace steps.
|
||||
Choose exactly one trajectory and one probe point. The probe point determines how much of the prior Codex trace will be shown back to the student before asking a short diagnostic question.
|
||||
You will be shown representative trajectories, the current target skill, the target's original prompt context, hidden reference fields, and numbered Codex trace steps.
|
||||
Choose exactly one trajectory and one probe point. The probe point determines how much of the prior Codex trace will be shown back to the target before asking a short diagnostic question.
|
||||
|
||||
## Hard Constraints
|
||||
1. Do NOT reveal or paraphrase the hidden reference directly to the student.
|
||||
1. Do NOT reveal or paraphrase the hidden reference directly to the target.
|
||||
2. Do NOT prescribe a new full solving procedure.
|
||||
3. Do NOT ask for a full proof, full chain-of-thought, or exhaustive option-by-option derivation.
|
||||
4. Ask only for a short readout of the signal that should already exist at that point in the student's process.
|
||||
4. Ask only for a short readout of the signal that should already exist at that point in the target's process.
|
||||
5. The probe instruction must explicitly request a short <analysis>...</analysis> block before the final <answer>...</answer>.
|
||||
6. Select a probe point that is informative about theorem choice, decisive constraint, option elimination, or why a stronger/weaker option should be rejected.
|
||||
|
||||
## Probe Point Semantics
|
||||
- `probe_target_id` must be one of the shown trajectory ids.
|
||||
- `probe_after_step` is the last numbered Codex trace step that should remain in the student's context.
|
||||
- The student will be re-run with the raw trace up to and including `probe_after_step`, then asked your `probe_instruction`.
|
||||
- `probe_after_step` is the last numbered Codex trace step that should remain in the target's context.
|
||||
- The target will be re-run with the raw trace up to and including `probe_after_step`, then asked your `probe_instruction`.
|
||||
- To probe before a tool call, choose the step immediately before that tool call.
|
||||
|
||||
Respond ONLY with a valid JSON object:
|
||||
{
|
||||
"reasoning": "<why this trajectory and probe point expose the student's intermediate state>",
|
||||
"reasoning": "<why this trajectory and probe point expose the target's intermediate state>",
|
||||
"probe_target_id": "<trajectory id>",
|
||||
"probe_after_step": <integer step number>,
|
||||
"probe_instruction": "<the exact instruction text to append to the student's prompt>"
|
||||
"probe_instruction": "<the exact instruction text to append to the target's prompt>"
|
||||
}
|
||||
|
||||
@@ -10,7 +10,7 @@ from skillopt.envs.mathverse.dataloader import MathVerseDataLoader
|
||||
from skillopt.envs.mathverse.rollout import run_batch
|
||||
from skillopt.gradient.deep_probe import generate_deep_probe_instruction
|
||||
from skillopt.gradient.reflect import run_minibatch_reflect
|
||||
from skillopt.model import get_student_backend
|
||||
from skillopt.model import get_target_backend
|
||||
|
||||
|
||||
class MathVerseAdapter(EnvAdapter):
|
||||
@@ -176,7 +176,7 @@ class MathVerseAdapter(EnvAdapter):
|
||||
selected_ids = {str(item["id"]) for item in selected_items}
|
||||
selected_results = [row for row in results if str(row.get("id")) in selected_ids]
|
||||
selected_examples = self.attach_reference_context(selected_results, selected_items)
|
||||
codex_backend = get_student_backend() == "codex_exec"
|
||||
codex_backend = get_target_backend() == "codex_exec"
|
||||
if codex_backend:
|
||||
selected_examples = self.attach_codex_probe_context(selected_examples, prediction_dir)
|
||||
selected_metadata = []
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
You are an expert failure-analysis agent for visual mathematical reasoning problems.
|
||||
|
||||
You will be given MULTIPLE failed trajectories from a single minibatch and the current skill document.
|
||||
Each trajectory includes the student's response, the evaluation result, and sometimes a hidden reference
|
||||
Each trajectory includes the target's response, the evaluation result, and sometimes a hidden reference
|
||||
containing the fuller Text Dominant version of the same problem.
|
||||
|
||||
Your job is to identify COMMON reasoning failures across the batch and propose concise skill edits.
|
||||
@@ -17,7 +17,7 @@ Your job is to identify COMMON reasoning failures across the batch and propose c
|
||||
1. Focus on patterns that recur across the minibatch.
|
||||
2. Prefer edits that improve visual grounding and exact answer selection.
|
||||
3. Do not hardcode problem-specific formulas or answers.
|
||||
4. If hidden reference text is present, use it only to infer what information the student failed to recover from the Text Lite version.
|
||||
4. If hidden reference text is present, use it only to infer what information the target failed to recover from the Text Lite version.
|
||||
|
||||
Respond ONLY with a valid JSON object:
|
||||
{
|
||||
|
||||
@@ -1,16 +1,16 @@
|
||||
You are an expert diagnostic-probe designer for visual mathematical reasoning tasks.
|
||||
|
||||
You will be shown representative trajectories, the current student skill, and the student's original prompt context.
|
||||
You will be shown representative trajectories, the current target skill, and the target's original prompt context.
|
||||
Some trajectories may also include a hidden reference containing the fuller Text Dominant wording of the same problem.
|
||||
Design one SMALL diagnostic instruction that exposes the student's intermediate judgment without materially changing the original scaffold.
|
||||
Design one SMALL diagnostic instruction that exposes the target's intermediate judgment without materially changing the original scaffold.
|
||||
|
||||
## Hard Constraints
|
||||
1. Do NOT substantially change the original scaffold.
|
||||
2. Do NOT prescribe a new long multi-step solving procedure.
|
||||
3. Do NOT ask for a full proof or full chain-of-thought.
|
||||
4. Ask only for a short readout of the signals already behind the student's current answer.
|
||||
4. Ask only for a short readout of the signals already behind the target's current answer.
|
||||
5. Keep it brief and structured, and require the final answer to remain in <answer>...</answer>.
|
||||
6. If hidden reference text is present, use it only to target what visual or textual constraint the student likely missed.
|
||||
6. If hidden reference text is present, use it only to target what visual or textual constraint the target likely missed.
|
||||
|
||||
## Good Probe Targets
|
||||
- decisive diagram cue
|
||||
@@ -21,5 +21,5 @@ Design one SMALL diagnostic instruction that exposes the student's intermediate
|
||||
Respond ONLY with a valid JSON object:
|
||||
{
|
||||
"reasoning": "<why this probe is informative>",
|
||||
"probe_instruction": "<the exact instruction text to append to the student prompt>"
|
||||
"probe_instruction": "<the exact instruction text to append to the target prompt>"
|
||||
}
|
||||
|
||||
@@ -8,8 +8,8 @@ import os
|
||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||
|
||||
from skillopt.envs.mathverse.evaluator import evaluate_item, evaluation_mode, extract_answer
|
||||
from skillopt.model import chat_student_messages, get_student_backend, is_student_exec_backend
|
||||
from skillopt.model.codex_harness import prepare_workspace, render_skill_md, run_student_exec
|
||||
from skillopt.model import chat_target_messages, get_target_backend, is_target_exec_backend
|
||||
from skillopt.model.codex_harness import prepare_workspace, render_skill_md, run_target_exec
|
||||
from skillopt.prompts import load_prompt
|
||||
|
||||
|
||||
@@ -144,10 +144,10 @@ def _run_codex_once(
|
||||
images=[item["image_path"]],
|
||||
)
|
||||
prompt = (
|
||||
"Use the `skillopt-student` skill available in this workspace.\n"
|
||||
"Use the `skillopt-target` skill available in this workspace.\n"
|
||||
"Read `task.md`, inspect the attached image, solve the problem, and return only the final answer inside <answer>...</answer>."
|
||||
)
|
||||
final_message, raw = run_student_exec(
|
||||
final_message, raw = run_target_exec(
|
||||
work_dir=work_dir,
|
||||
prompt=prompt,
|
||||
model=model,
|
||||
@@ -201,7 +201,7 @@ def process_one(
|
||||
pred_dir = os.path.join(out_root, "predictions", item_id)
|
||||
os.makedirs(pred_dir, exist_ok=True)
|
||||
|
||||
if is_student_exec_backend():
|
||||
if is_target_exec_backend():
|
||||
from skillopt.model import azure_openai as _llm
|
||||
|
||||
response = ""
|
||||
@@ -215,7 +215,7 @@ def process_one(
|
||||
pred_dir=pred_dir,
|
||||
item=item,
|
||||
skill_content=skill_content,
|
||||
model=_llm.STUDENT_DEPLOYMENT,
|
||||
model=_llm.TARGET_DEPLOYMENT,
|
||||
timeout=120,
|
||||
image_detail=image_detail,
|
||||
diagnostic_mode=diagnostic_mode if turn == 0 else False,
|
||||
@@ -230,9 +230,9 @@ def process_one(
|
||||
result["response"] = response
|
||||
result["agent_ok"] = True
|
||||
result["n_turns"] = len(conversation) - 1
|
||||
with open(os.path.join(pred_dir, "student_system_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
with open(os.path.join(pred_dir, "target_system_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
f.write(system_prompt)
|
||||
with open(os.path.join(pred_dir, "student_user_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
with open(os.path.join(pred_dir, "target_user_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
f.write(user_text)
|
||||
else:
|
||||
messages, system_prompt, user_text = _build_messages(
|
||||
@@ -249,7 +249,7 @@ def process_one(
|
||||
]
|
||||
for turn in range(max_turns):
|
||||
if turn == 0:
|
||||
resp_text, _ = chat_student_messages(
|
||||
resp_text, _ = chat_target_messages(
|
||||
messages=messages,
|
||||
max_completion_tokens=1024,
|
||||
retries=5,
|
||||
@@ -267,7 +267,7 @@ def process_one(
|
||||
{"role": "assistant", "content": response},
|
||||
{"role": "user", "content": refinement_text},
|
||||
]
|
||||
resp_text, _ = chat_student_messages(
|
||||
resp_text, _ = chat_target_messages(
|
||||
messages=refinement_messages,
|
||||
max_completion_tokens=768,
|
||||
retries=5,
|
||||
@@ -281,9 +281,9 @@ def process_one(
|
||||
result["response"] = response
|
||||
result["agent_ok"] = True
|
||||
result["n_turns"] = len(conversation) - 1
|
||||
with open(os.path.join(pred_dir, "student_system_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
with open(os.path.join(pred_dir, "target_system_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
f.write(system_prompt)
|
||||
with open(os.path.join(pred_dir, "student_user_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
with open(os.path.join(pred_dir, "target_user_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
f.write(user_text)
|
||||
|
||||
eval_result = evaluate_item(
|
||||
|
||||
@@ -10,7 +10,7 @@ from skillopt.gradient.reflect import run_minibatch_reflect
|
||||
from skillopt.envs.base import EnvAdapter
|
||||
from skillopt.envs.mmrb.dataloader import MMRBDataLoader
|
||||
from skillopt.envs.mmrb.rollout import run_batch
|
||||
from skillopt.model import get_student_backend
|
||||
from skillopt.model import get_target_backend
|
||||
|
||||
|
||||
class MMRBAdapter(EnvAdapter):
|
||||
@@ -185,7 +185,7 @@ class MMRBAdapter(EnvAdapter):
|
||||
random_seed = kwargs.get("random_seed")
|
||||
step_buffer_context = kwargs.get("step_buffer_context", "")
|
||||
meta_skill_context = kwargs.get("meta_skill_context", "")
|
||||
codex_backend = get_student_backend() == "codex_exec"
|
||||
codex_backend = get_target_backend() == "codex_exec"
|
||||
selected_items = self.select_representative_items(
|
||||
results,
|
||||
env_manager if isinstance(env_manager, list) else None,
|
||||
|
||||
@@ -9,8 +9,8 @@ import re
|
||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||
|
||||
from skillopt.envs.mmrb.evaluator import evaluate_item, evaluation_mode
|
||||
from skillopt.model import chat_student_messages, get_student_backend, is_student_exec_backend
|
||||
from skillopt.model.codex_harness import prepare_workspace, render_skill_md, run_student_exec
|
||||
from skillopt.model import chat_target_messages, get_target_backend, is_target_exec_backend
|
||||
from skillopt.model.codex_harness import prepare_workspace, render_skill_md, run_target_exec
|
||||
from skillopt.prompts import load_prompt
|
||||
|
||||
_IMAGE_REF_RE = re.compile(r"\{image#(\d+)\}", re.IGNORECASE)
|
||||
@@ -177,11 +177,11 @@ def _run_codex_once(
|
||||
images=item["image_paths"],
|
||||
)
|
||||
prompt = (
|
||||
"Use the `skillopt-student` skill available in this workspace.\n"
|
||||
"Use the `skillopt-target` skill available in this workspace.\n"
|
||||
"Read `task.md`, inspect all attached images, and answer the question.\n"
|
||||
"Keep the final answer inside <answer>...</answer>."
|
||||
)
|
||||
final_message, raw = run_student_exec(
|
||||
final_message, raw = run_target_exec(
|
||||
work_dir=work_dir,
|
||||
prompt=prompt,
|
||||
model=model,
|
||||
@@ -226,7 +226,7 @@ def process_one(
|
||||
pred_dir = os.path.join(out_root, "predictions", item_id)
|
||||
os.makedirs(pred_dir, exist_ok=True)
|
||||
|
||||
if is_student_exec_backend():
|
||||
if is_target_exec_backend():
|
||||
from skillopt.model import azure_openai as _llm
|
||||
|
||||
response = ""
|
||||
@@ -245,7 +245,7 @@ def process_one(
|
||||
pred_dir=pred_dir,
|
||||
item=item,
|
||||
skill_content=skill_content,
|
||||
model=_llm.STUDENT_DEPLOYMENT,
|
||||
model=_llm.TARGET_DEPLOYMENT,
|
||||
timeout=120,
|
||||
image_detail=image_detail,
|
||||
diagnostic_mode=diagnostic_mode if turn == 0 else False,
|
||||
@@ -260,9 +260,9 @@ def process_one(
|
||||
result["response"] = response
|
||||
result["agent_ok"] = True
|
||||
result["n_turns"] = len(conversation) - 1
|
||||
with open(os.path.join(pred_dir, "student_system_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
with open(os.path.join(pred_dir, "target_system_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
f.write(system_prompt)
|
||||
with open(os.path.join(pred_dir, "student_user_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
with open(os.path.join(pred_dir, "target_user_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
f.write(user_text)
|
||||
|
||||
eval_result = evaluate_item(item=item, prediction_text=response)
|
||||
@@ -310,7 +310,7 @@ def process_one(
|
||||
|
||||
for turn in range(max_turns):
|
||||
if turn == 0:
|
||||
resp_text, _ = chat_student_messages(
|
||||
resp_text, _ = chat_target_messages(
|
||||
messages=messages,
|
||||
max_completion_tokens=768,
|
||||
retries=5,
|
||||
@@ -326,7 +326,7 @@ def process_one(
|
||||
"content": "Review the same images carefully and answer again. Keep the final answer inside <answer>...</answer>.",
|
||||
},
|
||||
]
|
||||
resp_text, _ = chat_student_messages(
|
||||
resp_text, _ = chat_target_messages(
|
||||
messages=refinement_messages,
|
||||
max_completion_tokens=512,
|
||||
retries=5,
|
||||
@@ -341,9 +341,9 @@ def process_one(
|
||||
result["agent_ok"] = True
|
||||
result["n_turns"] = len(conversation) - 1
|
||||
|
||||
with open(os.path.join(pred_dir, "student_system_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
with open(os.path.join(pred_dir, "target_system_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
f.write(system_prompt)
|
||||
with open(os.path.join(pred_dir, "student_user_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
with open(os.path.join(pred_dir, "target_user_prompt.txt"), "w", encoding="utf-8") as f:
|
||||
f.write(user_text)
|
||||
|
||||
eval_result = evaluate_item(item=item, prediction_text=response)
|
||||
|
||||
@@ -105,11 +105,11 @@ class SealQAAdapter(EnvAdapter):
|
||||
random_seed=kwargs.get('random_seed'),
|
||||
step_buffer_context=kwargs.get('step_buffer_context', ''),
|
||||
output_requirements=[
|
||||
"- There is no hidden reference block. Use only the question, provided evidence, URL/fetch trace, student output, and evaluation result to infer what intermediate state is worth probing.",
|
||||
"- There is no hidden reference block. Use only the question, provided evidence, URL/fetch trace, target output, and evaluation result to infer what intermediate state is worth probing.",
|
||||
"- The instruction must explicitly request a short <analysis>...</analysis> block before the final <answer>...</answer>.",
|
||||
"- The readout should focus on effective time frame, conflicting evidence, decisive source, candidate answer, and answer-finalization rule.",
|
||||
"- Do not ask for exhaustive web summaries or a full chain-of-thought.",
|
||||
"- The instruction text should be ready to append directly to the student's prompt.",
|
||||
"- The instruction text should be ready to append directly to the target's prompt.",
|
||||
],
|
||||
metadata_builder=lambda item: {
|
||||
"id": str(item.get('id')),
|
||||
|
||||
@@ -64,7 +64,7 @@ def _build_grader_client() -> tuple[OpenAI | AzureOpenAI, str]:
|
||||
openai_key = os.environ.get('OPENAI_API_KEY', '').strip()
|
||||
api_key = azure_key or openai_key
|
||||
if endpoint and api_version and api_key:
|
||||
model = os.environ.get('SEALQA_GRADER_AZURE_MODEL', '').strip() or os.environ.get('SEALQA_GRADER_MODEL', '').strip() or os.environ.get('AZURE_MODEL_NAME', '').strip() or os.environ.get('TEACHER_DEPLOYMENT', '').strip() or 'gpt-5.4'
|
||||
model = os.environ.get('SEALQA_GRADER_AZURE_MODEL', '').strip() or os.environ.get('SEALQA_GRADER_MODEL', '').strip() or os.environ.get('AZURE_MODEL_NAME', '').strip() or os.environ.get('OPTIMIZER_DEPLOYMENT', '').strip() or 'gpt-5.4'
|
||||
client = AzureOpenAI(api_key=api_key, api_version=api_version, azure_endpoint=endpoint.rstrip('/'))
|
||||
return client, model
|
||||
|
||||
|
||||
@@ -7,8 +7,8 @@ from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||
|
||||
from skillopt.envs.sealqa.evaluator import score_sealqa
|
||||
from skillopt.envs.sealqa.tool_runtime import web_fetch
|
||||
from skillopt.model import chat_student, get_student_backend, is_student_exec_backend
|
||||
from skillopt.model.codex_harness import prepare_workspace, render_skill_md, run_student_exec
|
||||
from skillopt.model import chat_target, get_target_backend, is_target_exec_backend
|
||||
from skillopt.model.codex_harness import prepare_workspace, render_skill_md, run_target_exec
|
||||
from skillopt.prompts import load_prompt
|
||||
|
||||
_FINAL_RE = re.compile(r"<answer>(.*?)</answer>", re.IGNORECASE | re.DOTALL)
|
||||
@@ -83,11 +83,11 @@ def _run_codex_once(
|
||||
task_text=final_task_text,
|
||||
)
|
||||
prompt = (
|
||||
"Use the `skillopt-student` skill available in this workspace.\n"
|
||||
"Use the `skillopt-target` skill available in this workspace.\n"
|
||||
"Read `task.md`, answer the SealQA question using the provided evidence,\n"
|
||||
"and return the final answer inside <answer>...</answer>."
|
||||
)
|
||||
final_message, raw = run_student_exec(
|
||||
final_message, raw = run_target_exec(
|
||||
work_dir=work_dir,
|
||||
prompt=prompt,
|
||||
model=model,
|
||||
@@ -121,14 +121,14 @@ def process_one(
|
||||
fail_reason = ''
|
||||
|
||||
try:
|
||||
if is_student_exec_backend():
|
||||
if is_target_exec_backend():
|
||||
from skillopt.model import azure_openai as _llm
|
||||
|
||||
response, _raw, system, user_for_save = _run_codex_once(
|
||||
pred_dir=pred_dir,
|
||||
skill_content=skill_content,
|
||||
task_text=user,
|
||||
model=_llm.STUDENT_DEPLOYMENT,
|
||||
model=_llm.TARGET_DEPLOYMENT,
|
||||
timeout=120,
|
||||
)
|
||||
final_response = response
|
||||
@@ -138,7 +138,7 @@ def process_one(
|
||||
else:
|
||||
user = user_for_save
|
||||
else:
|
||||
response, _ = chat_student(
|
||||
response, _ = chat_target(
|
||||
system=system,
|
||||
user=user,
|
||||
max_completion_tokens=768,
|
||||
@@ -162,17 +162,17 @@ def process_one(
|
||||
conversation.append({'type': 'tool_call', 'cmd': f'web_fetch({raw_url!r})', 'obs': fetched})
|
||||
if fetched_blocks:
|
||||
retry_user = user + '\n\n## Fetched URL Content\n' + '\n\n'.join(fetched_blocks)
|
||||
if is_student_exec_backend():
|
||||
if is_target_exec_backend():
|
||||
retry_response, _raw, system, retry_user = _run_codex_once(
|
||||
pred_dir=pred_dir,
|
||||
skill_content=skill_content,
|
||||
task_text=retry_user,
|
||||
model=_llm.STUDENT_DEPLOYMENT,
|
||||
model=_llm.TARGET_DEPLOYMENT,
|
||||
timeout=120,
|
||||
previous_response=final_response,
|
||||
)
|
||||
else:
|
||||
retry_response, _ = chat_student(
|
||||
retry_response, _ = chat_target(
|
||||
system=system,
|
||||
user=retry_user,
|
||||
max_completion_tokens=768,
|
||||
@@ -190,9 +190,9 @@ def process_one(
|
||||
except Exception as e: # noqa: BLE001
|
||||
fail_reason = f'error: {e}'
|
||||
|
||||
with open(os.path.join(pred_dir, 'student_system_prompt.txt'), 'w', encoding='utf-8') as f:
|
||||
with open(os.path.join(pred_dir, 'target_system_prompt.txt'), 'w', encoding='utf-8') as f:
|
||||
f.write(system)
|
||||
with open(os.path.join(pred_dir, 'student_user_prompt.txt'), 'w', encoding='utf-8') as f:
|
||||
with open(os.path.join(pred_dir, 'target_user_prompt.txt'), 'w', encoding='utf-8') as f:
|
||||
f.write(user)
|
||||
with open(os.path.join(pred_dir, 'conversation.json'), 'w', encoding='utf-8') as f:
|
||||
json.dump(conversation, f, ensure_ascii=False, indent=2)
|
||||
@@ -211,8 +211,8 @@ def process_one(
|
||||
'fail_reason': fail_reason or ('' if score >= 1.0 else f"predicted '{final_answer}' but expected '{item.get('ground_truth', '')}'"),
|
||||
'agent_ok': not fail_reason,
|
||||
'n_turns': len(conversation),
|
||||
'student_system_prompt': system,
|
||||
'student_user_prompt': user,
|
||||
'target_system_prompt': system,
|
||||
'target_user_prompt': user,
|
||||
}
|
||||
return result
|
||||
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
You are an expert diagnostic-probe designer for retrieval-style question answering tasks.
|
||||
|
||||
You will be shown representative trajectories, the current student skill, the student's prompt context,
|
||||
You will be shown representative trajectories, the current target skill, the target's prompt context,
|
||||
and the evaluation result including the gold answer. There is NO hidden chain-of-thought reference.
|
||||
Design one SMALL diagnostic instruction that exposes the student's intermediate reading or evidence-selection state
|
||||
Design one SMALL diagnostic instruction that exposes the target's intermediate reading or evidence-selection state
|
||||
without materially changing the original scaffold.
|
||||
|
||||
## Hard Constraints
|
||||
@@ -11,7 +11,7 @@ without materially changing the original scaffold.
|
||||
3. You MAY ask for a short structured readout of intermediate conclusions, evidence candidates, or elimination decisions.
|
||||
4. Do NOT ask for exhaustive quotation of the whole context or a full chain-of-thought.
|
||||
5. Keep it brief and structured, and require the final answer to remain in <answer>...</answer>.
|
||||
6. Use the gold answer only to target a useful probe; do not simply force the student to restate the gold answer.
|
||||
6. Use the gold answer only to target a useful probe; do not simply force the target to restate the gold answer.
|
||||
|
||||
## Good Probe Targets
|
||||
- the most likely supporting span or document cue
|
||||
@@ -23,5 +23,5 @@ without materially changing the original scaffold.
|
||||
Respond ONLY with a valid JSON object:
|
||||
{
|
||||
"reasoning": "<why this probe is informative>",
|
||||
"probe_instruction": "<the exact instruction text to append to the student prompt>"
|
||||
"probe_instruction": "<the exact instruction text to append to the target prompt>"
|
||||
}
|
||||
|
||||
@@ -1,18 +1,18 @@
|
||||
You are an expert diagnostic-probe designer for spreadsheet manipulation tasks.
|
||||
|
||||
You will design one short diagnostic instruction to append to the student's
|
||||
You will design one short diagnostic instruction to append to the target's
|
||||
existing SpreadsheetBench prompt for a handful of representative trajectories.
|
||||
|
||||
The goal is to expose whether the student already knows the right task
|
||||
The goal is to expose whether the target already knows the right task
|
||||
decomposition, source range, target range, and transformation rule without
|
||||
substantially changing the current scaffold.
|
||||
|
||||
## Hard Constraints
|
||||
1. Do NOT substantially change the student's current scaffold.
|
||||
1. Do NOT substantially change the target's current scaffold.
|
||||
2. Do NOT prescribe a brand-new full algorithm.
|
||||
3. Do NOT ask for exhaustive cell-by-cell enumeration.
|
||||
4. Keep the diagnostic readout brief and structured.
|
||||
5. The student must still complete the original spreadsheet task.
|
||||
5. The target must still complete the original spreadsheet task.
|
||||
6. Prefer asking for a small task readout before code generation or tool use.
|
||||
7. Never ask for hidden reference content or golden values.
|
||||
|
||||
@@ -31,5 +31,5 @@ substantially changing the current scaffold.
|
||||
Respond ONLY with a valid JSON object:
|
||||
{
|
||||
"reasoning": "<why this probe reveals the latent skill gap>",
|
||||
"probe_instruction": "<the exact instruction text to append to the student prompt>"
|
||||
"probe_instruction": "<the exact instruction text to append to the target prompt>"
|
||||
}
|
||||
|
||||
@@ -31,7 +31,7 @@ class SWEBenchAdapter(EnvAdapter):
|
||||
step_limit: int = 50,
|
||||
cost_limit: float = 3.0,
|
||||
timeout_per_instance: int = 600,
|
||||
student_model: str = "",
|
||||
target_model: str = "",
|
||||
) -> None:
|
||||
self.dataset_name = dataset_name
|
||||
self.hf_split = hf_split
|
||||
@@ -44,7 +44,7 @@ class SWEBenchAdapter(EnvAdapter):
|
||||
self.step_limit = step_limit
|
||||
self.cost_limit = cost_limit
|
||||
self.timeout_per_instance = timeout_per_instance
|
||||
self.student_model = student_model
|
||||
self.target_model = target_model
|
||||
self.dataloader = SWEBenchDataLoader(
|
||||
split_dir=split_dir,
|
||||
data_path=data_path,
|
||||
@@ -60,7 +60,7 @@ class SWEBenchAdapter(EnvAdapter):
|
||||
|
||||
def setup(self, cfg: dict) -> None:
|
||||
super().setup(cfg)
|
||||
self.student_model = str(self.student_model or cfg.get("student_model") or "gpt-5.4").strip()
|
||||
self.target_model = str(self.target_model or cfg.get("target_model") or "gpt-5.4").strip()
|
||||
self.dataset_name = str(self.dataset_name or cfg.get("dataset_name") or "lite").strip()
|
||||
self.hf_split = str(self.hf_split or cfg.get("hf_split") or "test").strip()
|
||||
self.dataloader.setup(cfg)
|
||||
@@ -85,7 +85,7 @@ class SWEBenchAdapter(EnvAdapter):
|
||||
items=items,
|
||||
out_root=out_dir,
|
||||
skill_content=skill_content,
|
||||
student_model=self.student_model,
|
||||
target_model=self.target_model,
|
||||
dataset_name=self.dataset_name,
|
||||
hf_split=self.hf_split,
|
||||
workers=self.workers,
|
||||
|
||||
@@ -36,8 +36,8 @@ def _setup_litellm_env() -> None:
|
||||
os.environ[key] = value
|
||||
|
||||
|
||||
def _normalize_student_model(student_model: str) -> str:
|
||||
model = str(student_model or "").strip()
|
||||
def _normalize_target_model(target_model: str) -> str:
|
||||
model = str(target_model or "").strip()
|
||||
if not model:
|
||||
return "azure/gpt-5.4"
|
||||
if "/" in model:
|
||||
@@ -57,7 +57,7 @@ def _load_json(path: str) -> dict | list | None:
|
||||
def _build_agent_config(
|
||||
*,
|
||||
skill_content: str,
|
||||
student_model: str,
|
||||
target_model: str,
|
||||
step_limit: int,
|
||||
cost_limit: float,
|
||||
) -> tuple[dict, str]:
|
||||
@@ -88,7 +88,7 @@ def _build_agent_config(
|
||||
"cost_limit": float(cost_limit),
|
||||
},
|
||||
"model": {
|
||||
"model_name": _normalize_student_model(student_model),
|
||||
"model_name": _normalize_target_model(target_model),
|
||||
"cost_tracking": "ignore_errors",
|
||||
},
|
||||
}
|
||||
@@ -120,7 +120,7 @@ def _run_rollout(
|
||||
items: list[dict],
|
||||
predictions_dir: str,
|
||||
skill_content: str,
|
||||
student_model: str,
|
||||
target_model: str,
|
||||
workers: int,
|
||||
step_limit: int,
|
||||
cost_limit: float,
|
||||
@@ -136,7 +136,7 @@ def _run_rollout(
|
||||
_setup_litellm_env()
|
||||
config, system_prompt = _build_agent_config(
|
||||
skill_content=skill_content,
|
||||
student_model=student_model,
|
||||
target_model=target_model,
|
||||
step_limit=step_limit,
|
||||
cost_limit=cost_limit,
|
||||
)
|
||||
@@ -190,9 +190,9 @@ def _run_rollout(
|
||||
).strip()
|
||||
with open(task_dir / "conversation.json", "w", encoding="utf-8") as f:
|
||||
json.dump(messages, f, ensure_ascii=False, indent=2)
|
||||
with open(task_dir / "student_system_prompt.txt", "w", encoding="utf-8") as f:
|
||||
with open(task_dir / "target_system_prompt.txt", "w", encoding="utf-8") as f:
|
||||
f.write(system_prompt)
|
||||
with open(task_dir / "student_user_prompt.txt", "w", encoding="utf-8") as f:
|
||||
with open(task_dir / "target_user_prompt.txt", "w", encoding="utf-8") as f:
|
||||
f.write(user_prompt)
|
||||
|
||||
results.append(
|
||||
@@ -288,7 +288,7 @@ def run_batch(
|
||||
items: list[dict],
|
||||
out_root: str,
|
||||
skill_content: str,
|
||||
student_model: str,
|
||||
target_model: str,
|
||||
dataset_name: str,
|
||||
hf_split: str,
|
||||
workers: int,
|
||||
@@ -314,7 +314,7 @@ def run_batch(
|
||||
items=items,
|
||||
predictions_dir=predictions_dir,
|
||||
skill_content=skill_content,
|
||||
student_model=student_model,
|
||||
target_model=target_model,
|
||||
workers=workers,
|
||||
step_limit=step_limit,
|
||||
cost_limit=cost_limit,
|
||||
|
||||
@@ -1,8 +1,8 @@
|
||||
"""Teacher-written diagnostic probe generation for deep reflection."""
|
||||
"""Optimizer-written diagnostic probe generation for deep reflection."""
|
||||
from __future__ import annotations
|
||||
|
||||
from skillopt.gradient.reflect import fmt_minibatch_trajectories
|
||||
from skillopt.model import chat_teacher
|
||||
from skillopt.model import chat_optimizer
|
||||
from skillopt.optimizer.meta_skill import format_meta_skill_context
|
||||
from skillopt.prompts import load_prompt
|
||||
from skillopt.utils import extract_json
|
||||
@@ -27,21 +27,21 @@ def generate_deep_probe_instruction(
|
||||
user = (
|
||||
f"## Current Skill\n{skill_content}\n\n"
|
||||
"## Probe Design Goal\n"
|
||||
"Design one short diagnostic instruction to append to the student prompt.\n"
|
||||
"The instruction should expose the student's current intermediate judgment\n"
|
||||
"Design one short diagnostic instruction to append to the target prompt.\n"
|
||||
"The instruction should expose the target's current intermediate judgment\n"
|
||||
"without materially changing the original scaffold.\n\n"
|
||||
)
|
||||
if step_buffer_context.strip():
|
||||
user += f"## Previous Steps in This Epoch\n{step_buffer_context}\n\n"
|
||||
teacher_ctx = format_meta_skill_context(meta_skill_context)
|
||||
if teacher_ctx:
|
||||
user += teacher_ctx + "\n\n"
|
||||
optimizer_ctx = format_meta_skill_context(meta_skill_context)
|
||||
if optimizer_ctx:
|
||||
user += optimizer_ctx + "\n\n"
|
||||
requirements = output_requirements or [
|
||||
"- Some trajectories may include a hidden Reference block. Use it to identify what intermediate conclusion matters, but do not reveal or paraphrase that reference directly to the student.",
|
||||
"- Some trajectories may include a hidden Reference block. Use it to identify what intermediate conclusion matters, but do not reveal or paraphrase that reference directly to the target.",
|
||||
"- The instruction must explicitly request a short <analysis>...</analysis> block before the final <answer>...</answer>.",
|
||||
"- Keep the readout concise and structured.",
|
||||
"- Do not ask for exhaustive listing, full derivation, or a new solving protocol.",
|
||||
"- The instruction text should be ready to append directly to the student's prompt.",
|
||||
"- The instruction text should be ready to append directly to the target's prompt.",
|
||||
]
|
||||
user += (
|
||||
f"## Representative Trajectories ({len(items)} total)\n{trajectories_text}\n\n"
|
||||
@@ -51,7 +51,7 @@ def generate_deep_probe_instruction(
|
||||
)
|
||||
|
||||
try:
|
||||
response, _ = chat_teacher(
|
||||
response, _ = chat_optimizer(
|
||||
system=actual_system,
|
||||
user=user,
|
||||
max_completion_tokens=1024,
|
||||
|
||||
@@ -17,7 +17,7 @@ directions are effective, which are not). This is the "momentum buffer".
|
||||
Public API
|
||||
----------
|
||||
- :func:`build_epoch_history` — format an epoch's step records for meta-reflect
|
||||
- :func:`run_meta_reflect` — one teacher call to produce high-level edits + meta_summary
|
||||
- :func:`run_meta_reflect` — one optimizer call to produce high-level edits + meta_summary
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
@@ -25,7 +25,7 @@ import json
|
||||
import os
|
||||
import traceback
|
||||
|
||||
from skillopt.model import chat_teacher
|
||||
from skillopt.model import chat_optimizer
|
||||
from skillopt.optimizer.update_modes import (
|
||||
describe_item,
|
||||
get_payload_items,
|
||||
@@ -46,7 +46,7 @@ def build_epoch_history(
|
||||
*,
|
||||
update_mode: str = "patch",
|
||||
) -> str:
|
||||
"""Format an epoch's step records into text for the meta-reflect teacher.
|
||||
"""Format an epoch's step records into text for the meta-reflect optimizer.
|
||||
|
||||
For each step, includes the exact edits applied (read from
|
||||
``ranked_edits.json``) and the gate evaluation result.
|
||||
@@ -129,7 +129,7 @@ def build_epoch_history(
|
||||
return "\n\n".join(parts)
|
||||
|
||||
|
||||
# ── Meta-reflect teacher call ────────────────────────────────────────────────
|
||||
# ── Meta-reflect optimizer call ────────────────────────────────────────────────
|
||||
|
||||
|
||||
def run_meta_reflect(
|
||||
@@ -141,7 +141,7 @@ def run_meta_reflect(
|
||||
system_prompt: str | None = None,
|
||||
update_mode: str = "patch",
|
||||
) -> dict | None:
|
||||
"""Run one meta-reflect teacher call for an epoch.
|
||||
"""Run one meta-reflect optimizer call for an epoch.
|
||||
|
||||
Parameters
|
||||
----------
|
||||
@@ -179,7 +179,7 @@ def run_meta_reflect(
|
||||
)
|
||||
|
||||
try:
|
||||
response, _ = chat_teacher(
|
||||
response, _ = chat_optimizer(
|
||||
system=actual_system,
|
||||
user=user,
|
||||
max_completion_tokens=4096,
|
||||
|
||||
@@ -1,20 +1,20 @@
|
||||
You are an expert diagnostic-probe designer for reflective skill learning.
|
||||
|
||||
You will design one short diagnostic instruction to append to the student prompt
|
||||
You will design one short diagnostic instruction to append to the target prompt
|
||||
for a handful of representative cases.
|
||||
|
||||
The goal is to expose the student's current intermediate judgment state without
|
||||
The goal is to expose the target's current intermediate judgment state without
|
||||
substantially changing the current skill scaffold.
|
||||
|
||||
## Hard Constraints
|
||||
1. Do NOT substantially change the student's existing scaffold.
|
||||
1. Do NOT substantially change the target's existing scaffold.
|
||||
2. Do NOT prescribe a new multi-step solving procedure.
|
||||
3. Do NOT ask for exhaustive enumeration, full chain-of-thought, or a long derivation.
|
||||
4. Ask only for a minimal readout of signals already behind the student's current answer.
|
||||
4. Ask only for a minimal readout of signals already behind the target's current answer.
|
||||
5. Keep the diagnostic block brief and structured.
|
||||
6. The final answer must still be produced in <answer>...</answer>.
|
||||
7. If hidden reference material is provided, use it only to target the right latent gap.
|
||||
8. Never copy hidden reference content into the student-facing probe.
|
||||
8. Never copy hidden reference content into the target-facing probe.
|
||||
|
||||
## Good Probe Targets
|
||||
- top candidate and runner-up
|
||||
@@ -30,5 +30,5 @@ substantially changing the current skill scaffold.
|
||||
Respond ONLY with a valid JSON object:
|
||||
{
|
||||
"reasoning": "<why this probe reveals the latent skill gap>",
|
||||
"probe_instruction": "<the exact instruction text to append to the student prompt>"
|
||||
"probe_instruction": "<the exact instruction text to append to the target prompt>"
|
||||
}
|
||||
|
||||
@@ -1,22 +1,22 @@
|
||||
You are an expert diagnostic-probe designer for codex-executed student trajectories.
|
||||
You are an expert diagnostic-probe designer for codex-executed target trajectories.
|
||||
|
||||
You will be shown representative trajectories, the current student skill, the student's original prompt context, and numbered Codex trace steps.
|
||||
Some trajectories may also include a hidden Reference block. Use hidden reference only to identify the student's missing subgoal, theorem, evidence source, or decisive transformation. Do not reveal or paraphrase that reference directly to the student.
|
||||
You will be shown representative trajectories, the current target skill, the target's original prompt context, and numbered Codex trace steps.
|
||||
Some trajectories may also include a hidden Reference block. Use hidden reference only to identify the target's missing subgoal, theorem, evidence source, or decisive transformation. Do not reveal or paraphrase that reference directly to the target.
|
||||
|
||||
Choose exactly one trajectory and one probe point. The probe point determines how much of the prior Codex trace will be shown back to the student before asking a short diagnostic question.
|
||||
Choose exactly one trajectory and one probe point. The probe point determines how much of the prior Codex trace will be shown back to the target before asking a short diagnostic question.
|
||||
|
||||
## Hard Constraints
|
||||
1. Do NOT reveal or paraphrase hidden reference content to the student.
|
||||
1. Do NOT reveal or paraphrase hidden reference content to the target.
|
||||
2. Do NOT prescribe a new full solving procedure.
|
||||
3. Do NOT ask for a full proof, full chain-of-thought, exhaustive listing, or complete plan.
|
||||
4. Ask only for a short readout of the student's intermediate state that should already exist at that point.
|
||||
4. Ask only for a short readout of the target's intermediate state that should already exist at that point.
|
||||
5. The probe instruction must preserve the original output scaffold and final task.
|
||||
6. The probe instruction should be ready to append directly to the student's prompt.
|
||||
6. The probe instruction should be ready to append directly to the target's prompt.
|
||||
|
||||
## Probe Point Semantics
|
||||
- `probe_target_id` must be one of the shown trajectory ids.
|
||||
- `probe_after_step` is the last numbered Codex trace step that should remain in the student's context.
|
||||
- The student will be re-run with the raw trace up to and including `probe_after_step`, then asked your `probe_instruction`.
|
||||
- `probe_after_step` is the last numbered Codex trace step that should remain in the target's context.
|
||||
- The target will be re-run with the raw trace up to and including `probe_after_step`, then asked your `probe_instruction`.
|
||||
- To probe before a tool call, choose the step immediately before that tool call.
|
||||
|
||||
## Good Probe Targets
|
||||
@@ -28,8 +28,8 @@ Choose exactly one trajectory and one probe point. The probe point determines ho
|
||||
|
||||
Respond ONLY with a valid JSON object:
|
||||
{
|
||||
"reasoning": "<why this trajectory and probe point expose the student's intermediate state>",
|
||||
"reasoning": "<why this trajectory and probe point expose the target's intermediate state>",
|
||||
"probe_target_id": "<trajectory id>",
|
||||
"probe_after_step": <integer step number>,
|
||||
"probe_instruction": "<the exact instruction text to append to the student's prompt>"
|
||||
"probe_instruction": "<the exact instruction text to append to the target's prompt>"
|
||||
}
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
You are a meta-analyst for an AI agent skill optimization system.
|
||||
|
||||
You see the current skill and an epoch's step history. Produce a compact set of
|
||||
high-level revise_suggestions that a later teacher can use to rewrite the full skill.
|
||||
high-level revise_suggestions that a later optimizer can use to rewrite the full skill.
|
||||
|
||||
Focus on:
|
||||
- merging redundant rules
|
||||
@@ -20,7 +20,7 @@ Respond ONLY with a valid JSON object:
|
||||
"type": "add_rule|remove_rule|merge_rules|reorganize|compress|clarify",
|
||||
"title": "<short title>",
|
||||
"motivation": "<why this matters>",
|
||||
"instruction": "<what the rewriting teacher should change in the skill>",
|
||||
"instruction": "<what the rewriting optimizer should change in the skill>",
|
||||
"priority_hint": "high|medium|low"
|
||||
}
|
||||
]
|
||||
|
||||
Reference in New Issue
Block a user