8 Commits

Author SHA1 Message Date
Lliar-liar
a9cad7a125 Use official arXiv logomark 2026-05-24 19:43:19 +00:00
Cuzyoung
ded8c27c90 restore: bring back project page HTML and assets
These were accidentally deleted in the cleanup commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-24 19:38:34 +00:00
Cuzyoung
f55a26414e cleanup: remove unused benchmarks, deep_probe, meta_reflect
Remove sealqa, babyvision, mathverse, mmrb, swebench envs and configs.
Remove deep_probe, deep_reflect, meta_reflect modules and prompts.
Remove download_babyvision script.
These are not part of the core released benchmarks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-05-24 19:36:48 +00:00
Lliar-liar
338a88d31c Add model logos to results table 2026-05-24 19:18:57 +00:00
Lliar-liar
ba0fa8c14b Render method comparison from raw data 2026-05-24 18:00:08 +00:00
Lliar-liar
9012a79827 Add main results method comparison chart 2026-05-24 17:55:22 +00:00
Lliar-liar
9a064f7c97 Use YouTube teaser video 2026-05-24 14:59:26 +00:00
Lliar-liar
5862bbdc97 Add SkillOpt project webpage 2026-05-24 14:16:34 +00:00