Yif Yang
26e5338def
Update citation from @misc to @article format
...
Co-Authored-By: Claude <noreply@anthropic.com >
2026-06-26 02:54:46 +00:00
Yif Yang
4c1b74fce2
Update BibTeX entry in index.html
2026-05-25 14:30:01 +08:00
Huangzisu
2c7d9074fb
update webpage for arxiv link
2026-05-25 05:32:04 +00:00
Lliar-liar
5f4b228543
Soften average gain column styling
2026-05-24 19:45:10 +00:00
Lliar-liar
a9cad7a125
Use official arXiv logomark
2026-05-24 19:43:19 +00:00
Lliar-liar
5e968115f5
Align citation section with SkillLens
2026-05-24 19:39:16 +00:00
Cuzyoung
ded8c27c90
restore: bring back project page HTML and assets
...
These were accidentally deleted in the cleanup commit.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-05-24 19:38:34 +00:00
Cuzyoung
f55a26414e
cleanup: remove unused benchmarks, deep_probe, meta_reflect
...
Remove sealqa, babyvision, mathverse, mmrb, swebench envs and configs.
Remove deep_probe, deep_reflect, meta_reflect modules and prompts.
Remove download_babyvision script.
These are not part of the core released benchmarks.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com >
2026-05-24 19:36:48 +00:00
Lliar-liar
2df2542aec
Stabilize skill evolution layout
2026-05-24 19:36:08 +00:00
Lliar-liar
faa4ec6199
Align header and scroll effects with SkillLens
2026-05-24 19:31:24 +00:00
Lliar-liar
338a88d31c
Add model logos to results table
2026-05-24 19:18:57 +00:00
Lliar-liar
6e165d5347
Add Microsoft favicon
2026-05-24 19:14:33 +00:00
Lliar-liar
dde7dc9dd8
Add SkillLens related project link
2026-05-24 19:12:27 +00:00
Lliar-liar
cd9a0a02b9
Restyle project page after SkillLens
2026-05-24 19:08:05 +00:00
Lliar-liar
607bf74a1b
Reorder hero evaluation stats
2026-05-24 18:52:05 +00:00
Lliar-liar
9605217e75
Use Microsoft logo in page header
2026-05-24 18:27:25 +00:00
Lliar-liar
c42d541828
Refine project links and citation section
2026-05-24 18:24:48 +00:00
Lliar-liar
2e05edc399
Add project links and citation section
2026-05-24 18:18:36 +00:00
Lliar-liar
6e7d5d0117
Clarify hero harness names
2026-05-24 18:15:35 +00:00
Lliar-liar
88a99048a4
Align method comparison chart with page theme
2026-05-24 18:05:23 +00:00
Lliar-liar
bf2106808e
Remove method comparison implementation caption
2026-05-24 18:03:21 +00:00
Lliar-liar
ba0fa8c14b
Render method comparison from raw data
2026-05-24 18:00:08 +00:00
Lliar-liar
9012a79827
Add main results method comparison chart
2026-05-24 17:55:22 +00:00
Lliar-liar
c64fbcd4f8
Shorten hero target model label
2026-05-24 17:51:11 +00:00
Lliar-liar
6e1027f01a
Add harness count to hero badge
2026-05-24 17:48:32 +00:00
Lliar-liar
cd56a5fe7d
Make hero results badge more prominent
2026-05-24 17:43:29 +00:00
Lliar-liar
bbb250cc63
Clarify hero setting wins
2026-05-24 17:36:45 +00:00
Lliar-liar
5c45add28b
Update hero metrics to video results framing
2026-05-24 17:30:13 +00:00
Lliar-liar
e1896c691c
Improve ablation table layout
2026-05-24 17:23:28 +00:00
Lliar-liar
ec0841cccf
Remove duplicate GPT-5.5 results table
2026-05-24 17:15:24 +00:00
Lliar-liar
4019f1cbe7
Align webpage model terminology
2026-05-24 17:12:43 +00:00
Lliar-liar
cad3ab2d19
Simplify main results webpage table
2026-05-24 17:10:35 +00:00
Lliar-liar
9a064f7c97
Use YouTube teaser video
2026-05-24 14:59:26 +00:00
Lliar-liar
74cbe704fc
Polish project webpage copy
2026-05-24 14:55:44 +00:00
Lliar-liar
5862bbdc97
Add SkillOpt project webpage
2026-05-24 14:16:34 +00:00