Summary
Skill-creator now helps authors test and improve their Agent Skills without coding. It uses evals to check if skills work well and trigger correctly. These updates make skills more reliable as models change and improve.
Highlights
id996679111
testing turns a skill that seems to work into one you know works.
id996679442
Running evals sequentially can be slow, and accumulating context can bleed between test runs. Skill-creator now spins up independent agents to run evals in parallel with multi-agent support — each in a clean context with its own token and timing metrics. Faster results, no cross-contamination. We’ve also added comparator agents for A/B comparisons: two skill versions, or skill vs. no skill. They judge outputs without knowing which is which, so you can tell whether a change actually helped.