Summary

Skill-creator now helps authors test and improve their Agent Skills without coding. It uses evals to check if skills work well and trigger correctly. These updates make skills more reliable as models change and improve.

Highlights

id996679111

testing turns a skill that seems to work into one you know works.

→ Readwise


id996679442

Running evals sequentially can be slow, and accumulating context can bleed between test runs. Skill-creator now spins up independent agents to run evals in parallel with multi-agent support — each in a clean context with its own token and timing metrics. Faster results, no cross-contamination. We’ve also added comparator agents for A/B comparisons: two skill versions, or skill vs. no skill. They judge outputs without knowing which is which, so you can tell whether a change actually helped.

→ Readwise