agent-evaluation

★ 40K repoAI AgentsN/A

How to Install

Claude Code:

git clone https://github.com/sickn33/antigravity-awesome-skills && cp skills/SKILL.md ~/.claude/skills/

Cursor:

Copy SKILL.md into your .cursorrules file

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchm

Testing

Details

Category	AI/ML → AI Agents
Source	https://github.com/sickn33/antigravity-awesome-skills
Stars	★ 40K
Risk Level	N/A

Related Skills

hosted-agents

Build background agents in sandboxed environments. Use for hosted coding agents, sandboxed VMs, Moda

ai-ml

lambda-lang

Native agent-to-agent language for compact multi-agent messaging. A shared tongue agents speak direc

ai-ml

llm-app-patterns

Production-ready patterns for building LLM applications, inspired by [Dify](https://github.com/langg

ai-ml

local-llm-expert

Master local LLM inference, model selection, VRAM optimization, and local deployment using Ollama, l

ai-ml