LLMs and evals
November 12, 2024
- https://eugeneyan.com/writing/llm-patterns/#how-to-apply-evals
- https://www.sh-reya.com/blog/ai-engineering-flywheel/
- https://huyenchip.com/2024/07/25/genai-platform.html
- https://eugeneyan.com/writing/evals/
- https://x.com/karpathy/status/1599852921541128194
Pipeline (including evals):
-
https://jxnl.github.io/blog/writing/2024/02/28/levels-of-complexity-rag-applications/ Evals (overview):
-
https://eugeneyan.com/writing/llm-patterns/#how-to-apply-evals
Evals (practical):
- https://docs.anthropic.com/en/docs/build-with-claude/develop-tests
- https://github.com/anthropics/anthropic-cookbook/blob/main/misc/building%5Fevals.ipynb
- https://cookbook.openai.com/examples/evaluation/getting_started_with_openai_evals
- https://github.com/run-llama/ai-engineer-workshop/blob/main/notebooks/02_evaluation.ipynb
- https://eugeneyan.com/writing/aligneval/
Misc:
https://github.com/openai/evals/tree/main https://github.com/pltrdy/rouge