Optimizing LLM Apps at Scale
What to expect?
Join our webinar to delve into two distinct areas of evaluation involving Large Language Models (LLMs): LLM-based system evaluation and LLM evaluation. Our discussion will be anchored by two main topics: our in-house project, Wandbot, for LLM-based system evaluation, and our work on French and German LLM leaderboards for LLM evaluation.
For LLM-based system evaluation, we will focus on Wandbot, detailing our extensive configuration search across different models, embedding techniques, and settings to pinpoint the optimal RAG configurations. This part will also highlight how tools like Pydantic and LiteLLM aid in crafting production-ready systems, showcasing Wandbot's advancements in performance and functionality.
In discussing LLM evaluation, we'll turn our attention to the French and German LLM leaderboards, using the lm-evaluation-harness tool. This segment aims to shed light on the process of evaluating LLMs within the contexts of training and fine-tuning, emphasizing the challenges and solutions in developing multilingual benchmarks.
This webinar is structured for those interested in the comprehensive aspects of LLM evaluation, from system enhancement to benchmarking across languages, providing valuable insights for improving LLM performance and utility.