How CPU time is spent inside llamacpp LLaMA2 using OpenResty XRay











>> YOUR LINK HERE: ___ http://youtube.com/watch?v=dXTCdUb5AGk

Try out OpenResty XRay for free: https://openresty.com/en/xray/ • In this tutorial, you will get a step-by-step tour of how to use OpenResty XRay to analyze the llama.cpp application with LLaMA2 models. • We'll quickly pinpoint the most CPU-intensive C++ code paths in this application. These code paths are the ones that consume the most CPU time and may affect llama.cpp's performance. • Text version of this tutorial: https://blog.openresty.com/en/llama-h... • OpenResty XRay is a dynamic-tracing product that automatically analyzes your running applications to troubleshoot performance problems, behavioral issues, and security vulnerabilities with actionable suggestions. • Under the hood, OpenResty XRay is powered by our Y language targeting various runtimes like Stap+, eBPF+, GDB, and ODB, depending on the contexts. • OpenResty XRay is a truly non-intrusive dynamic analysis, which does not require installing any special modules or plugins in the target application, does not require recompiling the target application, and even does not require restarting the running process. • llama.cpp and LLaMA 2 are projects that make large language models (LLMs) more accessible and efficient for everyone. llama.cpp is a port of Meta’s LLaMA model in C/C++. LLaMA 2 is a family of generative text models that are fine-tuned for programming tasks and use grouped-query attention. However, these models use a lot of CPU resources. • Music: https://www.bensound.com • 0:00 Problem: high CPU usage • 1:04 Open OpenResty XRay web console • 1:24 Use the guidede analysis feature of OpenResty XRay to spot the hottest C++ code paths • 2:23 Analysis Report • 4:04 Use vim editor • 5:00 What is OpenResty XRay

#############################









Content Report
Youtor.org / Youtor.org Torrents YT video Downloader © 2024

created by www.mixer.tube