Local Retrieval Augmented Generation RAG from Scratch step by step tutorial
>> YOUR LINK HERE: ___ http://youtube.com/watch?v=qN_2fnOPY-M
In this video we'll build a Retrieval Augmented Generation (RAG) pipeline to run locally from scratch. • There are frameworks to do this such as LangChain and LlamaIndex, however, building from scratch means that you'll know all the parts of the puzzle. • Specifically, we'll build NutriChat, a RAG pipeline that allows someone to ask questions of a 1200 page Nutrition Textbook PDF. • Code on GitHub - https://github.com/mrdbourke/simple-l... • Whiteboard - https://whimsical.com/simple-local-ra... • Be sure to check out NVIDIA GTC, NVIDIA's GPU Technology Conference running from March 18-21. It's free to attend virtually! That's what I'm doing. • Sign up to GTC24 here: https://nvda.ws/3GUZygQ • Other links: • Download Nutrify (take a photo of food and learn about it) - https://nutrify.app • Learn AI/ML (beginner-friendly course) - https://dbourke.link/ZTMMLcourse • Learn TensorFlow - https://dbourke.link/ZTMTFcourse • Learn PyTorch - https://dbourke.link/ZTMPyTorch • AI/ML courses/books I recommend - https://www.mrdbourke.com/ml-resources/ • Read my novel Charlie Walks - https://www.charliewalks.com • Connect elsewhere: • Web - https://dbourke.link/web • Twitter - / mrdbourke • Twitch - / mrdbourke • ArXiv channel (past streams) - https://dbourke.link/archive-channel • Get email updates on my work - https://dbourke.link/newsletter • Timestamps: • 0:00 - Intro/NVIDIA GTC • 2:25 - Part 0: Resources and overview • 8:33 - Part 1: What is RAG? Why RAG? Why locally? • 12:26 - Why RAG? • 19:31 - What can RAG be used for? • 26:08 - Why run locally? • 30:26 - Part 2: What we're going to build • 40:40 - Original Retrieval Augmented Generation paper • 46:04 - Part 3: Importing and processing a PDF document • 48:29 - Code starts! Importing a PDF and making it readable • 1:17:09 - Part 4: Preprocessing our text into chunks (text splitting) • 1:28:27 - Chunking our sentences together • 1:56:38 - Part 5: Embedding creation • 1:58:15 - Incredible embeddings resource by Vicky Boykis • 1:58:49 - Open-source embedding models • 2:00:00 - Creating our embedding model • 2:09:02 - Creating embeddings on CPU vs GPU • 2:25:14 - Part 6: Retrieval (semantic search) • 2:36:57 - Troubleshooting np.fromstring (live!) • 2:41:41 - Creating a small semantic search pipeline • 2:56:25 - Showcasing the speed of vector search on GPUs • 3:20:02 - Part 7: Similarity measures between embedding vectors explained • 3:35:58 - Part 8: Functionizing our semantic search pipeline (retrieval) • 3:46:49 - Part 9: Getting a Large Langue Model (LLM) to run locally • 3:47:51 - Which LLM should I use? • 4:03:18 - Loading a LLM locally • 4:30:48 - Part 10: Generating text with our local LLM • 4:49:09 - Part 11: Augmenting our prompt with context items • 5:16:04 - Part 12: Functionizing our text generation step (putting it all together) • 5:33:57 - Summary and potential extensions • 5:36:00 - Leave a comment if there's any other videos you'd like to see! • #ai #rag #GTC24
#############################
![](http://youtor.org/essay_main.png)