Nomic Atlas Webinar Series 10 Million Documents of Text
>> YOUR LINK HERE: ___ http://youtube.com/watch?v=puTrPUMjgdM
00:00 Introduction Agenda • 1:00 Atlas - Twitter Dataset • 2:30 - Data Over Time in Atlas • 6:20 - How to Get Started • 7:30 - Obelics Multimodal Dataset • 9:50 - Model's Loss on Training Data • 11:50 - Content Moderation • 13:20 - Webinar Demo Dataset • 13:30 - Autogenerated Topics • 15:40 - Python Client • 20:00 - Data Diversity • 24:00 - De-duplication Detection • 26:30 - Embeddings Support • 29:30 - Tagging • 32:20 - Tiktok Dataset • 34:00 - Closing Webinar Summary • 34:45 - Semantic Data • 35:30 - Dataset Scale • 38:00 Future Webinars for Other Modalities • • • Summary • What to expect? • Dealing with unstructured text data, from initially sourcing it to comprehending and processing the dataset, and subsequently preparing it for generative AI training, has traditionally been a costly and time-consuming process. As more text data comes online and more models are developed, curating and understanding data becomes even more vital than in the past. Until Nomic Atlas was released, the tools available were not suited to meet the needs of Generative AI builders or the data market as a whole. Nomic Atlas is leading the charge on this front, creating a new category of ML-powered unstructured data curation. • • Nomic Atlas is the only platform in the world that lets anyone work with 10M+ documents of text, images, videos, embeddings, and audio revolutionizing how unstructured data is structured. Atlas quickly enables users to find what they are looking for in unstructured data without any effort. Other benefits include the ability to remove anomalies to build better quality models faster, improve data labeling, and enhance collaboration with SMEs and non-experts easily. • In this webinar Nomic Atlas will discuss: • The challenges of preparing text datasets for generative AI model training • What Nomic Atlas is and why it is important for DS, MLE, and non-subject matter experts • How automated structuring of unstructured text data at a massive scale unlocks value. We will highlight Nomic Atlas unstructured data capabilities like real-time text search, topic modeling, dataset collaboration, API and Jupyter integration, temporal and metadata filtering at the 10M text document scale. • How with Nomic Atlas the entire company become unstructured data pros to reach insights and collaborate with blazing speed on unstructured text
#############################