Apache Spark RDD Basics What is RDD How to create an RDD











>> YOUR LINK HERE: ___ http://youtube.com/watch?v=NRo8TluH7KI

ATTENTION DATA SCIENCE ASPIRANTS: • Click Below Link to Download Proven 90-Day Roadmap to become a Data Scientist in 90 days • https://www.bigdataelearning.com/the-... • Apache Spark Courses : https://www.bigdataelearning.com/courses • Official Website : https://bigdataelearning.com • Learning Objectives :: In this module, you will learn what RDD is. You will also learn 2 ways to create an RDD. This video also shows how to create an RDD through spark shell. • Topics :: Apache Spark RDD Basics: what is RDD, • How to create an RDD: 2 ways to create an RDD • what is RDD ? • ============ • RDD is the spark's core abstraction, which is resilient distributed dataset. RDD is the immutable ,distributed, collection of objects. Internally spark distributes the data in RDD, to different nodes across the cluster to achieve parallelization. • RDD creation • =========== • There are 2 ways to create RDD. one way is by • By loading external dataset: • -------------------------------------------- • for example if there is a dataset books.txt and we need to create RDD on that, we can pass the fully qualified dataset name within double quotes, to the textfile method of spark context object. then we can assign it to an RDD called booksRDD • the other way is by parallelizing collection of objects: • for example here we are creating a list, which contains 2 elements, 'red' and 'blue' and passing the list to parallelize method of sparkcontext object. this is assigned to an RDD called colorsRDD.

#############################









Content Report
Youtor.org / YTube video Downloader © 2025

created by www.youtor.org