PySpark Python API for Spark











############################# Video Source: www.youtube.com/watch?v=xc7Lc8RA8wE

UC Berkeley AmpLab member Josh Rosen, presents PySpark. PySpark is the new Python API for Spark which is available in release 0.7 This presentation was given at the Spark meetup at Conviva in San Mateo, Ca on Feb 21st 2013. Download here http://spark-project.org/downloads/ • Summary: • 00:33 What is Spark? • 03:00 What is PySpark? • 03:45 Example Word Count • 04:35 Demonstration of interactive shell on AWS EC2 • 06:22 tracking time elapsed, %time berkeley_pages.count() • 06:37 Spark web interface • 09:14 Distributing data, sc.parallelize • 11:20 API documentation • 11:27 Python doctest, create tests from interactive samples • 11:58 Example kmeans.py, k-means clustering • 12:39 Getting help help(sc) • 13:00 Example wordcount.py • 13:18 PySpark Implementation details • 14:15 PySpark less than 2K lines including comments • 17:18 Pickled Objects, RDD[Array[Byte]] • 17:44 Batching Pickle to reduce overhead • 18:00 Consolidating operations into single pass when possible • 19:27 PySpark Roadmap, • adding sorting support, file formats such as csv, PyPy JIT

#############################









Content Report
Youtor.org / Youtor.org Torrents YT video Downloader © 2024

created by www.mixer.tube