PySpark in Action

豆瓣 谷歌图书 Goodreads
PySpark in Action

登录后可管理标记收藏。

相关收藏单

rexarski 的 2025 backlog

ISBN: 9781617297205
作者: Jonathan Rioux
出版社: Manning Publications
发行时间: 2020 -10
语言: 英语
装订: Paperback
价格: USD 49.99
页数: 425

/ 10

0 个评分

评分人数不足
借阅或购买

Python data analysis at scale

Jonathan Rioux   

简介

PySpark in Action is a carefully engineered tutorial that helps you use PySpark to deliver your data-driven applications at any scale. This clear and hands-on guide shows you how to enlarge your processing capabilities across multiple machines with data from any source, ranging from Hadoop-based clusters to Excel worksheets. You’ll learn how to break down big analysis tasks into manageable chunks and how to choose and use the best PySpark data abstraction for your unique needs. By the time you’re done, you’ll be able to write and run incredibly fast PySpark programs that are scalable, efficient to operate, and easy to debug.
what's inside
Packaging your PySpark code
Managing your data as it scales across multiple machines
Re-writing Pandas, R, and SAS jobs in PySpark
Troubleshooting common data pipeline problems
Creating reliable long-running jobs

短评
评论
笔记