NeoDB - 黑色外星皮鸽 - 标签

Hadoop: The Definitive Guide 豆瓣

作者: Tom White O'Reilly Media 2015 - 4

Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters.
Using Hadoop 2 exclusively, author Tom White presents new chapters on YARN and several Hadoop-related projects such as Parquet, Flume, Crunch, and Spark. You’ll learn about recent changes to Hadoop, and explore new case studies on Hadoop’s role in healthcare systems and genomics data processing.
Learn fundamental components such as MapReduce, HDFS, and YARN
Explore MapReduce in depth, including steps for developing applications with it
Set up and maintain a Hadoop cluster running HDFS and MapReduce on YARN
Learn two data formats: Avro for data serialization and Parquet for nested data
Use data ingestion tools such as Flume (for streaming data) and Sqoop (for bulk data transfer)
Understand how high-level data processing tools like Pig, Hive, Crunch, and Spark work with Hadoop
Learn the HBase distributed database and the ZooKeeper distributed configuration service

Hadoop Application Architectures 豆瓣

作者: Mark Grover / Ted Malaska … O'Reilly Media 2015 - 4

With Early Release ebooks, you get books in their earliest form — the author's raw and unedited content as he or she writes — so you can take advantage of these technologies long before the official release of these titles. You'll also receive updates when significant changes are made, new chapters as they're written, and the final ebook bundle.
Get expert guidance on architecting end-to-end data management solutions with Apache Hadoop. While many sources explain how to use various components in the Hadoop ecosystem, this practical book takes you through architectural considerations necessary to tie those components together into a complete tailored application, based on your particular use case.
To reinforce those lessons, the book’s second section provides detailed examples of architecture used in some of the most commonly found Hadoop applications. Whether you’re designing and implementing a new Hadoop application, or planning to integrate Hadoop into your existing data infrastructure, Hadoop Application Architectures will skillfully guide you through the process.
The Early Release edition begins with chapters that concentrate on design considerations for Data Modeling and Data Movement in Hadoop:
Explore whether your application should store data on Hadoop Distributed File System (HDFS) or HBase
Get best practices for designing an HDFS or HBase schema
Learn how to design schemas for SQL-on-Hadoop (e.g. Hive, Impala, HCatalog) tables

Learning Spark 豆瓣

作者: Holden Karau / Andy Konwinski … O'Reilly Media 2015 - 2

Designing Data-Intensive Applications 豆瓣 Goodreads

9.4 (22 个评分) 作者: Martin Kleppmann O'Reilly Media 2017 - 4

Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?
In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.
Peer under the hood of the systems you already use, and learn how to use and operate them more effectively
Make informed decisions by identifying the strengths and weaknesses of different tools
Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity
Understand the distributed systems research upon which modern databases are built
Peek behind the scenes of major online services, and learn from their architectures

大数据

Hadoop: The Definitive Guide 豆瓣

Hadoop Application Architectures 豆瓣

Learning Spark 豆瓣

Designing Data-Intensive Applications 豆瓣 Goodreads