分布式
CockroachDB: The Definitive Guide 豆瓣
作者: Jesse Seldess / Ben Darnell O'Reilly Media 2022 - 5
Get the lowdown on CockroachDB, the elastic SQL database built to handle the demands of today's data-driven world. With this practical guide, software developers, architects, and DevOps teams will discover the advantages of building on a distributed SQL database. You'll learn how to create applications that scale elastically and provide seamless delivery for end users while remaining exceptionally resilient and indestructible.
Written from scratch for the cloud and architected to scale elastically to handle the demands of cloud native and open source, CockroachDB makes it easier to build and scale modern applications. If you're familiar with distributed systems, you'll quickly discover the benefits of strong data correctness and consistency guarantees as well as optimizations for delivering ultralow latencies to globally distributed end users.
With this thorough guide, you'll learn how to:
Plan and build applications for distributed infrastructure, including data modeling and schema design
Migrate data into CockroachDB
Read and write data and run ACID transactions across distributed infrastructure
Optimize queries for performance across geographically distributed replicas
Plan a CockroachDB deployment for resiliency across single-region and multiregion clusters
Secure, monitor, and optimize your CockroachDB deployment
Designing Data-Intensive Applications 豆瓣 Goodreads
9.4 (22 个评分) 作者: Martin Kleppmann O'Reilly Media 2017 - 4
Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords?
In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain the same. With this book, software engineers and architects will learn how to apply those ideas in practice, and how to make full use of data in modern applications.
Peer under the hood of the systems you already use, and learn how to use and operate them more effectively
Make informed decisions by identifying the strengths and weaknesses of different tools
Navigate the trade-offs around consistency, scalability, fault tolerance, and complexity
Understand the distributed systems research upon which modern databases are built
Peek behind the scenes of major online services, and learn from their architectures
Flink内核原理与实现 豆瓣
作者: 冯飞 / 崔鹏云 2020 - 9
《Flink内核原理与实现》是一本机械工业出版社的图书,作者是冯飞、崔鹏云、陈冠华三位大数据专家,从系统整体视角出发,既讲解了Flink的入门、安装、流计算开发入门、监控运维等基础知识,又讲解了Flink的时间概念、Window的实现原理及其代码解析,Flink的容错机制原理,容错的关键设计、代码实现分析,作业从源码到执行整个过程的解析, 作业的调度策略、资源管理、类型和序列化系统、内存管理、类数据交换的关键设计和代码实现分析,RPC通信框架等深度内容。
《Flink内核原理与实现》适合对实时计算感兴趣的大数据开发、运维领域的从业人员阅读,此外对机器学习工程技术人员也有所帮助。
Spark技术内幕 豆瓣
作者: 张安站 2015 - 9
Spark是不断壮大的大数据分析解决方案家族中备受关注的新增成员。它不仅为分布式数据集的处理提供一个有效框架,而且以高效的方式处理分布式数据集。它支持实时处理、流处理和批处理,提供了AllinOne的统一解决方案,使得Spark极具竞争力。
本书以源码为基础,深入分析Spark内核的设计理念和架构实现,系统讲解各个核心模块的实现,为性能调优、二次开发和系统运维提供理论支持;本文最后以项目实战的方式,系统讲解生产环境下Spark应用的开发、部署和性能调优。
Hadoop技术内幕 豆瓣
作者: 董西成 2013 - 11
本书从应用角度系统讲解了YARN的基本库和组件用法、应用程序设计方法、YARN上流行的各种计算框架(MapReduce、Tez、Storm、Spark),以及多个类YARN的开源资源管理系统(Corona和Mesos);从源代码角度深入分析YARN的设计理念与基本架构、各个组件的实现原理,以及各种计算框架的实现细节。
全书共四部分13章:第一部分(第1~2章)主要介绍了如何获取、阅读和调试Hadoop的源代码,以及YARN的设计思想、基本架构和工作流程;第二部分(第3~7章)结合源代码详细剖析和讲解了YARN的第三方开源库、底层通信库、服务库、事件库的基本使用和实现细节,详细讲解了YARN的应用程序设计方法,深入讲解和分析了ResourceManager、资源调度器、NodeManager等组件的实现细节;第三篇(第8~10章)则对离线计算框架MapReduce、DAG计算框架Tez、实时计算框架Storm和内存计算框架Spark进行了详细的讲解;第四部分(第11~13章)首先对Facebook Corona和Apache Mesos进行了深入讲解,然后对YARN的发展趋势进行了展望。附录部分收录了YARN安装指南、YARN配置参数以及Hadoop Shell命令等非常有用的资料。
Microservices with Spring Boot and Spring Cloud, 2nd Edition 豆瓣
作者: Magnus Larsson Packt Publishing 2021 - 7
A step-by-step guide to creating and deploying production-quality microservices-based applications
Key Features
Build cloud-native production-ready microservices with this comprehensively updated guide
Understand the challenges of building large-scale microservice architectures
Learn how to get the best out of Spring Cloud, Kubernetes, and Istio in combination
Book Description
With this book, you'll learn how to efficiently build and deploy microservices. This new edition has been updated for the most recent versions of Spring, Java, Kubernetes, and Istio, demonstrating faster and simpler handling of Spring Boot, local Kubernetes clusters, and Istio installation. The expanded scope includes native compilation of Spring-based microservices, support for Windows & Mac, and an introduction to Helm 3 for packaging and deployment. A revamped security chapter now follows the OAuth 2.1 specification and makes use of the newly launched Spring Authorization Server from the Spring team.
Starting with a set of simple cooperating microservices, you'll add persistence and resilience, make your microservices reactive, and document their APIs using Swagger/OpenAPI.
You’ll understand how fundamental design patterns are applied to add important functionality, such as service discovery with Netflix Eureka and edge servers with Spring Cloud Gateway. You’ll learn how to deploy your microservices using Kubernetes and adopt Istio. You'll explore centralized log management using the Elasticsearch, Fluentd, and Kibana (EFK) stack and monitor microservices using Prometheus and Grafana.
By the end of this book, you'll be confident in building microservices that are scalable and robust using Spring Boot and Spring Cloud.
What you will learn
Build reactive microservices using Spring Boot
Develop resilient and scalable microservices using Spring Cloud
Use OAuth 2.1/OIDC and Spring Security to protect public APIs
Implement Docker to bridge the gap between development, testing, and production
Deploy and manage microservices with Kubernetes
Apply Istio for improved security, observability, and traffic management
Write and run manual and automated microservice tests with JUnit, testcontainers, Gradle, and bash
Who This Book Is For
This book is intended for Java and Spring developers and architects who want to learn how to build microservice landscapes from the ground up and deploy them either on-premises or in the cloud, using Kubernetes as a container orchestrator and Istio as a service mesh.
No familiarity with microservices architecture is required to get started with this book.
分布式系统与一致性 豆瓣
作者: 陈东明 2021 - 6
一致性是非常重要的分布式技术。众所周知,分布式系统有很多特性,如可用性、可靠性等,这些特性多多少少会与一致性产生关系,受到一致性的影响。要全面研究、掌握分布式技术,一致性是绕不开的一个话题,也是最难解决的一个问题。本书主要介绍GFS、HDFS、BigTable、MongoDB、RabbitMQ、ZooKeeper、Spanner、CockroachDB系统与一致性有关的实现细节,以及非常重要的Paxos、Raft、Zab分布式算法;本书还介绍了事务一致性与隔离级别、顺序一致性、线性一致性与强一致性相关内容,以及架构设计中的权衡等。
从分布式技术的角度来说,本书讲解了分布式领域比较高阶的内容,但是从分布式一致性的角度来说,本书仍然是一致性的入门书。
2021年6月26日 已读
微信读书借阅 过了一遍框架和protocol。其实学一下6.824就都有更深入的理解。
2021 IT 软件工程 分布式
Distributed Machine Learning Patterns 豆瓣 Goodreads
作者: Yuan Tang Manning Publications 2022 - 3
Practical patterns for scaling machine learning from your laptop to a distributed cluster.
In Distributed Machine Learning Patterns you will learn how to:
Apply distributed systems patterns to build scalable and reliable machine learning projects
Construct machine learning pipelines with data ingestion, distributed training, model serving, and more
Automate machine learning tasks with Kubernetes, TensorFlow, Kubeflow, and Argo Workflows
Make trade offs between different patterns and approaches
Manage and monitor machine learning workloads at scale
Distributed Machine Learning Patterns teaches you how to scale machine learning models from your laptop to large distributed clusters. In it, you’ll learn how to apply established distributed systems patterns to machine learning projects, and explore new ML-specific patterns as well. Firmly rooted in the real world, this book demonstrates how to apply patterns using examples based in TensorFlow, Kubernetes, Kubeflow, and Argo Workflows. Real-world scenarios, hands-on projects, and clear, practical DevOps techniques let you easily launch, manage, and monitor cloud-native distributed machine learning pipelines.
about the technology
Scaling up models from standalone devices to large distributed clusters is one of the biggest challenges faced by modern machine learning practitioners. Distributing machine learning systems allow developers to handle extremely large datasets across multiple clusters, take advantage of automation tools, and benefit from hardware accelerations. In this book, Kubeflow co-chair Yuan Tang shares patterns, techniques, and experience gained from years spent building and managing cutting-edge distributed machine learning infrastructure.
about the book
Distributed Machine Learning Patterns is filled with practical patterns for running machine learning systems on distributed Kubernetes clusters in the cloud. Each pattern is designed to help solve common challenges faced when building distributed machine learning systems, including supporting distributed model training, handling unexpected failures, and dynamic model serving traffic. Real-world scenarios provide clear examples of how to apply each pattern, alongside the potential trade offs for each approach. Once you’ve mastered these cutting edge techniques, you’ll put them all into practice and finish up by building a comprehensive distributed machine learning system.
Spring Cloud 豆瓣
作者: 杨开振 2020 - 5
《Spring Cloud微服务和分布式系统实践》从企业的真实需求出发,理论结合实际,深入讲解Spring Cloud微服务和分布式系统的知识。书中既包括Spring Cloud微服务的各类常用组件的讲解,又包括分布式系统的常用知识的介绍。Spring Cloud组件方面主要讲解服务注册和服务发现(Eureka)、服务调用(Ribbon和OpenFeign)、断路器(Hystrix和Resilience4j)、网关(Zuul和Gateway)、配置(Config)、全链路追踪(Sleuth)、微服务的监控(Admin)等;分布式系统方面主要讲解分布式数据库、分布式缓存、会话和权限以及发号机制等。本书的实践部分通过Apache Thrift讲解了远程过程调用(RPC)在分布式系统中的应用,并且分析了处理高并发的一些常用方法,最后还通过一个简单的实例讲解了微服务系统的搭建。
本书适合想要学习Spring Cloud微服务、分布式系统开发的各类Java开发人员阅读,包括初学者和开发工程师。本书对架构师也有一定的帮助。
Design Patterns for Cloud Native Applications: Patterns in Practice Using APIs, Data, Events, and Streams 豆瓣
O'Reilly Media Inc. 2021 - 6
With the immense cost savings and scalability the cloud provides, the rationale for building cloud native applications is no longer in question. The real issue is how. With this practical guide, developers will learn about the most commonly used design patterns for building cloud native applications using APIs, data, events, and streams in both greenfield and brownfield development.
You'll learn how to incrementally design, develop, and deploy large and effective cloud native applications that you can manage and maintain at scale with minimal cost, time, and effort. Authors Kasun Indrasiri and Sriskandarajah Suhothayan highlight use cases that effectively demonstrate the challenges you might encounter at each step.
Learn the fundamentals of cloud native applications
Explore key cloud native communication, connectivity, and composition patterns
Learn decentralized data management techniques
Use event-driven architecture to build distributed and scalable cloud native applications
Explore the most commonly used patterns for API management and consumption
Examine some of the tools and technologies you'll need for building cloud native systems
HBase原理与实践 豆瓣
作者: 胡争 / 范欣欣 机械工业出版社 2019 - 9
本书系统介绍HBase基本原理与运行机制,融入了作者多年的开发经验与实践技巧。主要内容包括:HBase的体系结构和系统特性,HBase的基础数据结构与算法、依赖服务、客户端,RegionServer的核心模块,HBase的读写流程,Compaction实现原理和使用策略,负载均衡的实现与应用,HBase的宕机恢复原理,复制、备份与恢复原理,HBase的运维方法、系统调优与案例分析,最后介绍了HBase 2.x的核心技术,以及一些高级话题,如二级索引、单行事务、跨行事务、HBase开发与测试等。
2021年5月20日 在读 评分这么高 先从compaction读起来 有图总是好的
数据库 分布式
Kubernetes源码剖析 豆瓣
作者: 郑东旭 2020 - 6
《Kubernetes源码剖析 》主要分析了Kubernetes核心功能的实现原理,是一本帮助读者了解Kubernetes架构设计及内部原理实现的书。由于Kubernetes代码量较大,源码不容易理解,所以本书将梳理相关知识点,帮助读者快速学习。 本书共分为8章,第1章简要介绍了Kubernetes架构的核心组件,以及每个核心组件在架构中的作用;第2章主要介绍了Kubernetes构建过程中的源码实现;第3章主要介绍了Kubernetes的核心数据结构定义及围绕资源展开的核心功能;第4章主要介绍了kubectl命令行交互工具的实现机制;第5章主要介绍了client-go编程式交互工具的实现机制;第6章主要介绍了Etcd存储的核心实现;第7章主要介绍了kube-apiserver组件的核心实现;第8章主要介绍了kube-scheduler组件的核心实现。
Certified Kubernetes Application Developer (CKAD) Study Guide 豆瓣
作者: Benjamin Muschko O’Reilly Media, Inc. 2021 - 10
Developers with the ability to operate, troubleshoot, and monitor applications in Kubernetes are in high demand today. To meet this need, the Cloud Native Computing Foundation created a certification exam to establish a developer’s credibility and value in the job market to work in a Kubernetes environment.
The Certified Kubernetes Application Developer (CKAD) exam is different from the typical multiple-choice format of other certifications. Instead, the CKAD is a performance-based exam that requires deep knowledge of the tasks under immense time pressure.
This study guide walks you through all the topics you need to fully prepare for the exam covering Kubernetes 1.18. Author Benjamin Muschko also shares his personal experience with preparing for all aspects of the exam.
Learn when and how to apply Kubernetes concepts to manage an application
Understand the objectives, abilities, and tips and tricks needed to pass the CKAD exam
Explore the ins and outs of the kubectl command-line tool
Demonstrate competency for performing the responsibilities of a Kubernetes application developer
Solve real-world Kubernetes problems in a hands-on command-line environment
Navigate and solve questions during the CKAD exam
2021年1月23日 在读
爱了 最开头的tips & tricks 就看出这是实用的 很面向一次过的书
2021 分布式 软件工程
HBase不睡觉书 豆瓣
作者: 杨曦 2018 - 1
HBase是Apache旗下一个高可靠性、高性能、面向列、可伸缩的分布式存储系统。利用HBase技术可在廉价PC 服务器上搭建起大规模的存储化集群。使用HBase可以对数十亿级别的大数据进行实时性的高性能读写,在满足高性能的同时还保证了数据存取的原子性。
本书共分为10章,由浅入深的讲解HBase概念、安装、配置、部署。让读者对HBase先有一个感性认识,再从应用角度,介绍了高级用法、监控和性能调优。既兼顾了初学者也适用于想要深入学习HBase的读者。
本书适合于以前没有接触过HBase,或者了解HBase并希望能够深入掌握的读者,适合HBase应用开发人员和系统管理人员学习使用。
深入分布式缓存:从原理到实践 豆瓣
作者: 于君泽 / 曹洪伟 2017
这是国内首本从大型互联网系统的应用角度探讨分布式缓存的书籍,包含了原理、框架、架构、案例等多方面的视角。
互联网系统随着容量需求的陡增,许多看似简单的存储类场景都面临着巨大的容量和稳定性风险,而其中的大部分都可以通过对缓存的合理使用来规避。读者从对本书的阅读当中,将会获得应对这些问题的经验,也会对分布式缓存有一个体系化的认识。
本书内容共分为三个部分,按照从理论到实现,再到实践的思路撰写。
首先介绍分布式缓存的背景知识,对本书“分布式”和“缓存”这两个关键词进行了全面的综述,作为后续章节叙述的基础;
第二部分介绍业界主流的缓存,关注其原理与实现,囊括了Ehcache、Memcached、Redis、tair、EVCache、Aerospike等六个缓存或类缓存系统;
最后一部分讨论缓存在互联网系统中的实践,从广告、社交、新闻、电商、营销等五类典型的互联网应用入手,分析它们面临的性能稳定性问题以及如何利用分布式缓存解决这些问题
2020年10月14日 在读
最后面的事例可以看看 做面试用 timelime feed的schema和cache怎么用
分布式 软件工程
The Practitioner's Guide to Graph Data 豆瓣
作者: Matthias Broecheler / Denise Gosnell O'Reilly Media, Inc. 2020 - 4
This book will enable you to apply graph thinking to solve complex problems. If you want to learn how to build architectures for extracting value for your domain’s complex problems, then this book is for you.
You’ll learn how to think about your data as a graph, and how to determine if graph technology is right for your application. The book describes techniques for scalable, real-time, and multimodel architectures that solve complex problems, and shows how companies are successfully applying graph thinking in distributed production environments.
Authors Denise Koessler Gosnell and Matthias Broecheler also introduce the Graph Schema Language, a set of terminology and visual illustrations to normalize how graph practitioners communicate conceptual graph models, graph schema, and graph database design.
Cloud Native Spring in Action 豆瓣
作者: Thomas Vitale Manning Publications 2021 - 6
Cloud Native Spring in Action teaches you effective Spring and Kubernetes cloud development techniques that you can immediately apply to enterprise-grade applications. It takes you step by step from your first idea through to production, showing how cloud native development can add business value at every stage of the software development lifecycle. As you develop an online bookshop, you’ll learn how to build and test a cloud native app with Spring, containerize it with Docker, and deploy it to the public cloud with Kubernetes. Including coverage of security, continuous delivery, and configuration, this hands-on guide is the perfect primer for navigating the increasingly complex cloud landscape.
Data Algorithms with Spark 豆瓣
作者: Mahmoud Parsian O'Reilly Media, Inc. 2021
Apache Spark’s speed, ease of use, sophisticated analytics, and multilanguage support makes practical knowledge of this cluster-computing framework a required skill for data engineers and data scientists. With this hands-on guide, anyone looking for an introduction to Spark will learn practical algorithms and examples for this framework using PySpark.
In each chapter, author Mahmoud Parsian shows you how to solve a data problem with a set of Spark transformations and algorithms. You’ll learn how to tackle problems involving ETL, design patterns, machine learning algorithms, data partitioning, and genomics analysis. Each detailed recipe includes PySpark algorithms using the PySpark driver and shell script.
With this book, you will:
Learn how to select Spark transformations for optimized solutions
Explore powerful transformations and reductions including reduceByKey(), combineByKey(), and mapPartitions()
Understand data partitioning for optimized queries
Design machine learning algorithms including Naive Bayes, linear regression, and logistic regression
Build and apply a model using PySpark design patterns
Apply motif finding algorithms to graph data
Analyze graph data by using the GraphFrames API
Apply PySpark algorithms to clinical and genomics data (such as DNA-Seq)
Mastering Kafka Streams and ksqlDB 豆瓣
作者: Mitch Seymour O'Reilly Media, Inc. 2021 - 3
Working with unbounded and fast-moving data streams has historically been difficult. But with Kafka Streams and ksqlDB, building stream processing applications is easy and fun. This practical guide explores the world of real-time data systems through the lens of these popular technologies and explains important stream processing concepts against a backdrop of interesting business problems.
Mitch Seymour, senior data systems engineer at Mailchimp, introduces you to both Kafka Streams and ksqlDB so that you can choose the best tool for each unique stream processing project. Non-Java developers will find the ksqlDB path to be an especially gentle introduction to stream processing. In this book, you’ll learn:
Basic and advanced uses of Kafka Streams and ksqlDB
How to transform, enrich, and process event streams
How to build both stateless and stateful stream processing applications
The different notions of time and the role it plays in stream processing
How to to build event-driven microservices on top of continuous event streams
Features, operational characteristics, deployment patterns, and configuration tips for both technologies