rexarski 的 2025 backlog

Effective Pandas 2: Opinionated Patterns for Data Manipulation [图书] Goodreads

Independently published 2024 - 1

PySpark in Action [图书] 豆瓣 谷歌图书 Goodreads

作者: Jonathan Rioux Manning Publications 2020 - 10

PySpark in Action is a carefully engineered tutorial that helps you use PySpark to deliver your data-driven applications at any scale. This clear and hands-on guide shows you how to enlarge your processing capabilities across multiple machines with data from any source, ranging from Hadoop-based clusters to Excel worksheets. You’ll learn how to break down big analysis tasks into manageable chunks and how to choose and use the best PySpark data abstraction for your unique needs. By the time you’re done, you’ll be able to write and run incredibly fast PySpark programs that are scalable, efficient to operate, and easy to debug.
what's inside
Packaging your PySpark code
Managing your data as it scales across multiple machines
Re-writing Pandas, R, and SAS jobs in PySpark
Troubleshooting common data pipeline problems
Creating reliable long-running jobs

LLM Engineer's Handbook: Master the art of engineering large language models from concept to production [图书] Goodreads

作者: Paul Iusztin / Maxime Labonne … Packt Publishing 2024 - 10

Step into the world of LLMs with this practical guide that takes you from the fundamentals to deploying advanced applications using LLMOps best practices

Key FeaturesBuild and refine LLMs step by step, covering data preparation, RAG, and fine-tuningLearn essential skills for deploying and monitoring LLMs, ensuring optimal performance in productionUtilize preference alignment, evaluation, and inference optimization to enhance performance and adaptability of your LLM applicationsBook DescriptionArtificial intelligence has undergone rapid advancements, and Large Language Models (LLMs) are at the forefront of this revolution. This LLM book offers insights into designing, training, and deploying LLMs in real-world scenarios by leveraging MLOps best practices. The guide walks you through building an LLM-powered twin that’s cost-effective, scalable, and modular. It moves beyond isolated Jupyter notebooks, focusing on how to build production-grade end-to-end LLM systems.

Throughout this book, you will learn data engineering, supervised fine-tuning, and deployment. The hands-on approach to building the LLM Twin use case will help you implement MLOps components in your own projects. You will also explore cutting-edge advancements in the field, including inference optimization, preference alignment, and real-time data processing, making this a vital resource for those looking to apply LLMs in their projects.

By the end of this book, you will be proficient in deploying LLMs that solve practical problems while maintaining low-latency and high-availability inference capabilities. Whether you are new to artificial intelligence or an experienced practitioner, this book delivers guidance and practical techniques that will deepen your understanding of LLMs and sharpen your ability to implement them effectively.

What you will learnImplement robust data pipelines and manage LLM training cyclesCreate your own LLM and refine it with the help of hands-on examplesGet started with LLMOps by diving into core MLOps principles such as orchestrators and prompt monitoringPerform supervised fine-tuning and LLM evaluationDeploy end-to-end LLM solutions using AWS and other toolsDesign scalable and modularLLM systemsLearn about RAG applications by building a feature and inference pipelineWho this book is forThis book is for AI engineers, NLP professionals, and LLM engineers looking to deepen their understanding of LLMs. Basic knowledge of LLMs and the Gen AI landscape, Python and AWS is recommended. Whether you are new to AI or looking to enhance your skills, this book provides comprehensive guidance on implementing LLMs in real-world scenarios

Table of ContentsUnderstanding the LLM Twin Concept and ArchitectureTooling and InstallationData EngineeringRAG Feature PipelineSupervised Fine-TuningFine-Tuning with Preference AlignmentEvaluating LLMsInference OptimizationRAG Inference PipelineInference Pipeline DeploymentMLOps and LLMOps

Basketball Analytics: Objective and Efficient Strategies for Understanding How Teams Win [图书] Goodreads

作者: Stephen M. Shea / Christopher E. Baker CreateSpace Independent Publishing Platform 2013 - 11

Basketball Analytics is a must-read for any sports analytics enthusiast or student of the game of basketball. Authors Stephen Shea, Ph.D. (Professor of Mathematics) and Christopher Baker (Software Engineer) utilize their unique skill-set to introduce original metrics for analyzing player performance, team style and team construction in the NBA. While demonstrating an awareness of the industry’s best ideas, the authors present original, objective and efficient strategies for understanding how teams win. New player performance statistics include Offensive Efficiency (OE), Efficient Offensive Production (EOP), Defensive Stops Gained (DSG), and Approximate Value (AV). OE reflects a player’s ability to make the most fundamental offensive decisions. EOP adjusts a player’s points and assists based on his efficiency. DSG gives a complete measure of a player’s defensive contributions, without relying on individual player statistics like blocks and steals. AV is a measure of total player performance that rivals any publicly available statistic. Basketball Analytics introduces groundbreaking metrics on player involvement in the offense. Point, Rebound and Assist Balance aggregate player usage in these critical statistics. New studies on the NBA show whether teams should strive for balance or unbalance. An NBA draft pick value study determines the average value of each pick and the likelihood of landing a star or role player with each draft position. The results of this study are used to discuss topics including the biggest draft blunders and steals, the draft success of each NBA team, and the quality of each draft class dating back to 1977. This valuable understanding of the NBA Draft creates a foundation for discussing various approaches to team development and construction. Additionally, the authors discuss redefining the positions on the court, unpredictability in the game, data visualization, and applications of spatial tracking technology. There are many intensely debated questions surrounding the NBA today. Who are the most valuable players, and how do they compare to past greats? Which players have the greatest impact on their team’s defense? Should Kobe Bryant be concerned with getting his teammates involved in the offense? How do offenses differ in the clutch, and which players thrive in these situations? How difficult is it for a team to rebuild through the draft? Basketball Analytics introduces new statistics and new concepts to explore these questions and more.

Designing Machine Learning Systems [图书] 豆瓣

作者: Chip Huyen O’Reilly 2022 - 6

Machine learning systems are both complex and unique. They are complex because they consist of many different components and involve many different stakeholders. They are unique because they are data-dependent, and data varies wildly from one use case to the next.
This book takes a holistic approach to designing machine learning systems that are reliable, scalable, maintainable, and adaptive to changing environments and business requirements. It considers each design decision — e.g. how to create training data, what features to include, how to deploy, what to monitor, how often to retrain your model — in the context of how it can help the system as a whole achieve its objectives. The iterative framework laid out in this book is illustrated using actual case studies and backed by ample references.
Examples of the scenarios that this book will be able to help you tackle.
You have been given a business problem and a lot of raw data. You want to engineer this data and choose the right metrics to solve this problem.
Your initial models perform well in offline experiments and you want to deploy them.
You have little feedback on how your models are performing after your models are deployed, and you want to figure out a way to quickly detect, debug, and address any issue your models might run into in production.
The process of developing, evaluating, deploying, and updating models for your team has been mostly manual, slow, and error-prone. You want to automate and improve this process.
Each machine learning use case in your organization has been deployed using its own workflow, and you want to lay down the foundation (e.g. model store, feature store, monitoring tools) that can be shared and reused across use cases.
You're worried that there might be biases in your machine learning systems and you want to make your systems responsible!
Read less

Fluent Python, 2nd Edition [图书] 豆瓣

作者: Luciano Ramalho O'Reilly Media, Inc. 2021 - 1

Python’s simplicity lets you become productive quickly, but often this means you aren’t using everything it has to offer. With the updated edition of this hands-on guide, you’ll learn how to write effective, modern Python 3 code by leveraging its best ideas.
Don’t waste time bending Python to fit patterns you learned in other languages. Discover and apply idiomatic Python 3 features beyond your past experience. Author Luciano Ramalho guides you through Python’s core language features and libraries and teaches you how to make your code shorter, faster, and more readable.
Featuring major updates throughout the book, Fluent Python, second edition, covers:
Special methods: The key to the consistent behavior of Python objects
Data structures: Sequences, dicts, sets, Unicode, and data classes
Functions as objects: First-class functions, related design patterns, and type hints in function declarations
Object-oriented idioms: Composition, inheritance, mixins, interfaces, operator overloading, static typing and protocols
Control flow: Context managers, generators, coroutines, async/await, and thread/process pools
Metaprogramming: Properties, attribute descriptors, class decorators, and new class metaprogramming hooks that are simpler than metaclasses

Build a Large Language Model (From Scratch) [图书] 谷歌图书

作者: Sebastian Raschka Simon and Schuster 2024 - 10

Learn how to create, train, and tweak large language models (LLMs) by building one from the ground up!

In Build a Large Language Model (from Scratch) bestselling author Sebastian Raschka guides you step by step through creating your own LLM. Each stage is explained with clear text, diagrams, and examples. You’ll go from the initial design and creation, to pretraining on a general corpus, and on to fine-tuning for specific tasks.

Build a Large Language Model (from Scratch) teaches you how to:

• Plan and code all the parts of an LLM
• Prepare a dataset suitable for LLM training
• Fine-tune LLMs for text classification and with your own data
• Use human feedback to ensure your LLM follows instructions
• Load pretrained weights into an LLM

Build a Large Language Model (from Scratch) takes you inside the AI black box to tinker with the internal systems that power generative AI. As you work through each key stage of LLM creation, you’ll develop an in-depth understanding of how LLMs work, their limitations, and their customization methods. Your LLM can be developed on an ordinary laptop, and used as your own personal assistant.

About the technology

Physicist Richard P. Feynman reportedly said, “I don’t understand anything I can’t build.” Based on this same powerful principle, bestselling author Sebastian Raschka guides you step by step as you build a GPT-style LLM that you can run on your laptop. This is an engaging book that covers each stage of the process, from planning and coding to training and fine-tuning.

About the book

Build a Large Language Model (From Scratch) is a practical and eminently-satisfying hands-on journey into the foundations of generative AI. Without relying on any existing LLM libraries, you’ll code a base model, evolve it into a text classifier, and ultimately create a chatbot that can follow your conversational instructions. And you’ll really understand it because you built it yourself!

What's inside

• Plan and code an LLM comparable to GPT-2
• Load pretrained weights
• Construct a complete training pipeline
• Fine-tune your LLM for text classification
• Develop LLMs that follow human instructions

About the reader

Readers need intermediate Python skills and some knowledge of machine learning. The LLM you create will run on any modern laptop and can optionally utilize GPUs.

About the author

Sebastian Raschka is a Staff Research Engineer at Lightning AI, where he works on LLM research and develops open-source software.

The technical editor on this book was David Caswell.

Table of Contents

1 Understanding large language models
2 Working with text data
3 Coding attention mechanisms
4 Implementing a GPT model from scratch to generate text
5 Pretraining on unlabeled data
6 Fine-tuning for classification
7 Fine-tuning to follow instructions
A Introduction to PyTorch
B References and further reading
C Exercise solutions
D Adding bells and whistles to the training loop
E Parameter-efficient fine-tuning with LoRA

S. [图书] 豆瓣 谷歌图书

S. - Ship of Theseus

7.4 (41 个评分) 作者: [美] J.J.艾布拉姆斯 / [美] 道格·道斯特译者: 颜湘如中信出版集团 2016 - 6 其它标题: 忒修斯之船

文豪身份扑朔迷离
她在图书馆拾获一本《忒修斯之船》，作者石察卡身份成谜，据译者柯岱拉描述，他尚未写完便人间蒸发，生死未卜，留给世人一宗悬案。有人用铅笔写下批注，追寻石察卡真相，她也忍不住拿起笔加入讨论。
文字谜题真实冒险
书里，失忆的男人被掳上一艘神秘的船，怪异的船员带着他进行毫无目的地却又屡屡预示他命运的航行；书外，石察卡笔下的每一桩背叛、争斗、屠杀都在真实世界中一一发生，而柯岱拉看似颠三倒四、漏洞百出的译注，竟也个个暗藏玄机。
字里行间杀机重重
两人交换批注，资料越积越多，也越来越走进彼此内心。当他们以为终于快要接近真相，竟发现第三人笔迹，书中人物、作家命运，连同两人的生死，早已一起卷入迷局之中。
《S.》包含精装古书《忒修斯之船》和23个材质各异的附件，这是两人穿越时空留下的第一手资料，也是你参与这一趟冒险的重要线索。这是一场超越纸书界限的极致阅读，你将成为悬疑事件的一分子，和两人一同揭开文坛最危险的秘密。
爱书的人啊，尽情坠落吧……

零时霓虹：Liam Wong 都市夜景摄影集 [图书] 豆瓣

TO:KY:OO

作者: （英）Liam Wong 著译者: 王怡人湖南美术出版社 2022

To:ky:oo [图书] 豆瓣

作者: Liam Wong Thames and Hudson Ltd 2019 - 10

Synopsis
Liam Wong's debut monograph, a cyberpunk-inspired exploration of nocturnal Tokyo.
'I want to take real moments and transform them into something surreal, to make the viewer question the reality depicted in each photograph. This body of work encompasses my three years as a photographer and ultimately the completion of my debut photo series.'
Liam Wong
A testament to the deep art of colour composition, this publication - art directed by Wong himself and produced to the highest printing standard - brings together a complete and refined body of images that are evocative, timeless and completely transporting. Rounding out the book's special treatment is the first publication use of the 45/90 font, designed by Henrik Kubel, of London-based A2-TYPE. The book also features a section that reveals the creative and technical process of Wong's method, from identifying the right scene to composition, from capturing the essence of a moment to enhancing colour values and deepening an image's impact - insights are invaluable to admirers and photography enthusiasts alike.

Effective Polars: Optimized Data Manipulation for Polars 1.0 [图书] Goodreads

作者: Matt Harrison / Anique Khawar … Independently published 2024 - 7

How To Make The Best Coffee At Home [图书] Goodreads

Mitchell Beazley 2022 - 10

World-leading coffee expert and best-selling author of The World Atlas of Coffee shows you how to make barista-level coffee at home We all expect to be able to buy an excellent cup of coffee from the many brilliant coffee shops available. But what about the coffee we make at home? Shouldn't that be just as good? James Hoffmann is an entrepreneur and the international name in coffee, combining expert-level knowledge with a wonderful ability to communicate it. James runs Square Mile Coffee, as well as creating extremely informative, and popular, coffee and equipment reviews for his YouTube and Instagram channels. In his latest book he demonstrates everything you need to know to make consistently excellent coffee at home, including: what equipment is worth buying, and what isn't; how to grind coffee; the basics of brewing for all major equipment (cafetiere, aeropress, stovetop etc); understanding coffee drinks, from the cortado to latte and the perfect espresso.

Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter [图书] Goodreads

O'Reilly Media 2022 - 9

Finding great data analysts is difficult. Despite the explosive growth of data in industries ranging from manufacturing and retail to high technology, finance, and healthcare, learning and accessing data analysis tools has remained a challenge. This pragmatic guide will help train you in one of the most important tools in the field—Python. Filled with practical case studies, Python for Data Analysis demonstrates the nuts and bolts of manipulating, processing, cleaning, and crunching data with Python. It also serves as a modern introduction to scientific computing in Python for data-intensive applications. Learn about the growing field of data analysis from an expert in the community. Learn everything you need to start doing real data analysis work with Python Get the most complete instruction on the basics of the “modern scientific Python platform” Learn from an insider who builds tools for the scientific stack Get an excellent introduction for novices and a wealth of advanced methods for experienced analysts

Learning Spark [图书] 谷歌图书

作者: Jules S. Damji / Brooke Wenig … "O'Reilly Media, Inc." 2020 - 07

Data is bigger, arrives faster, and comes in a variety of formats—and it all needs to be processed at scale for analytics or machine learning. But how can you process such varied workloads efficiently? Enter Apache Spark.Updated to include Spark 3.0, this second edition shows data engineers and data scientists why structure and unification in Spark matters. Specifically, this book explains how to perform simple and complex data analytics and employ machine learning algorithms. Through step-by-step walk-throughs, code snippets, and notebooks, you’ll be able to:Learn Python, SQL, Scala, or Java high-level Structured APIsUnderstand Spark operations and SQL EngineInspect, tune, and debug Spark operations with Spark configurations and Spark UIConnect to data sources: JSON, Parquet, CSV, Avro, ORC, Hive, S3, or KafkaPerform analytics on batch and streaming data using Structured StreamingBuild reliable data pipelines with open source Delta Lake and SparkDevelop machine learning pipelines with MLlib and productionize models using MLflow

Trustworthy Online Controlled Experiments [图书] 豆瓣

作者: Ron Kohavi / Diane Tang … Cambridge University Press 2020 - 5

Getting numbers is easy; getting numbers you can trust is hard. This practical guide by experimentation leaders at Google, LinkedIn, and Microsoft will teach you how to accelerate innovation using trustworthy online controlled experiments, or A/B tests. Based on practical experiences at companies that each run more than 20,000 controlled experiments a year, the authors share examples, pitfalls, and advice for students and industry professionals getting started with experiments, plus deeper dives into advanced topics for practitioners who want to improve the way they make data-driven decisions. Learn how to • Use the scientific method to evaluate hypotheses using controlled experiments • Define key metrics and ideally an Overall Evaluation Criterion • Test for trustworthiness of the results and alert experimenters to violated assumptions • Build a scalable platform that lowers the marginal cost of experiments close to zero • Avoid pitfalls like carryover effects and Twyman's law • Understand how statistical issues play out in practice.

Feature Engineering and Selection [图书] 豆瓣

作者: Max Kuhn / Kjell Johnson Chapman and Hall/CRC 2019 - 8

The process of developing predictive models includes many stages. Most resources focus on the modeling algorithms but neglect other critical aspects of the modeling process. This book describes techniques for finding the best representations of predictors for modeling and for finding the best subset of predictors for improving model performance. A variety of example data sets are used to illustrate the techniques along with R programs for reproducing the results.

Forecasting [图书] 豆瓣

作者: Rob J Hyndman / George Athanasopoulos OTexts; 2 edition 2018 - 5

Forecasting is required in many situations. Deciding whether to build another power generation plant in the next five years requires forecasts of future demand. Scheduling staff in a call centre next week requires forecasts of call volumes. Stocking an inventory requires forecasts of stock requirements. Telecommunication routing requires traffic forecasts a few minutes ahead. Whatever the circumstances or time horizons involved, forecasting is an important aid in effective and efficient planning. This textbook provides a comprehensive introduction to forecasting methods and presents enough information about each method for readers to use them sensibly. Examples use R with many data sets taken from the authors' own consulting experience. In this second edition, all chapters have been updated to cover the latest research and forecasting methods. Three new chapters have been added on dynamic regression forecasting, hierarchical forecasting and practical forecasting issues. The latest version of the book is freely available online at http://OTexts.com/fpp2.

SprawlBall: A Visual Tour of the New Era of the NBA [图书] 豆瓣

作者: Kirk Goldsberry Houghton Mifflin Harcourt 2019 - 4

What is SprawlBall?
The recent change in the NBA isn’t from large ball to smallball. It’s from large ball to sprawlball. Many of the most powerful areas on the court are now sprawling out from the interior, decorating the edges. And the men who thrive there are now the game’s most powerful players… Sprawlball explores the past, present, and future of a league in transition and attempts to capture the nature of this surreal time in the NBA.

Hoop Atlas [图书] 谷歌图书

作者: Kirk Goldsberry HarperCollins 2024 - 05

"When discussing greatness, we have too often distilled it down to how many rings someone has won, neglecting to celebrate the incredible influence that players have had on the evolution of basketball. Kirk's book beautifully explains this other side of greatness—how one player's skill and style of play can change the game forever."—JJ RedickThe bestselling author of Sprawlball, Kirk Goldsberry returns with a visual feast of a book—equal parts Book of Basketball and Shea Serrano—that uses sharp writing, his signature graphics, and cutting-edge statistical analyses to unpack how a handful of NBA superstars —MJ to Lebron to Jokic—have reshaped pro basketball and charted the course to the future of the NBA.Every few years a talent comes along that disrupts everything we think we know about how the NBA should work. Whether it’s scoring, playmaking, or shooting, these are players and tactics that fundamentally challenge how the game is played and what greatness looks like on a basketball court. For a period of time, these players each become an “Atlas” for the league, carrying the weight of the NBA on their shoulders, but also providing the roadmap that points the way to the future of the sport. In tandem, they map out the modern NBA’s creation.Goldsberry returns with a highly visual, electrifying tour through the last three decades of NBA history, showing the “Atlas” players that have led us out of the brutishness of 90s hoops and into the wide open spaces of the most skilled era in NBA history. Charting the course from Jordan to Jokic—with plenty of stops along the way for Iverson, Kobe, Curry, and of course Lebron—Goldsberry, who was instrumental in helping spur the NBA’s statistical revolution, has designed a vibrant new way to compare and debate the contributions of the best NBA players of all-time. Masterfully connecting NBA past and present through incisive writing and stunning visual statistical analyses, he shows how we’ve come to this unprecedented moment, a time when offensive efficiency and shooting percentages are higher than ever.Using beautifully designed, four-color shot maps and illustrations, Goldsberry offers a graphic journey through the last thirty years of the NBA that covers up to the 2023 season and is as much fun to look at as it is to read. The end result offers stories and analyses of a select group of NBA superstars that open up lively debates, reveal just how singular their talents truly are, and characterize the dramatic 21st-century metamorphosis of the best basketball league in the world.

意外的旅程 [图书] 豆瓣

作者: 许知远云南人民出版社 2024 - 2

许知远漫游十五年，首次结成“旅行三书”
探寻世界，横穿中国，用陌生与偶遇重访被遗忘的历史
在疲倦时代，看一个游手好闲者的旁观、洞察与想象
第一册：从黑河到腾冲
第二册：加尔各答、开罗和最幸福的国度
第三册：马六甲、檀香山以及永井荷风的浅草
【编辑推荐】
🌟许知远漫游十五年的“旅行三书”
本套书汇集过去十五年许知远在世界各地与中国旅行的随笔。
第一册与第二册为其早年受欢迎的著作《祖国的陌生人》《一个游荡者的世界》的全新修订版，并特别配以同行者的摄影图片。第三册为许知远首次出版的旅行写作，因全球疫情的影响，他意外地不断在夏威夷、日本与马来西亚三地停留和辗转。正是这一“意外”开启的旅程，预言般地成为了这十五年漫游的主调——陌生的，偶遇的，被推翻的，是重新理解世界和历史的窗口。
🌟当一个游手好闲者，用陌生与偶遇重访被遗忘的历史
十五年的行走，许知远依旧在用犀利、敏锐的眼光试着剖开不同的文化版图，依旧不知疲倦地好奇、观察和理解陌生人的生活，并从未丢失在游历中对真实历史的反思（更没有丢失对迷茫不断显现的真诚），他甚至开始对“意外”产生渴望——或许是走了很久，他终于发现，正是这种曾令他不适的恒久的不安，才是所有力量的源泉。
🌟特别收录许知远早年寻访贾樟柯、余华、陈丹青等人珍贵的现场与对话
在第一册中特别收录许知远早年寻访导演贾樟柯、摄影师刘香成、作家余华及画家陈丹青时珍贵的文字现场与谈话。
不同于“十三邀”式的面对面访谈实录，除却能听见十几年前（才过去十几年，那个仿佛已经陌生的年代）思想的时代感与碰撞，这里每篇文章充满了作者在当下被对方行走的举动，交谈时的表情，甚至短暂的沉默包围时涌上来的感受。充满颗粒度的内心对话，与随着交谈进行作者不断切换的视角，带领读者回到第一现场，直抵作者内心。
🌟精巧文库本，前沿设计，含许知远亲签彩色环衬
采用文库本大小，阅读轻巧，便于携带，更适合在旅行或行走途中翻阅。
贴合主题“意外”的设计理念，函套采用简约的文字排版，理性而克制，象征作者的旅行随笔中思辨、严肃的一面。内封设计大胆，新锐：三本书名不断叠加与重合，并以多种的汉字形态呈现，正如这十五年漫游中丰富的目的地，无数的意外，以及不断汇集或碰撞的新的力量。三本封面拼接后又呈现为一幅完成的图像，正如这些意义非凡却毫无关联的偶遇，一齐塑造了漫游中的人。
【内容简介】
本套书为2010年至今作者在中国或世界各地游历的旅行随笔合集。第一册，上辑，作者沿爱辉—腾冲线横穿中国，在游历中重新审视当下中国社会的真实切片与历史的复杂面孔；下辑，作者在游历路程中寻访了贾樟柯、余华、刘香成、陈丹青等文化学者，展开“十三邀”式的漫谈与追问。第二册，作者的足迹跨越不丹、东欧、印度等不同文化属地，旨在呈现一个更多元文化的世界版图，从而打开看待自我与世界关系更多的可能性；第三册围绕2020年作者在马来西亚、夏威夷、日本三地的停留与辗转展开，不仅勾勒了各地在危机下的不同风貌与普通人的真实生活，更深入讨论了各地在历史上对危机的应对方式，揭开其背后被遗忘的历史。