NeoDB - 笑匠 - 标签

数据科学

Practical Statistics for Data Scientists (2/e) 豆瓣

作者: Peter Gedeck / Andrew Bruce … O'Reilly Media, Inc. 2020 - 5

Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. The second edition of this practical guide—now including examples in Python as well as R—explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what’s important and what’s not.
Many data scientists use statistical methods but lack a deeper statistical perspective. If you’re familiar with the R or Python programming languages, and have had some exposure to statistics but want to learn more, this quick reference bridges the gap in an accessible, readable format.
With this updated edition, you’ll dive into:
Exploratory data analysis
Data and sampling distributions
Statistical experiments and significance testing
Regression and prediction
Classification
Statistical machine learning
Unsupervised learning

An Introduction to Statistical Learning 豆瓣 Goodreads

9.8 (12 个评分) 作者: Gareth James / Daniela Witten … Springer 2013 - 8

An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform. Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.

Understanding Advanced Statistical Methods 豆瓣

作者: Peter Westfall / Kevin S. S. Henning Chapman and Hall/CRC 2013 - 5

Providing a much-needed bridge between elementary statistics courses and advanced research methods courses, Understanding Advanced Statistical Methods helps students grasp the fundamental assumptions and machinery behind sophisticated statistical topics, such as logistic regression, maximum likelihood, bootstrapping, nonparametrics, and Bayesian methods. The book teaches students how to properly model, think critically, and design their own studies to avoid common errors. It leads them to think differently not only about math and statistics but also about general research and the scientific method. With a focus on statistical models as producers of data, the book enables students to more easily understand the machinery of advanced statistics. It also downplays the "population" interpretation of statistical models and presents Bayesian methods before frequentist ones. Requiring no prior calculus experience, the text employs a "just-in-time" approach that introduces mathematical topics, including calculus, where needed. Formulas throughout the text are used to explain why calculus and probability are essential in statistical modeling. The authors also intuitively explain the theory and logic behind real data analysis, incorporating a range of application examples from the social, economic, biological, medical, physical, and engineering sciences. Enabling your students to answer the why behind statistical methods, this text teaches them how to successfully draw conclusions when the premises are flawed. It empowers them to use advanced statistical methods with confidence and develop their own statistical recipes. Ancillary materials are available on the book's website.

Statistical Rethinking 豆瓣

作者: Richard McElreath Chapman and Hall/CRC 2015

Statistical Rethinking: A Bayesian Course with Examples in R and Stan builds readers’ knowledge of and confidence in statistical modeling. Reflecting the need for even minor programming in today’s model-based statistics, the book pushes readers to perform step-by-step calculations that are usually automated. This unique computational approach ensures that readers understand enough of the details to make reasonable choices and interpretations in their own modeling work.
The text presents generalized linear multilevel models from a Bayesian perspective, relying on a simple logical interpretation of Bayesian probability and maximum entropy. It covers from the basics of regression to multilevel models. The author also discusses measurement error, missing data, and Gaussian process models for spatial and network autocorrelation.
By using complete R code examples throughout, this book provides a practical foundation for performing statistical inference. Designed for both PhD students and seasoned professionals in the natural and social sciences, it prepares them for more advanced or specialized statistical modeling.

Applied Predictive Modeling 豆瓣 Goodreads

作者: Max Kuhn / Kjell Johnson Springer 2013 - 9

This text is intended for a broad audience as both an introduction to predictive models as well as a guide to applying them. Non-mathematical readers will appreciate the intuitive explanations of the techniques while an emphasis on problem-solving with real data across a wide variety of applications will aid practitioners who wish to extend their expertise. Readers should have knowledge of basic statistical ideas, such as correlation and linear regression analysis. While the text is biased against complex equations, a mathematical background is needed for advanced topics. Dr. Kuhn is a Director of Non-Clinical Statistics at Pfizer Global R&D in Groton Connecticut. He has been applying predictive models in the pharmaceutical and diagnostic industries for over 15 years and is the author of a number of R packages. Dr. Johnson has more than a decade of statistical consulting and predictive modeling experience in pharmaceutical research and development. He is a co-founder of Arbor Analytics, a firm specializing in predictive modeling and is a former Director of Statistics at Pfizer Global R&D. His scholarly work centers on the application and development of statistical methodology and learning algorithms.

Programming Skills for Data Science 豆瓣

作者: Michael Freeman / Joel Ross Addison Wesley 2018 - 11

The Elements of Statistical Learning 豆瓣 Goodreads

9.8 (9 个评分) 作者: Trevor Hastie / Robert Tibshirani … Springer 2009 - 10

During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for "wide" data (p bigger than n), including multiple testing and false discovery rates.

R for Data Science 豆瓣

9.8 (10 个评分) 作者: Hadley Wickham / Garrett Grolemund O'Reilly Media 2016

http://r4ds.had.co.nz/

笑匠

@QinX_4762@neodb.social

宁在一思进，莫在一思停。
——《一代宗师》

------

井底之蛙也拥有一片天空，十三岁的孩子也可以有一片精神家园。
——王小波

------

谦逊，满足于简单，不急于给出自己根本没有的东西。执着于简单，最基本的，如果她后来自己壮大了那我就表现她，如她所是。
—— 马雁

------

所谓无底深渊，下去也是前程万里。
——木心

------

如梦欢畅
幸福我不愿只能幻想
醒来时惊恐的心悬停在腹中寻氧
彼此身旁
人生绝不该永远如此彷徨
它一定不仅是梦幻觉与暗月光
—— 《生之响往》

------

若批评不自由，则赞美无意义。
Sans la liberté de blâmer, il n'est point d'éloge flatteur.
——Pierre- Augustin Caron de Beaumarchais