Data_Science
An Introduction to Statistical Learning 豆瓣 Goodreads
9.8 (12 个评分) 作者: Gareth James / Daniela Witten Springer 2013 - 8
An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform. Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.
Introduction to Applied Linear Algebra 豆瓣
作者: Stephen Boyd / Lieven Vandenberghe Cambridge University Press 2018 - 6
This book is meant to provide an introduction to vectors, matrices, and least squares methods, basic topics in applied linear algebra. Our goal is to give the beginning student, with little or no prior exposure to linear algebra, a good grounding in the basic ideas, as well as an appreciation for how they are used in many applications, including data fitting, machine learning and artificial intelligence, to-mography, navigation, image processing, finance, and automatic control systems.
The background required of the reader is familiarity with basic mathematical notation. We use calculus in just a few places, but it does not play a critical role and is not a strict prerequisite. Even though the book covers many topics that are traditionally taught as part of probability and statistics, such as fitting mathematical models to data, no knowledge of or background in probability and statistics is needed.
The book covers less mathematics than a typical text on applied linear algebra. We use only one theoretical concept from linear algebra, linear independence, and only one computational tool, the QR factorization; our approach to most applications relies on only one method, least squares (or some extension). In this sense we aim for intellectual economy: With just a few basic mathematical ideas, con-cepts, and methods, we cover many applications. The mathematics we do present, however, is complete, in that we carefully justify every mathematical statement. In contrast to most introductory linear algebra texts, however, we describe many applications, including some that are typically considered advanced topics, like document classification, control, state estimation, and portfolio optimization.
The book does not require any knowledge of computer programming, and can be used as a conventional textbook, by reading the chapters and working the exercises that do not involve numerical computation. This approach however misses out on one of the most compelling reasons to learn the material: You can use the ideas and methods described in this book to do practical things like build a prediction model from data, enhance images, or optimize an investment portfolio. The growing power of computers, together with the development of high level computer languages and packages that support vector and matrix computation, have made it easy to use the methods described in this book for real applications. For this reason we hope that every student of this book will complement their study with computer programming exercises and projects, including some that involve real data. This book includes some generic exercises that require computation; additional ones, and the associated data files and language-specific resources, are available online.
If you read the whole book, work some of the exercises, and carry out computer exercises to implement or use the ideas and methods, you will learn a lot. While there will still be much for you to learn, you will have seen many of the basic ideas behind modern data science and other application areas. We hope you will be empowered to use the methods for your own applications.
The book is divided into three parts. Part I introduces the reader to vectors, and various vector operations and functions like addition, inner product, distance, and angle. We also describe how vectors are used in applications to represent word counts in a document, time series, attributes of a patient, sales of a product, an audio track, an image, or a portfolio of investments. Part II does the same for matrices, culminating with matrix inverses and methods for solving linear equa-tions. Part III, on least squares, is the payoff, at least in terms of the applications. We show how the simple and natural idea of approximately solving a set of over-determined equations, and a few extensions of this basic idea, can be used to solve many practical problems.
The whole book can be covered in a 15 week (semester) course; a 10 week (quarter) course can cover most of the material, by skipping a few applications and perhaps the last two chapters on nonlinear least squares. The book can also be used for self-study, complemented with material available online. By design, the pace of the book accelerates a bit, with many details and simple examples in parts I and II, and more advanced examples and applications in part III. A course for students with little or no background in linear algebra can focus on parts I and II, and cover just a few of the more advanced applications in part III. A more advanced course on applied linear algebra can quickly cover parts I and II as review, and then focus on the applications in part III, as well as additional topics.
We are grateful to many of our colleagues, teaching assistants, and students for helpful suggestions and discussions during the development of this book and the associated courses. We especially thank our colleagues Trevor Hastie, Rob Tibshirani, and Sanjay Lall, as well as Nick Boyd, for discussions about data fitting and classification, and Jenny Hong, Ahmed Bou-Rabee, Keegan Go, David Zeng, and Jaehyun Park, Stanford undergraduates who helped create and teach the course EE103. We thank David Tse, Alex Lemon, Neal Parikh, and Julie Lancashire for carefully reading drafts of this book and making many good suggestions.
Data Science for Business 豆瓣
作者: Foster Provost / Tom Fawcett O'Reilly Media 2013 - 8
Review
"A must-read resource for anyone who is serious about embracing the opportunity of big data."
-- Craig Vaughan
Global Vice President at SAP
"This book goes beyond data analytics 101. It's the essential guide for those of us (all of us?) whose businesses are built on the ubiquity of data opportunities and the new mandate for data-driven decision-making."
--Tom Phillips
CEO of Media6Degrees and Former Head of Google Search and Analytics
"Data is the foundation of new waves of productivity growth, innovation, and richer customer insight. Only recently viewed broadly as a source of competitive advantage, dealing well with data is rapidly becoming table stakes to stay in the game. The authors' deep applied experience makes this a must read--a window into your competitor's strategy."
-- Alan Murray
Serial Entrepreneur; Partner at Coriolis Ventures
"This timely book says out loud what has finally become apparent: in the modern world, Data is Business, and you can no longer think business without thinking data. Read this book and you will understand the Science behind thinking data."
-- Ron Bekkerman
Chief Data Officer at Carmel Ventures
"A great book for business managers who lead or interact with data scientists, who wish to better understand the principles and algorithms available without the technical details of single-disciplinary books."
-- Ronny Kohavi
Partner Architect at Microsoft Online Services Division
About the Author
Foster Provost is Professor and NEC Faculty Fellow at the NYU Stern School of Business where he teaches in the MBA, Business Analytics, and Data Science programs. His award-winning research is read and cited broadly. Prof. Provost has co-founded several successful companies focusing on data science for marketing.
Tom Fawcett holds a Ph.D. in machine learning and has worked in industry R&D for more than two decades for companies such as GTE Laboratories, NYNEX/Verizon Labs, and HP Labs. His published work has become standard reading in data science.
数据之巅 豆瓣
7.2 (8 个评分) 作者: 涂子沛 中信出版社 2014 - 5
《数据之巅:大数据革命,历史、现实与未来》从美国建国之基讲起,通过阐述初数时代、内战时代、镀金时代、进步时代、抽样时代、大数据时代的特征,系统梳理了美国数据文化的形成,阐述了其数据治国之道,论述了中国数据文化的薄弱之处,展望了未来数据世界的远景。
“尊重事实,用数据说话”,“推崇知识和理性,用数据创新”,作者不仅意在传承黄仁宇“数目字”管理的薪火,还试图把数据这个科技符号在中国转变为文化符号,形成一种文化话语体系。大数据正在撬动中国的制度创新、科技创新。
Trust in Numbers 豆瓣
作者: Theodore M. Porter Princeton University Press 2000 - 8
This investigation of the overwhelming appeal of quantification in the modern world discusses the development of cultural meanings of objectivity over two centuries. How are we to account for the current prestige and power of quantitative methods? The usual answer is that quantification is seen as desirable in social and economic investigation as a result of its successes in the study of nature. Theodore Porter is not content with this. Why should the kind of success achieved in the study of stars, molecules, or cells be an attractive model for research on human societies? he asks. And, indeed, how should we understand the pervasiveness of quantification in the sciences of nature? In his view, we should look in the reverse direction: comprehending the attractions of quantification in business, government, and social research will teach us something new about its role in psychology, physics, and medicine.
Drawing on a wide range of examples from the laboratory and from the worlds of accounting, insurance, cost-benefit analysis, and civil engineering, Porter shows that it is "exactly wrong" to interpret the drive for quantitative rigor as inherent somehow in the activity of science except where political and social pressures force compromise. Instead, quantification grows from attempts to develop a strategy of impersonality in response to pressures from outside. Objectivity derives its impetus from cultural contexts, quantification becoming most important where elites are weak, where private negotiation is suspect, and where trust is in short supply.
Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques (Morgan Kaufmann Series in Data Management Systems) 豆瓣
作者: Ian H. Witten / Eibe Frank Morgan Kaufmann 2016
Data Mining: Practical Machine Learning Tools and Techniques, Fourth Edition, offers a thorough grounding in machine learning concepts, along with practical advice on applying these tools and techniques in real-world data mining situations. This highly anticipated fourth edition of the most acclaimed work on data mining and machine learning teaches readers everything they need to know to get going, from preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches.
Extensive updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including substantial new chapters on probabilistic methods and on deep learning. Accompanying the book is a new version of the popular WEKA machine learning software from the University of Waikato. Authors Witten, Frank, Hall, and Pal include today's techniques coupled with the methods at the leading edge of contemporary research.
Provides a thorough grounding in machine learning concepts, as well as practical advice on applying the tools and techniques to data mining projectsPresents concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methodsIncludes a downloadable WEKA software toolkit, a comprehensive collection of machine learning algorithms for data mining tasks-in an easy-to-use interactive interfaceIncludes open-access online courses that introduce practical applications of the material in the book
Principles of Data Integration 豆瓣
作者: Doan, AnHai; Halevy, Alon; Ives, Zachary 2012 - 7
How do you approach answering queries when your data is stored in multiple databases that were designed independently by different people? This is first comprehensive book on data integration and is written by three of the most respected experts in the field. This book provides an extensive introduction to the theory and concepts underlying today's data integration techniques, with detailed, instruction for their application using concrete examples throughout to explain the concepts. Data integration is the problem of answering queries that span multiple data sources (e.g., databases, web pages). Data integration problems surface in multiple contexts, including enterprise information integration, query processing on the Web, coordination between government agencies and collaboration between scientists. In some cases, data integration is the key bottleneck to making progress in a field. The authors provide a working knowledge of data integration concepts and techniques, giving you the tools you need to develop a complete and concise package of algorithms and applications. It offers a range of data integration solutions enabling you to focus on what is most relevant to the problem at hand. It enables you to build your own algorithms and implement your own data integration applications. It provides a companion website with numerous project-based exercises and solutions and slides. Links to commercially available software allowing readers to build their own algorithms and implement their own data integration applications. Facebook page for reader input during and after publication.