數據挖掘
Data Mining 豆瓣
作者: Jiawei Han / Micheline Kamber Morgan Kaufmann 2011 - 7
The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges.
* Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of your data
An Introduction to Statistical Learning 豆瓣 Goodreads
9.8 (12 个评分) 作者: Gareth James / Daniela Witten Springer 2013 - 8
An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform. Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.
The Elements of Statistical Learning 豆瓣 Goodreads
9.8 (9 个评分) 作者: Trevor Hastie / Robert Tibshirani Springer 2009 - 10
During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression & path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for "wide" data (p bigger than n), including multiple testing and false discovery rates.
集体智慧编程 豆瓣
Programming Collective Intelligence
8.3 (15 个评分) 作者: Toby Segaran 译者: 莫映 / 王开福 电子工业出版社 2009 - 1
本书以机器学习与计算统计为主题背景,专门讲述如何挖掘和分析Web上的数据和资源,如何分析用户体验、市场营销、个人品味等诸多信息,并得出有用的结论,通过复杂的算法来从Web网站获取、收集并分析用户的数据和反馈信息,以便创造新的用户价值和商业价值。全书内容翔实,包括协作过滤技术(实现关联产品推荐功能)、集群数据分析(在大规模数据集中发掘相似的数据子集)、搜索引擎核心技术(爬虫、索引、查询引擎、PageRank算法等)、搜索海量信息并进行分析统计得出结论的优化算法、贝叶斯过滤技术(垃圾邮件过滤、文本过滤)、用决策树技术实现预测和决策建模功能、社交网络的信息匹配技术、机器学习和人工智能应用等。
本书是Web开发者、架构师、应用工程师等的绝佳选择。
Mining of Massive Datasets 豆瓣
作者: Anand Rajaraman / Jeffrey David Ullman Cambridge University Press 2011
The popularity of the Web and Internet commerce provides many extremely large datasets from which information can be gleaned by data mining. This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets. It begins with a discussion of the map-reduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing. The PageRank idea and related tricks for organizing the Web are covered next. Other chapters cover the problems of finding frequent itemsets and clustering. The final chapters cover two applications: recommendation systems and Web advertising, each vital in e-commerce. Written by two authorities in database and Web technologies, this book is essential reading for students and practitioners alike.
Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications 豆瓣
作者: Gary Miner / John Elder Academic Press 2012 - 1
Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications brings together all the information, tools and methods a professional will need to efficiently use text mining applications and statistical analysis. Winner of a 2012 PROSE Award in Computing and Information Sciences from the Association of American Publishers, this book presents a comprehensive how-to reference that shows the user how to conduct text mining and statistically analyze results. In addition to providing an in-depth examination of core text mining and link detection tools, methods and operations, the book examines advanced preprocessing techniques, knowledge representation considerations, and visualization approaches. Finally, the book explores current real-world, mission-critical applications of text mining and link detection using real world example tutorials in such varied fields as corporate, finance, business intelligence, genomics research, and counter terrorism activities. The world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the textual data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. As the Internet expands and our natural capacity to process the unstructured text that it contains diminishes, the value of text mining for information retrieval and search will increase dramatically. Features: extensive case studies, most in a tutorial format, allow the reader to 'click through' the example using a software program, thus learning to conduct text mining analyses in the most rapid manner of learning possible; numerous examples, tutorials, power points and datasets available via companion website on Elsevierdirect.com; and glossary of text mining terms provided in the appendix -CD included.
Automated Data Collection with R 豆瓣
作者: Simon Munzert / Christian Rubba Wiley 2015 - 1
A hands on guide to web scraping and text mining for bothbeginners and experienced users of R
(1)Introduces fundamental concepts of the main architecture of theweb and databases and covers HTTP, HTML, XML, JSON, SQL.
(2)Provides basic techniques to query web documents and data sets(XPath and regular expressions).
(3)An extensive set of exercises are presented to guide thereader through each technique.
(4)Explores both supervised and unsupervised techniques as well asadvanced techniques such as data scraping and text management.
(5)Case studies are featured throughout along with examples foreach technique presented.
(6)R code and solutions to exercises featured in thebook are provided on a supporting website.
Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy 豆瓣 Goodreads Sukkertoppen
Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
8.3 (6 个评分) 作者: Cathy O'Neil Crown 2016 - 9 其它标题: Weapons of Math Destruction
We live in the age of the algorithm. Increasingly, the decisions that affect our lives--where we go to school, whether we can get a job or a loan, how much we pay for health insurance--are being made not by humans, but by machines. In theory, this should lead to greater fairness: Everyone is judged according to the same rules.
But as mathematician and data scientist Cathy O'Neil reveals, the mathematical models being used today are unregulated and uncontestable, even when they're wrong. Most troubling, they reinforce discrimination--propping up the lucky, punishing the downtrodden, and undermining our democracy in the process.
Learning From Data 豆瓣
10.0 (7 个评分) 作者: Yaser S. Abu-Mostafa / Malik Magdon-Ismail AMLBook 2012 - 3
Machine learning allows computational systems to adaptively improve their performance with experience accumulated from the observed data. Its techniques are widely applied in engineering, science, finance, and commerce. This book is designed for a short course on machine learning. It is a short course, not a hurried course. From over a decade of teaching this material, we have distilled what we believe to be the core topics that every student of the subject should know. We chose the title `learning from data' that faithfully describes what the subject is about, and made it a point to cover the topics in a story-like fashion. Our hope is that the reader can learn all the fundamentals of the subject by reading the book cover to cover. ---- Learning from data has distinct theoretical and practical tracks. In this book, we balance the theoretical and the practical, the mathematical and the heuristic. Our criterion for inclusion is relevance. Theory that establishes the conceptual framework for learning is included, and so are heuristics that impact the performance of real learning systems. ---- Learning from data is a very dynamic field. Some of the hot techniques and theories at times become just fads, and others gain traction and become part of the field. What we have emphasized in this book are the necessary fundamentals that give any student of learning from data a solid foundation, and enable him or her to venture out and explore further techniques and theories, or perhaps to contribute their own. ---- The authors are professors at California Institute of Technology (Caltech), Rensselaer Polytechnic Institute (RPI), and National Taiwan University (NTU), where this book is the main text for their popular courses on machine learning. The authors also consult extensively with financial and commercial companies on machine learning applications, and have led winning teams in machine learning competitions.
Big Data Baseball 豆瓣
作者: Travis Sawchik Flatiron Books 2015 - 5
After twenty consecutive losing seasons for the Pittsburgh Pirates, team morale was low, the club's payroll ranked near the bottom of the sport, game attendance was down, and the city was becoming increasingly disenchanted with its team. Pittsburghers joked their town was the city of champions…and the Pirates. Big Data Baseball is the story of how the 2013 Pirates, mired in the longest losing streak in North American pro sports history, adopted drastic big-data strategies to end the drought, make the playoffs, and turn around the franchise's fortunes.
Award-winning journalist Travis Sawchik takes you behind the scenes to expertly weave together the stories of the key figures who changed the way the small-market Pirates played the game. For manager Clint Hurdle and the front office staff to save their jobs, they could not rely on a free agent spending spree, instead they had to improve the sum of their parts and find hidden value. They had to change. From Hurdle shedding his old-school ways to work closely with Neal Huntington, the forward-thinking data-driven GM and his team of talented analysts; to pitchers like A. J. Burnett and Gerrit Cole changing what and where they threw; to Russell Martin, the undervalued catcher whose expert use of the nearly-invisible skill of pitch framing helped the team's pitchers turn more balls into strikes; to Clint Barmes, a solid shortstop and one of the early adopters of the unconventional on-field shift which forced the entire infield to realign into positions they never stood in before. Under Hurdle's leadership, a culture of collaboration and creativity flourished as he successfully blended whiz kid analysts with graybeard coaches―a kind of symbiotic teamwork which was unique to the sport.
Big Data Baseball is Moneyball on steroids. It is an entertaining and enlightening underdog story that uses the 2013 Pirates season as the perfect lens to examine the sport's burgeoning big-data movement. With the help of data-tracking systems like PitchF/X and TrackMan, the Pirates collected millions of data points on every pitch and ball in play to create a tome of color-coded reports that revealed groundbreaking insights for how to win more games without spending a dime. In the process, they discovered that most batters struggled to hit two-seam fastballs, that an aggressive defensive shift on the field could turn more batted balls into outs, and that a catcher's most valuable skill was hidden. All these data points which aren't immediately visible to players and spectators, are the bit of magic that led the Pirates to spin straw in to gold, finish the 2013 season in second place, end a twenty-year losing streak.
Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques (Morgan Kaufmann Series in Data Management Systems) 豆瓣
作者: Ian H. Witten / Eibe Frank Morgan Kaufmann 2016
Data Mining: Practical Machine Learning Tools and Techniques, Fourth Edition, offers a thorough grounding in machine learning concepts, along with practical advice on applying these tools and techniques in real-world data mining situations. This highly anticipated fourth edition of the most acclaimed work on data mining and machine learning teaches readers everything they need to know to get going, from preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches.
Extensive updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including substantial new chapters on probabilistic methods and on deep learning. Accompanying the book is a new version of the popular WEKA machine learning software from the University of Waikato. Authors Witten, Frank, Hall, and Pal include today's techniques coupled with the methods at the leading edge of contemporary research.
Provides a thorough grounding in machine learning concepts, as well as practical advice on applying the tools and techniques to data mining projectsPresents concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methodsIncludes a downloadable WEKA software toolkit, a comprehensive collection of machine learning algorithms for data mining tasks-in an easy-to-use interactive interfaceIncludes open-access online courses that introduce practical applications of the material in the book
Principles and Practice of Structural Equation Modeling, Second Edition (Methodology In The Social Sciences) 豆瓣
作者: Rex B. Kline The Guilford Press 2004 - 9
This popular text provides an accessible guide to the application, interpretation, and pitfalls of structural equation modeling (SEM). Reviewed are fundamental statistical concepts--such as correlation, regressions, data preparation and screening, path analysis, and confirmatory factor analysis--as well as more advanced methods, including the evaluation of nonlinear effects, measurement models and structural regression models, latent growth models, and multilevel SEM. Special features include a Web page offering data and program syntax files for many of the research examples, electronic overheads that can be downloaded and printed by instructors or students, and links to SEM-related resources.
Hiding from the Internet 豆瓣
作者: Michael Bazzell CreateSpace Independent Publishing Platform 2016 - 1
Take control of your privacy by removing your personal information from the internet with this second edition.
Author Michael Bazzell has been well known in government circles for his ability to locate personal information about anyone through the internet. In Hiding from the Internet: Eliminating Personal Online Information, he exposes the resources that broadcast your personal details to public view. He has researched each source and identified the best method to have your private details removed from the databases that store profiles on all of us.
This book will serve as a reference guide for anyone that values privacy. Each technique is explained in simple steps. It is written in a hands-on style that encourages the reader to execute the tutorials as they go. The author provides personal experiences from his journey to disappear from public view.
Much of the content of this book has never been discussed in any publication. Always thinking like a hacker, the author has identified new ways to force companies to remove you from their data collection systems. This book exposes loopholes that create unique opportunities for privacy seekers. Among other techniques, you will learn to:
Remove your personal information from dozens of public databases and people search websites
Create free anonymous mail addresses, email addresses, and telephone numbers
Control your privacy settings on social networks and remove sensitive data
Provide disinformation to conceal true private details
Force data brokers to stop sharing your information with both private and public organizations
Prevent marketing companies from monitoring your browsing, searching, and shopping habits
Remove your landline and cellular telephone numbers from online websites
Use a credit freeze to eliminate the worry of financial identity theft and fraud
Change your future habits to promote complete privacy and anonymity
Conduct a complete background check to verify proper information removal
The Model Thinker Goodreads 豆瓣
作者: Scott E. Page Basic Books 2019 - 3
From the stock market to genomics laboratories, census figures to marketing email blasts, we are awash with data. But as anyone who has ever opened up a spreadsheet packed with seemingly infinite lines of data knows, numbers aren’t enough: we need to know how to make those numbers talk. In The Model Thinker, social scientist Scott E. Page shows us the mathematical, statistical, and computational models–from linear regression to random walks and far beyond–that can turn anyone into a genius. At the core of the book is Page’s “many-model paradigm,” which shows the reader how to apply multiple models to organize the data, leading to wiser choices, more accurate predictions, and more robust designs. The Model Thinker provides a toolkit for business people, students, scientists, pollsters, and bloggers to make them better, clearer thinkers, able to leverage data and information to their advantage.