数据挖掘
复杂数据统计方法 豆瓣
作者: 吴喜之 出版社: 中国人民大学出版社 2012 - 10
《复杂数据统计方法——基于r的应用》用自由的日软件分析30多个可以从国外网站下载的真实数据,包括横截面数据、纵向数据和时间序列数据,通过这些数据介绍了几乎所有经典方法及最新的机器学习方法。
《复杂数据统计方法——基于r的应用》特点:(1)以数据为导向;(2)介绍最新的方法(附有传统方法回顾);(3)提供r软件入门及全部例子计算的日代码及数据的网址;(4)各章独立。
《复杂数据统计方法——基于r的应用》的读者对象包括统计学、应用统计学、经济学、数学、应用数学、精算、环境、计量经济学、生物医学等专业的本科、硕士及博士生,各领域的教师和实际工作者。
Pattern Recognition and Machine Learning 豆瓣 Goodreads
Pattern Recognition and Machine Learning (Information Science and Statistics)
9.8 (19 个评分) 作者: Christopher Bishop 出版社: Springer 2007 - 10
The dramatic growth in practical applications for machine learning over the last ten years has been accompanied by many important developments in the underlying algorithms and techniques. For example, Bayesian methods have grown from a specialist niche to become mainstream, while graphical models have emerged as a general framework for describing and applying probabilistic techniques. The practical applicability of Bayesian methods has been greatly enhanced by the development of a range of approximate inference algorithms such as variational Bayes and expectation propagation, while new models based on kernels have had a significant impact on both algorithms and applications.
This completely new textbook reflects these recent developments while providing a comprehensive introduction to the fields of pattern recognition and machine learning. It is aimed at advanced undergraduates or first-year PhD students, as well as researchers and practitioners. No previous knowledge of pattern recognition or machine learning concepts is assumed. Familiarity with multivariate calculus and basic linear algebra is required, and some experience in the use of probabilities would be helpful though not essential as the book includes a self-contained introduction to basic probability theory.
The book is suitable for courses on machine learning, statistics, computer science, signal processing, computer vision, data mining, and bioinformatics. Extensive support is provided for course instructors, including more than 400 exercises, graded according to difficulty. Example solutions for a subset of the exercises are available from the book web site, while solutions for the remainder can be obtained by instructors from the publisher. The book is supported by a great deal of additional material, and the reader is encouraged to visit the book web site for the latest information.
Probabilistic Graphical Models 豆瓣
作者: Daphne Koller / Nir Friedman 出版社: The MIT Press 2009 - 7
Most tasks require a person or an automated system to reason--to reach conclusions based on available information. The framework of probabilistic graphical models, presented in this book, provides a general approach for this task. The approach is model-based, allowing interpretable models to be constructed and then manipulated by reasoning algorithms. These models can also be learned automatically from data, allowing the approach to be used in cases where manually constructing a model is difficult or even impossible. Because uncertainty is an inescapable aspect of most real-world applications, the book focuses on probabilistic models, which make the uncertainty explicit and provide models that are more faithful to reality. Probabilistic Graphical Models discusses a variety of models, spanning Bayesian networks, undirected Markov networks, discrete and continuous models, and extensions to deal with dynamical systems and relational data. For each class of models, the text describes the three fundamental cornerstones: representation, inference, and learning, presenting both basic concepts and advanced techniques. Finally, the book considers the use of the proposed framework for causal reasoning and decision making under uncertainty. The main text in each chapter provides the detailed technical development of the key ideas. Most chapters also include boxes with additional material: skill boxes, which describe techniques; case study boxes, which discuss empirical cases related to the approach described in the text, including applications in computer vision, robotics, natural language understanding, and computational biology; and concept boxes, which present significant concepts drawn from the material in the chapter. Instructors (and readers) can group chapters in various combinations, from core topics to more technically advanced material, to suit their particular needs.
Introduction to Information Retrieval 豆瓣
作者: Christopher D. Manning / Prabhakar Raghavan 出版社: Cambridge University Press 2008 - 7
Class-tested and coherent, this groundbreaking new textbook teaches classic web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. Written from a computer science perspective by three leading experts in the field, it gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Although originally designed as the primary text for a graduate or advanced undergraduate course in information retrieval, the book will also create a buzz for researchers and professionals alike.
Contents
1. Information retrieval using the Boolean model; 2. The dictionary and postings lists; 3. Tolerant retrieval; 4. Index construction; 5. Index compression; 6. Scoring and term weighting; 7. Vector space retrieval; 8. Evaluation in information retrieval; 9. Relevance feedback and query expansion; 10. XML retrieval; 11. Probabilistic information retrieval; 12. Language models for information retrieval; 13. Text classification and Naive Bayes; 14. Vector space classification; 15. Support vector machines and kernel functions; 16. Flat clustering; 17. Hierarchical clustering; 18. Dimensionality reduction and latent semantic indexing; 19. Web search basics; 20. Web crawling and indexes; 21. Link analysis.
Reviews
“This is the first book that gives you a complete picture of the complications that arise in building a modern web-scale search engine. You'll learn about ranking SVMs, XML, DNS, and LSI. You'll discover the seedy underworld of spam, cloaking, and doorway pages. You'll see how MapReduce and other approaches to parallelism allow us to go beyond megabytes and to efficiently manage petabytes." -Peter Norvig, Director of Research, Google Inc.
"Introduction to Information Retrieval is a comprehensive, up-to-date, and well-written introduction to an increasingly important and rapidly growing area of computer science. Finally, there is a high-quality textbook for an area that was desperately in need of one." -Raymond J. Mooney, Professor of Computer Sciences, University of Texas at Austin
“Through compelling exposition and choice of topics, the authors vividly convey both the fundamental ideas and the rapidly expanding reach of information retrieval as a field.” -Jon Kleinberg, Professor of Computer Science, Cornell University
Foundations of Statistical Natural Language Processing 豆瓣
作者: Christopher D. Manning / Hinrich Schütze 出版社: The MIT Press 1999 - 6
Statistical approaches to processing natural language text have become dominant in recent years. This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear. The book contains all the theory and algorithms needed for building NLP tools. It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations. The book covers collocation finding, word sense disambiguation, probabilistic parsing, information retrieval, and other applications.
Introduction to Data Mining 豆瓣
作者: Pang-Ning Tan / Michael Steinbach 出版社: Addison Wesley 2005 - 5
Introduction to Data Mining presents fundamental concepts and algorithms for those learning data mining for the first time. Each concept is explored thoroughly and supported with numerous examples. The text requires only a modest background in mathematics. Each major topic is organized into two chapters, beginning with basic concepts that provide necessary background for understanding each data mining technique, followed by more advanced concepts and algorithms. Quotes This book provides a comprehensive coverage of important data mining techniques. Numerous examples are provided to lucidly illustrate the key concepts. -Sanjay Ranka, University of Florida In my opinion this is currently the best data mining text book on the market. I like the comprehensive coverage which spans all major data mining techniques including classification, clustering, and pattern mining (association rules). -Mohammed Zaki, Rensselaer Polytechnic Institute
数据挖掘 豆瓣
Data Mining: Concepts and Techniques, Third Edition
作者: (美)Jiawei Han / (加)Micheline Kamber 出版社: 机械工业出版社 2012 - 3
数据挖掘领域最具里程碑意义的经典著作
完整全面阐述该领域的重要知识和技术创新
【编辑推荐】
我们生活在数据洪流的时代。本书向我们展示了如何从这样海量的数据中找到有用知识的方法和技术。最新的第3版显著扩充了数据预处理、挖掘频繁模式、分类和聚类这几个核心章节的内容;还全面讲 述了OLAP和离群点检测,并研讨了挖掘网络、复杂数据类型以及重要应用领域。本书将是一本适用于数据分析、数据挖掘和知识发现课程的优秀教材。
—— Gregory Piatetsky-Shapiro, KDnuggets的总裁
Jiawei、Micheline和Jian的教材全景式地讨论了数据挖掘的所有相关方法,从聚类和分类的经典主题,到数据库方法(关联规则、数据立方体),到更新和更高级的主题(SVD/PCA、小波、支持向量机),等等。总的说来,这是一本既讲述经典数据挖掘方法又涵盖大量当代数据挖掘技术的优秀著作,既是教学相长的优秀教材,又对专业人员具有很高的参考价值。
—— 摘自卡内基-梅隆大学Christos Faloutsos教授为本书所作序言
【内容简介】
当代商业和科学领域大量激增的数据量要求我们采用更加复杂和精细的工具来进行数据分析、处理和挖掘。尽管近年来数据挖掘技术取得的长足进展使得我们广泛收集数据越来越容易,但技术的发展依然难以匹配爆炸性的数据增长以及随之而来的大量数据处理需求,因此我们比以往更加迫切地需要新技术和自动化工具来帮助我们将这些数据转换为有用的信息和知识。
本书前版曾被KDnuggets的读者评选为最受欢迎的数据挖掘专著,是一本可读性极佳的教材。它从数据库角度全面系统地介绍数据挖掘的概念、方法和技术以及技术研究进展,并重点关注近年来该领域重要和最新的课题——数据仓库和数据立方体技术,流数据挖掘,社会化网络挖掘,空间、多媒体和其他复杂数据挖掘。每章都针对关键专题有单独的指导,提供最佳算法,并对怎样将技术运用到实际工作中给出了经过实践检验的实用型规则。如果你希望自己能熟练掌握和运用当今最有力的数据挖掘技术,那这本书正是你需要阅读和学习的宝贵资源。本书是数据挖掘和知识发现领域内的所有教师、研究人员、开发人员和用户都必读的一本书。
本书特点
引入了许多算法和实现示例,全部以易于理解的伪代码编写,适用于实际的大规模数据挖掘项目。
讨论了一些高级主题,例如挖掘面向对象的关系型数据库、空间数据库、多媒体数据库、时间序列数据库、文本数据库、万维网以及其他领域的应用等。
全面而实用地给出用于从海量数据中获取尽可能多信息的概念和技术。
分类数据分析的统计方法 豆瓣
Statistical Methods for Categorical Data Analysis, 2nd Ed
作者: [美]丹尼尔 •A.鲍威斯 / 谢宇 译者: 任强峥 / 巫锡炜 出版社: 社会科学文献出版社 2009 - 7
丹尼尔 A.鲍威斯和谢宇教授合著的《分类数据分析的统计方法》一书对分类数据分析的方法和模型,以及在社会科学研究中的应用作了全面的介绍。本书的一个明确目标是整合变换方法和潜在变量方法,它们是两类不同但又相互补充的处理分类数据分析的传统方法。这也是第一次在一单册书中严密地介绍针对离散因变量、交叉分类和跟踪数据的模型和方法。目前还没有看到有类似的著作。
本书的第二版增加了应用于分类数据的多水平模型。许多章节的内容经过了进一步的修订,并扩充了新的应用和实例。第二版中显著的特点是详细讨论了针对分层或多水平模型的经典贝叶斯估计技术,拓展了离散时间生存分析模型和Cox回归模型的内容,以及针对背离模型假设的评估和调适方法。辅助网址列举了使用各种统计软件包重复书中每一个例子的程序,实践证明是教师、学生和研究者学习的重要资源。
本书介绍了基本的方法和模型,它们构成了当代社会统计学的核心。本书介绍的模型跨度非同寻常,它们被广泛应用在社会学、人口学、心理测验学、计量经济学、政治学、生物统计学及其他领域。作为学生学习高级社会统计课程的研究生教材和应用研究者的参考书,是非常有用的。
模式分类 豆瓣
作者: Richard O. Duda / Peter E. Hart 译者: 李宏东 出版社: 机械工业出版社 2003 - 9
《模式分类》(原书第2版)的第1版《模式分类与场景分析》出版于1973年,是模式识别和场景分析领域奠基性的经曲名著。在第2版中,除了保留了第1版的关于统计模式识别和结构模式识别的主要内容以外,读者将会发现新增了许多近25年来的新理论和新方法,其中包括神经网络、机器学习、数据挖掘、进化计算、不变量理论、隐马尔可夫模型、统计学习理论和支持向量机等。作者还为未来25年的模式识别的发展指明了方向。书中包含许多实例,各种不同方法的对比,丰富的图表,以及大量的课后习题和计算机练习。
IBM SPSS数据分析与挖掘实战案例精粹 豆瓣
作者: 张文彤 / 钟云飞 出版社: 清华大学出版社 2013 - 2
《IBM SPSS数据分析与挖掘实战案例精粹》以IBM SPSS Statistics 20.0和IBM SPSS Modeler 14.1为工具,提供了医疗、金融、保险、汽车、快速消费品、市场研究、互联网等多个行业的数据分析/挖掘案例,基于实战需求,详细讲解整个案例的完整分析过程,并将模型和软件的介绍融于案例讲解之中,使读者在阅读时能突破方法和工具的局限,真正聚集于对数据分析精髓的领悟。《IBM SPSS数据分析与挖掘实战案例精粹》所附光盘包括案例数据和分析程序/流文件,读者可完整重现全部的分析内容。
统计推断 豆瓣
Statistical Inference
作者: [美] George Casella / [美] Roger L. Berger 出版社: 机械工业出版社 2012 - 1
本书从概率论的基础开始,通过例子与习题的旁征博引,引进了大量近代统计处理的新技术和一些国内同类教材中不能见而广为使用的分布。其内容包括工科概率论入门、经典统计和现代统计的基础,又加进了不少近代统计中数据处理的实用方法和思想,例如:Bootstrap再抽样法、刀切(Jackknife)估计、EM算法、Logistic回归、稳健(Robust)回归、Markov链、Monte Carlo方法等。它的统计内容与国内流行的教材相比,理论较深,模型较多,案例的涉及面要广,理论的应用面要丰富,统计思想的阐述与算法更为具体。
本书可作为工科、管理类学科专业本科生、研究生的教材或参考书,也可供教师、工程技术人员自学之用。
信息检索导论 豆瓣
Introduction to Information Retrieval,1E
作者: Christopher D.Manning / Hinrich Schütze 译者: 王斌 出版社: 人民邮电出版社 2010 - 8
封面图片为英国伯明翰塞尔福瑞吉百货大楼,其极具线条感的轮廓外型优美,犹如水波的流动。其外表悬挂了1.5万个铝碟,创造出一种极具现代气息的纹理装饰效果,有如夜空下水流的波光粼粼,闪烁于月光之下,使建筑的商业氛围表现到极致。设计该建筑的英国“未来系统建筑事物所”,将商场内部围合成一个顶部采光的中庭,配以交叉的自动扶梯,使购物环境呈现出一种凝聚的向心力和商业广告的展示效应。作为英国第二商业城市伯明翰的建筑地标,人们称该建筑为“未来的百货商店”。因其设计构思的前卫性,该建筑获得2004年英国皇家建筑学会的“建筑设计奖”和2004年“英国皇家工艺美术委员会奖”等多个奖项。
本书从计算机科学领域的角度出发,介绍了信息检索的基础知识,并对当前信息检索的发展做了回顾,重点介绍了搜索引擎的核心技术,如文档分类和文档聚类问题,以及机器学习和数值计算方法。书中所有重要的思想都用示例进行了解释,生动形象,引人入胜,实现了理论与实战的完美结合。
本书的三位作者均是信息检索领域的顶级专家,两位来自学术教育界,一位来自硅谷业界,使本书既具备深厚的理论基础,又代表了尖端科技水准。因此,该书甫一出版,即被奉为该领域的权威著作,备受瞩目。目前已被众多世界名校采用为信息检索课程的教材。
信息论、推理与学习算法 豆瓣
Information Theory, Inference and Learning Algorithms
作者: [英] David J.C. MacKay 译者: 肖明波 / 席斌 出版社: 高等教育出版社 2006 - 7
本书是英国剑桥大学卡文迪许实验室的著名学者David J.C.MacKay博士总结多年教学经验和科研成果,于2003年推出的一部力作。本书作者不仅透彻地论述了传统信息论的内容和最新编码算法,而且以高度的学科驾驭能力,匠心独具地在一个统一框架下讨论了贝叶斯数据建模、蒙特卡罗方法、聚类算法、神经网络等属于机器学习和推理领域的主题,从而很好地将诸多学科的技术内涵融会贯通。本书注重理论与实际的结合,内容组织科学严谨,反映了多门学科的内在联系和发展趋势。同时,本书还包含了丰富的例题和近400道习题(其中许多习题还配有详细的解答),便于教学或自学,适合作为信息科学与技术相关专业高年级本科生和研究生教材,对相关专业技术人员也不失为一本有益的参考书。...