自然语言处理
牛津计算语言学手册 豆瓣
作者: 米特科夫 编 出版社: 外语教学与研究出版社 1991
《牛津计算语言学手册》内容简介:《牛津计算语言学手册》是一部手册性的计算语言学专著,收录了包括语言学家、计算机专家和语言工程人员在内的50位学者撰写的综述性文章,全面地反映了国外计算语言学主要领域的最新成果,是我们了解国外计算语言学发展动向的一个窗口。 全书各章写作风格一致,内容协调,浑然一体,使用有趣的实例来介绍艰深的技术问题,而且尽量不使用繁难的数学公式,尤其适合文科背景的读者阅读。对于那些对计算语言学感兴趣和刚入门的读者而言,《牛津计算语言学手册》也是一本必备的参考书。
Introduction to Information Retrieval 豆瓣
作者: Christopher D. Manning / Prabhakar Raghavan 出版社: Cambridge University Press 2008 - 7
Class-tested and coherent, this groundbreaking new textbook teaches classic web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. Written from a computer science perspective by three leading experts in the field, it gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Although originally designed as the primary text for a graduate or advanced undergraduate course in information retrieval, the book will also create a buzz for researchers and professionals alike.
Contents
1. Information retrieval using the Boolean model; 2. The dictionary and postings lists; 3. Tolerant retrieval; 4. Index construction; 5. Index compression; 6. Scoring and term weighting; 7. Vector space retrieval; 8. Evaluation in information retrieval; 9. Relevance feedback and query expansion; 10. XML retrieval; 11. Probabilistic information retrieval; 12. Language models for information retrieval; 13. Text classification and Naive Bayes; 14. Vector space classification; 15. Support vector machines and kernel functions; 16. Flat clustering; 17. Hierarchical clustering; 18. Dimensionality reduction and latent semantic indexing; 19. Web search basics; 20. Web crawling and indexes; 21. Link analysis.
Reviews
“This is the first book that gives you a complete picture of the complications that arise in building a modern web-scale search engine. You'll learn about ranking SVMs, XML, DNS, and LSI. You'll discover the seedy underworld of spam, cloaking, and doorway pages. You'll see how MapReduce and other approaches to parallelism allow us to go beyond megabytes and to efficiently manage petabytes." -Peter Norvig, Director of Research, Google Inc.
"Introduction to Information Retrieval is a comprehensive, up-to-date, and well-written introduction to an increasingly important and rapidly growing area of computer science. Finally, there is a high-quality textbook for an area that was desperately in need of one." -Raymond J. Mooney, Professor of Computer Sciences, University of Texas at Austin
“Through compelling exposition and choice of topics, the authors vividly convey both the fundamental ideas and the rapidly expanding reach of information retrieval as a field.” -Jon Kleinberg, Professor of Computer Science, Cornell University
Foundations of Statistical Natural Language Processing 豆瓣
作者: Christopher D. Manning / Hinrich Schütze 出版社: The MIT Press 1999 - 6
Statistical approaches to processing natural language text have become dominant in recent years. This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear. The book contains all the theory and algorithms needed for building NLP tools. It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations. The book covers collocation finding, word sense disambiguation, probabilistic parsing, information retrieval, and other applications.
语言研究的数学方法 豆瓣
作者: [美]芭芭拉•帕赫蒂 / [荷]爱丽丝•特缪伦 译者: 吴道平 出版社: 商务印书馆 2012 - 8
本书由欧美当代顶尖数理语言学家联合力作,欧美大学语言学系经典教材,为目前各个语种中最完善的版本。涵盖离散数学几乎所有内容,尤其包括和语言研究密切相关的部分。全书分为五部分,分别为:集合论;逻辑和形式系统;代数;作为形式语言的英语;语言、语法与自动机。每章后附大量练习,每部分后附复习题,并配有练习答案,帮助加深对所学内容的理解。
统计自然语言处理基础 豆瓣 Goodreads
Foundations of Statistical Natural Language Processing
作者: Chris Manning / Hinrich Schütze 译者: 苑春法 / 李伟 出版社: 电子工业出版社 2005 - 1
《统计自然语言处理基础:国外计算机科学教材系列》是一本全面系统地介绍统计自然语言处理技术的专著,被国内外许多所著名大学选为计算语言学相关课程的教材。《统计自然语言处理基础:国外计算机科学教材系列》涵盖的内容十分广泛,分为四个部分,共16章,包括了构建自然语言处理软件工具将用到的几乎所有理论和算法。全书的论述过程由浅入深,从数学基础到精确的理论算法,从简单的词法分析到复杂的语法分析,适合不同水平的读者群的需求。同时,《统计自然语言处理基础:国外计算机科学教材系列》将理论与实践紧密联系在一起,在介绍理论知识的基础上给出了自然语言处理技术的高层应用(如信息检索等)。在《统计自然语言处理基础:国外计算机科学教材系列》的配套网站上提供了许多相关资源和工具,便于读者结合书中习题,在实践中获得提高。近年来,自然语言处理中的统计学方法已经逐渐成为主流。