数据挖掘
The Signal and the Noise 豆瓣 Goodreads
6.8 (5 个评分) 作者: Nate Silver Penguin Press HC, The 2012 - 9
"Nate Silver's The Signal and the Noise is The Soul of a New Machine for the 21st century."
—Rachel Maddow, author of Drift
Nate Silver built an innovative system for predicting baseball performance, predicted the 2008 election within a hair’s breadth, and became a national sensation as a blogger—all by the time he was thirty. The New York Times now publishes FiveThirtyEight.com, where Silver is one of the nation’s most influential political forecasters.
Drawing on his own groundbreaking work, Silver examines the world of prediction, investigating how we can distinguish a true signal from a universe of noisy data. Most predictions fail, often at great cost to society, because most of us have a poor understanding of probability and uncertainty. Both experts and laypeople mistake more confident predictions for more accurate ones. But overconfidence is often the reason for failure. If our appreciation of uncertainty improves, our predictions can get better too. This is the “prediction paradox”: The more humility we have about our ability to make predictions, the more successful we can be in planning for the future.
In keeping with his own aim to seek truth from data, Silver visits the most successful forecasters in a range of areas, from hurricanes to baseball, from the poker table to the stock market, from Capitol Hill to the NBA. He explains and evaluates how these forecasters think and what bonds they share. What lies behind their success? Are they good—or just lucky? What patterns have they unraveled? And are their forecasts really right? He explores unanticipated commonalities and exposes unexpected juxtapositions. And sometimes, it is not so much how good a prediction is in an absolute sense that matters but how good it is relative to the competition. In other cases, prediction is still a very rudimentary—and dangerous—science.
Silver observes that the most accurate forecasters tend to have a superior command of probability, and they tend to be both humble and hardworking. They distinguish the predictable from the unpredictable, and they notice a thousand little details that lead them closer to the truth. Because of their appreciation of probability, they can distinguish the signal from the noise.
With everything from the health of the global economy to our ability to fight terrorism dependent on the quality of our predictions, Nate Silver’s insights are an essential read.
社交网站的数据挖掘与分析 豆瓣
Mining the Social Web : Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites
作者: Matthew A·Russell 译者: 师蓉 机械工业出版社 2012 - 2
Facebook、Twitter和LinkedIn产生了大量宝贵的社交数据,但是你怎样才能找出谁通过社交媒介正在进行联系?他们在讨论些什么?或者他们在哪儿?这本简洁而且具有可操作性的书将揭示如何回答这些问题甚至更多的问题。你将学到如何组合社交网络数据、分析技术,如何通过可视化帮助你找到你一直在社交世界中寻找的内容,以及你闻所未闻的有用信息。
每个独立的章节介绍了在社交网络的不同领域挖掘数据的技术,这些领域包括博客和电子邮件。你所需要具备的就是一定的编程经验和学习基本的Python工具的意愿。
•获得对社交网络世界的直观认识
•使用GitHub上灵活的脚本来获取从诸如Twitter、Facebook和LinkedIn之类的社交网络API中的数据
•学习如何应用便捷的Python工具来交叉分析你所收集的数据
•通过XHTML朋友圈探讨基于微格式的社交联系
•应用诸如TF-IDF、余弦相似性、搭配分析、文档摘要、派系检测之类的先进挖掘技术
•通过基于HTML5和JavaScript工具包的网络技术建立交互式可视化
智能Web算法 豆瓣
Algorithms of the Intelligent Web
作者: Haralambos Marmanis / Dmitry Babenko 译者: 阿稳 / 陈钢 电子工业出版社 2011 - 11
本书涵盖了五类重要的智能算法:搜索、推荐、聚类、分类和分类器组合,并结合具体的案例讨论了它们在Web应用中的角色及要注意的问题。除了第1章的概要性介绍以及第7章对所有技术的整合应用外,第2~6章以代码示例的形式分别对这五类算法进行了介绍。
本书面向的是广大普通读者,特别是对算法感兴趣的工程师与学生,所以对于读者的知识背景并没有过多的要求。本书中的例子和思想应用广泛,所以对于希望从业务角度更好地理解有关技术的技术经理、产品经理和管理层来说,本书也有一定的价值。