Large Sample Covariance Matrices and High-Dimensional Data Analysis

豆瓣
Large Sample Covariance Matrices and High-Dimensional Data Analysis

登录后可管理标记收藏。

ISBN: 9781107065178
作者: Jianfeng Yao / Shurong Zheng / Zhidong Bai
出版社: Cambridge University Press
发行时间: 2015 -4

/ 10

0 个评分

评分人数不足
借阅或购买

Jianfeng Yao / Shurong Zheng   

简介

Book description
High-dimensional data appear in many fields, and their analysis has become increasingly important in modern statistics. However, it has long been observed that several well-known methods in multivariate analysis become inefficient, or even misleading, when the data dimension p is larger than, say, several tens. A seminal example is the well-known inefficiency of Hotelling's T2-test in such cases. This example shows that classical large sample limits may no longer hold for high-dimensional data; statisticians must seek new limiting theorems in these instances. Thus, the theory of random matrices (RMT) serves as a much-needed and welcome alternative framework. Based on the authors' own research, this book provides a firsthand introduction to new high-dimensional statistical methods derived from RMT. The book begins with a detailed introduction to useful tools from RMT, and then presents a series of high-dimensional problems with solutions provided by RMT methods.
Reviews
'This is the first book which treats systematic corrections to the classical multivariate statistical procedures so that the resultant procedures can be used for high-dimensional data. The corrections have been done by employing asymptotic tools based on the theory of random matrices.'
Yasunori Fujikoshi - Hiroshima University, Japan
'… this book is the first to cover these topics and can serve both as a good introduction to the topics as well as a comprehensive reference on the state of the art.'
Robert Stelzer Source: MathSciNet
'This book deals with the analysis of covariance matrices under two different assumptions: large-sample theory and high-dimensional-data theory. While the former approach is the classical framework to derive asymptotics, nevertheless the latter has received increasing attention due to its applications in the emerging field of big-data. Due to its novelty and its relevance in the current research, the authors focus mainly on the high-dimensional-data framework. … The theory and the applications are presented under both the large-sample theory and the high-dimensional-data theory, and thus the reader can easily appreciate the differences between the two approaches. The material is presented in a quite simple manner, and the reader only needs some pre-requisites in basic mathematical statistics, linear algebra, and theory of multivariate normal distributions. Some technical prerequisites are collected in two appendices. Therefore, the book can be used by graduate students and researchers in a wide range of disciplines, ranging from mathematics to applied sciences.'
Fabio Rapallo Source: Zentralblatt MATH

contents

http://web.hku.hk/~jeffyao/docs/samplechaps-scv-Aug23.pdf
Notations page vi
Preface vii
1 Introduction 1
1.1 Large dimensional data and new asymptotic statistics 1
1.2 Random matrix theory 3
1.3 Eigenvalue statistics of large sample covariance matrices 4
1.4 Organisation of the book 5
2 Limiting spectral distributions 7
2.1 Introduction 7
2.2 Fundamental tools 8
2.3 Marcenko-Pastur distributions ˇ 10
2.4 Generalised Marcenko-Pastur distributions 16 ˇ
2.5 LSD for random Fisher matrices 22
3 CLT for linear spectral statistics 30
3.1 Introduction 30
3.2 CLT for linear spectral statistics of a sample covariance matrix 31
3.3 Bai and Silverstein’s CLT 39
3.4 CLT for linear spectral statistics of random Fisher matrices 40
3.5 The substitution principle 44
4 The generalised variance and multiple correlation coefficient 47
4.1 Introduction 47
4.2 The generalised variance 47
4.3 The multiple correlation coefficient 52
5 The T-statistic 57
5.1 Introduction 57
5.2 Dempster’s non-exact test 58
5.3 Bai-Saranadasa’s test 60
5.4 Improvements of the Bai-Saranadasa test 62
5.5 Monte-Carlo results 66
6 Classification of data 69
6.1 Introduction 69
6.2 Classification into one of two known multivariate normal populations 69
6.3 Classification into one of two multivariate normal populations
with unknown parameters 70
6.4 Classification into one of several multivariate normal populations 72
6.5 Classification under large dimensions: the T-rule and the D-rule 73
6.6 Misclassification rate of the D-rule in case of two normal populations 74
6.7 Misclassification rate of the T-rule in case of two normal populations 77
6.8 Comparison between the T-rule and the D-rule 78
6.9 Misclassification rate of the T-rule in case of two general populations 79
6.10 Misclassification rate of the D-rule in case of two general populations 83
6.11 Simulation study 89
6.12 A real data analysis 94
7 Testing the general linear hypothesis 97
7.1 Introduction 97
7.2 Estimators of parameters in multivariate linear regression 98
7.3 Likelihood ratio criteria for testing linear hypotheses about
regression coefficients 98
7.4 The distribution of the likelihood ratio criterion under the null 99
7.5 Testing equality of means of several normal distributions with
common covariance matrix 101
7.6 Large regression analysis 103
7.7 A large-dimensional multiple sample significance test 109
8 Testing independence of sets of variates 115
8.1 Introduction 115
8.2 The likelihood ratio criterion 115
8.3 The distribution of the likelihood ratio criterion under the null
hypothesis 118
8.4 The case of two sets of variates 120
8.5 Testing independence of two sets of many variates 122
8.6 Testing independence of more than two sets of many variates 126
9 Testing hypotheses of equality of covariance matrices 130
9.1 Introduction 130
9.2 Criteria for testing equality of several covariance matrices 130
9.3 Criteria for testing that several normal distributions are identical 133
9.4 The sphericity test 136
9.5 Testing the hypothesis that a covariance matrix is equal to a given
matrix 138
9.6 Testing hypotheses of equality of large-dimensional covariance
matrices 139
9.7 Large-dimensional sphericity test 148
10 Estimation of the population spectral distribution 160
10.1 Introduction 160
10.2 A method-of-moments estimator 161
10.3 An estimator using least sum of squares 166
10.4 A local moment estimator 176
10.5 A cross-validation method for selection of the order of a population
spectral distribution 189
11 Large-dimensional spiked population models 201
11.1 Introduction 201
11.2 Limits of spiked sample eigenvalues 203
11.3 Limits of spiked sample eigenvectors 209
11.4 Central limit theorem for spiked sample eigenvalues 211
11.5 Estimation of the values of spike eigenvalues 224
11.6 Estimation of the number of spike eigenvalues 226
11.7 Estimation of the noise variance 237
12 Efficient optimisation of a large financial portfolio 244
12.1 Introduction 244
12.2 Mean-Variance Principle and the Markowitz’s enigma 244
12.3 The plug-in portfolio and over-prediction of return 247
12.4 Bootstrap enhancement to the plug-in portfolio 253
12.5 Spectrum-corrected estimators 257
Appendix A Curvilinear integrals 275
Appendix B Eigenvalue inequalities 282
Bibliography 285
Index 291

短评
评论
笔记