SMS scnews item created by Munir Hiabu at Tue 13 Oct 2020 1017
Type: Seminar
Distribution: World
Expiry: 20 Oct 2020
Calendar1: 16 Oct 2020 1600-1700
CalLoc1: https://au.bbcollab.com/guest/fcf219c74ac743e89565a9e6e8d349a9
Auth: munir@119-18-1-53.771201.syd.nbn.aussiebb.net (mhia8050) in SMS-WASM

Statistics Across Campuses: Inge Koch -- Robust approaches to principal component analysis for high-dimensional

Robust approaches to principal component analysis for high-dimensional and directional
data 

Date: 16 October 2020, Friday 

Time: 4pm 

Speaker: Prof.  Inge Koch (The University of Western Australia) 

Abstract: 

Principal component analysis (PCA) is a widespread tool for selecting a smaller number
of dimensions and key features in multivariate and high-dimensional data.  More recently
a number of variants of PCA have been developed including sparse PCA for
high-dimensional data and robust PCA.  In this talk we focus on PCA developments for
multivariate and high-dimensional directional random vectors and data which have been
transformed to live on the surface of the d-dimensional sphere.  These random vectors
are also know as special signs.  For directional random vectors we review robust
covariance-related matrices, including the sign and rank covariance matrices, and we
present theoretical results of these and relate their relationships to the canonical
population covariance matrix.  

For random vectors and data from the elliptic distribution we point out relationships
between these robust population covariance matrices and their sample couterparts.  For
non-elliptic data, much less is known at the population level and the sample level about
behaviour of these various covariance matrices.  We begin with sample versions of the
robust covariance matrices, and show the relationships between them and between sample
and corresponding population quantities.  

We complement these comparisons with calculations based on real data and simulated data
ranging from multivariate Gaussian and skew-normal to bimodal and data with high
kurtosis and outliers.  For such data we study the behaviour of the first few
eigenvectors and calculate the closeness of eigenvectors arising from different robust
covariance matrices.  For simulated data we also calculate their closeness to the
eigenvector of the population covariance matrix for a range of dimensions as the sample
size increases.  Our findings show that kurtosis is a key feature which affects the
closeness of the sample eigenvectors to those of the population and we suggest criteria
based on the amount of kurtosis which may provide a guide to choosing the `best’
sample covariance to use for particular datasets.  

Link: https://au.bbcollab.com/guest/fcf219c74ac743e89565a9e6e8d349a9