This book was written by the late professor Thomas D. Wickends, and is by far the best book on understanding linear regression I have read. As written in the preface, most books on multivariate statistics approach it via two aspects: algebra and computation. Algebraic approach to the multivariate statistics, and multiple regression in particular, is a self-contained one. It is built on linear algebra and probability and therefore can use these two to develop all the theorems and algorithms, and give rigorous proves on the correctness and error bounds of the methods. Computational approach is to teach the students on how to run the software and interpret the results. For example, reading the p-values of the regression coefficients. However the computational approach, because of its lack of diving into the theories, does not tell one how F-test and the corresponding p-values will change when the properties of the data change. And the computational approach would leave statements like “insignificant p-values don’t mean no correlation with the target” at a superficial level without any deep understanding. The geometrical approach taken by Prof. Wickends has the exact aim: to understand the meanings behind the linear algebra equations and the output of the software!

Geometry is a perfect tool for understanding multivariate statistics because first it can help understand linear algebra (quite a few linear algebra books take a geometry approach) and second it can help understand probability distribution as well. For one-dimension distributions, we think in terms of PDF and CDF curves. And for multivariate distributions, the pdf contour in the high dimensional space is elliptical. This book combines these two geometrical understandings in a unified framework and develops several key concepts of multivariate statistics in only 160 pages!

The main feature of this book is that every concept is developed geometrically, without using linear algebra or calculus at all. In viewing the data points, there are two spaces to view them: variable space (row space) and subject space (column space). When the number of variables exceeds three, it becomes hard to think about/visualize the data points in the variable space, yet it is still conceptually very easy to see them in the subject space! And most of the concepts in this book are developed in this subject space, only occasionally the author switches to the variable space to provide another view (e.g. in the PCA chapter). For example, the linear regression equation:

(all vectors here are demeaned, so there is no offset parameter a)

Basically this equation defines a linear space that are spanned by x_1 to x_p. The aim of the linear regression is to make \hat{y} as close to the target y as possible. In the subject space:

Where e is the residual vector, which is perpendicular to the space defined by xs when the distance (|e|) is minimized. By setting the perpendicular constraints as equations, we can drive the normal equation of linear regression, which in an algebraic approach is usually defined directly as X’X b = X’y rather than derived from geometrical intuition.

Other multivariate statistics book (e.g. Johnson & Wichern) though at places introduce a geometrical explanation. These bits are however embedded inside a large set of linear algebra equations. That makes the geometrical explanation in those book only supportive for the algebra equations, and therefore are not self-contained! While in this book, everything has to be developed geometrically, there is no corner that the author can hide a geometry concept that has not developed before. For example, this book gives a very good explain of the demean operation. In multiple regression, the demean operation introduces a space defined by a 1 vector (1,1,…1)’, and the original y and xs share their mean components in this same space. And this space is perpendicular to the residual vector e, therefore e’1 = 0! (If e’1!=0, then e has a non-zero projection to 1-space, which can then be absorbed into the constant offset in the regression equation.)

For practitioners, section 4.2 (goodness of fit) and chapter 5 of this book are most useful. The *coefficient of determination* is interpreted geometrically as:

Chapter 5’s title is Configurations of multiple regression vectors, which explains how the regression becomes for different configurations of xs. When the predictors (xs) are highly correlated, the regression becomes unstable (more analysis in Chapter 6). And the book proposes several approaches to deal with the multicollinearity problem. One of them is not to minimize |e| alone (as the position of the regression space defined by the collinear xs is not stable), and |e| + some criterion that stabilizes the position of the regression space. This would lead to ridge regression and other variants. Ridge regression is l2-regularized regression, and the regularization is to “control model complexity”. And in the situation of multicollinearity, we can a much better understanding of what on earth is model complexity through geometry!

Other points in this book: The geometry of linear algebra meets the geometry of probability in Chapter 6 (tests). Geometry interpretation of analysis of variance (Sec 3.4 & Chapter 8), PCA (Chapter 9), and CCA (Chapter 10). Each chapter of this book refreshes some of my memory and adds something new, geometrically. Strongly recommended for anyone who works with regression problems and does data mining in general.

This last paragraph is not about this particular book, but is about my motivation of reading statistics books recently. Many data miners understand regression as loss minimization, probably plus knowing that a small p-value is good indication of a useful feature (this is indeed not accurate). I was such a data miner until half a year ago. But few of them know how the p-value in multiple regression (and in other models, e.g. logistic regression) is actually calculated. Software packages nowadays are very advanced and make models like linear regression/pca/cca seem to be as simple as one line of code in R/Matlab. But if we don’t know them by hand and by heart and understand the various properties of these simple models, how can we feel confident when applying them? On the other side, as researchers in data mining, how can we confidently modify existing models to avoid their weakness in a particular application if we only have a shallow understanding of the weakness of a model? Therefore, for example, we cannot stop our understanding of regularization at only one sentence “controlling model complexity.” We need go deeper than that, to know what is behind that abstract term “model complexity”. This is why recently while writing my phd thesis, I am also learning and re-learning statistics and machine learning models. I should have done this say four years ago! But it is never too late to learn, and learning them after doing several successful data mining projects gives me a new perspective.

interesting article, I really do enjoy reading about mining!

ReplyDeleteThis blog awesome and i learn a lot about programming from here.The best thing about this blog is that you doing from beginning to experts level.

ReplyDeleteLove from

this is really a very great blog. the information present in this blog will be very useful for us. thank you for sharing with us.

ReplyDeleteSEO Company In Chennai

Its a wonderful post and very helpful, thanks for all this information. You are including better information regarding this topic in an effective way.Thank you so much

ReplyDeleteInstallment Loans Near Me

Title loans Near Me

Cash Advances Near Me

This is very useful post for all those who want to learn. If you want many such informational and useful post visit https://www.loginworks.com/blogs/

ReplyDeleteWe at COEPD glad to announce that we have introduced Dot Net Technologies Internship Programs (Self sponsored) for professionals who want to have hands on experience. This program is available in COEPD Hyderabad premises which is accompanied by IT Companies. It is intelligently dedicated to our firm participants predominantly acknowledging and appreciating the fact that they are on the path of making a career in Dot Net Technologies discipline. We assume Object-Oriented Programming concepts and teaches C#.NET, ADO.NET which helps the interns to build database-driven Web applications and Web Sites successfully. This internship is designed to gain theoretical knowledge and also hands-on practice and practical know-how to master the nitty-gritty of the Dot Net developer profession. More than a training institute, COEPD today stands differentiated as a mission to help you "Build your dream career" - COEPD way.

ReplyDeletehttp://www.coepd.com/DotNet-Internship.html

شركة مكافحة حشرات بحائل

ReplyDeleteشركة مكافحة حشرات بالدمام

شركة دهانات بأبها

افضل شركة ترميم وتشطيب المنازل بالجنوب

thanks for sharing this information

ReplyDeletebest python training in chennai

best python training in sholinganallur

best python training institute in omr

python training in omr

best java training in chennai

devops training in chennai

best devops training in chennai

thanks for sharing this information

ReplyDeletepython training in bangalore

best python training institute in bangalore

python training in jayanagar bangalore

Artificial Intelligence training in Bangalore

data science with python training in Bangalore

RPA Training in Bangalore

Blue Prism Training in Bangalore

Google Cloud Training in Bangalore

It’s always so sweet and also full of a lot of fun for me personally and my office colleagues to search your blog a minimum of thrice in a week to see the new guidance you have got.Surya Informatics

ReplyDeleteNice Information

ReplyDelete"Pressure Vessel Design Course is one of the courses offered by Sanjary Academy in Hyderabad. We have offer professional

Engineering Course like Piping Design Course,QA / QC Course,document Controller course,pressure Vessel Design Course,

Welding Inspector Course, Quality Management Course, #Safety officer course."

Piping Design Course in India

Piping Design Course in Hyderabad

Piping Design Course in Hyderabad

QA / QC Course

QA / QC Course in india

QA / QC Course in Hyderabad

Document Controller course

Pressure Vessel Design Course

Welding Inspector Course

Quality Management Course

Quality Management Course in india

Safety officer course

Good Post

ReplyDeleteYaaron Studios is one of the rapidly growing editing studios in Hyderabad. We are the best Video Editing services in Hyderabad. We provides best graphic works like logo reveals, corporate presentation Etc. And also we gives the best Outdoor/Indoor shoots and Ad Making services.

Best video editing services in Hyderabad,ameerpet

Best Graphic Designing services in Hyderabad,ameerpet

Best Ad Making services in Hyderabad,ameerpet

Fixed-Price - Intermediate ($$) - Est. http://s7rlse04ra.dip.jp http://4b3slgtpfn.dip.jp http://t06v2lwtpe.dip.jp

ReplyDeleteNice blog! i'm also working with a Digital marketing company in gurgaon

ReplyDeletewebsite designing in gurgaon

best website design services in gurgaon

best web design company in gurgaon

best website design in gurgaon

website design services in gurgaon

website design service in gurgaon

best website designing company in gurgaon

website designing services in gurgaon

web design company in gurgaon

best website designing company in india

top website designing company in india

best web design company in gurgaon

best web designing services in gurgaon

best web design services in gurgaon

website designing in gurgaon

website designing company in gurgaon

website design in gurgaon

graphic designing company in gurgaon

website company in gurgaon

website design company in gurgaon

web design services in gurgaon

best website design company in gurgaon

website company in gurgaon

Website design Company in gurgaon

best website designing services in gurgaon

best web design in gurgaon

website designing company in gurgaon

website development company in gurgaon

web development company in gurgaon

website design company

website designing services

I’m Артур Борис a resident/citizen of the Republic Of Russian. I’m 52 years of age, an entrepreneur/businessman. I once had difficulties in financing my project/business, if not for a good friend of mine who introduced me to Mr Benjamin Lee to get a loan worth $250,000 USD from his company. When i contacted them it took just five working days to get my loan process done and transferred to my account. Even with a bad credit history, they still offer their service to you. They also offer all kinds of loan such as business loans, home loans, personal loans, car loans. I don’t know how to thank them for what they have done for me but God will reward them according to his riches in glory. If you need an urgent financial assistance contact them today via email lfdsloans@outlook.com WhatsApp information...+1-989-394-3740

ReplyDelete