My Book Review of The Elements of Statistical Learning - chap 2

My Paper ReviewThe Elements of Statistical Learning

YAN TING LIN

Chapter 1 Introduction• Statistical Learning 在幹麻？為什麼要學？• 預測是否有疾病、未來股價、影像辨識、找出疾病因子 …• 第一章提了四個案例 :• Ex 1: Email Spam• Ex 2: Prostate Cancer• Ex 3: Handwritten Digit Recognition• Ex 4: DNA Expression Microarrays• 這本書適合給誰看？這本書是如何架構的？

Email Spam

Prostate Cancer

Handwritten Digit Recognition

DNA Expression Microarrays

Chapter 2• 2.1 Introduction• 2.2 Variable Type, and Terminology• Quantitative 與 Qualitative 的差異• 解釋符號 G 、 X 、 Y 、 Input 、 Output 、 Ordered categorical 、 K-level

qualitative variable 、 xi 、 Ŷ

2.3 Two Simple Approaches to Prediction: Least Squares and Nearest Neighbors • Linear Models and Least Squares

• 最簡單的線性關係所以從• Training Set 已知 X 和 Y 可以找出• 一個近似最佳解的 β• 可以應用在高偏差的資料

• Nearest-Neighbor Methods

• 簡稱 KNN 是一種非監督的學習方法 • 優點就是簡單，可以用在高 variance 的資料。

• Least Squares 產生的 Decision Boundary 平滑，適合符合線性模型的資料，具有低變異、高偏差的特性。• k-nearest-neighbor 產生的 Decision Boundary 則是比較自由，根據資料做變化，具有高變異、低偏差的特性。

兩種情境• Scenario 1: The training data in each class were generated from

bivariate Gaussian distributions with uncorrelated components and different means. • Scenario 2: The training data in each class came from a mixture of 10

low- variance Gaussian distributions, with individual means themselves distributed as Gaussian.

• 以上兩種方法適合的情境不太一樣，最小平方法適合情境 1 、最近 K 個鄰近點法適合情境 2

書中作者的一個模擬案例

作者提及這兩種方法可以改進的地方• 我覺得這些內容已經是很後面章節的東西了，大部份現在常用的方法都是上述這兩種方法的變形。• Kernel methods use weights that decrease smoothly to zero with distance from the

target point, rather than the effective 0/1 weights used by k-nearest neighbors. • In high-dimensional spaces the distance kernels are modified to emphasize some

variable more than others. • Local regression fits linear models by locally weighted least squares, rather than

fitting constants locally. • Linear models fit to a basis expansion of the original inputs allow arbitrarily

complex models. • Projection pursuit and neural network models consist of sums of non- linearly

transformed linear models.

2.4 Statistical Decision Theory 1. 對於每個真實世界的輸入資料 X2. 以及要預測的每一個輸出資料 Y3. 尋找一個最適合關係式描述 Y = f(X)4. 因為一定有誤差很少能完美符合5. 所以定義一個衡量誤差的函數 L ( Y, f(X) ) 來評量誤差6. 最常使用的衡量誤差的方法是 Squared Error Loss 7. 最完美簡單的形式就是線性關係 Y = f(X) = βx + bias8. 可以利用微分或線性方程解方程式得到 β9. 藉由降低誤差 EPE 可以找到一個近似最佳解的模型

• k-nearest neighbors 和 least squares 這兩種方法都是找接近的平均期望值來建立符合資料的模型• 唯一的差別在於他們對於資料的假設不同• least squares 是假設資料的模型近似線性方程式 (linear model) • k-nearest neighbors 是假設資料的模型近似 locally constant

function• Bayes Optimal Classifier 利用條件分佈模型來分類資料，其 Bayes

rate 就是 error rate

2.5 Local Methods in High Dimensions • the curse of dimensionality (Bellman, 1961) • https://zh.wikipedia.org/wiki/维数灾难• 這個章節在講別人論文裡的東西，也是很重要的議題。• 快速理解可以參考維基百科，大意是資料在高維度的情況下，很多變數屬性時非常難處理、會遇到許多問題。

https://zh.wikipedia.org/wiki/%E7%BB%B4%E6%95%B0%E7%81%BE%E9%9A%BE

https://zh.wikipedia.org/wiki/%E7%BB%B4%E6%95%B0%E7%81%BE%E9%9A%BE

2.6 Statistical Models, Supervised Learning and Function Approximation • 目的找出一個理想對應輸入與輸出的關係式• the function-fitting paradigm from a machine learning point of view • 機器學習就是把資料丟進學習器程式然後程式根據資料學習，輸出一個方法 (function)• Suppose for simplicity that the errors are additive and that the model

Y = f(X) + ε is a reasonable assumption

• 2.6 後半段再講如何用 Maximum likelihood estimation 找理想方法的參數• https://en.wikipedia.org/wiki/Maximum_likelihood_estimation

https://en.wikipedia.org/wiki/Maximum_likelihood_estimation

2.7 Structured Regression Models

• In order to obtain useful results for finite N, we must restrict the eligible solutions to (2.37) to a smaller set of functions. • 後面這幾章應該是在講如何在眾多不同參數的解決方法中找出一個適合的。• 可能是因為第二章而已，後面段落許多公式的敘述推導都沒有寫得很詳細、快快帶過。

2.8 Classes of Restricted Estimators • Roughness Penalty and Bayesian Methods

• Kernel Methods and Local Regression • https://en.wikipedia.org/wiki/Kernel_smoother• Basis Functions and Dictionary Methods • 這邊我看不太懂，作者有說後面章節會在說明。

https://en.wikipedia.org/wiki/Kernel_smoother

2.9 Model Selection and the Bias–Variance Tradeoff

Presentations & Public Speaking

My Book Review of The Elements of Statistical Learning - chap 2