29
Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

Embed Size (px)

Citation preview

Page 1: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

Introduction to PythonSession 2: Beginning Numerical Python and Visualization

Jeremy Chen

Page 2: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

This Session’s Agenda

• NumPy Arrays• Random Number Generation• Numerical Linear Algebra• Visualization• An OLS Example

2

Page 3: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

3

Preliminaries

• By now you should have a Python distribution…– Run IPython or Spyder– If not, start downloading one and work with

IDLE

Page 4: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

4

NumPy Arrays

• The ndarray is a typed N-dimensional array object:>>> import numpy as np>>> A = np.ndarray((2,3)) # Kinda the same as np.empty((2,3)); Uninitialized dataarray([[ 8.48798326e-314, 2.12199579e-314, 0.00000000e+000], [ 0.00000000e+000, 1.05290307e-253, 1.47310613e-319]])>>> A[0,0] = 2; A[0,1] = 3; A[1,1] = 4; A[1,2] = 5; Aarray([[ 2., 3., 0.], [ 0., 4., 5.]])>>> 10 * A + 1 # Simple vector operationsarray([[ 21., 31., 1.], [ 1., 41., 51.]])>>> np.exp(A) # And various other vector operationsarray([[ 7.3890561 , 20.08553692, 1. ], [ 1. , 54.59815003, 148.4131591 ]])>>> [A.shape, A.ndim] # And don’t forget ndarrays have a "shape" and "dimension"[(2, 3), 2]>>> A.dtype # ... and a data type.dtype('float64')

Page 5: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

5

NumPy Arrays: Creating

• Common ways of creating a ndarray>>> np.array(np.arange(5)) # arange is like range but returns a ndarrayarray([0, 1, 2, 3, 4])>>> np.array(np.arange(5)).dtype # numpy selects an appropriate datatypedtype('int32')>>> np.reshape([3,2,1,3,1,2,6,7], (2,4))array([[3, 2, 1, 3], [1, 2, 6, 7]])>>> np.ones((2,3,4))array([[[ 1., 1., 1., 1.], [ 1., 1., 1., 1.], [ 1., 1., 1., 1.]],

[[ 1., 1., 1., 1.], [ 1., 1., 1., 1.], [ 1., 1., 1., 1.]]]) >>> np.zeros((5))array([ 0., 0., 0., 0., 0.])

Page 6: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

6

NumPy Arrays: Creating

• An amusing way to create a ndarray>>> M = np.mat('[1,2,3;4,5,6]'); M # Like MATLABmatrix([[1, 2, 3], [4, 5, 6]])>>> A = np.array(M); A # ... not really necessaryarray([[1, 2, 3], [4, 5, 6]])>>> # Here's the obligatory Matrix Multiplication digression>>> M * M.T # Matrix multiplication (M.T is the transpose of M)matrix([[14, 32], [32, 77]])>>> A * A.T # Not matrix multiplicationTraceback (most recent call last): File "<stdin>", line 1, in <module>ValueError: operands could not be broadcast together with shapes (2,3) (3,2) >>> A.dot(A.T) # Matrix multiplication for 2 dimensional ndarraysarray([[14, 32], [32, 77]])>>> np.dot(A,A.T) # ... same here. (Dot product for 1 dimensional ndarrays)array([[14, 32], [32, 77]])

Page 7: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

7

NumPy Arrays: Accessing

• Simple slicing creates views (not copies)>>> A = np.reshape(np.arange(2*4)+1, (2,4)); Aarray([[1, 2, 3, 4], [5, 6, 7, 8]])>>> B = A[:,1:3]; Barray([[2, 3], [6, 7]])>>> B[0,0] = 100; B; Aarray([[100, 3], [ 6, 7]])array([[ 1, 100, 3, 4], [ 5, 6, 7, 8]])

• “Fancy Indexing” creates copies>>> C = A[:, [1,3]]; C # Using lists/arrays for indexingarray([[100, 4], [ 6, 8]])>>> C[0,0] = 200; C; Aarray([[200, 4], [ 6, 8]])array([[ 1, 100, 3, 4], [ 5, 6, 7, 8]])

Page 8: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

8

NumPy Arrays: A Little More

• Some standard operations>>> A = np.reshape(np.arange(2*4)+1, (2,4)); Aarray([[1, 2, 3, 4], [5, 6, 7, 8]])>>> A.sum()36>>> A.sum(axis=0)array([ 6, 8, 10, 12])>>> A.sum(axis=1)array([10, 26])>>> A.cumsum(axis=1)array([[ 1, 3, 6, 10], [ 5, 11, 18, 26]])>>> (A.mean(axis=1), A.std(axis=1), A.var(axis=1), A.min(axis=1), A.argmax(axis=1))(array([ 2.5, 6.5]), array([ 1.11803399, 1.11803399]), array([ 1.25, 1.25]), array([1, 5]), array([3, 3]))

>>> arr = np.array([1,2,4,9,1,4]); arr.sort(); arr # Sortingarray([1, 1, 2, 4, 4, 9])>>> np.unique(arr) # Keep only unique entriesarray([1, 2, 4, 9])>>> np.unique(np.array(['A', 'B', 'CC', 'c', 'A', 'CC'])) # Applies to strings tooarray(['A', 'B', 'CC', 'c'], dtype='|S2')

Page 9: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

9

Random Number Generation

• The Random Seed>>> np.random.seed(10)

• Various Distributions>>> np.random.SOME_DISTRIBUTION(params, size=(dimensions))>>> # rand ~ Uniform; randn ~ Normal; exponential; beta; >>> # gamma; binomial; randint(=>Discrete) ... even triangular

Page 10: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

10

Numerical Linear Algebra

• Matrix Decompositions>>> A = np.ones((5,5)) + np.eye(5)>>> L = np.linalg.cholesky(A) # Cholesky decomposition.>>> np.allclose(np.dot(L,L.T), A) # Check...True>>> Q, R = np.linalg.qr(A) # QR decomposition>>> np.allclose(np.dot(Q,R), A)True>>> # Eigenvalue decomposition (Use eigh when symmetric/Hermetian)>>> e, V = np.linalg.eig(A)>>> np.allclose(np.dot(A,V), np.dot(V,np.diag(e)))# V may not be unitaryTrue>>> # Singular Value Decomposition>>> U, s, V = np.linalg.svd(A, full_matrices = True)>>> np.allclose(np.dot(np.dot(U, np.diag(s)),V), A)True

Page 11: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

11

Numerical Linear Algebra

• Solving Linear Systems>>> np.linalg.solve(A, np.ones((5,1))).T # Solve Linear Systemarray([[ 0.16666667, 0.16666667, 0.16666667, 0.16666667, 0.16666667]])

>>> np.linalg.inv(A) # Matrix Inversearray([[ 0.83333333, -0.16666667, -0.16666667, -0.16666667, -0.16666667], [-0.16666667, 0.83333333, -0.16666667, -0.16666667, -0.16666667], [-0.16666667, -0.16666667, 0.83333333, -0.16666667, -0.16666667], [-0.16666667, -0.16666667, -0.16666667, 0.83333333, -0.16666667], [-0.16666667, -0.16666667, -0.16666667, -0.16666667, 0.83333333]])

• Other Stuff>>> np.linalg.norm(A) # Matrix norm6.324555320336759>>> np.linalg.det(A) # Determinant (should be 6...)5.9999999999999982>>> np.trace(A) # Matrix trace10>>> np.linalg.matrix_rank(A) # Matrix rank5

Page 12: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

12

Visualization (By Example)

• By now you should have a Python distribution.

• Run IPython or SpyderIn [1]: %pylabUsing matplotlib backend: module://IPython.kernel.zmq.pylab.backend_inlinePopulating the interactive namespace from numpy and matplotlibIn [2]: plot( arange(20) )Out[2]: [<matplotlib.lines.Line2D at 0x8918850>]

Page 13: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

13

Visualization (By Example)

• A Standard Brownian MotionIn [3]: dx = 0.01 # Hit CTRL-Enter for multi-line input ...: walk = (np.random.randn(1000) * np.sqrt(dx)).cumsum() ...: plot(np.arange(0,1000)*dx, walk - walk[0])Out[3]: [<matplotlib.lines.Line2D at 0xd31f770>]

Page 14: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

14

Visualization (By Example)

• A 2D Intensity PlotIn [4]: xy_extent = 20 ...: xy_range = np.arange(-xy_extent,xy_extent,0.1) ...: X_m, Y_m = np.meshgrid(xy_range, xy_range) ...: f = ((X_m-5) - 2 * (Y_m+5)) ** 2 - (1.5*(X_m+3) + 1 * (Y_m-5)) ** 2 ...: imshow(f, cmap = matplotlib.cm.bone, \ ...: extent=[-xy_extent, xy_extent, -xy_extent, xy_extent]) ...: colorbar()Out[4]: <matplotlib.colorbar.Colorbar instance at 0x0C3AFE90>

Page 15: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

15

Visualization (By Example)

• A 2D Contour PlotIn [5]: xy_extent = 20 ...: xy_range = np.arange(-xy_extent,xy_extent,0.1) ...: X_m, Y_m = np.meshgrid(xy_range, xy_range) ...: f = ((X_m-5) - 2 * (Y_m+5)) ** 2 - (1.5*(X_m+3) + 1 * (Y_m-5)) ** 2 ...: contourf(X_m, Y_m, f, 20) ...: colorbar()Out[5]: <matplotlib.colorbar.Colorbar instance at 0x10C12DA0>

Page 16: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

16

Visualization (By Example)

Our Objective…

Page 17: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

17

Visualization (By Example)

• Some data for plottingIn [6]: XY = np.arange(1,1+100)/100.0 ...: XZ = np.arange(1,1+1000)/1000.0 ...: Y = np.random.randn(100) ...: Z = np.random.randn(1000) ...: Z.sort()

• Setting up the figure and subplotsIn [7]: n_row = 2 ...: n_col = 2 ...: fig1 = plt.figure(figsize=(12,8)) ...: axes = []; # Collect axis references in a list ...: for k in range(n_row * n_col): ...: axes.append( fig1.add_subplot(n_row ,n_col , k+1) )

Page 18: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

18

Visualization (By Example)

• The simple line plotIn [8]: axes[0].plot( XZ - 0.5, np.cos(1.5 * (XZ - 0.5) * \ ...: 2 * 3.14159), color='k') ...: axes[0].set_xlim([-0.6, 0.6]) ...: axes[0].set_title(r'$\cos\ 2\pi x$') ...: axes[0].set_xlabel('x') ...: axes[0].set_ylabel('y') ...: fig1Out[8]:

Page 19: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

19

Visualization (By Example)

• The histogram of normal samplesIn [9]: axes[1].hist( Z, bins=20, color='b') ...: axes[1].set_title('Some Histogram of 1000 samples from $N(0,1)$') ...: fig1Out[9]:

Page 20: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

20

Visualization (By Example)

• The scatter plots (with legend)In [10]: axes[2].scatter( XY[10:60], Y[:50] + XY[10:60]*0.05, marker='o', \ ...: c=(0.5, 0.5, 0.5), label='Population 1') ...: axes[2].scatter( XY[40:90], Y[50:] + XY[40:90]*0.05, marker='x', \ ...: c='#FF00A0', label='Population 2') ...: axes[2].legend(loc = 'best') ...: fig1Out[10]:

Page 21: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

21

Visualization (By Example)

• The big mess of normal CDFsIn [11]: axes[3].set_title(r'A Colourful Mess of $N(\mu_k, 1)$ CDFs') ...: col = ['r', 'g', 'b', 'c', 'm', 'y', 'k'] # w for white ...: line = ['solid', 'dashed', 'dashdot', 'dotted'] ...: linestyle = [(c,l) for c in col for l in line] ...: for k in range(40): ...: ls_idx = k % len(linestyle) ...: axes[3].plot( Z+(k-20)*0.5, XZ, color = linestyle[ls_idx][0], \ ...: linestyle=linestyle[ls_idx][1]) ...: axes[3].text(-10, 0.45, 'Nothing to See Here.', fontsize=20) ...: fig1Out[11]:

Page 22: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

22

Visualization (By Example)

Page 23: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

23

Example: Ordinary Least Squares

• The standard linear model– where – The Maximum Likelihood Estimator is the Least Squares

SolutionIn [12]: import math ...: import numpy as np ...: import matplotlib.pyplot as plt ...: import scipy.stats ...: # Load data (Choose between data sets with... ...: # 100, 200, 500, 1000, 5000, and 10000 samples) ...: data = np.load('OLS_data_500.npz') ...: X = data['X'] ...: Y = data['Y'] ...: true_beta = data['true_beta'] # Yeah... These are provided ...: true_error_std = data['true_error_std'] ...: data.close()

Page 24: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

24

Example: Ordinary Least Squares

In [13]: if X.shape[0] != Y.shape[0]: # Always good to check for errors ...: raise ValueError("Data dimension mismatch.") ...: num_pred = X.shape[1] ...: num_samples = X.shape[0]

In [14]: # Descriptive Statistics ...: plt_side = math.ceil(math.sqrt(num_pred + 1)) ...: fig = plt.figure(figsize = (12,12)); ...: for k in range(num_pred + 1): ...: ax = fig.add_subplot(plt_side, plt_side, k+1) ...: if k == 0: ...: ax.hist(Y, bins=20) ...: ax.set_title( "Dependent Variable ($\\mu={mean:.3}$, $\\sigma={std:.3}$)".format( \ ...: mean = np.mean(Y), std = np.std(Y)) )

...: else: ...: ax.hist(X[:, k-1], bins=20) ...: ax.set_title( "Predictor {pred} ($\\mu={mean:.3}$, $\\sigma={std:.3}$)".format( \ ...: pred = k, mean = np.mean(X[:, k-1]), std = np.std(X[:, k-1])) )

Page 25: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

25

Example: Ordinary Least Squares

Page 26: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

26

Example: Ordinary Least Squares

In [15]: # OLS ...: XtX = np.dot(X.T, X) ...: XtY = np.dot(X.T, Y) ...: beta_est = np.linalg.solve(XtX, XtY) ...: s_sq = np.var(Y - np.dot(X, beta_est)) ...: ...: print "True Betas", true_beta.T ...: print "Estimated Betas", beta_est.T ...: print ...: ...: print "True Error Variance: ", true_error_std ** 2 ...: print "Estimated Error Variance: ", s_sqTrue Betas [[ 1. 1. 1. 1. 1. 0.]]Estimated Betas [[ 1.2389584 0.98711812 0.90258976 0.86002269 0.99770273 0.01079312]]

True Error Variance: 4Estimated Error Variance: 3.85523406528

Page 27: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

27

Example: Ordinary Least Squares

In [16]: # Hypothesis Testing ...: standard_errors = np.reshape(np.sqrt(np.diag( \ ...: s_sq * np.linalg.solve(XtX, np.eye(num_pred)) )), (num_pred,1)) ...: print "Standard Errors: ", standard_errors.T ...: t_statistics = beta_est / standard_errors ...: print "t-Statistics: ", t_statistics.T ...: df = num_samples - num_pred ...: p_values = 2 * scipy.stats.t.cdf(-abs(t_statistics), df).T ...: print "p-values (2-sided): ", p_valuesStandard Errors: [[ 0.26819354 0.05023788 0.16891535 0.09578419 0.09744182 0.06468454]]t-Statistics: [[ 4.61964296 19.6488801 5.34344442 8.97875377 10.23895845 0.16685784]]p-values (2-sided): [[ 4.91038992e-06 6.11554987e-64 1.39498275e-07 5.75446104e-18 1.92610049e-22 8.67550186e-01]]

So let’s do it again!

Page 28: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

28

Example: Ordinary Least Squares

In [15]: # OLS Again ...: X = X[:,:-1]; num_pred = X.shape[1]; num_samples = X.shape[0]...: XtX = np.dot(X.T, X) ...: XtY = np.dot(X.T, Y) ...: beta_est = np.linalg.solve(XtX, XtY) ...: s_sq = np.var(Y - np.dot(X, beta_est)) ...: print "Estimated Betas", beta_est.T ...: standard_errors = np.reshape(np.sqrt(np.diag( \ ...: s_sq * np.linalg.solve(XtX, np.eye(num_pred)) )), (num_pred,1)) ...: print "Standard Errors: ", standard_errors.T ...: t_statistics = beta_est / standard_errors ...: print "t-Statistics: ", t_statistics.T ...: df = num_samples - num_pred ...: p_values = 2 * scipy.stats.t.cdf(-abs(t_statistics), df).T ...: print "p-values (2-sided): ", p_values Estimated Betas [[ 1.25027031 0.98744205 0.90124032 0.86084415 0.9975555 ]]Standard Errors: [[ 0.25949091 0.05020176 0.16872633 0.09566025 0.09744054]]t-Statistics: [[ 4.81816615 19.6694721 5.34143267 8.99897405 10.23758223]]p-values (2-sided): [[ 1.92950811e-06 4.54116320e-64 1.40853414e-07 4.88551777e-18 1.93166121e-22]]

Page 29: Introduction to Python Session 2: Beginning Numerical Python and Visualization Jeremy Chen

新加坡国立大学商学院

29