83
© 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. [email protected]

© 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. [email protected]

Embed Size (px)

Citation preview

Page 1: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

© 2

005

The

Mat

hWor

ks, I

nc.

Handling Large Data Sets Efficiently in MATLAB®

Stuart McGarrity

The MathWorks, Inc.

[email protected]

Page 2: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

2

Handling Large Data Sets is Like…

Page 3: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

3

Agenda

Problems in handling large data sets Strategies for handling large data set Maximizing available memory on your system Minimizing required memory in MATLAB®

Page 4: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

4

Problems in Handling Large Data Sets

Page 5: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

5

What are Large Data Sets?

Data in MATLAB® represent physical quantities

Large Data– Lots of quantities– Varying by time, space– High resolution– Example: 5-10 TB data

per flight test Trends: Devices, Computers,

RAM, Hard drives

Page 6: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

6

What are Large Data Handling Problems?

Running out of memory– Large data sets need lots of memory to store and process– Computers have finite memory– Data set size > available memory: “Out of memory” errors

Slowness – Large data sets need lots of operations to process, and

access– Today's CPUs have limited speed– Slowness due to page file use

Page 7: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

7

Causes of Out of Memory Error

Required memory > available memory– Memory constraints on (32-bit) computer system– Memory usage characteristics of MATLAB

Lack of understanding of memory constraints or requirements– >>A=rand(6e3,6e3);B=svd(A);” Why out of

memory?”– “I have 1 GB file but I have 3 GB of RAM. Why out of

memory?” Mistakes

– >>a=rand(10000); % need 800MB storage

Page 8: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

8

Example from CSSM: MATLAB Memory Limitation Problem!!!!!!!!!!!"erican" <[email protected]> wrote in message news:<ijkq1gh4vqd8@legacy>...

It seems Matlab has a bad memory management. I got a new PC with 4G physical memory and hoped to free my worry about memory usage.However, I found that I can only use about 1G memory for several big matrices. If I continue to create small matrix, I can use about 1.5G memory. Although I tried to change the system performance so that the swap space was set to the maximum, it still doesn't work. I want to maximize the usage of my 4G memory, at least to 2G. Any suggestions?

Thanks!

Page 9: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

9

MATLAB Users’ Data Set Sizes

25% of MATLAB users have data set sizes > 100 MB

Page 10: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

10

MATLAB Users’ File Sizes

30% of MATLAB users access files > 100 MB

Page 11: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

11

Strategies for Handling Large Data Sets

Page 12: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

12

Strategies

Ensure available memory > required memory Maximizing available memory on your system

– System configuration Minimize required memory in MATLAB

– During access, storage, processing, plotting

Page 13: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

13

Two Tactics Not Focused on Today

Use 64-bit– Removes one key memory constraint allowing processes

to address many tera bytes of data. Distributed computing

– N Machines ~= N x Memory– Subset of all applications (data parallel)

Page 14: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

14

Maximizing Available Memory on Your System

Page 15: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

15

What’s the biggest array in MATLAB under Windows XP?

>>a=zeros(?,1);>>whos

a) 600MBb) 1GBc) 1.5GB

Page 16: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

16

Memory Constraints Causing Out of Memory Errors

DataDatazeros(1e9,1)zeros(1e9,1)

New Data New Data RequirementsRequirements

ContiguousContiguousFree BlockFree Block}}

Other FragmentsOther Fragments

MATLABMATLABWorkspaceWorkspace

Other ML VariablesOther ML Variables

WorkspaceWorkspace

ML ML footprint, footprint, Win DLLsWin DLLs

}}

MATLABMATLABProcess virtual Process virtual

memorymemory

Limit OS dependantLimit OS dependante.g. 2GB in Win2ke.g. 2GB in Win2k

Other AppsOther Apps

MATLABMATLABProcessProcessVirtual Virtual memorymemory

All ApplicationsAll Applicationsmemorymemory

requirementrequirement

}} RAMRAM

Page FilePage File

}}

Total SystemTotal SystemMemory Memory

Page 17: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

17

Total System Memory

DataDatarand(1e8,1)rand(1e8,1)

New Data New Data RequirementsRequirements

ContiguousContiguousFree BlockFree Block}}

Other FragmentsOther Fragments

MATLABMATLABWorkspaceWorkspace

Other ML VariablesOther ML Variables

WorkspaceWorkspace

ML ML footprint, footprint, Win DLLsWin DLLs

}}

MATLABMATLABProcess virtual Process virtual

memorymemory

Limit OS dependantLimit OS dependante.g. 2GB in Win2ke.g. 2GB in Win2k

Other AppsOther Apps

MATLABMATLABProcessProcessVirtual Virtual memorymemory

All ApplicationsAll Applicationsmemorymemory

RequirementRequirement

}} RAMRAM

Page FilePage File

}}

Total SystemTotal SystemMemory Memory

Page 18: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

18

Total System Memory Available

Storage for all processes =

– Physical RAM (fast and expensive)+

– Page file on disk (cheap and slow)

Memory Management Guide Tech note 1106 Amount of RAM affects performance; not direct cause of

“out of memory” errors

Page 19: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

19

Viewing Total System Memory Available and Usage: Task Manager Alt-Ctrl-Del Right-click task bar Physical, commit charge Process Explorer (Google

“process explorer”)

Page 20: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

20

Maximizing Total System Memory

Size– Ensure non-zero or system managed page file– Max 4 GB

Performance– Add RAM– Max 4 GB

Page 21: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

21

The MATLAB Process’s Virtual Memory

DataDatarand(1e8,1)rand(1e8,1)

New Data New Data RequirementsRequirements

ContiguousContiguousFree BlockFree Block}}

Other FragmentsOther Fragments

MATLABMATLABWorkspaceWorkspace

Other ML VariablesOther ML Variables

WorkspaceWorkspace

ML ML footprint, footprint, Win DLLsWin DLLs

}}

MATLABMATLABProcess virtual Process virtual

memorymemory

Limit OS dependantLimit OS dependante.g. 2GB in Win2ke.g. 2GB in Win2k

Other AppsOther Apps

MATLABMATLABProcessProcessVirtual Virtual memorymemory

All ApplicationsAll Applicationsmemorymemory

RequirementRequirement

}} RAMRAM

Page FilePage File

}}

Total SystemTotal SystemMemory Memory

Page 22: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

22

It is Limited and OS-Dependent

32-bit platforms– Windows 2000 and XP (by default): 2 GB– Linux/UNIX/MAC system configurable: 3-4 GB– Windows XP with /3gb boot.ini switch: 3 GB

64-bit platforms– Linux: 8TB (not all 64 bits used)

Page 23: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

23

Maximizing The MATLAB Process’s Virtual Memory Choose OS with largest process memory (in order):

– 64-bit Linux, (future Win64)– 32-bit UNIX/Linux/MAC– Windows XP with /3gb– Window 2000, Windows XP (by default)

Page 24: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

24

Checking The Virtual Memory Limit

>>system_dependent memstats UNDOCUME

NTED

Page 25: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

25

Increasing Process Limit on XP to 3G: 3GB Switch

Right-click Properties> Advanced > Startup and Recovery > Edit

Make copy of [Operating system line], change comment and add /3gb

Reboot, select new OS option, check memstats

Page 26: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

26

Page 27: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

27

Page 28: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

28

Page 29: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

29

Page 30: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

30

Page 31: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

31

Page 32: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com
Page 33: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

33

Page 34: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

34

Page 35: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

35

Page 36: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

36

Page 37: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

37

Page 38: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

38

The MATLAB Process’s Virtual Memory Limit with 3GB Switch>>system_dependent memstats UNDO

CUMENTED

Page 39: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

39

Workspace Size and Largest Block

DataDatarand(1e8,1)rand(1e8,1)

New Data New Data RequirementsRequirements

ContiguousContiguousFree BlockFree Block}}

Other FragmentsOther Fragments

MATLABMATLABWorkspaceWorkspace

Other ML VariablesOther ML Variables

WorkspaceWorkspace

ML ML footprint, footprint, Win DLLsWin DLLs

}}

MATLABMATLABProcess virtual Process virtual

memorymemory

Limit OS dependantLimit OS dependante.g. 2GB in Win2ke.g. 2GB in Win2k

Other AppsOther Apps

MATLABMATLABProcessProcessVirtual Virtual memorymemory

All ApplicationsAll Applicationsmemorymemory

RequirementRequirement

}} RAMRAM

Page FilePage File

}}

Total SystemTotal SystemMemory Memory

Page 40: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

40

Workspace Size and Largest Block

Workspace Size = 2/3GB minus:– System DLLs– Java – MATLAB.exe and DLLs

Largest block (for numerical arrays)– Fragmentation (Mainly on Windows)– Third party DLLs (e.g., Google Desktop, Fineprint)– Windows security updates

Page 41: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

41

Finding Size of Workspace and Largest Block>>system_dependent memstats

Workspace size– 1.7 or 2.7 GB

Largest block– Goal 1.5 GB– Diagnose if less

Atlantis

UNDOCUME

NTED

Page 42: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

42

Diagnosing Memory (Workspace) Fragmentation>>system_dependent dumpmem Shows MATLAB memory map and DLLs Shows where largest blocks starts Reveals causing fragmentation

UNDOCUME

NTED

Page 43: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

43

Example: Print Driver and New System DLLs Third party DLLs (e.g. Fineprint) Windows Security Updates and service pack DLLs

– Use movedlls.exe– www.mathworks.com/support (Solution Number: 1-1HE4G5)

Page 44: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

44

Maximizing Largest Contiguous Block

Uninstall third-party tools if issue Use XP SP2 and movedlls fix utility Other techniques

– Pack (save and load): Useful when using lots of variables after working for a while

– Restart MATLAB

Page 45: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

45

Final Memory Available on XP

32-bit Windows XP default– 1.7 GB total, 1.5 GB contiguous

32-bit Windows XP with /3gb switch– 2.7 GB Total, 1.5 GB contiguous

Rough guide what is possible– Can process 100s MB arrays with simple operations– Can process 10s MB arrays with complex operations,

many operations, lines of codes

Page 46: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

46

What’s the biggest real array in MATLAB under 64-bit Linux?

>>a=zeros(?,1);>>whos

a) 2 GBb) 16 GBc) 8 TB

Page 47: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

47

64-bit Linux MATLAB

Workspace size limit: TB’s Single array number of elements size limit 2e9 elements (2^31-2), mxarray limitation

Page 48: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

48

Summary: Maximizing Available Memory

Maximize total system memory and performance– Use non-zero page file and have lots RAM

Minimize the system memory requirements– Close other applications

Maximize MATLAB process’s virtual memory– Use best OS or configure

Maximize largest contiguous block– Diagnose fragmentation

Page 49: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

49

Minimizing Required Memory in MATLAB

Page 50: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

50

Minimizing Memory Requirements

Data access Data storage Processing Plotting

Page 51: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

51

Application Problem Description

Semiconductor wafer thickness test data– Six months of production, millions of wafers

Problem– What percentage of the wafers manufactured last month

meet thickness specifications? Thickness data

– Large text waferdata.csv– Nine position plus other information– Try import– View contents waferdata_start.csv

Page 52: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

52

Data Access

Page 53: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

53

Take Only What You Need

Take only what you need for the calculation– Not usually problem with databases– Common problem with big flat files

Consider block processing– Independent blocks– State saved (e.g., filtering)

Tip: Clear variable first

Page 54: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

54

For Text Files Use textscan

Take rows and columns you need For example: data = textscan(fid, format, N, ‘delimiter’,’,’)– Select columns with format string– Select rows with N

Returns cells for each data type in format string. You need to convert to doubles.

Only read in nine columns and one month (1e6 rows)

Exercise 5

Page 55: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

55

Binary Files: Try memmapfile

Map a section of a binary file into memory Benefits

– Faster than fread and fwrite– Access files with MATLAB indexing operations

Other– Random access to sections– Can have multiple views– Take from MATLAB memory

Example– Simple homogenous file – Mixed data types and access as arrays

Page 56: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

56

Memory Mapping Example

%% Default, map whole file as uint8m=memmapfile('waferdata_uint8.bin')m.Data(1:20);

%% Specify format and namem=memmapfile('waferdata_uint8.bin','format',{'uint8' [20 100] 'x'},'repeat',20*1000)

A=m.Data;

%% Change format on the flym.format={'uint8' [1 4] 'headerbits';... 'uint8' [4 9] 'middle';... 'uint8' [7 1] 'tail'};A=m.Data;

Page 57: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

57

Data Storage

Page 58: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

58

Use Smallest Data Type

Depends on intended actions Complicated Math (e.g., Linear Algebra)

– Doubles or singles, 8 or 4 bytes– For example: a=single(7)

Simple arithmetic and original data is integers– Integers,1-4 bytes, for example e.g. a=int8(7)– Can be faster than doubles– Try with waferdata (Exercise 6)

Sparse– Just non-zero values and index stored>>a = sparse(2e9,1,pi);

Exercise 6a cell execute

Page 59: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

59

Use Smallest Data Type (cont.)

Categories, dates– Cell arrays of strings, 60 byte header

for each element– Use sparingly

Contiguousness– Numeric arrays must be contiguous– Cell arrays and structures do not

Comparison with C– For numerical processing, similar

choice of data types

Page 60: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

60

Load and Store as uint8

Read in uint8– Note: when preallocating need to specify data types

Exercise 6

Page 61: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

61

Processing

Page 62: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

62

Memory for Processing

Calculate only the results you need MATLAB operators and functions need extra memory

storage– Passing data to functions by value and assignments– Copy on write

Makes MATLAB safer, easier to debug – More memory than in-place operations using pointers

Page 63: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

63

Monitoring MATLAB Memory Usage

Process Explorer or Task Manager

MATLAB Monitoring Tool: www.MATLABcentral.com

MATLAB Central > File Exchange > Utilities > Development Environment > MATLAB Monitoring Tool

Example:– x=rand(10e6,1);– y=x; y(2)=1.5;

UNDOCUME

NTED

Page 64: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

64

Monitoring MATLAB Memory Usage (cont.) Total memory usage: workspaces, graphics>>system_dependent(‘CheckMallocMemoryUsage’);ans = 6820440 Show results in bytes (not MBytes) Starts with a few MB To use it, set an environmental variable

– MATLAB_MEM_MGR debug– In DOS window, C:\ setx MATLAB_MEM_MGR debug

UNDOCUME

NTED

Page 65: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

65

How much temporary memory does data=data+1 require, where data is a 1 MB array take in an M-File?

Run M-file containing:data=zeros(1e6,1,’int8’); data=data+1;

a) 0 MBb) 1 MBc) 2 MB

Page 66: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

66

JIT Vs Interpreter for Large Data Set Handling Run loaddata Offset in the thicknesses data needs to be corrected

with data = data+5– At command line– In M-File

Run Loaddata, then exercise 7a

UNDOCUME

NTED

Page 67: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

67

Minimizing Copies and Temporaries

Share data rather than pass to functions– Nested Functions – Global

Reduce size of array to scalars or smaller blocks– Memory copies are equal to size of array– Process with for loops, de-vectorize

Can be slower, so must trade off speed for less memory

Page 68: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

68

What is the fastest way to process MATLAB matrices with for loops?

a) Down the columnsb) Along the rowsc) Doesn't matter

Exercise 7

UNDOCUME

NTED

Page 69: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

69

Example: De-Vectorizing

1D 2D

Exercise 7

Page 70: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

70

Find Percentage of Wafers Meeting Specification What percentage of the wafers meet specification

– Must be < Maximum thickness of 200– Must be > Minimum thickness of 70

exercise 8a, 8c

Page 71: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

71

Plotting

Page 72: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

72

How much extra memory do you need to plot a 10 MB double array?

>>x=rand(125e4,1);>>plot(x);

a) 10 MBb) 20 MBc) 40 MB

Exercise 9a, process explorer on MATLAB task

Page 73: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

73

Memory Copies When Plotting

Need extra memory to plot– For example: >>plot(data(:,1));

plot(x,y)– Copy of x, y to xdata and ydata properties of line

plot(y)– Copy of y in ydata and indices in xdata, 1:size(y)

Results– Memory requirements triple at least (more

temporarily)– Integers more (stored as doubles)

Page 74: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

74

Minimize Copies: Plot Only What You Need Limited resolution of screen/human eye Sub-select, down-sample

– Plot every Nth element– Plot min/max in each block

Page 75: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

75

Minimize Memory Requirements: Review

Accessing data– Take only what you need– Try processing in blocks

Data Storage– Store in smallest data type

Processing– Reduce temporaries with loops, blocking, nested, globals

Plotting– Plot what you need

Page 76: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

76

Summary: Strategies For Handling Large Data Sets Ensure available memory > required memory 64-bit and distributed computing 32-bit single CPU

– Maximize available memory– Minimize required memory

Page 77: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

77

Appendix

Page 78: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

78

Appendix Contents

Resources 1 MB is Not Equal to Million Bytes Setting the Paging File Size File Size vs. Data Size Minimizing Other Applications’ Memory

Page 79: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

79

Resources

Technical note 1106– www.mathworks.com/support, enter 1106

“Large Data Set Handling in MATLAB 7” Digest Article November 2004: www.mathworks.com/company/newsletters/digest/nov04/newfeatures.html

Page 80: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

80

1 MB is Not Equal to Million Bytes

1 KB= 2^10 bytes = 1024 bytes– Not 1e3 bytes

1 MB=2^20 bytes = 1048576 bytes– Not 1e6 bytes

1 GB=2^30 bytes = 1073741824 bytes– Not 1e9 bytes

Page 81: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

81

Setting the Paging File Size

My Computer > Properties > Advanced > Performance > Advanced > Virtual memory, Change then press “Set”

Set to System Managed Size or Custom

Slow down with page file, swapping

Page 82: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

82

File Size vs. Data Size

Binary files: File size = data size– Assuming you use same data type as in the binary files

Text files: File size ~= data size, for example:– Data element: double= 8 bytes– Text file element, format 1: 3.4, 4 chars = 4 bytes– Text file element, format 2: 3.47896e+001, 13 chars= 13

bytes

Page 83: © 2005 The MathWorks, Inc. Handling Large Data Sets Efficiently in MATLAB ® Stuart McGarrity The MathWorks, Inc. stuart.mcgarrity@mathworks.com

83

Minimizing Other Applications’ Memory

Shut them down Restart Computer, if can’t

get < 300 MB Use windows msconfig

to avoid starting up applications

Not too important if you have a page file