Matlab and MySQL

Embed Size (px)

Citation preview

  • 8/11/2019 Matlab and MySQL

    1/17

    A Powerful Sidekick: Using MySQL for High-Volume Data Manipulation in Matlab by Dimitri Shvorob 1

    1. Introduction.

    A continuing poll on WRDS Forum asks visitors to identify the statistical software they most commonly use. SASand Matlab take top spots in the league table, but SASs edge is overwhelming: 73% vs. 9%. Broad selection ofready-to-use statistical routines, comprising SAS/STAT, surely plays a big part in explaining the softwares appeal.However, one thinks that for many researchers, choice of SAS stems from its superior facility with tasks ancillary toanalysis, namely data retrieval and manipulation (subsetting, sorting, reshaping, etc.). Accomplishing these tasks inMatlab is much less convenient.

    o Matlab cannot handle large datasets, routinely processed in SASo Matlab has no remote-access capabilities similar to those of SAS/CONNECTo Matlab has no adequate analog to SASs SQL procedure

    Unlike SAS, which leverages available memory resources with continual disk read/writes, Matlab relies on memoryexclusively and cannot create, load or save any volume of data exceeding its limits. Although this might not be a

    problem in most areas of Matlabs application, for WRDS users who routinely manage datasets with hundreds ofthousands, or millions, of records, out-of-memory errors are an all-too-familiar occurrence, and switching to SASthe all-too-natural recourse 2.

    Regarding the second claim, it suffices to point out that a SAS user can define a directory on WRDS server as aremote library 3, and access SAS datasets located in the directory in the same way as he/she would access a dataseton ones own PC. With Matlab, accessing a MAT file on a different computer is an arduous, if not impossible, task.

    Finally, WRDS users frequently perform two types of tasks: (1) match-merging records located in the same ordifferent datasets, and (2) computing summary statistics for groups of observations. Both tasks are easilyaccomplished in SAS using PROC SQL 4, but require non-trivial programming effort in Matlab, and often producelooped and relatively slow code.

    This brief report encourages Matlab users to explore a technique that goes a long way towards resolving the above-mentioned problems 5. At the core of the proposed approach is use of MySQL Server database management systemas (1) a high-capacity data repository accessible to Matlab, and SAS, and (2) a full-fledged SQL processor that can

    be controlled from Matlab. In essence, we recommend the following course of action when working with WRDSdata.

    o Retrieve data from WRDS using SASo Transfer data to MySQL, converting the SAS dataset into a MySQL tableo Manipulate data within MySQL, submitting SQL commands from Matlabo Retrieve selected data to Matlab workspaceo If needed, save data in Matlab workspace to a MySQL database

    While SAS continues to be needed, its role is limited to getting data from WRDS and passing them to MySQL - allin a single step - after which one can work solely with Matlab. Matlabs memory constraint is not eliminated, but

    1This is the first draft of this report, completed in August 2006, during an internship at WRDS. (The idea of usingMySQL in tandem with Matlab was suggested to the author by Michael Boldin). I would like to claim responsibilityfor any errors, and welcome your comments at [email protected] .2However, consult this insightful Mathworks presentation for ways to expand memory resources available to Matlab.3See section PC SAS/Connect - Remote Library Services of this guide for more details.4Summary statistics can also be calculated with PROC MEANS, of course.5The utility functions discussed in section 3 were designed and tested with MySQL 5.0, mym 1.0.8, and Matlab 7.Matlab 6 users may be able to access MySQL through the basic, limited interface of mym. m .

    1

  • 8/11/2019 Matlab and MySQL

    2/17

  • 8/11/2019 Matlab and MySQL

    3/17

    2.1. Installing MySQL.

    Download and run the installer module of MySQL Server 5.0 (Windows Essentials package), selecting TypicalInstall in Setup type screen, skipping sign-up in MySQL.com Sign Up screen,

    and marking checkbox Configure the MySQL Server now in Wizard completed screen. Accept default choices inConfiguration type and Windows options screens, and select a password, protecting

    access to MySQL databases, in Security options screen. (Write the password down, as it will be needed each timeyou access MySQL, whether from Matlab or SAS). Complete installation of MySQL by pressing Execute buttonin Execute configuration screen.

    3

  • 8/11/2019 Matlab and MySQL

    4/17

    2.2. Connecting MySQL and SAS.

    ODBC , or open database connectivity, is a Windows/Unix technology allowing data exchange between a widerange of data management systems, including MySQL and SAS. Since neither software package comes with ODBCcapability pre-set, one needs to install ODBC plug-ins for MySQL and SAS, and configure ODBC data sourcesassociated with each application, so that the two can be recognized and linked by Windows.

    Download and run the installer module of MySQL ODBC driver, selecting Typical Install in Setup Type screen.

    Download and run the installer module of SAS ODBC driver, accepting default settings, and mark Administer datasources now checkbox in Finish screen.

    Press Finish to have the installer open ODBC Data Source Administrator system window 7. Tab User DSN isactive, and displays registered ODBC data sources. To add a SAS data source, press Add, select SAS from the listof available data sources, and click Finish.

    Back in the window of SAS ODBC driver installer, with General tab active, switch to Servers tab, enter anarbitrary name in field Name of Server settings panel, and press Configure.

    7The window can be accessed through Windows Control panel, by navigating to Administrative tools section andclicking on Data Sources (ODBC) icon.

    4

  • 8/11/2019 Matlab and MySQL

    5/17

  • 8/11/2019 Matlab and MySQL

    6/17

    Repeating the initial steps of registering a SAS data source, press Add button in User DSN tab, select MySQLODBC 3.51 Driver from the list, and press Finish. When MySQL Connector/ODBC configuration screen appears,with Login tab active,

    o Enter root in field User, and the previously chosen password in field Passwordo Select test from the drop-down list in field Databaseo Assign an arbitrary name to the MySQL data source, by entering it in field Data Source Name

    (Since later you may want to set up and make accessible to SAS additional databases - consider having a databasewith Compustat data, another with CRSP data, etc. - it is expedient to include the name of the target database intothe data source name, to avoid confusion in the future. If you plan to access MySQL databases located on a differentcomputer, e.g. a department or university server, you might also want to distinguish them from those residing onyour personal computer, for instance by adding a local or remote keyword to a data source name).

    2.3. Connecting MySQL and Matlab.

    Matlab functions enabling read/write access to MySQL databases constitute the last, front-end component of the proposed scheme. These include mym. m , Yannick Marets extension of Robert Almgrens mysql . m , and a set ofutilities based on mym. m , written by the author.

    Download and run mym. m installer, renaming file mym. mexw32 to mym. dl l for a release of Matlab 7 older than 7.1.

    Download and open the archive containing mym. m utilities, listed in Table 1 .

    Add locations of downloaded m-files to Matlabs path, as shown below.

    6

  • 8/11/2019 Matlab and MySQL

    7/17

    Table 1. Accessing MySQL from Matlab: available functions.

    Function Purpose Example

    mycheck Check MySQL connection mycheck

    myopen Connect to MySQL myopen( l ocal host , root , appl e )myopen( wr ds. whart on. upenn. edu , j smi t h , pear )

    mycl ose Disconnect from MySQL mycl ose

    dbl i s t List available databases al l _ dbs = dbl i s t

    dbcurr Show current database cur r _db = dbcur r

    dbadd Create a database dbadd( crsp )dbadd( pr oj ect1 )

    dbopen Open a database dbopen( pr oj ect1 )

    dbdr op Delete a database 8 dbdr op( j unkdb )

    t bl i st List database tables t bl i s t ( pr oj ect 1 )

    t badd Create a table t badd( mytest , { name , dob , age }, { var char( 30) , dat e , doubl e })

    t bdr op Delete a table t bdrop( j unktb )

    t br ename Rename a table t br ename( mytest , t est )

    t b a t t r List column names and types [names , t ypes ] = tbat t r ( t e s t )names = t bat t r ( crsp. dsf )

    t bs i ze Show tables size [ r ows , c ol s ] = t bs i z e( t e st )col s = t bs i z e( t es t , 2)

    t bread Read from a tablegl obal name dob agevecs = { name , dob , age };col s = vecs;tb read( t e s t , vecs, col s )

    t bwr i t e Write to a tablegl obal name dob agename = { J ohn }; dob = { 1- J an- 2000 }; age = NaN;vecs = { name , dob , age };tbwr i t e ( t e s t , vecs)

    mym Submit an SQL command

    mym( cr eat e t abl e t est ( name varchar ( 30), dob dat e, agedoubl e) )mym( i nser t i nt o t es t val ues ( J ohn , 1- J an- 2000 ,NULL) )[ name, dob, age] = mym( sel ect * f r omt est )

    8 Never delete system databases mys ql and i nf or mat i on_sc hema , or any of their tables.

    7

  • 8/11/2019 Matlab and MySQL

    8/17

    3. Test drive.

    Having completed the steps above, you can test the Matlab/MySQL connection by opening Matlab and entering

    myopen( l ocal host , r oot , mypwd )

    with mypwd replaced by your MySQL password. mym. m will try to connect to MySQL, and display the followingmessage if it succeeds.

    mYmv1. 0. 8, Copyri ght ( C) 2006, Swi ss Federal I nst i t ut e of t echnol ogy, Lausanne, CHmYm comes wi t h ABSOLUTELY NO WARRANTY. Thi s i s f r ee sof t war e,and you are wel come to r edi st r i but e i t under cert ai n condi t i ons.For det ai l s read t he GPL l i cense i ncl uded wi t h t hi s di st r i but i on.

    To check that MySQL is accessible to SAS - recall that by setting up a single MySQL data source, namedmysql _t est , we have granted SAS access only to database t est - open SAS and submit

    l i bname dbtest ODBC dsn = mysql _t est user = r oot passwor d = mypwd;

    again replacing mypwd with the actual password. SAS will attempt an ODBC connection to t est , and report theoutcome in session log.

    NOTE: Li bref DBTEST was successf ul l y assi gned as f ol l ows:Engi ne: ODBCPhysi cal Name: mysql _t est

    At this point, you can switch to SAS Explorer window, and find dbt est in the list of the sessions libraries.

    The library is empty, as database test contains no tables. In the remainder of this section, we will fill the librarywith a dataset retrieved from WRDS and access it from Matlab, in the context of a simple exercise: counting howmany firms from each SIC industry are found in the Compustat Industrial Annual file in each of the most recent fiveyears 9.

    We establish a remote connection to WRDS server

    %l et r ol and = wr ds. whart on. upenn. edu 4016;opt i ons pagesi ze = max comami d = TCP r emot e = wr ds;si gnon username = _pr ompt _;l i bname comp r emot e / wr ds/ compust at / sasdat a ser ver = wr ds;

    and select relevant data directly into a table in t e s t .

    9Compustat cognoscenti will take issue with variable yeara , fiscal year, being confused with the calendar year. Weuse it as a shortcut.

    8

  • 8/11/2019 Matlab and MySQL

    9/17

    dat a mysql . exampl e;

    set comp. compann ( wher e = ( year a > 2000) ) ;i f dat a6 > 0; / * posi t i ve t ot al asset s requi red */keep yeara dnum gvkey;r un;

    As SAS log indicates, variable labels and formats - features that are specific to SAS - are lost in transition toMySQL 10.

    NOTE: SAS var i abl e l abel s, f ormats , and l engths ar e not wr i t t en t o DBMS t abl es.NOTE: Ther e wer e 43404 obser vat i ons r ead f r om t he dat a set COMP. COMPANN.NOTE: The data set DBTEST. EXAMPLE has 43404 obser vat i ons and 3 var i abl es.

    (Another way in which ODBC libraries differ from native SAS libraries is that datasets (i.e. tables) located in themhave overwrite protection, and need to be deleted before a datasets new version is created 11 . Third and mostimportant difference is in the speed with which SAS reads from, and especially writes to, MySQL tables. As TableA1 of Appendix I demonstrates, ODBC-channeled read/writes are significantly slower than SASs operations withnative datasets).

    Switching to Matlab, we open database t e s t

    dbopen( t est )

    and verify that table exampl e is visible to Matlab,

    t bl i st

    ans = exampl e

    and has the expected structure and size.

    [ names, t ypes] = t batt r ( exampl e )

    names = DNUM GVKEY year a

    t ypes = doubl e doubl e doubl e

    t bsi ze( exampl e )

    ans = 43404 3

    We load contents of exampl e into Matlab workspace, retaining variable name gvkey, but replacing dnum and yeara with the more intuitive i ndustr y and year .

    [ i ndust r y, gvkey, year ] = mym( sel ect * f r om exampl e )

    In this case, data fit into available memory, but had we worked with a (much) larger dataset and encountered an out-of-memory error, we might try to retrieve a subset of exampl e , with a statement like

    [ i ndust r y, year] = mym( sel ect i ndust r y, yeara f r omexampl e )or

    [ i ndust r y, gvkey, year] = mym( sel ect * f r omexampl e where yeara = 2005 )

    10See Appendix II, however. 11A database table accessible to SAS can be deleted with PROC DATASETS or via SAS Explorers graphicalinterface. In Matlab, the task can be accomplished with t bdr op .

    9

  • 8/11/2019 Matlab and MySQL

    10/17

  • 8/11/2019 Matlab and MySQL

    11/17

    Including variable gvkey in the working dataset, we had in mind the need to check for duplicate records, i.e. to makesure that one record corresponds to any given firm-year combination. The check can be coded like

    I = uni que( i ndus t r y ) ; ni = l engt h( I ) ; Y = uni que( year ) ; ny = l engt h( Y) ;

    f or i = 1: nif or j = 1: ny

    x = gvkey(i ndustr y == I ( i ) & year == Y( j ) ) ;y = uni que( x) ;i f l engt h( x) ~= l engt h( y)

    di sp( Dupl i cat e! )end

    endend

    but can be done in a simpler way with MySQL:

    x = mym( sel ect gvkey f r om exampl e gr oup by gvkey, yeara, dnumhavi ng count ( *) > 1 )

    x = Empt y mat r i x: 0- by- 1

    (Table exampl e is known not to contain any missing values of gvkey , yeara , or dnum , but if this were not the case -for example, if some values of yeara were missing - we would use i s not nul l condition in wher e or havi ng clause,

    x = mym( [ sel ect gvkey f r omexampl e where year a i s not nul l . . . gr oup by gvkey, year a, dnum havi ng count ( *) > 1 ) ] )

    to exclude unwanted cases).

    Likewise, to produce a table of firm counts, with element ( i , j ) giving the number of firms of industry i on recordin year j , we can use a pure Matlab approach

    N = zeros(n i , ny) ;f or i = 1: ni

    f or j = 1: ny

    N( i , j ) = sum( i ndus t r y == I ( i ) & year == Y( i ) ) ;endend

    or a mixed one:

    [ yr, i n, n] = mym( sel ect year a, dnum, count ( *) f r om exampl e gr oup by yeara, dnum ) ;N = zeros(n i , ny) ;f or i = 1: ni

    f or j = 1: nyx = n( i n == I ( i ) & yr == Y( j ) ) ;i f ~i sempt y(x)

    N( i , j ) = x;end

    endend

    On inspection, most of the code in the snippet above deals not with counting, but with reshaping the table of countsthat was produced by the query in the first line. This suggests yet another (unorthodox) approach: have the querysave its output to a table, and use SASs PROC TRANSPOSE to reshape it 14.

    Submitting the following line to Matlab,

    mym( cr eate t abl e t emp sel ect year a, dnum, count( *) f r om exampl e gr oup by year a, dnum )

    14N could be reshaped using l ong2wi de. m , available from Matlab File Exchange.

    11

  • 8/11/2019 Matlab and MySQL

    12/17

    we switch to SAS, verify that library dbt est now contains two datasets, exampl e and t emp - if Explorer windowfails to refresh the library view, click on a different library, then again on dbt est - and run

    proc t r ansposedat a = dbt est . t empout = dbt est. count s_sas ( dr op = dnum _name_ _l abel _) ;by dnum;i d year a;r un;

    then return to Matlab and retrieve contents of count s_s as , this time using function t br ead .

    gl obal NN = zeros(n i , ny) ;[ v ec s, col s ] = deal ( c el l ( 5, 1) ) ;vecs = st r c at ( N( : , , i nt 2st r ( ( 1: 5) ) , ) ) ;col s = cel l s t r ( s t r cat ( _ , i nt 2st r ( Y ) ) ) ;t bread( count s_sas , vecs , col s)

    Cell arrays vecs and col s contain the names of columns that are to be read from counts _sas - these are columns _2001 , .. , _2005, created by PROC TRANSPOSE - and of the Matlab arrays that are to store incoming values. To populate a five-column matrix, we fill vecs with values N( : , 1) , .. , N(: , 5) .

    After the last code fragment is executed in Matlab, we need to replace NaNs in the counts matrix N with zeros,

    N( i snan( N) ) = 0;

    which adds to the impression of the SAS-based approach as being more cumbersome than the rest. Our intention in presenting it was to demonstrate how data residing in a MySQL database can be nearly concurrently manipulatedwith Matlab and SAS, a capability that many WRDS users are likely to appreciate.

    It remains to show how to transfer variables in Matlab workspace to a MySQL database. We conclude this exercise by saving N, the matrix of firm counts, as table count s_ mat l ab of current database test . The operation requires twosteps: creating an empty table of specified structure 15 with function t badd

    [ c ol s , t ypes ] = deal ( c el l ( 5, 1) ) ;

    t ypes( : ) = { doubl e };col s = cel l st r ( st r cat ( N , i nt 2st r ( Y ) ) ) ; % Use col umn names N2001, N2002, et c.t badd( count s_mat l ab , col s, t ypes)

    and transferring Ns contents into counts _mat l ab with t bwr i t e .

    t bwr i t e( count s_mat l ab , vecs, col s)

    We shut down Matlabs connection to MySQL with

    mycl ose

    15MySQL data types are discussed in Chapter 11 of MySQL 5.0 Reference Manual. In practice, one can limitattention to types doubl e , date , and char( n) and var char( n) . (Argument n in the definition of character typeschar and varchar denotes the maximum allowed string length; to avoid having to resize a character-type tablecolumn with , choose a value known to be sufficiently large). Interested readers may wish to explore the possibilitiesoffered by MySQLs BLOB type (supported by mym. m) which allows saving a numeric array to a single cell of aMySQL table. (Consider saving data for various firms, or various sets of regression estimates, in distinct, indexedcells of a single MySQL table). We do not discuss BLOBs in this report, and refer to the helpful example in mym. mdocumentation.

    12

  • 8/11/2019 Matlab and MySQL

    13/17

    4. Summary.

    Matlabs inability to handle data volumes in excess of computers memory resources, or access data stored in SASssas7bdat file format, such as those available from WRDS server, has severely limited the softwares application byWRDS users, leading many to choose SAS as their primary programming tool. In this report, we suggest anapproach that exploits SASs edge at data retrieval, but breaks its hold on high-volume data manipulation.Operations that to this point could only be done in SAS are now possible, and can be executed with reasonableefficiency, in Matlab. At the same time, the proposed approach offers a previously unavailable robust high-capacityfacility for data transfer between Matlab and SAS, and thus serves a broader goal: enabling researchers familiar with

    both Matlab and SAS to use both packages in a single session, leveraging strengths of one with those of the other.

    13

  • 8/11/2019 Matlab and MySQL

    14/17

    Appendix I. Evaluating MySQLs performance.

    Advocating MySQL as a replacement for SAS, we have to disclose instances where its performance was found to bedisappointing. Table A1 reviews a sequence of timing tests in which a group of data-manipulation tasks - retrieval,subsetting, sorting, merging, etc. - was performed in both MySQL and SAS. The tests involved a one-million-rowsubset of the CRSP monthly stock file, and employed a PC with a 1.8 GHz Pentium M processor and 512 MB ofRAM, with a standard configuration 16 of MySQL 5.0, SAS 9.1 and Matlab 7, running under Windows XP.

    Table A1. Selected timing tests.

    Task Manipulating a SAS datasetwith SAS

    Manipulating a MySQL tablewith SAS

    Manipulating a MySQLtable with Matlab 17

    Retrieve datafromWRDS

    dat a sas. t est ;set crsp. msf( obs = 1000000);run;

    3:30

    data mysql . t est;set crs p. msf( obs = 1000000);run;

    20: 26

    ( Wi t h mysql . t est creat ed)gl obal cusi p per mno permco var s = { cusi p , permno , };tb read( t es t , var s )

    20: 26 + 7:55

    Describedata

    proc cont ent sdata = sas . t e s t ;run;

    0:00

    proc cont ent sdat a = mysql . t est ;run;

    1:57

    t b a t t r ( t e s t )t bs i z e( t es t )

    0: 02 + 1: 10

    Transfer data betweenMySQL and aSAS disklibrary

    ( SAS t o MySQL)pr oc copy

    i n = s asout = mysql ;sel ec t t e st ;run;

    12: 47

    ( MySQL t o SAS)pr oc copy

    i n = mysqlout = sas ;sel ec t t e st ;run;

    3:10

    ( Mat l ab wor kspace t o MySQL)gl obal cusi p per mno permco var s = { cusi p , permno , };t b wr i t e ( t e st , var s )

    15: 00

    Transfer datato SASWORKlibrary

    pr oc copyi n = s asout = work;sel ec t t e st ;run;

    1:59

    pr oc copyi n = mysqlout = work;sel ec t t e st ;run;

    3:14

    n/a

    Subset data:select rows

    dat a sas. sub1;set sas. t est

    (where = (vol = 0) ) ;run;

    1:03dat a sas. sub2;

    set sas. t est(where = (vol > 0) ) ;

    run;0:53

    dat a mysql . sub1;set mysql . t est

    (where = (vol = 0) ) ;run;

    3:10dat a mysql . sub2;

    set mysql . t est(where = (vol > 0) ) ;

    run;12: 03

    mym( cr eat e tabl e sub1sel ec t * f r omt es twher e vol = 0 )

    1:18mym( cr eat e tabl e sub2

    sel ec t * f r omt es twher e vol > 0 )

    5:28

    Concatenatesubsets of data

    dat a sas. t est ;set sas. sub1 sas. sub2;run;

    0:45

    data mysql . t est;set mysql . sub1 mysql . sub2;run;

    10: 29

    mym( i nser t i nto sub1 sel ect *f romsub2 )t brename( sub1 , t est )

    5:21

    Subset data:selectcolumns

    data sas. crop;set sas. t est( keep = per mno dat e) ;run;

    1:25

    dat a mysql . cr op;set mysql . t est( keep = per mno dat e) ;run;

    5:46

    mym( cr eat e t abl e cr opsel ect per mno, dat ef r o m t e s t )

    0:42

    16Wishing to boost MySQLs speed, we briefly experimented with environment variables key_buf f er _si ze andt abl e_cache , increasing their values from 8,388,608 to 64,000,000, and from 256 to 512, respectively,

    mym( set gl obal key_buf f er_ si ze = 64000000 )mym( set gl obal t abl e_cache = 512 )

    but saw no improvement in the speed of the test join. Consult Section 7.5.2 of MySQL 5.0 Reference Manual forinformation on MySQL server parameters.17We do not report results of exercises where SQL commands were submitted directly to MySQL, through MySQLCommand Client window, as these were essentially identical to those obtained with Matlab and mym. m .

    14

  • 8/11/2019 Matlab and MySQL

    15/17

    Sort data proc sortdat a = sas. t estout = sas . sor t ;by permno dat e;run;

    3:19

    proc sortdata = mysql . t estout = mysql . sort ;by per mno dat e;run;

    12: 51

    mym( cr eat e t abl e sortsel ec t * f r omt es torder by permno, dat e )

    1:50

    Computesummary

    statistics

    proc sql ;creat e tabl e sas. s tatas sel ect dat e, mean(r et)f r o m s as . t es tgroup by dat e;qui t ;

    1:19

    proc sql ;creat e tabl e mysql . st atas sel ect dat e, mean(r et)f rommysql . t estgroup by dat e;qui t ;

    3:24

    mym( cr eat e t abl e st atsel ect dat e, avg( ret)f r o m t e s tgr oup by permno, dat e )

    0:15

    Performa join

    proc sql ;creat e t abl e sas. j oi nas sel ect a. per mno, a. dat e,

    a. prc, b. prc as l prcf romsas . t e s t al ef t j oi n s as . t es t b

    on a. per mno = b. per mnoand a. dat e > b. dat eand a. dat e b. dat eand a. dat e 0 %t hen %do;

    %l et l i b = %subst r ( &data, 1, &p- 1) ;%l et dst = %subst r ( &data, &p+1, &n- &p+1) ; %end;

    pr oc sql ;creat e t abl e &i nf oas sel ect name, l abel , f ormat f r omdi ct i onary. col umnswhere l i b = upcase( "&l i b")

    and memname = upcase( " &dst " )and memt ype = " DATA" ;

    qui t ;%mend;

    / * Appl y var i abl e l abel s and f ormat s, saved by get Label sAndFormat s t odat aset I NFO, t o dat aset DATA */

    %macro set Label sAndFormat s( dat a, i nf o) ;pr oc sql nopr i nt ;

    sel ect count ( *) i nt o : n f r om &i nf o;sel ect name i nt o : name1 - %sysf unc( compr ess( : name&n. ) ) f r om &i nf o;sel ect l abel i nt o : l abel 1 - %sysfunc(compr ess( : l abel &n. ) ) f r om&i nf o;sel ect f ormat i nt o : f ormat1 - %sysf unc( compr ess(: f ormat&n. ) ) f r om&i nf o;qui t ;

    dat a &dat a;set &dat a;%do i = 1 %t o &n;

    l abel &&name&i = " &&l abel &i ";f or mat &&name&i &&f or mat &i ;

    %end;run;%mend;

    Macro get Label sAndFormat s extracts labels and formats from a SAS dataset. By directing the macros output to aMySQL table, one makes labels immediately accessible to Matlab. Labels and formats can be stored in MySQL, andre-applied if the data are taken back to SAS, with macro set Label sAndFor mat s . Consider the following example,where SAS dataset t e s t is moved to a MySQL database, its labels saved in table t est _col umns and fetched toMatlab, and then taken back, with labels and formats restored.

    pr oc copyi n = s as ;out = mysql ;sel ec t t e st ;r un;

    %get Label sAndFormats ( sas. t est , mysql . t est _col umns);

    [ name, l abel ] = mym( sel ect name, l abel f r omt est_col umns ) (in Matlab)

    pr oc copyi n = s as ;out = mysql ;sel ec t t e st ;r un;

    %set Label sAndFormats ( sas. t est , mysql . t est _col umns);

    16

  • 8/11/2019 Matlab and MySQL

    17/17

    Appendix III . Improving tbwrite speed.

    Function t bwr i t e invokes SQL command i nser t val ues to add data to a MySQL table, with buf f er rows of eachvecs vector passed to the database in a single call. Choice of buf f er (set to 1,000 by default) has a major impact onwrite speed, but the arguments optimal setting, which depends on the number and types of input vectors, is difficultto guess. In some cases, it may be worthwhile to try to identify the best choice of buf f er, by selecting a subset of

    data, e.g. 1/100th

    or 1/20th

    of rows of each vecs vector, and repeatedly writing it to MySQL, varying the value ofbuf f er . Recording the time of each run, one selects the best-performing buf f er value and uses it for the full write.Matlab code below illustrates the idea.

    gl obal xx = rand( 1, 1e6) ; % Or i gi nal dat ay = x( 1: 1e4) ; % Wr i t e- t es t dat aB = [ 10 100 1000 10000] ; % BUFFER val ues t o t r y

    T = nan*B; % St opwat ch t i mesf or i = 1: l engt h( B)

    t badd( wr i t e_ tes t , { y }, { doubl e }, r epl ace )t i ct bwr i t e ( wr i t e _t e st , { y }, {}, B( i ) )

    T( i ) = t oc;endbmax = B( T == mi n( T) ) ; % Best BUFFERt badd( wr i t e_ real , { x }, { doubl e }, r epl ace )t bwr i t e( wri t e_real , { x }, {}, bmax)

    17