Upload
hoanganh
View
215
Download
1
Embed Size (px)
Citation preview
Handling Pointers and Dynamic Memory
Laurent Hascoet, INRIA Jean Utke, All State (previously Argonne) Sri Hari Krishna Narayanan, Argonne
Adjoint computation in source transformation AD
q Original computaBon
q A source transformaBon tool produces:
q The AD tool must provide the variables required to compute – TAF: Organizes repeated computaBons of slices of (1) before
compuBng a parBal in (2) – OpenAD: Computes the parBals in (1) and stores them for use in (2).
Array indices and control are also stored – Tapenade: stores control informaBon and intermediate variables in (1)
2
P : ..., j : v1+ =∂φ∂v1
u,...,vk+ =∂φ∂vk
u"
#$
%
&', ⋅ ⋅ ⋅
)
*+
,
-., j = p,...,1
P : ..., j : u = φ v1,...,vk( )( ),...!" #$, j =1,..., p
∂φ∂vi
(1)
(2)
How is control flow handled?
q By storing the loop iteraBon count and index
3
void mini1(double *y, double *x) { ! int i; ! for (i = 0; i < 2; i=i+1) { ! y[i] = x[i] + sin(x[i]*x[i]); ! } !}
void ad_mini1(DERIV_TYPE *y,DERIV_TYPE *x) { ! //Removed plain code and declarations! if (our_rev_mode.tape == 1) { ! for (ad_i=0; ad_i < 2; ad_i=ad_i+1) { ! //Removed computation! push_i_s0(ad_i); ! ad_S_2 = ad_S_2 + 1; ! } ! push_i_s0(ad_S_2); ! } else if (our_rev_mode.adjoint == 1) { ! pop_i_s0(ad_S_0); ! for (ad_i=1; ad_i<=ad_S_0; ad_i=ad_i+1) { ! pop_i_s0(ad_S_6); ! //Removed other computation for clarity! Saxpy(ad_S_7,y[ad_S_6],ad_p_2); !} } }
Fwd Sweep
Rev Sweep
How is control flow handled?
q By storing branching informaBon
4
void mini1(double *y, double x) { ! if( x < 1.0) ! *y = x + sin(x * x); !}
void ad_mini1(DERIV_TYPE *y,DERIV_TYPE *x) { ! if (our_rev_mode.tape == 1) { ! if (DERIV_val( *x) < 1.00000) { ! //Removed computation! push_i_s0(1); ! } else { ! push_i_s0(0); ! } ! } else if (our_rev_mode.adjoint == 1) { ! pop_i_s0(ad_Symbol_0); ! if (ad_Symbol_0) { ! //Removed computation! } else { !} } }
How is dynamic memory handled?
q What if the array was dynamically allocated?
5
void mini1(double *y) { ! int i; ! double *x; ! x = malloc(2 * sizeof(double)); ! for (i = 0; i < 2; i=i+1) { ! y[i] = x[i] + sin(x[i]*x[i]); ! } ! free(x);!}
void ad_mini1(DERIV_TYPE *y) { ! //Removed plain code and declarations! if (our_rev_mode.tape == 1) { ! double *x; ! x = malloc(2 * sizeof(double)); ! //Removed computation! free(x);! } else if (our_rev_mode.adjoint == 1) { ! double *x; ! for (ad_i=1; ad_i<=ad_S_0; ad_i=ad_i+1) { ! pop_i_s0(ad_S_6); ! pop_s0(ad_S_7); ! pop_s0(ad_S_8); ! Saxpy(ad_S_7,y[ad_S_6],ad_prp_2); ! Saxpy(ad_S_8,y[ad_S_6],ad_prp_1); ! IncDeriv(ad_prp_0,y[ad_S_6]); ! IncDeriv(x[ad_S_6],ad_prp_2); ! IncDeriv(x[ad_S_6],ad_prp_1); ! IncDeriv(x[ad_S_6],ad_prp_0); !} } }
q ExisBng techniques work well as long as the memory is not dynamically allocated or deallocated inside P
What kinds of pointer assignments are there?
q Taking the address of a variable
q Assignment of the result of pointer arithmeBc
q Assignment of the result of a malloc
q QuesBon: Can the pointer assignments be simply inserted in
the reversed sequence? 6
δ&
δa
δm
double *x, y; ! x = &y !
double *x; ! x = x + 1; !
double *x; ! x = (double *) malloc(2 * sizeof(double)); ! free(x);!
Problems with pointer assignments
q If simple reversal of operaBons is followed: – *x is used in the adjoint sweep before it’s address is assigned. – Note: ADIC does not ordinarily introduce x=&y in the adjoint sweep
7
double *x, y, z; ! x = &y; ! z = sin(*x * *x); !
//Removed plain code and declarations! if (our_rev_mode.tape == 1) { ! DERIV_TYPE *x, y, z; ! x = &y; ! //Use of *x ! z = sin(*x * *x);! } else if (our_rev_mode.adjoint == 1) { ! DERIV_TYPE *x, y, z;! //Use of *x ! IncDeriv(*x, z);! //Setting of pointer ! x = &y; ! }
δ&
Problems with pointers
q If simple reversal of operaBons is followed: – p will be updated and used in the adjoint sweep before iniBalizaBon – Note: ADIC does not ordinarily introduce p=x in the adjoint sweep
8
δa
double y, *p, x[2]; ! p = x; ! for (i = 0; i < 2; i=i+1) { ! y = *p + sin(*p); ! p++; ! } !
if (our_rev_mode.tape == 1) { ! DERIV_TYPE y, *p, x[2]; ! p = x; ! for (i = 0; i < 2; i=i+1) { ! y = *p + sin(*p); ! p++; ! } ! } else if (our_rev_mode.adjoint == 1) { ! DERIV_TYPE y, *p, x[2];! for (ad_i=1; ad_i<=ad_S_0; ad_i=ad_i+1) { ! p++; ! //Use of p ! IncDeriv(*p, y); ! } ! //Setting of pointer ! p = x; ! }
Problems with pointers
q In the adjoint sweep x is freed before allocaBon/use
9
int i; ! double *x; ! x = malloc(2 * sizeof(double)); ! for (i = 0; i < 2; i=i+1) { ! y[i] = x[i] + sin(x[i]*x[i]); ! } ! free(x);!
if (our_rev_mode.tape == 1) { ! DERIV_TYPE *x; ! x = malloc(2 * sizeof(double)); ! //Removed computation! free(x);! } else if (our_rev_mode.adjoint == 1) { ! DERIV_TYPE *x;! free(x); ! for (ad_i=1; ad_i<=ad_S_0; ad_i=ad_i+1) { ! pop_i_s0(ad_S_6); ! pop_s0(ad_S_7); ! pop_s0(ad_S_8); ! Saxpy(ad_S_7,y[ad_S_6],ad_prp_2); ! Saxpy(ad_S_8,y[ad_S_6],ad_prp_1); ! IncDeriv(ad_prp_0,y[ad_S_6]); ! IncDeriv(x[ad_S_6],ad_prp_2); ! IncDeriv(x[ad_S_6],ad_prp_1); ! IncDeriv(x[ad_S_6],ad_prp_0); ! } ! x = (double *) malloc(2 * sizeof(double));! }
δm
What is the solution? (1/3)
q IntuiBvely, every value that is needed in the reverse sweep is stored in the forward sweep and restored in the reverse sweep
q So we can introduce pushpointer() and poppointer() calls.
q The pointer may not be correct. Why?
10
double *x, y; ! x = &y; ! .. = *x; !
if (our_rev_mode.tape == 1) { ! DERIV_TYPE *x, y; ! x = &y; ! .. = *x; ! pushpointer(x);! } else if (our_rev_mode.adjoint == 1) { ! DERIV_TYPE *x, y;! poppointer(x); ! //Use of x ! IncDeriv(ad_prp_2, *x);! }
What is the solution? (2/3)
q The pointer in the forward sweep is based on the memory allocated in the forward sweep. Therefore the pointer in the reverse sweep must be rebased to the memory allocated in the reverse sweep.
11
double *x, y; ! x = &y; !
if (our_rev_mode.tape == 1) { ! DERIV_TYPE *x, y; ! x = &y; ! .. = *x; ! pushpointer(x);! } else if (our_rev_mode.adjoint == 1) { ! DERIV_TYPE *x, y;! poppointer(x); ! rebase(x); ! //Use of x ! IncDeriv(ad_prp_2, *x);! }
What is the solution? (3/3)
q IntuiBvely, we must allocate data in the reverse sweep q So we can say that a deallocaBon in the forward sweep must
have a corresponding allocaBon in the reverse sweep and an allocaBon in the forward sweep must have a corresponding deallocaBon in the reverse sweep.
12
void mini1(double *y) { ! int i; ! double *x; ! x = malloc(2 * sizeof(double)); ! for (i = 0; i < 2; i=i+1) { ! y[i] = x[i] + sin(x[i]*x[i]); ! } ! free(x);!}
void ad_mini1(DERIV_TYPE *y) { ! if (our_rev_mode.tape == 1) { ! DERIV_TYPE *x; ! x = malloc(2 * sizeof(DERIV_TYPE )); ! //Removed computation! free(x);! } else if (our_rev_mode.adjoint == 1) { ! x = malloc(2 * sizeof(DERIV_TYPE));! for (ad_i=1; ad_i<=ad_S_0; ad_i=ad_i+1) { ! //Removed computation ! } ! free(x); !} }
How can we rebase a pointer? (1/2)
q At runBme, we can maintain a table of all memory chunks that are allocated in a chunks list.
q Both dynamically allocated and staBcally allocated chunks are maintained.
q We add calls to generated code that grow/shrink/lookup the list.
q How does this help?
13
typedef struct { ! void* base ; // Address allocated in forward sweep! int size ; // Size in bytes! void* newbase ; // Address allocated in reverse sweep!} ADMM_ChunkInfo ;
How can we rebase a pointer? (2/2)
q We maintain a table of all memory chunks that are allocated in a chunks list.
q AssumpBon: Pointer assignments be within a range:
q By linking base and newbase we can locate (rebase) p relaBve to newbase
14
typedef struct { ! void* base ; // Address allocated in forward sweep! int size ; // Size in bytes! void* newbase ; // Address allocated in reverse sweep!} ADMM_ChunkInfo ;
Fwd Sweep: base <= oldp <= base + size !Rev Sweep: newbase <= newp <= newbase + size
newp = newbase + (oldp-‐base) ;
The functions to “allocate and deallocate” memory
15
/** Forward sweep correspondent of standard malloc(). ! * Keeps track of base and size of the allocated chunk. ! * Usage: T* x = (T*)malloc(n*sizeof(T)) ; ! * becomes/creates in the forward sweep: ! * T* x = (T*)*FW_ADMM_Allocate(<if_dynamicMemory?0:x>, n*sizeof(T), <if_dynamicMemory?1:0>) ; */!void* FW_ADMM_Allocate(void *base, int size, int isDynamic); !!e.g: !x = FW_ADMM_Allocate(x, n*sizeof(double), 1); !FW_ADMM_Allocate(x, sizeof(double), 0); !!/** Forward sweep correspondent of standard free(). !* Pushes base and size on to the stack. ! * Usage: free(x) ; ! * becomes/creates in the forward sweep: ! * FW_ADMM_Deallocate((void*)x, <if_pointers?1:0> <if_dynamicMemory?1:0>) ; */ !void FW_ADMM_Deallocate(void *base, int pointers, int isDynamic); !!e.g.: !FW_ADMM_Deallocate(x,1); !FW_ADMM_Deallocate(x,0); !!!
The functions to “allocate and deallocate” memory
16
/** Backward sweep correspondent of standard free(). ! * Pops old base and size. ! * Re-allocates a chunk. ! * If pointers is true, calls for rebase of the pointer in the chunk. ! * Remembers correspondence from old base to new base. ! * Re-base waiting pointers. ! * Usage: free(x) ; ! * becomes/creates in the backward sweep: ! * x = BW_ADMM_Deallocate(<if_dynamicMemory?0:x>, (void**)&xb, <if_containsPointers?1:0>, <if_dynamicMemory?1:0>) ; */!void* BW_ADMM_Deallocate(void *chunk, int pointers, int isDynamic); !!e.g.: !x = BW_ADMM_Deallocate(x, 0, 1); !BW_ADMM_Deallocate(x, 0, 0); !!/** Backward sweep correspondent of standard malloc(). ! * Frees the chunk based at newbase, as well as its adjoint if present. ! * Usage: T* x = (T*)malloc(n*sizeof(T)) ; ! * becomes/creates in the backward sweep: ! * BW_ADMM_Allocate((void*)x, (void*)xb, <if_dynamicMemory?1:0>) ; */!void BW_ADMM_Allocate(void *newbase, int isDynamic); !!e.g.: !BW_ADMM_Allocate(x, 1); !BW_ADMM_Allocate(x, 0); !
Example for dynamic allocation
17
void ad_mini1(DERIV_TYPE *y) { ! if (our_rev_mode.tape == 1) { ! DERIV_TYPE *x, y;! x = FW_ADMM_Allocate(x, n*sizeof(double), 1);! //Removed computation! FW_ADMM_Deallocate(x,0,1);! } else if (our_rev_mode.adjoint == 1) { ! DERIV_TYPE *x, y;! x = BW_ADMM_Deallocate(x, 0, 1);! for (ad_i=1; ad_i<=ad_S_0; ad_i=ad_i+1) { ! //Removed computation ! } ! BW_ADMM_Deallocate(x, 1); !} }
double *x, y; ! x = malloc(2 * sizeof(double)); ! for (i = 0; i < 2; i=i+1) { ! y[i] = x[i] + sin(x[i]*x[i]); ! } ! free(x);!}
Function to rebase a pointer
q Searches the list for a base such that *pp < (base+size) q If newbase has been already allocated, then:
*pp = newbase+ (*pp-‐base) ;
q Otherwise, rebasing is performed ajer newbase is allocated
18
/** Re-base waiting pointers *pp and *ppb ! * from their old base from the forward sweep ! * to their new base in the backward sweep. ! * When new base not available yet, schedules ! * this to be done when new base is allocated. ! * Usage: restoring a pointer pp (i.e. declared as T* pp ;) ! * becomes/creates in the forward sweep: ! * pushpointer8(pp) ; ! * and becomes/creates in the backward sweep: ! * poppointer8((void**)(&pp)) ; ! * ADMM_Rebase((void**)(&pp), (void**)(&ppb)) ; */!void ADMM_Rebase(void **pp); !!
Rules for ADIC (1/2):
q malloc() or stack variable declaraBon: – Fwd Sweep: FW_ADMM_Allocate() – Rev Sweep: BW_ADMM_Allocate()
q free() or stack variable declaraBon: – Fwd Sweep: FW_ADMM_Deallocate() – Rev Sweep: BW_ADMM_Deallocate()
q Pointer assignments – Fwd Sweep: pushpointer() to precede the statement – Rev Sweep: poppointer() and rebase() inserted at top of reverse BB.
q FuncBon body: – Pointers that fall out of scope must be detected through analysis.
• Fwd Sweep: pushpointer() at the end of the procedure • Rev Sweep: poppointer() at the topof the procedure
19
Rules for ADIC (2/2):
q DefUse Data Flow Analysis: – USE(s): list of variables in RHS of every statement s. – MustKillList(s): any variable, v that is USEed in a statement s that
contains free() – MustDef(s): any variable, v, is DEFed in a statement s. Ignore whether v
is a pointer. – Result: list of variables in a proc that are assigned a value but not freed.
This can include non-‐pointers. The non-‐pointers are then removed.
20
A note on matrix allocation
q Code can allocate a block of memory and assign pointers into it.
– Analysis must determine which variables hold pointers – FW_ADMM_Deallocate(): Pushes contents of the block – BW_ADMM_Deallocate(): Rebases the contents of the block
21
double ** createMatrix(int m, int n){ ! int i; ! double **matrix, *mat; ! matrix = (double**)malloc(m, sizeof(double *)); ! matrix[0] = mat = (double*)malloc(m*n, sizeof(double)); ! for (i=1; i<m; i++){ ! matrix[i] = mat+n*i; ! } ! return matrix; !}
void freeMatrix(double **A){ ! free(A[0]); ! free(A); ! return; !}
Test case in progress
q MulBbody dynamics: – Has been differenBated using
ADIC and Tapenade in forward mode.
– By ADOL-‐C in both modes
22
XY
Z1z
2z
3z
4z
cLrL
fL
gL
Other approaches
q Postpone free – Inhibit all free() calls in the forward sweep – Inhibit all malloc() in the reverse sweep – Drawback: Memory growth
q Custom memory allocator – force chunks to match – guaranteeing that for each allocaBon of the forward sweep, the
corresponding allocaBon of the backwards sweep returns exactly the same chunk of memory
23
Refinements to reduce memory management
q For pointers to staBcally allocated variables: – In the reverse sweep: insert the reference assignment correctly
– If there is only one assignment for this pointer, possibly recursively reassign other pointers. This resembles recomputaBon!
– If stack frame is invariant (without inlining etc.): • Just use the original pointer values.
24
p = &y !
p1 = p !
Conclusion
q Designed a runBme library to handle all the cases involved in pointer usage and dynamic allocaBon of memory in source transformaBon adjoint computaBon.
q The runBme library can be shared by Tapenade and ADIC. q We are tesBng the implementaBon on a mulBbody code. q Refinements that reduce the need for runBme management
are possible.
25