25

Sim - Inria · 2016. 1. 22. · Sim ulation of F ractional Bro wnian Motion with Conditionalized Random Midp oin t Displacemen Ilkk a Norros VTT Information T ec hnology P.O.Bo x

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Simulation of Fractional Brownian Motion withConditionalized Random Midpoint DisplacementIlkka NorrosVTT Information TechnologyP.O.Box 1202FIN-02044 VTT, Finlandilkka.norros@vtt.�Petteri MannersaloVTT Information TechnologyP.O.Box 1202FIN-02044 VTT, Finlandpetteri.mannersalo@vtt.�Jonathan L. WangBellcore331 Newman Springs RoadRed Bank, NJ 07701, [email protected] measurement studies have shown that the burstiness of packettra�c is associated with long-range correlations that can be e�ciently mod-eled by terms of fractal or self-similar processes, e.g., fractional Brownianmotion (FBM). To gain a better understanding of queuing and network-related performance issues based on simulations as well as to determinenetwork element performance and capacity characteristics based on loadtesting, it is essential to be able to accurately and quickly generate longtraces from FBM processes. In this paper, we consider an approximateFBM generation method based on the concept of bisection and interpola-tion, which is an improvement of a much used but inaccurate method knownas the random midpoint displacement (RMD) algorithm. We further extendour new algorithm (referred to as RMDmn) to be able to generate FBMtraces without a priori knowledge of the length of the simulation (i.e., on-the-�y generation), instead of being a pure top-down generation (that is,the entire trace has to be generated �rst before it can be used) like theoriginal RMD algorithm. We present the mathematical and numerical as-pects of the RMDmn algorithm as well as compare it with two other widelyfavored FBM generation methods, i.e., the fast Fourier transform (FFT)1

  • method and the method based on aggregating a large number of ON-OFFsources with in�nite-variance sojourn times.Keywords: fractional Brownian motion, self-similar tra�c generation, ran-dom midpoint displacement.1 IntroductionRecent measurement studies (see for example [7, 10, 18]) have shown that theburstiness of packet tra�c is associated with long-range correlations than can bee�ciently modeled by terms of fractal or self-similar processes, e.g., fractionalBrownian motion (FBM). Fractional Brownian motions were included in the tool-box of statistical models by Mandelbrot and Van Ness [13], the �rst applicationsbeing in economy and hydrology. Later, these processes have been used in manyother �elds, including generation of arti�cial landscapes [12] and, more recently,teletra�c modeling [15]. However, the generation (or simulation) of FBMs hasremained somewhat non-trivial since it is practically impossible to build largeFBM samples precisely using a sequence of independent standard Gaussian ran-dom variables as the basic random elements. This paper presents a conceptuallysimple FBM generation algorithm which can reach practically arbitrary accuracywithout sacri�cing too much computation e�ciency.A normalized FBM with self-similarity parameter H 2 (0; 1) is a stochasticprocess (Zt)t�0 characterized by the following properties:(i) Zt has stationary increments;(ii) Z0 = 0, and EZt = 0 for all t;(iii) EZ2t = t2H for all t;(iv) Zt is Gaussian;(v) Zt has continuous sample paths.Ordinary Brownian motion is obtained as the special case with H = 1=2. Theexistence of such a process was established by Kolmogorov [8] in the Hilbert spaceframework. In [13], the process was de�ned more constructively as an integralwith respect to ordinary Brownian motion:Zt � Zs = cHnZ ts (t� u)H� 12 dWu (1.1)+Z s�1 �(t� u)H� 12 � (s� u)H� 12� dWuo:2

  • The normalization EZ21 = 1 is achieved withcH =r2H�(32 �H)=(�(H + 12)�(2 � 2H));where � is Euler's Gamma function. The single most important property of aFBM is its statistical self-similarity: the processes Z�t and �HZt have the samepath space distribution for any � > 0. In particular, the correlation structure isthe same in all timescales. For H 6= 1=2, this implies that the correlations donot die out in the same way as most processes used in statistical modeling, forexample the �nite order autoregressive processes. Thus, it is in general not a goodidea to generate a long sequence of successive increments of a FBM using eitheran exact (too heavy) or truncated (eventually exponential decay of correlation)autoregressive scheme.The exact generation of long FBM traces is infeasible in practice due tothe amount of storage and CPU time required (consider, for example, the di-mensionality of the covariance matrix of the sequence of FBM increments, andthe di�culty of generating a sequence that exactly conforms to this correlationstructure). For example, skillful use of matrix algebra allows the FBM simulationprogram of FRACLAB (a toolbox for signal processing with fractals developedin the FRACTALES team at INRIA Rocquencourt, France) to generate exactFBM sequences of length 104, but the algorithm slows down dramatically iflonger sequences are required.Therefore, well understood and e�cient approximate algorithms become de-sirable, especially for generating long traces for the purpose of network perfor-mance testing, simulation and analysis. Several approximation algorithms havebeen proposed, which include:� a fast but ad hoc method suggested by Mandelbrot that is based on shortmemory approximation [11];� queuing based methods such as a method based on the M/G/1 queue lengthwith Poisson arrivals and heavy-tailed service time [4];� transform methods based on inversely transforming known FBM coe�cientsin the transformed domain, these include methods based on the fast Fouriertransform [17] and the wavelet transform [1, 5, 21, 20];� aggregation methods based on aggregating a large number of single sourcemodels such as models based on chaotic maps [19], ON-OFF model within�nite-variance sojourn times [22], and AR(1)-processes with the AR(1)parameters chosen from a beta-distribution on [0,1] with shape parametersp and q [6]. 3

  • � bisection methods based on generating points of the process path �top-down� by properly interpolating from the existing points.The last group includes, as the simplest case, random midpoint displacement(RMD), which has traditionally been identi�ed as a method appropriate for fastbut somewhat inaccurate FBM generation. It gives realizations with roughly theright self-similarity parameter, but the correlations behave in a non-stationarymanner. Lau et al. found it, however, quite suitable for the generation of simu-lated data tra�c [9], at least to the extent that the generated traces are quali-tatively satisfactory. In [14], a one step further conditionalized version of RMDwas proposed, and it was found quite satisfactory for most needs.The main idea behind the algorithm proposed in this paper is to apply thesame principle of bisection, but to achieve arbitrary accuracy at minimal com-putation and storage costs. Moreover, the algorithm can be modi�ed to producelong accurate FBM traces on-the-�y instead of being a pure top-down algorithm,with moderate memory requirements and high computational e�ciency. Here,the term on-the-�y means that traces are generated without a priori knowledgeof the length of the simulation; in pure top-down algorithms, the entire tracehas to be generated �rst before it can be used. This on-the-�y capability greatlyimproves the usability of the proposed generation method in simulation and loadtesting that requires long traces such as the ATM environment with its stringentQoS objectives and high capacities.This paper is organized as follows. The algorithm and some mathematicalaspects are studied in Section 2. Numerical studies on the accuracy of the al-gorithm are presented in Section 3.1; an outline of our implementation and thecomputation complexity is given in Section 3.3; and comparison with two otherwidely favored algorithms, i.e., the FFT and the aggregation methods, in termsof their accuracy and time complexity is presented in Section 3.2. Finally, thepaper is summarized in Section 4.2 The Conditionalized Random Midpoint Displace-ment AlgorithmIn this section, we �rst describe an exact construction for a FBM. Then weintroduce our RMDmn algorithm and describe an extension of the algorithm toenable generation of FBM traces on-the-�y. Finally, covariances of approximatetraces are studied.4

  • I

    I

    I I

    I I I

    2

    3 4

    5 6 7 8I

    1

    10

    Figure 2.1: Ordering of the intervals in the binary construction.2.1 The Full Binary ConstructionLet I1; I2; I3; : : : denote dyadic subintervals of (0; 1], numbered as shown in Figure2.1. For any i, denote by Xi the increment Xi = Zbi � Zai , where Ii = (ai; bi].Further, let e(i) and v(i) be the i-vector and scalar, respectively, de�ned bye(i)[Xi;Xi�1; : : : ;X1]T = E [Xi+1 j Xi;Xi�1; : : : ;X1] ;v(i) = Var [Xi+1 j Xi;Xi�1; : : : ;X1] :Finally, denote U1 = X1 = Z1 andUi+1 = �Xi+1 � e(i)[Xi;Xi�1; : : : ;X1]T� =pv(i); i = 1; 2; : : :The above relations establish a 1�1mapping between the sequence X1;X2; : : : ofFBM increments and the sequence U1; U2; : : : of mutually independent standardGaussian random variables. It can be considered as one of the many interestingorthogonalizations of a FBM. (For a very di�erent kind of orthogonalization, see[16].) By the continuity of Z, the mapping (U1; U2; : : :) 7! (X1;X2; : : :) extendsto an almost everywhere de�ned map : IRIN ! C[0; 1] such that(U1; U2; : : :) = Z:We note the following monotonicity properties:Proposition 2.1 The mapping(u1; u2) 7! (u1; u2; u3; : : :)is increasing in the set of sequences (u1; u2; u3; : : :) such that the right hand sideis de�ned. 5

  • Proof It is enough to note that the coe�cients of U1 and U2 are positive inthe representation of any Zt in the orthogonal basis fU1; U2; : : :g. The coe�cientof U1 is simply Cov (Zt; Z1), which is positive for t > 0. Since U2 = (Z1=2 �Z1=2)=pv(1), the coe�cient of U2 isCov (Zt; U2)pVar (U2) = Cov �Zt; Z1=2�� 12Cov (Zt; Z1)pv(1)Var (U2) :It is easy to check that the nominator is positive for t 2 (0; 1), with a maximumat t = 12 . �On the other hand, the whole mapping is not monotone. For example,increasing U3 while keeping the other Ui's �xed decreases Z3=4.2.2 Basic AlgorithmBy truncating the exact construction described in the previous section, we nowconstruct an approximate FBM realization (Z(t) : t 2 [0; 1]) with parameterH 2 (0; 1). Note that using the property of self-similarity, Z(t) can be scaledonto an interval of any desired length. First, we set Z(0) = 0 and draw Z(1)from the standard Gaussian distribution. The conditional distribution of Z(12)given Z(0) = 0 and Z(1) is then N(12Z(1); 2�2H � 14). Thus, we can next drawZ(12) from this distribution.In principle, this bisecting (or halving) process could be continued using exactconditional distributions. However, the dimension of the multivariate Gaussiandistribution grows by each new additional point, and it quickly becomes necessaryto restrict the number of previously drawn points that are used in conditioning.In the traditional RMD algorithm, conditioning is made only on the incrementsof the interval to be halved. In the version proposed in [14], we additionallyconditionalized on the increment of the left neighboring interval. This idea isfurther extended in the present version to conditionalize on a �xed �nite numberof already generated left and right neighboring intervals.To be more precise, let us �x two integers m � 0 and n � 1. We shallconditionalize on (at most) m neighboring increments to the left of the intervalto be halved and on (at most) n neighbor increments to the right of the intervalto be halved, including the �mother� interval itself. Thus, we can call our algo-rithm RMDmn, so that the conventional RMD is just RMD0;1 and the algorithmpresented in [14] is RMD1;1.Let �Uik : i = 0; 1; : : : ; k = 0; : : : ; 2i�1 � 1 be a set of independent stan-dard Gaussian random variables. Denote Xi;j = Z(j � 2�i) � Z((j � 1) � 2�i),6

  • i = 0; 1; 2; : : :, j = 1; : : : ; 2i. (Note the di�erent numbering of the intervalscompared to the previous section!) We de�ne the FBM generation algorithm re-cursively. Assume that we have obtained the values up to stage i� 1 (resolution2�i+1). We then draw Xi;1, Xi;2,. . . as follows:Since Xi;2j�1 +Xi;2j = Xi�1;j , it is su�cient to generate Xi;j for odd j. Letus proceed from left to right, and assume that Xi;1; : : : ;Xi;2k have already beengenerated (k 2 �0; 1; : : : ; 2i�1 � 1). Now, we can chooseXi;2k+1 = e(i; k)[Xi;(2k�m+1)_1; : : : ;Xi;2k; Xi�1;k+1; : : : ;Xi�1;(k+n)^2i�1 ]T+pv(i; k) Ui;k; (2.1)where e(i; k) is a row vector such thate(i; k)[Xi;(2k�m+1)_1; : : : ;Xi;2k; Xi�1;k+1; : : : ;Xi�1;(k+n)^2i�1 ]T= E �Xi;2k+1 j Xi;(2k�m+1)_1; : : : ;Xi;2k; Xi�1;k+1; : : : ;Xi�1;(k+n)^2i�1�(2.2)and v(i; k) a scalar de�ned byv(i; k) = Var �Xi;2k+1 j Xi;(2k�m+1)_1; : : : ;Xi;2k; Xi�1;k+1; : : : ;Xi�1;(k+n)^2i�1� :The vector e(i; k) and the number v(i; k) are computed by the standard formulaefor multivariate Gaussian distributions as follows. Denote by �ik the covariancematrix�ik = Cov ��Xi;2k+1; Xi;(2k�m+1)_1; : : : ;Xi;2k; Xi�1;k+1; : : : ;Xi�1;(k+n)^2i�1�� ;whereCov ([x1; : : : ; xn]) = 0BBBB@ Cov(x1; x1) Cov(x1; x2) � � � Cov(x1; xn)Cov(x2; x1) Cov(x2; x2) � � � Cov(x2; xn)... ... ... ...Cov(xn; x1) Cov(xn; x2) � � � Cov(xn; xn)1CCCCA ;and split it as �ik = " Var (Xi;2k+1) �(1;2)ik�(2;1)ik �(2;2)ik # :Then e(i; k) = �(1;2)ik ��(2;2)ik ��1 ; v(i; k) = det �ikdet �(2;2)ik :7

  • By the stationarity of the increments of Z and by self-similarity, e(i; k) is inde-pendent of i and k when 2k � m and k � 2i�1 � n. Moreover, it depends on ionly when 2i < m+ 2n.On the other hand, the number of the true covariance matrices, when runningup to the stage i = dlog(m+2n)e, is 1+ 2+ 22 + : : :+2i�1 = 2i� 1 < 2m+4n.Thus, less than 2m + 4n vectors e(i; k) need to be computed. The same holdsfor the scalars v(i; k), except that �independence with respect to i� is replacedby scaling a constant factor 2�2Hi.In the case m = n = 1, the only special case is k = 0, that is, the �rstsplitting for each i, which is then made according to the rule of the usual RMD.For all the other i and k, we need only one weight vector e and one conditionalvariance, which are computed ase = �(1;2) ��(2;2)��1 ; Var [Xi;2k+1 j Xi;2k;Xi�1;k+1] = det �det �(2;2) � 2�2Hi;where� = " 1 �(1;2)�(2;1) �(2;2) # = 264 1 22H�1 � 1 22H�122H�1 � 1 1 12 (32H � 22H � 1)22H�1 12(32H � 22H � 1) 22H 3752.3 On-the-Fly RMDmn GenerationWithout the on-the-�y generation capability, performance simulation of FBMqueues with long synthetic input traces cannot be achieved. This is especiallyimportant in network environments where Quality-of-Service (QoS) objectivesare stringent such as the case in ATM.At the �rst sight, the RMDmn algorithm looks unsuitable for generating FBM�on-the-�y�, which could be a requirement of a synthetic tra�c generator for thepurpose of simulation and load testing, e.g., to determine the queuing behaviorand performance of ATM switches. However, the basic algorithm described inthe previous section can be extended to have such a capability because of thefollowing observation: it is not necessary to generate the splits in exactly thesame order as above. That is, instead of completely generating each resolutionbefore moving to the �ner one, we can have several un�nished resolutions at thesame time.An �on-the-�y� FBM generation algorithm may be constructed so that afterthe interval [0; 2i�] is generated with resolution �, the trace is �expanded� to theinterval [0; 2i+1�]. Naturally, the algorithm slows down after each �expansion�,8

  • 59 3 6 2 7 4 8 1

    9

    10 11 12 13 14 15 16

    1 4 6 8 5 11 13 10 15 14 1612(n=1) 2

    2 4 3 7 6 8 5 11 10 151 1413 16122 4 3 7 6 8 5 11 10 151 14 16(n=2)

    (n>2)12 13

    3 7 99Figure 2.2: Running the RMDmn algorithm over an interval and the on-the-�yversion. The order of the splits is shown under the axis.but if we generate the increments in an appropriate order, the speed of the slow-down is only logarithmic. The appropriate way is to choose the new incrementto be as left as possible, i.e., the interval under consideration must have n � 1(or all) nearest intervals on the right hand side of the same size as the motherinterval. In addition to these intervals, m (or all) nearest intervals on the left,whose length is same as the new interval, are taken into account in conditioning.(The algorithm could be still improved by counting, whenever it is possible, mneighboring intervals from the left; if there do not exist m intervals of the samesize as the new interval, take m intervals as large as possible. For example,in the �rst step in �gure 2.3, Z(16�) � Z(8�) would be drawn conditioned onZ(8�) � Z(4�) and Z(4�) � Z(0).) DenotingYi;j = Z(j � 2i�)� Z((j � 1) � 2i�); i = 0; 1; : : : ; j = 1; 2; : : : ; (2.3)where � is the desired resolution, then similar relations as in equation (2.1) canbe written. The enlargement is de�ned byYi;1 = E [Yi;1 j Yi�1;1] +qVar [Yi;1 j Yi�1;1] Ui;1 = e(i; 1)Yi�1;1 +pv(i; 1) Ui;1and increments on the inner intervals byYi;2k+1 = e(i; k)[Yi;(2k+1�m)_1; : : : ; Yi;2k;Yi+1;k+1; : : : ; Yi+1;(k+m)^Ni+1 ]T9

  • +pv(i; k) Ui;k;where Ni+1 is the last generated increment on the resolution 2i+1. One shouldnotice that in the �on-the-�y� case the order of the splits depends on the param-eter n, while in the basic algorithm it is always the same (see �gure 2.2).16δ

    ?

    ?

    ?

    0 2δδ 4δ 8δ

    0

    0

    δ

    δ

    2δ 4δ

    4δ 8δ

    8δ 16δ

    16δ

    Figure 2.3: Conditionalizing the �on-the-�y� generation algorithm with m = n =2. The line segments above the axis correspond to the intervals (i.e., increments)to be counted in the conditionalizing.The critical aspect in the �on-the-�y� generator, related to the speed of theslowdown, is the memory usage. Assume that we have generated the interval[0; T ] with the resolution � and the trace is enlarged to [0; 2T ]. In order togenerate Z(T +�) we need either m or all the nearest values from each resolutionon the left, i.e, all the values from the resolutions T , T=2, T=4, : : :, dT=me andm nearest values from the resolutions �, 2�, 4�, : : :, bT=mc need to be stored.Summing up the number of the needed values from the left, we get#left = 1 + 1 + 2 + 4 + : : :+ 2blog2mc +�log2(T� )� dlog2me� m2 � m log2(T� ):From the right we must keep all the generated values in the memory. The rule ofchoosing an interval to be split guarantees that there can never exist more thann + 1 intervals of the size 2� and n intervals of the coarser resolutions (if n iseven, then the worst possible case is n intervals of the size 2� and n� 1 intervalsof the coarser resolutions), i.e.,#right � 1 + n log2� T2�� � n log2(T� ); n � 1:10

  • Moving towards the point 2T decreases #right and increases #left, nevertheless,the latter is bounded by m log2(2T=�). Thus, by a clever memory manage-ment one can generate simulation traces on the interval [0; 2T ] with less thanm log2(2T=�) + n log2(T=�) values of the trace stored at a time. For example,to generate a one-day simulated trace with the resolution of one millisecond andm = n = 2 we need only about one hundred memory locations.2.4 Exact Covariances of RMD0;1 and RMD1;1For the usual RMD0;1, the true covariances of the simulated process can be easilydetermined. Indeed, given the increments Xi and Xj on two disjoint intervalsIi and Ij , the further evolution of the simulation inside these intervals proceedsindependently. If Ii0 is either half of Ij and Ij0 is either half of Ij , then we haveCov �Xi0 ;Xj0� = Cov�12Xi +pv(i0 � 1)Ui0 ; 12Xj +pv(j0 � 1)Uj0�= 14Cov (Xi;Xj) :Iterating this, we see that the correlation coe�cient of the increments Xi, Xj ontwo intervals of equal size isCorr (Xi;Xj) = 2(k(i;j)�1)(2H�2)Corr (Z1; Z2 � Z1) = 2k(i;j)(2H�2)(22H�1 � 1);where k(i; j) is the number of generation steps from Ii and Ij up to their closestcommon ancestor. The covariance matrix�(i) = Cov[Xi;1;Xi;2; : : : ;Xi;2i ]can be computed by a simple rule: start with �(0) = [1] and proceed from �(i)to �(i+1) by replacing each diagonal element of (i)jj of �(i) by the 2� 2 matrix(i)jj " 2�2H 12 � 2�2H12 � 2�2H 2�2H #and each non-diagonal element (i)jk (j 6= k) by the 2� 2 matrix(i)jk " 14 1414 14 # :The matrix �(5) is shown in Figure 2.4. The true covariance matrix, shown inthe next �gure, is thus approximated by �terrace architecture�.11

  • 10

    20

    30

    10

    20

    30

    0

    0.0005

    0.001

    0.0015

    10

    20

    30Figure 2.4: The 32� 32 covariance matrix of RMD0;1. H = 0:9.

    10

    20

    30

    10

    20

    30

    0

    0.0005

    0.001

    0.0015

    10

    20

    30Figure 2.5: The true 32� 32 covariance matrix of FGN. H = 0:9.12

  • When m > 0 and/or n > 1, computation of the exact covariance matrix ofthe RMDmn process is more complicated, since the dependence of increments isnot restricted to the common ancestor type, and each split corresponds to a lesstrivial extension of the covariance matrix than by RMD0;1. We have derived thisin the next simplest case RMD1;1. The 32 � 32 matrix is shown in Figure 2.6.Note that the asymmetricity of the algorithm causes a strange, robust look ofthe surface.

    10

    20

    30

    10

    20

    30

    0

    0.0005

    0.001

    0.0015

    10

    20

    30Figure 2.6: The 32� 32 covariance matrix of RMD1;1. H = 0:9.Let us have a closer look at some correlations. First, note that RMD0;1 givesthe variances of the dyadic increments Xi exactly right, whereas RMD1;1 doesn't� see Figure 2.7. We see an interesting fractal shape. The behavior of thesecurves remains to be further investigated and is beyond the scope of this paper.3 Evaluation of RMDmnWe have coded a prototype program (http://www.vtt.fi/tte/tte23/cost257/)for generating approximate FBM traces for a given interval with RMDmn. Theprogram is written in C and, in addition to the standard C�libraries, it uses therandom number library Ranlib and the matrix algebra library Meschach, bothavailable via Netlib (http://www.netlib.org/).As a starting point, we show in Figure 3.1 synthetic traces generated by13

  • 5 10 15 20 25 30

    0.96

    0.97

    0.98

    0.99

    1.01

    1.02

    200 400 600 800 1000

    0.94

    0.96

    0.98

    1.02

    Figure 2.7: The diagonals of the 32� 32 and 1024� 1024 covariance matrices ofRMD1;1, each normed by the true FGN value. H = 0:75.RMD1;2 with inputH = 0.5, 0.6, 0.7, 0.8, 0.9, and 0.95. We can see that, as theHvalue increases, the traces indeed become more and more (long-term) correlated(shown as a low frequency �uctuation in the �gure). In contrast, the trace forH = 0:5, corresponding to independent increments, does indeed resemble whitenoise. Thus, the RMDmn algorithm generates traces that qualitatively resembleFBM. We next provide a quantitative assessment of the RMDmn algorithm andcompare it with other widely favored approaches such as the method based onthe fast Fourier transform (referred to as the FFT method) [17] and the methodbased on aggregating a large number of ON-OFF sources with in�nite-variancesojourn times [22] (referred to as the aggregation method).3.1 Accuracy Analyses3.1.1 Variations in the Sample PathAs an internal check of the accuracy of the algorithm with di�erent values of mand n, we rely on the algorithm being very accurate when m and n are large, andchoose the trace ~Z(t) calculated with parameters H = 0:8, m = 50 and n = 25as a reference trace. Di�erences between this and simulations with parameters14

  • 0 5000 10000 15000 20000

    10

    20

    30

    40

    50

    H = 0.5

    0 5000 10000 15000 20000

    10

    20

    30

    40

    50

    H = 0.6

    0 5000 10000 15000 20000

    10

    20

    30

    40

    50

    H = 0.7

    0 5000 10000 15000 20000

    10

    20

    30

    40

    50

    H = 0.8

    0 5000 10000 15000 20000

    10

    20

    30

    40

    50

    H = 0.9

    0 5000 10000 15000 20000

    10

    20

    30

    40

    H = 0.95

    Figure 3.1: Synthetic tra�c traces generated by RMD1;2.15

  • (m = 0; n = 1), (m = 2; n = 2), and (m = 6; n = 4) are shown in �gure 3.2. Notethat all the simulations are performed with same values of the random numbergenerator. One way to measure the error is to consider absolute values of thedi�erences of the increments. An error function may be de�ned aserror = PNk=1 j ~Xk �XkjPNk=1 j ~Xkj ;i.e., relative error in the `1�norm, where Xk = Z(tk)�Z(tk�1) and ~Xk = ~Z(tk)�~Z(tk�1). The observed values of the error function with di�erent trace lengthsand parametersm and n are shown in table 3.1. It seems to be quite obvious thatthere is no need to require more than a few, in this case four, nearest neighborsin order to get a good simulation trace. The other remarkable fact observed isthat the length of a trace does not play any signi�cant role in the error. Thereason for that is, of course, the self-similarity of fractional Brownian motion.Length of trace212 214 216 218m = 0 n = 1 0.2138 0.2203 0.2256 0.2286m = 2 n = 2 0.0126 0.0128 0.0131 0.0132m = 4 n = 3 0.0055 0.0055 0.0056 0.0056m = 6 n = 4 0.0028 0.0028 0.0028 0.0028m = 8 n = 5 0.0020 0.0019 0.0019 0.0018m = 10 n = 6 0.0014 0.0015 0.0014 0.0013Table 3.1: Relative `1-error as a function of trace length3.1.2 Hurst ParameterWe next provide an assessment of the method by evaluating the di�erence be-tween the target (input to the method) and the output Hurst parameters. Theevaluation is performed by randomly generating 100 traces (for each target Hurstparameter) and estimate the output Hurst parameter based on the wavelet esti-mation method [3, 2] which has been proven to be unbiased and robust (againstdeterministic trends, for example). These 100 estimates for each target Hurstparameter value are then averaged and plotted on Figure 3.3 against 16 choicesof (m, n) pairs ranging from (0, 1) (ordinary RMD) to (20, 10). The estimationis set up so that the 95% con�dence intervals are the same for all the cases wesimulated. 16

  • 0.2 0.4 0.6 0.8 1

    -0.01

    0.01

    0.02

    0.03

    0.2 0.4 0.6 0.8 1

    -0.002

    -0.001

    0.001

    0.2 0.4 0.6 0.8 1

    -0.0006

    -0.0004

    -0.0002

    0.0002

    0.0004

    0.0006

    Figure 3.2: Di�erences between the reference trace (m = 50; n = 25) and traces(m = 0; n = 1), (m = 2; n = 2) and (m = 6; n = 4).17

  • M:N

    Hurs

    t P

    ara

    mete

    r E

    stim

    ate

    0.4

    90

    0.4

    95

    0.5

    00

    0.5

    05

    0.5

    10

    o o o o o o o o o o o o o o o o- - - - - - - - - - - - - - - -

    - - - - - - - - - - - - - - - -

    0:1 0:2 1:1 0:3 1:2 2:1 0:4 1:3 2:2 3:1 5:10 7:8 10:510:2015:1520:10

    H = 0.5

    M:N

    Hurs

    t P

    ara

    mete

    r E

    stim

    ate

    0.5

    90

    0.5

    95

    0.6

    00

    0.6

    05

    0.6

    10

    oo o o

    oo o

    o oo

    o o o o o o

    -- - -

    -- -

    - --

    - - - - - -

    -- - -

    -- -

    - --

    - - - - - -

    0:1 0:2 1:1 0:3 1:2 2:1 0:4 1:3 2:2 3:1 5:10 7:8 10:510:2015:1520:10

    H = 0.6

    M:N

    Hurs

    t P

    ara

    mete

    r E

    stim

    ate

    0.6

    90

    0.6

    95

    0.7

    00

    0.7

    05

    0.7

    10

    o

    o o o

    o

    o o

    o o

    o

    o o o o o o

    -

    - - -

    -

    - -

    - -

    -

    - - - - - -

    -

    - - -

    -

    - -

    - -

    -

    - - - - - -

    0:1 0:2 1:1 0:3 1:2 2:1 0:4 1:3 2:2 3:1 5:10 7:8 10:510:2015:1520:10

    H = 0.7

    M:N

    Hurs

    t P

    ara

    mete

    r E

    stim

    ate

    0.7

    90

    0.7

    95

    0.8

    00

    0.8

    05

    0.8

    10

    o

    o o o

    o

    o o

    o o

    o

    o o o o o o

    -

    - - -

    -

    - -

    - -

    -

    - - - - - -

    -

    - - -

    -

    - -

    - -

    -

    - - - - - -

    0:1 0:2 1:1 0:3 1:2 2:1 0:4 1:3 2:2 3:1 5:10 7:8 10:510:2015:1520:10

    H = 0.8

    M:N

    Hurs

    t P

    ara

    mete

    r E

    stim

    ate

    0.8

    90

    0.8

    95

    0.9

    00

    0.9

    05

    0.9

    10

    o

    o o o

    o

    o o

    o o

    o

    o o o o o o

    -

    - - -

    -

    - -

    - -

    -

    - - - - - -

    -

    - - -

    -

    - -

    - -

    -

    - - - - - -

    0:1 0:2 1:1 0:3 1:2 2:1 0:4 1:3 2:2 3:1 5:10 7:8 10:510:2015:1520:10

    H = 0.9

    M:N

    Hurs

    t P

    ara

    mete

    r E

    stim

    ate

    0.9

    40

    0.9

    45

    0.9

    50

    0.9

    55

    0.9

    60

    o

    o o o

    o

    oo

    o o

    o

    o o o o o o

    -

    - - -

    -

    --

    - -

    -

    - - - - - -

    -

    - - -

    -

    --

    - -

    -

    - - - - - -

    0:1 0:2 1:1 0:3 1:2 2:1 0:4 1:3 2:2 3:1 5:10 7:8 10:510:2015:1520:10

    H = 0.95

    Figure 3.3: Comparing the estimated Hurst parameters of the traces generatedby the RMDmn algorithm and the target Hurst parameter values.18

  • We can see that in general the quality of the traces is quite good, i.e., theHurst parameters of the simulated traces are close to the target values, and asexpected as them and/or n increase, the quality of the generated traces improves.Note that traditional RMD produces traces with too low Hurst parameter, andthe same holds, to lesser extent, for RMDmn when m and n are very small, withone surprising exception: with m = 1 and n = 2, the Hurst parameters of thesynthetic traces are extremely close to the target values.3.2 Comparison with Other MethodsIn this section, we compare the RMDmn algorithm with two other popular gen-eration methods: the fast Fourier transform (FFT) method (note that a versionof the FFT method was proposed in the ATM Forum as a method to generatesynthetic self-similar tra�c traces) and the method based on aggregation of ON-OFF sources with heavy-tailed sojourn times. The comparison is based on thetime complexity and the quality of the generated traces in terms of the matchingbetween the Hurst parameter values of the synthetic traces and their targets.The left panel of Figure 3.4 demonstrates the quality of the FFT method bydepicting the di�erence of the average Hurst parameter estimates of 100 gen-erated traces and their target values against the target Hurst parameter value.The right panel shows the same �gure-of-merit for the RMD1;2 algorithm. Wecan see that in terms of the Hurst parameter estimates of the generated traces,these two algorithms are quite comparable.For the aggregation method, we found that the Hurst parameter value of thegenerated trace is close to its target value asymptotically, i.e., when the timescale is large. However, the convergence to the asymptotic result (i.e., the targetHurst value) may be slow. This is demonstrated in Figure 3.5 which shows thevariance-time plot [10] of a trace generated based on the aggregation methodwith a target Hurst parameter value of 0.5. We see that the Hurst parameterestimate is only close to 0.5 (i.e., parallel to the dotted line) when the timescale is su�ciently large. This issue could potentially hinder the use of theaggregation method in practice. (Note that the theory states that aggregationin time and space (i.e. number of sources) with proper normalization is requiredfor the convergence to FBM to take place, thus the discussion here relating tothe quality and complexity of the aggregation-based method needs to be takenwith care. However, one needs to be aware of the trade-o� between accuracy andtime-complexity with the need for large aggregations (both in space and time)when using the aggregation-based method.)The comparison of the time complexity of these three algorithms (for gen-erating a synthetic trace with 217 points) are depicted in Figure 3.6, with the19

  • Target Hurst Parameter

    Hurst

    Para

    meter

    Esti

    mate

    - Targ

    et Hu

    rst P

    arame

    ter

    0.5 0.6 0.7 0.8 0.9

    -0.00

    4-0.

    002

    0.00.0

    020.0

    04

    o

    -

    -

    o

    -

    -

    o

    -

    -

    o

    -

    -

    o

    -

    -

    o

    -

    -

    Hurst Paramter Estimate Based on the FFT Method

    Target Hurst Parameter

    Hurst

    Para

    meter

    Esti

    mate

    - Targ

    et Hu

    rst P

    arame

    ter

    0.5 0.6 0.7 0.8 0.9

    -0.00

    4-0.

    002

    0.00.0

    020.0

    04

    o

    -

    -

    o

    -

    -

    o

    -

    -

    o

    -

    -

    o

    -

    -

    o

    -

    -

    Hurst Paramter Estimate Based on RMD_{1,2}

    Figure 3.4: Comparing the quality of traces generated by the FFT method andthe RMD1;2 algorithm.

    log10(m)

    log10(v

    ariance

    s)

    0 1 2 3 4

    -1.0

    -0.5

    0.0

    0.5

    1.0

    Figure 3.5: Variance-time plot of a trace generated by the aggregation methodwith target Hurst parameter value of 0.5.20

  • left panel showing the computation time in seconds against the length of thegenerated trace and the right panel showing time against the length of the gen-erated trace in logarithmic 2 base. We see that the time of generating a synthetictrace for all three algorithms increases linearly with the length of the trace, withRMD1;2 slightly faster than the FFT method, and both are much faster thanthe aggregation method. (All three algorithms were coded in C and no e�ortshave been attempted to optimize the codes; therefore, the di�erence in the ab-solute computational time may be debatable.) Note that the time complexity ofthe aggregation method depends on the number of sources and the target Hurstparameter value. The curve shown here is for a target Hurst parameter value of0.7 and 30 aggregated ON-OFF sources. Furthermore, the aggregation methodwas originally suggested in the context of a parallel computing architecture [22],where the method can be easily implemented simply by multiplexing. Whenusing the FFT or RMDmn algorithms, it is not so clear whether the bene�t ofparallelizing is as large as in the aggregation-based methods.

    Trace Length

    Time C

    omple

    xity (

    sec)

    0 2*10^54*10^56*10^58*10^5 10^6

    010

    020

    030

    040

    0

    FFT

    RMD_mn

    Aggregation

    log2(Trace Length)

    Time C

    omple

    xity (

    sec)

    14 15 16 17 18 19 20

    010

    020

    030

    040

    0

    FFTRMD_mn

    Aggregation

    Figure 3.6: Comparing the time complexity of the FFT method and the RMD1;2algorithm.3.3 Implementation Issues and Computational ComplexityThe main structure of our implementation and some statistics about runtime arepresented in this section.Because the covariance matrices �ik are needed at each step of the algorithmand, furthermore, they are independent of i if multiplied by the scaling term21

  • 2�2Hi, it is natural to calculate them in the beginning. Equivalently to (2.2) wehave e(i; k) = �(1;2)ik ��(2;2)ik ��1and v(i; k) = det �ikdet �(2;2)ik = Var(Xi;2k+1)� �(1;2)ik ��(2;2)ik ��1 �(2;1)ik ;where the last equivalence is due to simple manipulations of matrices. Because�(2;2)ik is a symmetric matrix, we can solve e(i; k) and v(i; k) using the Choleskymethod. (If m and n are small enough, e(i; k) and v(i; k) can be solved exactly,e.g., with Mathematica; thus, if one knows beforehand that neither large m norlarge n are needed then it is maybe better to use the exact covariance matrices.)After initialization of e(i; k) and v(i; k), carrying through the FBM simulationis only a matter of generating random numbers and performing simple matrixalgebra.Runtime as a function of trace length and conditionalizing parameters isshown in table 3.2. The real runtime varied from one second to one minutedemonstrating that the algorithm is really a fast one. The di�erence in runtime ofthe traces with di�erentm and n is mainly due the time spent in the initialization� the bigger the covariance matrix the slower the Cholesky method. After solvinge(i; k) and v(i; k), the simulation runs about same speed for all n and m not toobig, say, smaller than 30.Conditionalizing Length of traceparameters 212 215 218 221m = 0 n = 1 0.02 0.04 0.26 3.39m = 4 n = 3 0.02 0.04 0.26 3.40m = 10 n = 6 0.02 0.05 0.27 3.42m = 20 n = 12 0.07 0.10 0.32 3.49m = 30 n = 20 0.28 0.32 0.53 3.70Table 3.2: CPU time in seconds elapsed when simulating FBM in a HP9000/J210XC workstation.4 ConclusionsIn this paper, we considered an approximate FBM generation method, RMDmn,based on the concept of bisection and interpolation, which is a generalization22

  • of the ordinary random midpoint displacement (RMD) algorithm. We furthermodi�ed RMDmn to be able to generate FBM traces on-the-�y. We discussedboth mathematical and numerical aspects of RMDmn as well as compared itwith two other methods widely favoured in generating FBM traces for teletra�cmodeling purposes: the fast Fourier transform (FFT) method and the methodbased on aggregating a large number of ON-OFF sources with in�nite-variancesojourn times.One should notice that the fair comparison between di�erent methods is avery di�cult task while all approximate generation methods have their pro's andcon's. By choice of the aspects to compare, almost any method can be oversold.Our comparison is strictly based on the criteria mentioned in this paper, andother criteria, such as the need to demultiplex the trace into di�erent individualtra�c streams, may need to be considered in the choice of the appropriate tra�cgeneration method in practice.We have made the following observations:� the time complexity of all three algorithms grows linearly with the desiredtrace length;� the aggregation method requires a couple of orders of magnitude of timeperiods to converge to the asymptotic results, and is therefore much slower;� the FFT method is comparable to the RMDmn algorithm in terms of qualityand time complexity, however, the FFT method is strictly a top-downalgorithm, that is, the whole trace has to be generated before it can beused in any application;� the RMDmn algorithm gives quite satisfactory results both in terms of thetrace quality as well as the computational e�ciency, in addition to theability to generate synthetic traces on-the-�y, and thus, in our opinions, isa better choice for the FBM tra�c generation among the three.� it is possible to design algorithms based on some other techniques (e.g.wavelets), which are superior to RMDmn in some aspects, but outperform-ing the simplicity of the RMDmn algorithm may be di�cult.As a �nal remark, we note that it is straightforward to generalize RMDmnto produce good traces of any centered Gaussian process Y with stationary in-crements and a variance function v(t) = EY 2t . The only di�erence to our caseis that without self-similarity, one has to generate the needed multiplier vectorsand constants for each resolution separately, which increases memory require-ments and the duration of the initialization phase. When this has been done,the generation algorithm runs as fast as in our case.23

  • References[1] P. Abry and D. Sellan. The Wavelet-based Synthesis for the FractionalBrownian Motion Proposed by F. Sellan and Y. Meyer: Remarks and FastImplementation. Appl. Comp. Harmonic Anal., 3:377-383, 1996.[2] P. Abry and D. Veitch. Long-range dependence: Revisiting aggregation withwavelets. Journal of Time Series Analysis, 19(3):253�266, 1998.[3] P. Abry and D. Veitch. Wavelet analysis of long-range dependent tra�c.IEEE Trans. Info. Theory, 44(1):2�15, 1998.[4] D.R. Cox. Long-range dependence: A review. In H.A. David and H.T.David, editors, Statistics: An Appraisal, pages 55�74. The Iowa State Uni-versity Press, Ames, Iowa, 1984.[5] P. Flandrin. Wavelet analysis and synthesis of fractional Brownian motion.IEEE Transactions on Information Theory, 38(2), 1992.[6] C.W.J. Granger. Long memory relationships and the aggregation of dynamicmodels. J. Econometr., 14:227�238, 1980.[7] J.L. Jerkins and J.L. Wang. A measurement analysis of ATM cell-levelaggrate tra�c. In Proc. IEEE Globecom, pages 1589�1595, Phoenix, AZ,USA, November, 1997.[8] A.N. Kolmogorov. Wienersche Spiralen und einige andere interessante Kur-ven im Hilbertschen Raum. C.R. (Doklady) Acad. Sci. USSR (N.S.), 26:115�118, 1940.[9] W.-C. Lau, A. Erramilli, J.L. Wang, and W. Willinger. Self-similar tra�cgeneration: The random midpoint displacement algorithm and its proper-ties. In 1995 IEEE International Conference on Communications (ICC'95),pages 466�472, Seattle, USA, 1995.[10] W.E. Leland, M.S. Taqqu, W. Willinger, and D.V. Wilson. On the self-similar nature of Ethernet tra�c (extended version). IEEE/ACM Transac-tions on Networking, 2(1):1�15, February 1994.[11] B.B. Mandelbrot. A fast fractional Gaussian noise generator. Water Re-sources Research, 7:543�553, 1971.[12] B.B. Mandelbrot. Fractals. Form, Chance, and Dimension. W.H. Freemanand Company, San Francisco, 1977.24

  • [13] B.B. Mandelbrot and J.W. Van Ness. Fractional Brownian motions, frac-tional noises and applications. SIAM Review, 10:422�437, 1968.[14] I. Norros. Studies on a model for connectionless tra�c, based on frac-tional Brownian motion. Technical Report 242TD(92)041, COST, 1992.(Presented also in June 1993 at the Conference on Applied Probability inEngineering, Computer and Communication Sciences, Paris).[15] I. Norros. On the use of fractional Brownian motion in the theory of con-nectionless networks. IEEE Journal on Selected Areas in Communications,13(6), August 1995.[16] I. Norros, E. Valkeila, and J. Virtamo. An elementary approach to a Gir-sanov formula and other analytical results on fractional Brownian motions.To appear in Bernoulli in 1999.[17] V. Paxson. Fast, approximate synthesis of fractional Gaussian noise forgenerating self-similar network tra�c. Computer Communication Review,27(5):5�18, 1997.[18] V. Paxson and S. Floyd Wide-area tra�c: The failure of Poisson modeling.In Proc. ACM Sigcomm, pages 257�268, London, UK, 1994.[19] P. Pruthi. An Application of Chaotic Maps to Packet Tra�c Modeling. PhDthesis, Royal Institute of Technology, Dept of Teleinformatics, 1995. ISSN1103-534X.[20] F. Sellan. Synthèse de mouvements browniens fractionnaires à l'aide de latransformation par ondelettes. C.R. Acad. Sci. Paris, Série I, 321:351�358,1995.[21] M.A. Stoksik, R.G. Lane, and D.T. Nguyen. Accurate synthesis of fractionalBrownian motion using wavelets. Electronics Letters, 30(5):383�384, 1994.[22] W. Willinger, M.S. Taqqu, R. Sherman, and D.V. Wilson. Self-similaritythrough high-variability: Statistical analysis of Ethernet LAN tra�c at thesource level. In Proc. ACM/Sigcomm'95, pages 100�113, Cambridge, MA,USA, 1995.25