Persistence 1

Embed Size (px)

Citation preview

  • Computing Persistent Homology

    Afra Zomorodian and Gunnar Carlsson

    (Discrete and Computational Geometry)

    Abstract

    We show that the persistent homology of a filtered d-dimensional simplicial complex is simply the standard homol-ogy of a particular graded module over a polynomial ring. Ouranalysis establishes the existence of a simple description ofpersistent homology groups over arbitrary fields. It also en-ables us to derive a natural algorithm for computing persistenthomology of spaces in arbitrary dimension over any field. Thisresult generalizes and extends the previously known algorithmthat was restricted to subcomplexes of S3 and Z2 coefficients.Finally, our study implies the lack of a simple classificationover non-fields. Instead, we give an algorithm for computingindividual persistent homology groups over an arbitrary prin-cipal ideal domains in any dimension.

    1 IntroductionIn this paper, we study the homology of a filtered d-dimensional simplicial complex K, allowing an arbi-trary principal ideal domain D as the ground ring of co-efficients. A filtered complex is an increasing sequenceof simplicial complexes, as shown in Figure 1. It deter-mines an inductive system of homology groups, i.e., afamily of Abelian groups {Gi}i0 together with homo-morphisms Gi Gi+1. If the homology is computedwith field coefficients, we obtain an inductive system ofvector spaces over the field. Each vector space is deter-mined up to isomorphism by its dimension. In this paperwe obtain a simple classification of an inductive systemof vector spaces. Our classification is in terms of a set of

    Research by the first author is partially supported by NSF un-der grants CCR-00-86013 and ITR-0086013. Research by the sec-ond author is partially supported by NSF under grant DMS-0101364.Research by both authors is partially supported by NSF under grantDMS-0138456.

    Department of Computer Science, Stanford University, Stanford,California.

    Department of Mathematics, Stanford University, Stanford, Cali-fornia.

    d

    a b

    c

    2 cd, ad

    d

    a b

    cd

    a b

    cd

    a b

    c

    3 ac 4 abc 5 acd

    a b

    d

    a b

    c

    1 c, d,a, b0 ab, bc

    Figure 1. A filtered complex with newly added simplices high-lighted.

    intervals. We also derive a natural algorithm for comput-ing this family of intervals. Using this family, we mayidentify homological features that persist within the fil-tration, the persistent homology of the filtered complex.

    Furthermore, our interpretation makes it clear that ifthe ground ring is not a field, there exists no similarlysimple classification of persistent homology. Rather, thestructures are very complicated, and although we mayassign interesting invariants to them, no simple classifi-cation is, or is likely ever to be, available. In this casewe provide an algorithm for computing a single persis-tent group for the filtration.

    In the rest of this section we first motivate our studythrough three examples in which filtered complexesarise whose persistent homology is of interest. We thendiscuss prior work and its relationship to our work. Weconclude this section with an outline of the paper.

    1.1 Motivation

    We call a filtered simplicial complex, along with its as-sociated chain and boundary maps, a persistence com-plex. We formalize this concept in Section 3. Per-sistence complexes arise naturally whenever one is at-tempting to study topological invariants of a space com-putationally. Often, our knowledge of this space is lim-ited and imprecise. Consequently, we must utilize amultiscale approach to capture the connectivity of thespace, giving us a persistence complex.

  • Example 1.1 (point cloud data) Suppose we are givena finite set of pointsX from a subspaceX Rn. We callX point cloud data or PCD for short. It is reasonable tobelieve that if the sampling is dense enough, we shouldbe able to compute the topological invariants of X di-rectly from the PCD. To do so, we may either computethe Cech complex, or approximate it via a Rips com-plex [15]. The latter complex R(X) has X as its vertexset. We declare a set of vertices = {x0, x1, . . . , xk}to span a k-simplex of R(X) iff the vertices are pair-wise close, that is, d(xi, xj) for all pairs xi, xj .There is an obvious inclusion R(X) R(X) when-ever < . In other words, for any increasing sequenceof non-negative real numbers, we obtain a persistencecomplex.

    Example 1.2 (density) Often, our samples are not froma geometric object, but are heavily concentrated on it. Itis important, therefore, to compute a measure of den-sity of the data around each sample. For instance, wemay count the number of samples (x) contained in aball of size around each sample x. We may then de-fine Rr (X) R to be the Rips subcomplex on thevertices for which (x) r. Again, for any increasingsequence of non-negative real numbers r, we obtain apersistence complex. We must analyze this complex tocompute topological invariants attached to the geomet-ric object around which our data is concentrated.

    Example 1.3 (Morse functions) Given a manifold Mequipped with a Morse function f , we may filter Mvia the excursion sets Mr = {m M | f(m) r}.We again choose an increasing sequence of non-negativenumbers to get a persistence complex. If the Morsefunction is a height function attached to some embed-ding of M in Rn, persistent homology can now give in-formation about the shape of the submanifolds, as wellthe homological invariants of the total manifold.

    1.2 Prior WorkWe assume familiarity with basic group theory and referthe reader to Dummit and Foote [10] for an introduc-tion. We make extensive use of Munkres [16] in our de-scription of algebraic homology and recommend it as anaccessible resource to non-specialists. There is a largebody of work on the efficient computation of homologygroups and their ranks [1, 8, 9, 13].

    Persistent homology groups are initially defined in[11, 18]. The authors also provide an algorithm thatworked only for spaces that were subcomplexes of S3overZ2 coefficients. The algorithm generates a set of in-tervals for a filtered complex. Surprisingly, the authors

    show that these intervals allowed the correct computa-tion of the rank of persistent homology groups. In otherwords, the authors prove constructively that persistenthomology groups of subcomplexes of S3, if computedover Z2 coefficients, have a simple description in termsof a set of intervals. To build these intervals, the algo-rithm pairs positive cycle-creating simplices with neg-ative cycle-destroying simplices. During the computa-tion, the algorithm ignores negative simplices and al-ways looks for the youngest simplex. While the authorsprove the correctness of the results of the algorithm, theunderlying structure remains hidden.

    1.3 Our WorkWe are motivated primarily by the unexplained resultsof the previous work. We wish to answer the followingquestions:

    1. Why does a simple description exist for persistenthomology of subcomplexes of S3 over Z2?

    2. Does this description also exist over other ringsof coefficients and arbitrary-dimensional simplicialcomplexes?

    3. Why can we ignore negative simplices during com-putation?

    4. Why do we always look for the youngest simplex?

    5. What is the relationship between the persistence al-gorithm and the standard reduction scheme?

    In this paper we resolve all these questions by uncov-ering and elucidating the structure of persistent homol-ogy. Specifically, we show that the persistent homologyof a filtered d-dimensional simplicial complex is simplythe standard homology of a particular graded moduleover a polynomial ring. Our analysis places persistenthomology within the classical framework of algebraictopology. This placement allows us to utilize a standardstructure theorem to establish the existence of a simpledescription of persistent homology groups as a set ofintervals, answering the first question above. This de-scription exists over arbitrary fields, not just Z2 as in theprevious result, resolving the second question.

    Our analysis also enables us to derive a persistencealgorithm from the standard reduction scheme in alge-bra, resolving the next three questions using two mainlemmas. Our algorithm generalizes and extends the pre-viously known algorithm to complexes in arbitrary di-mensions over arbitrary fields of coefficients. We alsoshow that if we consider integer coefficients or coeffi-cients in some non-field R, there is no similar simple

    2

  • classification. This negative result suggests the possibil-ity of interesting yet incomplete invariants of inductivesystems. For now, we give an algorithm for classifying asingle homology group over an arbitrary principal idealdomain.

    1.4 Spectral SequencesAny filtered complex gives rise to a spectral sequence,so it is natural to wonder about the relationship be-tween this sequence and persistence. A full discussionon spectral sequences is outside the scope of this pa-per. However, we include a few remarks here for thereader who is familiar with the subject. We may easilyshow that the persistence intervals for a filtration corre-spond to nontrivial differentials in the spectral sequencethat arises from the filtration. Specifically, an interval oflength r corresponds to some differential dr+1. Giventhis correspondence, we realize that the method of spec-tral sequences computes persistence intervals in order oflength, finding all intervals of length r during the com-putation of the Er+1 term. In principle, we may use thismethod to compute the result of our algorithm. How-ever, the method does not provide an algorithm, but ascheme that must be tailored for each problem indepen-dently. The practitioner must decide on an appropriatebasis, find the zero terms in the sequence, and deducethe nature of the differentials. Our analysis of persis-tent homology, on the other hand, provides a complete,effective, and implementable algorithm for any filteredcomplex.

    1.5 OutlineWe begin by reviewing concepts from algebra and sim-plicial homology in Section 2. We also re-introducepersistent homology over integers and arbitrary dimen-sions. In Section 3 we define and study the persistencemodule, a structure that represents the homology of a fil-tered complex. In addition, we establish a relationshipbetween our results and prior work. Using our analy-sis, we derive an algorithm for computation over fieldsin Section 4. For non-fields, we describe an algorithmin Section 5 that computes individual persistent groups.Section 6 describes our implementation and some ex-periments. We conclude the paper in Section 7 with adiscussion of current and future work.

    2 BackgroundIn this section we review the mathematical and algo-rithmic background necessary for our work. We beginby reviewing the structure of finitely generated modules

    over principal ideal domains. We then discuss simpli-cial complexes and their associated chain complexes.Putting these concepts together, we define simplicial ho-mology and outline the standard algorithm for its com-putation. We conclude this section by describing persis-tent homology.

    2.1 AlgebraThroughout this paper we assume a ring R to be com-mutative with unity. A polynomial f(t) with coefficientsin R is a formal sum

    i=0 ait

    i, where ai R and t is

    the indeterminate. For example, 2t+ 3 and t7 5t2 areboth polynomials with integer coefficients. The set ofall polynomials f(t) over R forms a commutative ringR[t] with unity. If R has no divisors of zero, and all itsideals are principal, it is a principal ideal domain (PID).For our purposes, a PID is simply a ring in which wemay compute the greatest common divisor or gcd of apair of elements. This is the key operation needed bythe structure theorem that we discuss below. PIDs in-clude the familiar rings Z, Q, and R. Finite fields Zpfor p a prime, as well as F [t], polynomials with coeffi-cients from a field F , are also PIDs and have effectivealgorithms for computing the gcd [6].

    A graded ring is a ring R,+, equipped witha direct sum decomposition of Abelian groups R =

    iRi, i Z, so that multiplication is defined by bi-linear pairings Rn Rm Rn+m. Elements in asingle Ri are called homogeneous and have degree i,deg e = i for all e Ri. We may grade the polyno-mial ring R[t] non-negatively with the standard grading(tn) = tn R[t], n 0. In this grading, 2t6 and 7t3 areboth homogeneous of degree 6 and 3, respectively, buttheir sum 2t6 + 7t3 is not homogeneous. The productof the two terms, 14t9, has degree 9 as required by thedefinition. A graded module M over a graded ring Ris a module equipped with a direct sum decomposition,M = iMi, i Z, so that the action of R on M isdefined by bilinear pairings Rn Mm Mn+m. Themain structure in our paper is a graded module and weinclude concrete examples that clarify this concept lateron. A graded ring (module) is non-negatively graded ifRi = 0 (Mi = 0) for all i < 0.

    The standard structure theorem describes finitely gen-erated modules and graded modules over PIDs.

    Theorem 2.1 (structure) If D is a PID, then everyfinitely generated D-module is isomorphic to a directsum of cyclic D-modules. That is, it decomposesuniquely into the form

    D (

    mi=1

    D/diD

    ), (1)

    3

  • for di D, Z, such that di|di+1. Similarly, ev-ery graded module M over a graded PID D decomposesuniquely into the form

    (n

    i=1

    iD

    ) m

    j=1

    jD/djD

    , (2)where dj D are homogeneous elements so thatdj |dj+1, i, j Z, and denotes an -shift upwardin grading.

    In both cases, the theorem decomposes the structuresinto two parts. The free portion on the left includesgenerators that may generate an infinite number of el-ements. This portion is a vector space and should befamiliar to most readers. Decomposition (1) has a vec-tor space of dimension . The torsional portion on theright includes generators that may generate a finite num-ber of elements. For example, if PID D is Z in the theo-rem, Z/3Z = Z3 would represent a generator capable ofonly creating three elements. These torsional elementsare also homogeneous. Intuitively then, the theorem de-scribes finitely generated modules and graded modulesas structures that look like vector spaces but also havesome dimensions that are finite in size.

    2.2 Simplicial Complexes

    A simplicial complex is a set K, together with a col-lection S of subsets of K called simplices (singular sim-plex) such that for all v K, {v} S, and if S,then S. We call the sets {v} the vertices ofK. Whenit is clear from context what S is, we refer to set K asa complex. We say S is a k-simplex of dimensionk if || = k + 1. If , is a face of , and is a coface of . An orientation of a k-simplex , ={v0, . . . , vk}, is an equivalence class of orderings of thevertices of , where (v0, . . . , vk) (v(0), . . . , v(k))are equivalent if the sign of is 1. We denote an ori-ented simplex by []. A simplex may be realized geo-metrically as the convex hull of k + 1 affinely indepen-dent points in Rd, d k. A realization gives us thefamiliar low-dimensional k-simplices: vertices, edges,triangles, and tetrahedra, for 0 k 3, shown in Fig-ure 2. Within a realized complex, the simplices mustmeet along common faces. A subcomplex of K is a sub-set L K that is also a simplicial complex. A filtrationof a complex K is a nested subsequence of complexes = K0 K1 . . . Km = K. For generality,we let Ki = Km for all i m. We call K a filteredcomplex. We show a small filtered complex in Figure 1.

    tetrahedron

    a

    b

    c

    d

    [a, b, c, d]

    c

    a

    b

    triangle[a, b, c]

    edge

    a b

    [a, b]vertex

    a

    a

    Figure 2. Oriented k-simplices in R3, 0 k 3. The orien-tation on the tetrahedron is shown on its faces.

    2.3 Chain ComplexThe kth chain group Ck of K is the free Abelian groupon its set of oriented k-simplices, where [] = [ ] if = and and are differently oriented.. An elementc Ck is a k-chain, c =

    i ni[i], i K with coeffi-

    cients ni Z. The boundary operator k : Ck Ck1is a homomorphism defined linearly on a chain c by itsaction on any simplex = [v0, v1, . . . , vk] c,

    k =i

    (1)i[v0, v1, . . . , vi, . . . , vk],

    where vi indicates that vi is deleted from the sequence.The boundary operator connects the chain groups into achain complex C:

    Ck+1 k+1 Ck k Ck1 .We may also define subgroups of Ck using the boundaryoperator: the cycle group Zk = ker k and the bound-ary group Bk = im k+1. We show examples of cy-cles in Figure 3. An important property of the boundaryoperators is that the boundary of a boundary is alwaysempty, kk+1 = 0. This fact, along with the defi-nitions, implies that the defined subgroups are nested,Bk Zk Ck, as in Figure 4. For generality, we oftendefine null boundary operators in dimensions where Ckis empty.

    2.4 HomologyThe kth homology group is Hk = Zk/Bk. Its elementsare classes of homologous cycles. To describe its struc-ture, we view the Abelian groups we have defined so far

    Figure 3. The dashed 1-boundary rests on the surface of atorus. The two solid 1-cycles form a basis for the first homologyclass of the torus. These cycles are non-bounding: neither is aboundary of a piece of surface.

    4

  • Ck

    Bk1

    Zk1

    Ck1k+1 kCk+1

    0 00

    Z k

    kB

    Z k+1

    k+1B

    Figure 4. A chain complex with its internals: chain, cycle, andboundary groups, and their images under the boundary opera-tors.

    as modules over the integers. This view allows alter-nate ground rings of coefficients, including fields. If thering is a PID D, Hk is a D-module and Theorem (2.1)applies: , the rank of the free submodule, is the Bettinumber of the module, and di are its torsion coefficients.When the ground ring is Z, the theorem above describesthe structure of finitely generated Abelian groups. Overa field, such asR,Q, orZp for p a prime, the torsion sub-module disappears. The module is a vector space that isfully described by a single integer, its rank , which de-pends on the chosen field.

    2.5 ReductionThe standard method for computing homology is the re-duction algorithm. We describe this method for integercoefficients as it is the more familiar ring. The methodextends to modules over arbitrary PIDs, however.

    As Ck is free, the oriented k-simplices form the stan-dard basis for it. We represent the boundary operatork : Ck Ck1 relative to the standard bases of thechain groups as an integer matrix Mk with entries in{1, 0, 1}. The matrix Mk is called the standard ma-trix representation of k. It has mk columns and mk1rows (the number of k- and (k 1)-simplices, respec-tively). The null-space of Mk corresponds to Zk and itsrange-space to Bk1, as manifested in Figure 4. The re-duction algorithm derives alternate bases for the chaingroups, relative to which the matrix for k is diagonal.The algorithm utilizes the following elementary row op-erations on Mk:

    1. exchange row i and row j,2. multiply row i by 1,3. replace row i by (row i) + q(row j), where q is an

    integer and j 6= i.The algorithm also uses elementary column operationsthat are similarly defined. Each column (row) opera-tion corresponds to a change in the basis for Ck (Ck1).For example, if ei and ej are the ith and jth basis ele-ments for Ck, respectively, a column operation of type(3) amounts to replacing ei with ei+ qej . A similar row

    operation on basis elements ei and ej for Ck1, how-ever, replaces ej by ej qei. We shall make use of thisfact in Section 4. The algorithm systematically modifiesthe bases of Ck and Ck1 using elementary operationsto reduce Mk to its (Smith) normal form:

    Mk =

    b1 0.

    .

    . 00 blk

    0 0

    ,

    where lk = rankMk = rank Mk, bi 1, and bi|bi+1for all 1 i < lk. The algorithm can also computecorresponding bases {ej} and {ei} for Ck and Ck1,respectively, although this is unnecessary if a decompo-sition is all that is needed. Computing the normal formin all dimensions, we get a full characterization of Hk:

    (i) the torsion coefficients of Hk1 (di in (1)) are pre-cisely the diagonal entries bi greater than one.

    (ii) {ei | lk+1 i mk} is a basis for Zk. Therefore,rankZk = mk lk.

    (iii) {biei | 1 i lk} is a basis for Bk1. Equiva-lently, rankBk = rankMk+1 = lk+1.

    Combining (ii) and (iii), we have

    k = rankZk rankBk = mk lk lk+1. (3)

    Example 2.1 For the complex in Figure 1, the standardmatrix representation of 1 is

    M1 =

    ab bc cd ad ac

    a 1 0 0 1 1b 1 1 0 0 0c 0 1 1 0 1d 0 0 1 1 0

    ,where we show the bases within the matrix. Reducingthe matrix, we get the normal form

    M1 =

    cd bc ab z1 z2

    d c 1 0 0 0 0c b 0 1 0 0 0b a 0 0 1 0 0a 0 0 0 0 0

    ,where z1 = ad bc cd ab and z2 = ac bc abform a basis for Z1 and {d c, c b, b a} is a basisfor B0.

    5

  • We may use a similar procedure to compute homol-ogy over graded PIDs. A homogeneous basis is a basisof homogeneous elements. We begin by representing krelative to the standard basis of Ck (which is homoge-neous) and a homogeneous basis for Zk1. Reducingto normal form, we read off the description provided bydirect sum (2) using the new basis {ej} for Zk1:

    (i) zero row i contributes a free term with shift i =deg ei,

    (ii) row with diagonal term bi contributes a torsionalterm with homogeneous dj = bj and shift j =deg ej .

    The reduction algorithm requires O(m3) elementaryoperations, where m is the number of simplices in K.The operations, however, must be performed in exactinteger arithmetic. This is problematic in practice, asthe entries of the intermediate matrices may become ex-tremely large.

    2.6 PersistenceWe end this section with by re-introducing persistence.Given a filtered complex, the ith complex Ki has asso-ciated boundary operators ik, matrices M ik, and groupsCik,Zik,Bik and Hik for all i, k 0. Note that super-scripts indicate the filtration index and are not related tocohomology. The p-persistent kth homology group ofKi is

    Hi,pk = Zik / (B

    i+pk Zik). (4)

    The definition is well-defined: both groups in the de-nominator are subgroups of Cl+pk , so their intersectionis also a group, a subgroup of the numerator. The p-persistent kth Betti number of Ki is i,pk , the rank ofthe free subgroup of Hi,pk . We may also define persistenthomology groups using the injection i,pk : Hik Hi+pk ,that maps a homology class into the one that containsit. Then, im i,pk ' Hi,pk [11, 18]. We extend thisdefinition over arbitrary PIDs, as before. Persistent ho-mology groups are modules and Theorem 2.1 describestheir structure.

    3 The Persistence ModuleIn this section we take a different view of persistenthomology in order to understand its structure. Intu-itively, the computation of persistence requires compat-ible bases for Hik and H

    i+pk . It is not clear when a suc-

    cinct description is available for the compatible bases.We begin this section by combining the homology of

    all the complexes in the filtration into a single alge-braic structure. We then establish a correspondence thatreveals a simple description over fields. Most signifi-cantly, we illustrate that the persistent homology of afiltered complex is simply the standard homology of aparticular graded module over a polynomial ring. A sim-ple application of the structure theorem (Theorem 2.1)gives us the needed description. We end this section byillustrating the relationship of our structures to the per-sistence equation (Equation (4).)

    Definition 3.1 (persistence complex) A persistencecomplex C is a family of chain complexes {C i}i0 overR, together with chain maps f i : C i C i+1 , so thatwe have the following diagram:

    C 0f0 C 1 f

    1

    C 2 f2

    .Our filtered complexK with inclusion maps for the sim-plices becomes a persistence complex. Below, we showa portion of a persistence complex, with the chain com-plexes expanded. The filtration index increases hori-zontally to the right under the chain maps f i, and thedimension decreases vertically to the bottom under theboundary operators k.

    3

    y 3y 3yC02

    f0 C12 f1

    C22 f2

    2

    y 2y 2yC01

    f0 C11 f1

    C21 f2

    1

    y 1y 1yC00

    f0 C10 f1

    C20 f2

    Definition 3.2 (persistence module) A persistencemodule M is a family of R-modules M i, together withhomomorphisms i : M i M i+1.For example, the homology of a persistence complex is apersistence module, where i simply maps a homologyclass to the one that contains it.

    Definition 3.3 (finite type) A persistence complex{Ci, f i} (persistence module {M i, i}) is of finite typeif each component complex (module) is a finitely gen-erated R-module, and if the maps f i (i, respectively)are isomorphisms for i m for some integer m.As our complex K is finite, it generates a persistencecomplex C of finite type, whose homology is a persis-tence module M of finite type. We showed in the Intro-duction how such complexes arise in practice.

    6

  • 3.1 CorrespondenceSuppose we have a persistence module M ={M i, i}i0 over ring R. We now equip R[t] with thestandard grading and define a graded module over R[t]by

    (M) =i=0

    M i,

    where the R-module structure is simply the sum of thestructures on the individual components, and where theaction of t is given by

    t(m0,m1,m2, . . .) = (0, 0(m0), 1(m1), 2(m2), . . .).That is, t simply shifts elements of the module up in thegradation.

    Theorem 3.1 (correspondence) The correspondence defines an equivalence of categories between thecategory of persistence modules of finite type over Rand the category of finitely generated non-negativelygraded modules over R[t].

    The proof is the Artin-Rees theory in commutative alge-bra [12].

    Intuitively, we are building a single structure that con-tains all the complexes in the filtration. We begin bycomputing a direct sum of the complexes, arriving at amuch larger space that is graded according to the filtra-tion ordering. We then remember the time each simplexenters using a polynomial coefficient. For instance, sim-plex a enters the filtration in Figure 1 at time 0. To shiftthis simplex along the grading, we must multiply thesimplex using t. Therefore, while a exists at time 0, t aexists at time 1, t2 a at time 2, and so on. The keyidea is that the filtration ordering is encoded in the co-efficient polynomial ring. We utilize these coefficientsin Section 4 to derive the persistence algorithm from thereduction scheme in Section 2.5.

    3.2 DecompositionThe correspondence established by Theorem 3.1 sug-gests the non-existence of simple classifications of per-sistence modules over a ground ring that is not a field,such as Z. It is well known in commutative algebra thatthe classification of modules overZ[t] is extremely com-plicated. While it is possible to assign interesting in-variants to Z[t]-modules, a simple classification is notavailable, nor is it ever likely to be available.

    On the other hand, the correspondence gives us a sim-ple decomposition when the ground ring is a field F .Here, the graded ring F [t] is a PID and its only gradedideals are homogeneous of form (tn), so the structure of

    the F [t]-module is described by sum (2) in Theorem 2.1:(n

    i=1

    iF [t]

    ) m

    j=1

    jF [t]/(tnj )

    . (5)We wish to parametrize the isomorphism classes ofF [t]-modules by suitable objects.

    Definition 3.4 (P-interval) A P-interval is an orderedpair (i, j) with 0 i < j Z = Z {+}.

    We associate a graded F [t]-module to a set S of P-intervals via a bijection Q. We define Q(i, j) =iF [t]/(tji) for P-interval (i, j). Of course,Q(i,+) = iF [t]. For a set of P-intervals S ={(i1, j1), (i2, j2) . . . , (in, jn)}, we define

    Q(S) =nl=1

    Q(il, jl).

    Our correspondence may now be restated as follows.

    Corollary 3.1 The correspondence S Q(S) definesa bijection between the finite sets of P-intervals and thefinitely generated graded modules over the graded ringF [t]. Consequently, the isomorphism classes of persis-tence modules of finite type over F are in bijective cor-respondence with the finite sets of P-intervals.

    3.3 InterpretationBefore proceeding any further, we recap our work so farand relate it to prior results. Recall that our input is afiltered complex K and we are interested in its kth ho-mology. In each dimension the homology of complexKi becomes a vector space over a field, described fullyby its rank ik. We need to choose compatible basesacross the filtration in order to compute persistent ho-mology for the entire filtration. So, we form the persis-tence module corresponding to K, a direct sum of thesevector spaces. The structure theorem states that a ba-sis exists for this module that provides compatible basesfor all the vector spaces. In particular, each P-interval(i, j) describes a basis element for the homology vec-tor spaces starting at time i until time j 1. This ele-ment is a k-cycle e that is completed at time i, forming anew homology class. It also remains non-bounding un-til time j, at which time it joins the boundary group Bjk.Therefore, the P-intervals discussed here are preciselythe so-called k-intervals utilized in [11] to describe per-sistent Z2-homology. That is, while component homol-ogy groups are torsionless, persistence appears as tor-sional and free elements of the persistence module.

    7

  • Our interpretation also allows us to ask when e+ Blkis a basis element for the persistent groups Hl,pk . RecallEquation (4). As e 6 Blk for all l < j, we know thate 6 Bl+pk for l + p < j. Along with l i and p 0,the three inequalities define a triangular region in theindex-persistence plane, as drawn in Figure 5. The re-gion gives us the values for which the k-cycle e is a basiselement for Hl,pk . In other words, we have just shown adirect proof of the k-triangle Lemma in [11], which werestate here in a different form.

    Lemma 3.1 Let T be the set of triangles defined by P-intervals for the k-dimensional persistence module. Therank l,pk of H

    l,pk is the number of triangles in T contain-

    ing the point (l, p).

    Consequently, computing persistent homology over afield is equivalent to finding the corresponding set of P-intervals.

    p 0>

    >l i

    persistence (p)

    index (l)

    (i, 0)

    (i, j i)

    (j, 0)(i, j i)

    (j, 0)(i, 0)

    l+p <

    j

    Figure 5. The inequalities p 0, l i, and l+p < j define atriangular region in the index-persistence plane. This region de-fines when the cycle is a basis element for the homology vectorspace.

    4 Algorithm for FieldsIn this section we devise an algorithm for computingpersistent homology over a field. Given the theoret-ical development of the last section, our approach israther simple: we simplify the standard reduction al-gorithm using the properties of the persistence module.Our arguments give an algorithm for computing the P-intervals for a filtered complex directly over the field F ,without the need for constructing the persistence mod-ule. This algorithm is a generalized version of the pair-ing algorithm shown in [11].

    4.1 DerivationWe use the small filtration in Figure 1 as a running ex-ample and compute over Z2, although any field will do.The persistence module corresponds to a Z2[t]-moduleby the correspondence established in Theorem 2.1. Ta-ble 1 reviews the degrees of the simplices of our filtra-tion as homogeneous elements of this module.

    a b c d ab bc cd ad ac abc acd0 0 1 1 1 1 2 2 3 4 5

    Table 1. Degree of simplices of filtration in Figure 1

    Throughout this section we use {ej} and {ei} to rep-resent homogeneous bases for Ck and Ck1, respec-tively. Relative to homogeneous bases, any represen-tation Mk of k has the following basic property:

    deg ei + degMk(i, j) = deg ej , (6)where Mk(i, j) denotes the element at location (i, j).We get

    M1 =

    ab bc cd ad ac

    d 0 0 t t 0c 0 1 t 0 t2

    b t t 0 0 0a t 0 0 t2 t3

    , (7)for 1 in our example. The reader may verify Equa-tion (6) using this example for intuition, e.g. M1(4, 4) =t2 as deg ad deg a = 2 0 = 2, according to Table 1.

    Clearly, the standard bases for chain groups are ho-mogeneous. We need to represent k : Ck Ck1 rel-ative to the standard basis for Ck and a homogeneousbasis for Zk1. We then reduce the matrix and readoff the description of Hk according to our discussionin Section 2.5. We compute these representations in-ductively in dimension. The base case is trivial. As0 0, Z0 = C0 and the standard basis may be usedfor representing 1. Now, assume we have a matrixrepresentation Mk of k relative to the standard basis{ej} for Ck and a homogeneous basis {ei} for Zk1.For induction, we need to compute a homogeneous ba-sis for Zk and represent k+1 relative to Ck+1 and thecomputed basis. We begin by sorting basis ei in re-verse degree order, as already done in the matrix inEquation (7). We next transform Mk into the column-echelon form Mk, a lower staircase form shown in Fig-ure 6 [17]. The steps have variable height, all landingshave width equal to one, and non-zero elements mayonly occur beneath the staircase. A boxed value in thefigure is a pivot and a row (column) with a pivot is called

    8

  • 0 0

    0 0 ...

    0 0 0

    Figure 6. The column-echelon form. An indicates a non-zerovalue and pivots are boxed.

    a pivot row (column). From linear algebra, we knowthat rankMk = rankBk1 is the number of pivots inan echelon form. The basis elements corresponding tonon-pivot columns form the desired basis for Zk. In ourexample, we have

    M1 =

    cd bc ab z1 z2

    d t 0 0 0 0c t 1 0 0 0b 0 t t 0 0a 0 0 t 0 0

    , (8)

    where z1 = ad cd t bc t ab, and z2 = ac t2 bc t2 ab form a homogeneous basis for Z1.

    The procedure that arrives at the echelon form isGaussian elimination on the columns, utilizing elemen-tary column operations of types (1, 3) only. Startingwith the left-most column, we eliminate non-zero en-tries occurring in pivot rows in order of increasing row.To eliminate an entry, we use an elementary column op-eration of type (3) that maintains the homogeneity ofthe basis and matrix elements. We continue until we ei-ther arrive at a zero column, or we find a new pivot. Ifneeded, we then perform a column exchange (type (1))to reorder the columns appropriately.

    Lemma 4.1 (Echelon Form) The pivots in column-echelon form are the same as the diagonal elements innormal form. Moreover, the degree of the basis elementson pivot rows is the same in both forms.

    Proof: Because of our sort, the degree of row basis el-ements ei is monotonically decreasing from the top rowdown. Within each fixed column j, deg ej is a constantc. By Equation (6), degMk(i, j) = c deg ei. There-fore, the degree of the elements in each column is mono-tonically increasing with row. We may eliminate non-zero elements below pivots using row operations that donot change the pivot elements or the degrees of the rowbasis elements. We then place the matrix in diagonalform with row and column swaps.

    The lemma states that if we are only interested in the de-gree of the basis elements, we may read them off from

    the echelon form directly. That is, we may use the fol-lowing corollary of the standard structure theorem to ob-tain the description.

    Corollary 4.1 Let Mk be the column-echelon form fork relative to bases {ej} and {ei} for Ck and Zk1, re-spectively. If row i has pivot Mk(i, j) = tn, it con-tributes deg eiF [t]/tn to the description of Hk1. Oth-erwise, it contributes deg eiF [t]. Equivalently, we get(deg ei, deg ei + n) and (deg ei,), respectively, as P-intervals for Hk1.

    In our example, M1(1, 1) = t in Equation (8). Asdeg d = 1, the element contributes 1Z2[t]/(t) or P-interval (1,2) to the description of H0.

    Mk+1

    = 0

    ij

    kM

    xm mk1 k m mxk k+1

    j

    i

    Figure 7. As kk+1 = 0, MkMk+1 = 0 and this isunchanged by elementary operations. When Mk is reducedto echelon form Mk by column operations, the correspondingrow operations zero out rows in Mk+1 that correspond to pivotcolumns in Mk.

    We now wish to represent k+1 in terms of the basiswe computed for Zk. We begin with the standard ma-trix representation Mk+1 of k+1. As kk+1 = 0,MkMk+1 = 0, as shown in Figure 7. Furthermore,this relationship is unchanged by elementary operations.Since the domain of k is the codomain of k+1, the el-ementary column operations we used to transform Mkinto echelon form Mk give corresponding row opera-tions on Mk+1. These row operations zero out rows inMk+1 that correspond to non-zero pivot columns in Mk,and give a representation of k+1 relative to the basis wejust computed for Zk. This is precisely what we are af-ter. We can get it, however, with hardly any work.

    Lemma 4.2 (Basis Change) To represent k+1 relativeto the standard basis for Ck+1 and the basis computedfor Zk, simply delete rows in Mk+1 that correspond topivot columns in Mk.

    Proof: We only used elementary column operations oftypes (1,3) in our variant of Gaussian elimination. Onlythe latter changes values in the matrix. Suppose we re-place column i by (column i) + q(column j) in order to

    9

  • eliminate an element in a pivot row j, as shown in Fig-ure 7. This operation amounts to replacing column basiselement ei by ei+qej inMk. To effect the same replace-ment in the row basis for k+1, we need to replace rowj with (row j) q(row i). However, row j is eventu-ally zeroed-out, as shown in Figure 7, and row i is neverchanged by any such operation.

    Therefore, we have no need for row operations. Wesimply eliminate rows corresponding to pivot columnsone dimension lower to get the desired representationfor k+1 in terms of the basis for Zk. This completesthe induction. In our example, the standard matrix rep-resentation for 2 is

    M2 =

    abc acd

    ac t t2

    ad 0 t3

    cd 0 t3

    bc t3 0ab t3 0

    .

    To get a representation in terms of C2 and the basis(z1, z2) for Z1 we computed earlier, we simply elimi-nate the bottom three rows. These rows are associatedwith pivots in M1, according to Equation (8). We get

    M2 =

    abc acdz2 t t2z1 0 t3

    ,where we have also replaced ad and ac with the corre-sponding basis elements z1 = ad bc cd ab andz2 = ac bc ab.

    4.2 AlgorithmOur discussion gives us an algorithm for computing P-intervals of an F [t]-module over field F . It turns out,however, that we can simulate the algorithm over thefield itself, without the need for computing the F [t]-module. Rather, we use two significant observationsfrom the derivation of the algorithm. First, Lemma 4.1guarantees that if we eliminate pivots in the order of de-creasing degree, we may read off the entire descriptionfrom the echelon form and do not need to reduce to nor-mal form. Second, Lemma 4.2 tells us that by simplynoting the pivot columns in each dimension and elimi-nating the corresponding rows in the next dimension, weget the required basis change.

    Therefore, we only need column operations through-out our procedure and there is no need for a matrix rep-resentation. We represent the boundary operators as aset of boundary chains corresponding to the columns

    2 3 4 5 6 7 1098abc acdadcdbcab

    0 14 5 6 910

    ad acba

    c

    bd

    c

    a b c d ac

    Figure 8. Data structure after running the algorithm on the fil-tration in Figure 1. Marked simplices are in bold italic.

    of the matrix. Within this representation, column ex-changes (type (1)) have no meaning, and the only opera-tion we need is of type (3). Our data structure is an arrayT with a slot for each simplex in the filtration, as shownin Figure 8 for our example. Each simplex gets a slot inthe table. For indexing, we need a full ordering of thesimplices, so we complete the partial order defined bythe degree of a simplex by sorting simplices accordingto dimension, breaking all remaining ties arbitrarily (wedid this implicitly in the matrix representation.) We alsoneed the ability to mark simplices to indicate non-pivotcolumns.

    Rather than computing homology in each dimensionindependently, we compute homology in all dimen-sions incrementally and concurrently. The algorithm, asshown in Figure 9, stores the list of P-intervals for Hkin Lk.

    COMPUTEINTERVALS (K) {for k = 0 to dim(K) Lk = ;for j = 0 to m 1 {

    d = REMOVEPIVOTROWS (j);if (d = ) Mark j ;else {i = maxindex d; k = dimi;Store j and d in T [i];Lk = Lk {(deg i,deg j)}

    }}for j = 0 to m 1 {

    if j is marked and T [j] is empty {k = dimj ; Lk = Lk {(deg j ,)}

    }}

    }

    Figure 9. Algorithm COMPUTEINTERVALS processes a complexof m simplices. It stores the sets of P-intervals in dimension kin Lk.

    When simplex j is added, we check via proce-dure REMOVEPIVOTROWS to see whether its bound-ary chain d corresponds to a zero or pivot column. Ifthe chain is empty, it corresponds to a zero column and

    10

  • chain REMOVEPIVOTROWS () {k = dim; d = k;Remove unmarked terms in d;while (d 6= ) {i = maxindex d;if T [i] is empty, break;Let q be the coefficient of i in T [i];d = d q1T [i];

    }return d;

    }

    Figure 10. Algorithm REMOVEPIVOTROWS first eliminates rowsnot marked (not corresponding to the basis for Zk1), and theneliminates terms in pivot rows.

    we mark j : its column is a basis element for Zk, andthe corresponding row should not be eliminated in thenext dimension. Otherwise, the chain corresponds toa pivot column and the term with the maximum indexi = maxindex d is the pivot, according the procedure de-scribed for the F [t]-module. We store index j and chaind representing the column in T [i]. Applying Corol-lary 4.1, we get P-interval (deg i,deg j). We con-tinue until we exhaust the filtration. We then performanother pass through the filtration in search of infiniteP-intervals: marked simplices whose slot is empty.

    We give the function REMOVEPIVOTROWS in Fig-ure 10. Initially, the function computes the boundarychain d for the simplex. It then applies Lemma 4.2,eliminating all terms involving unmarked simplices toget a representation in terms of the basis for Zk1. Therest of the procedure is Gaussian elimination in the orderof decreasing degree, as dictated by our discussion forthe F [t]-module. The term with the maximum index i =max d is a potential pivot. If T [i] is non-empty, a pivotalready exists in that row, and we use the inverse of itscoefficient to eliminate the row from our chain. Other-wise, we have found a pivot and our chain is a pivot col-umn. For our example filtration in Figure 8, the marked0-simplices {a, b, c, d} and 1-simplices {ad, ac} gener-ate P-intervals L0 = {(0,), (0, 1), (1, 1), (1, 2)} andL1 = {(2, 5), (3, 4)}, respectively.

    4.3 DiscussionFrom our derivation, it is clear that the algorithm has thesame running time as Gaussian elimination over fields.That is, it takes O(m3) in the worst case, where m is thenumber of simplices in the filtration. The algorithm isvery simple, however, and represents the matrices effi-ciently. In our preliminary experiments, we have seen alinear time behavior for the algorithm.

    5 Algorithm for PIDsThe correspondence we established in Section 3 elimi-nates any hope for a simple classification of persistentgroups over rings that are not fields. Nevertheless, wemay still be interested in their computation. In this sec-tion, we give an algorithm to compute the persistent ho-mology groups Hi,pk of a filtered complex K for a fixedi and p. The algorithm we provide computes persistenthomology over any PID D of coefficients by utilizing areduction algorithm over that ring.

    To compute the persistent group, we need to obtaina description of the numerator and denominator of thequotient group in Equation (4). We already know how tocharacterize the numerator. We simply reduce the stan-dard matrix representation M ik of ik using the reduc-tion algorithm. The denominator, Bi,pk = B

    i+pk Zik,

    plays the role of the boundary group in Equation (4).Therefore, instead of reducing matrix M ik+1, we needto reduce an alternate matrix M i,pk+1 that describes thisboundary group. We obtain this matrix as follows:

    (1) We reduce matrixM ik to its normal form and obtaina basis {zj} for Zik, using fact (ii) in Section 2.5.We may merge this computation with that of thenumerator.

    (2) We reduce matrix M i+pk+1 to its normal form and ob-tain a basis {bl} for Bi+pk using fact (iii) in Sec-tion 2.5.

    (3) Let N = [{bl} {zj}] = [B Z], that is, the columnsof matrix N consist of the basis elements from thebases we just computed, and B and Z are the re-spective submatrices defined by the bases. We nextreduce N to normal form to find a basis {uq} forits null-space. As before, we obtain this basis us-ing fact (ii). Each uq = [q q], where q, q arevectors of coefficients of {bl}, {zj}, respectively.Note that Nuq = Bq+Zq = 0 by definition. Inother words, element Bq = Zq is belongs tothe span of both bases. Therefore, both {Bq} and{Zq} are bases for Bi,pk = Bi+pk Zik. We form amatrix M i,pk+1 from either.

    We now reduce M i,pk+1 to normal form and read off thetorsion coefficients and the rank of Bi,pk . It is clearfrom the procedure that we are computing the persistentgroups correctly, giving us the following.

    Theorem 5.1 For coefficients in any PID, persistent ho-mology groups are computable in the order of time andspace of computing homology groups.

    11

  • 6 ExperimentsIn this section, we discuss experiments using an im-plementation of the persistence algorithm for arbitraryfields. Our aim is to further elucidate the contributionsof this paper. We look at two scenarios where the previ-ous algorithm would not be applicable, but where our al-gorithm succeeds in providing information about a topo-logical space.

    6.1 ImplementationWe have implemented our field algorithm for Zp for p aprime, and Q coefficients. Our implementation is in Cand utilizes GNU MP, a multiprecision library, for ex-act computation [14]. We have a separate implementa-tion for coefficients in Z2 as the computation is greatlysimplified in this field. The coefficients are either 0, or1, so there is no need for orienting simplices or main-taining coefficients. A k-chain is simply a list of sim-plices, those with coefficient 1. Each simplex is itsown inverse, reducing the group operation to the sym-metric difference, where the sum of two k-chains c, d isc+ d = (c d) (c d). We use a 2.2 GHz Pentium4 Dell PC with 1 GB RAM running Red Hat Linux 7.3for computing the timings.

    6.2 DataOur algorithm requires a persistence complex as input.In the introduction, we discussed how persistence com-plexes arise naturally in practice. In Example 1.3, wediscussed generating persistence complexes using ex-cursion sets of Morse functions over manifolds. Wehave implemented a general framework for computingcomplexes of this type. We must emphasize, however,that our persistence software processes persistence com-plexes of any origin.

    Our framework takes a tuple (K, f) as input and pro-duces a persistence complex C(K, f) as output. K isa d-dimensional simplicial complex that triangulates anunderlying manifold, and f : vertK R is a discretefunction over the vertices of K that we extend linearlyover the remaining simplices of K. The function f actsas the Morse function over the manifold, but need notbe Morse for our purposes. Frequently, our complex isaugmented with a map : K Rd that immersesor embeds the manifold in Euclidean space. Our algo-rithm does not require for computation, but is of-ten provided as a discrete map over the vertices of Kand is extended linearly as before. For example, Fig-ure 11 displays a triangulated Klein bottle, immersedin R3. For each dataset, Table 2 gives the numbersk of k-simplices, as well as the Euler characteristic

    =

    k(1)ksk. We use the Morse function to com-pute the excursion set filtration for each dataset. Table 3gives information on the resulting filtrations.

    Figure 11. A wire-frame visualization of dataset K, an immersedtriangulated Klein bottle with 4000 triangles.

    |K| len filt (s) pers (s)K 12,000 1,020 0.03 < 0.01E 529,225 3,013 3.17 5.00J 3,029,383 256 24.13 50.23

    Table 3. Filtrations. The number of simplices in the filtration|K| = i si, the length of the filtration (number of distinctvalues of function f ), time to compute the filtration, and time tocompute persistence over Z2 coefficients.

    6.3 Field Coefficients

    A contribution of this paper is the generalization of thepersistence algorithm to arbitrary fields. This contribu-tion is important when the manifold under study con-tains torsion. To make this clear, we compute the ho-mology of the Klein bottle using the persistence algo-rithm. Here, we are interested only in the Betti numbersof the final complex in the filtration for illustrative pur-poses. The non-orientability of the Klein bottle is visiblein Figure 11. The change in triangle orientation at theparametrization boundary leads to a rendering artifactwhere two sets of triangles are front-facing. In homol-ogy, the non-orientability of the Klein bottle manifestsitself as a torsional 1-cycle cwhere 2c is a boundary (in-deed, it bounds the surface itself.) The homology groupsover Z are:

    H0(K) = Z,H1(K) = Z Z2,H2(K) = {0}.

    12

  • number sk of k-simplices0 1 2 3 4

    K 2,000 6,000 4,000 0 0 0E 3,095 52,285 177,067 212,327 84,451 1J 17,862 297,372 1,010,203 1,217,319 486,627 1

    Table 2. Datasets. K is the Klein bottle, shown in Figure 11. E is potential around electrostatic charges. J is supersonic jet flow.

    F 0 1 2 time (s)Z2 1 2 1 0.01Z3 1 1 0 0.23Z5 1 1 0 0.23Z3203 1 1 0 0.23Q 1 1 0 0.50

    Table 4. Field coefficients. The Betti numbers of K computedover field F and time for the persistence algorithm. We use aseparate implementation for Z2 coefficients.

    Note that 1 = rankH1 = 1. We now use the heightfunction as our Morse function, f = z, to generate thefiltration in Table 3. We then compute the homology ofdataset K with field coefficients using our algorithm, asshown in Table 4.

    Over Z2, we get 1 = 2 as homology is unable to rec-ognize the torsional boundary 2c with coefficients 0 and1. Instead, it observes an additional class of homology1-cycles. By the Euler-Poincare relation, =

    i i, so

    we also get a class of 2-cycles to compensate for the in-crease in 1 [16]. Therefore, Z2-homology misidentifiesthe Klein bottle as the torus. Over any other field, how-ever, homology turns the torsional cycle into a boundary,as the inverse of 2 exists. In other words, while we can-not observe torsion in computing homology over fields,we can deduce its existence by comparing our resultsover different coefficient sets. Similarly, we can com-pare sets of P-intervals from different computations todiscover torsion in a persistence complex.

    Note that our algorithms performance for this datasetis about the same over arbitrary finite fields, as the coef-ficients do not get large. The computation over Q takesabout twice as much time and space, since each rationalis represented as two integers in GNU MP.

    6.4 Higher DimensionsA second contribution of this paper is the extension ofthe persistence algorithm from subcomplexes of S3 tocomplexes in arbitrary dimensions. We have already uti-lized this capability in computing the homology of theKlein bottle. We now examine the performance of thisalgorithm in higher dimensions. For practical motiva-tion, we use large-scale time-varying volume data as in-

    put. Advances in data acquisition systems and comput-ing technologies have resulted in the generation of mas-sive sets of measured or simulated data. The datasetsusually contain the time evolution of physical variables,such as temperature, pressure, or flow velocity at samplepoints in space. The goal is to identify and localize sig-nificant phenomena within the data. We propose usingpersistence as the significance measure.

    The underlying space for our datasets is the four-dimensional space-time manifold. For each dataset, wetriangulate the convex hull of the samples to get a trian-gulation. Each complex listed in Table 2 is homeomor-phic to a four-dimensional ball and has = 1. DatasetE contains the potential around electrostatic charges ateach vertex. Dataset J records the supersonic flow ve-locity of a jet engine. We use these values as Morsefunctions to generate the filtrations. We then computepersistence overZ2 coefficients to get the Betti numbers.We give filtration sizes and timings in Table 3. Figure 12displays 2 for dataset J. We observe a large number oftwo-dimensional cycles (voids), as the co-dimension is2. Persistence allows us to decompose this graph intothe set of P-intervals. Although there are 730,692 P-intervals in dimension 2, most are empty as the topolog-ical attribute is created and destroyed at the same func-tion level. We draw the 502 non-empty P-intervals inFigure 13. We note that the P-intervals represent a com-pact and general shape descriptor for arbitrary spaces.

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    0 50 100 150 200 250

    2f

    f

    Figure 12. Graph of f2 for dataset J, where f is the flow ve-locity.

    13

  • Figure 13. The 502 non-empty P-intervals for dataset J indimension 2. The amalgamation of these intervals gives thegraph in Figure 12.

    For the large data sets, we do not compute persistenceover alternate fields as the computation requires in ex-cess of 2 gigabytes of memory. In the case of finite fieldsZp, we may restrict the prime p so that the computationfits within an integer. This is a reasonable restriction,as on most modern machines with 32-bit integers, it im-plies p < 216 1. Given this restriction, any coefficientwill be less than p and representable as a 4-byte integer.The GNU MP exact integer format, on the other hand,requires at least 16 bytes for representing any integer.

    7 ConclusionWe believe the most important contribution of this pa-per is a reinterpretation of persistent homology withinthe classical framework of algebraic topology. Our in-terpretation allows us to:

    1. establish a correspondence that fully describes thestructure of persistent homology over any field, notonly over Z2, as in the previous result,

    2. and relate the previous algorithm to the classicreduction algorithm, thereby extending it to arbi-trary fields and arbitrary dimensional complexes,not just subcomplexes of S3 as in the previous re-sult.

    We provide implementations of our algorithm for fields,and show that they perform quite well for large datasets.Finally, we give an algorithm for computing a persis-tent homology group with fixed parameters over arbi-trary PIDs.

    Our software for n-dimensional complexes enables usto analyze arbitrary-dimensional point cloud data andtheir derived spaces. One current project uses this im-plementation for feature recognition using a novel alge-braic method [2]. Another project analyzes the topo-logical structures in a high-dimensional data set derivedfrom natural images [7]. Yet another applies persistenceto derived spaces to arrive at compact shape descriptorsfor geometric objects [3, 5]. Future theoretical work in-clude examining invariants for persistent homology overnon-fields and defining multivariate persistence, wherethere is more than one persistence dimension. An ex-ample would be tracking a Morse function as well asdensity of sampling on a manifold. Finally, we haverecently reimplemented the algorithm using the genericparadigm. This implementation will soon be a part ofthe CGAL library [4].

    AcknowledgmentsThe first author thanks Herbert Edelsbrunner and JohnHarer for discussions on Z-homology, and Leo Guibasfor providing support and encouragement. Both au-thors thank Anne Collins for her thorough review of themanuscript and Ajith Mascarenhas for providing sim-plicial complexes for datasets E and J. The originaldata are part of the Advanced Visualization TechnologyCenters data repository and appears courtesy of AndreaMalagoli and Milena Micono of the Laboratory for As-trophysics and Space Research (LASR) at the Universityof Chicago. Figure 11 was rendered in Stanford Graph-ics Labs Scanalyze.

    References[1] BASU, S. On bounding the Betti numbers and computing

    the Euler characteristic of semi-algebraic sets. DiscreteComput. Geom. 22 (1999), 118.

    [2] CARLSSON, E., CARLSSON, G., AND DE SILVA, V. Analgebraic topological method for feature identification,2003. Manuscript.

    [3] CARLSSON, G., ZOMORODIAN, A., COLLINS, A.,AND GUIBAS, L. Persistence barcodes for shapes. InProc. Symp. Geom. Process. (2004), pp. 127138.

    [4] CGAL. Computational geometry algorithms library.http://www.cgal.org.

    [5] COLLINS, A., ZOMORODIAN, A., CARLSSON, G.,AND GUIBAS, L. A barcode shape descriptor for curve

    14

  • point cloud data. In Proc. Symp. Point-Based Graph.(2004), pp. 181191.

    [6] COX, D., LITTLE, J., AND OSHEA, D. Ideals, Vari-eties, and Algorithms, second ed. Springer-Verlag, NewYork, NY, 1991.

    [7] DE SILVA, V., AND CARLSSON, G. Topological esti-mation using witness complexes. In Proc. Symp. Point-Based Graph. (2004), pp. 157166.

    [8] DONALD, B. R., AND CHANG, D. R. On the complex-ity of computing the homology type of a triangulation.In Proc. 32nd Ann. IEEE Sympos. Found Comput. Sci.(1991), pp. 650661.

    [9] DUMAS, J.-G., HECKENBACH, F., SAUNDERS, B. D.,AND WELKER, V. Computing simplicial homologybased on efficient Smith normal form algorithms. In Al-gebra, Geometry, and Software Systems (2003), pp. 177207.

    [10] DUMMIT, D., AND FOOTE, R. Abstract Algebra. JohnWiley & Sons, Inc., New York, NY, 1999.

    [11] EDELSBRUNNER, H., LETSCHER, D., AND ZOMORO-DIAN, A. Topological persistence and simplification.Discrete Comput. Geom. 28 (2002), 511533.

    [12] EISENBUD, D. Commutative Algebra with a View To-ward Algebraic Theory. Springer, New York, NY, 1995.

    [13] FRIEDMAN, J. Computing Betti numbers via combina-torial Laplacians. In Proc. 28th Ann. ACM Sympos. The-ory Comput. (1996), pp. 386391.

    [14] GRANLUND, T. The GNU multiple precision arithmeticlibrary. http://www.swox.com/gmp.

    [15] GROMOV, M. Hyperbolic groups. In Essays in GroupTheory, S. Gersten, Ed. Springer Verlag, New York, NY,1987, pp. 75263.

    [16] MUNKRES, J. R. Elements of Algebraic Topology.Addison-Wesley, Reading, MA, 1984.

    [17] UHLIG, F. Transform Linear Algebra. Prentice Hall,Upper Saddle River, NJ, 2002.

    [18] ZOMORODIAN, A. Computing and ComprehendingTopology: Persistence and Hierarchical Morse Com-plexes. PhD thesis, University of Illinois at Urbana-Champaign, 2001.

    15

    IntroductionMotivationPrior WorkOur WorkSpectral SequencesOutline

    BackgroundAlgebraSimplicial ComplexesChain ComplexHomologyReductionPersistence

    The Persistence ModuleCorrespondenceDecompositionInterpretation

    Algorithm for FieldsDerivationAlgorithmDiscussion

    Algorithm for PIDsExperimentsImplementationDataField CoefficientsHigher Dimensions

    Conclusion