13
Walking with Euler through Ostpreußen and RNA Mark Muldoon February 4, 2010

Walking with Euler through Ostpreußen and RNA with Euler through Ostpreußen and RNA Mark Muldoon February 4, 2010 K¨onigsberg (1652) ≈ Kaliningrad (2007)? The K¨onigsberg Bridge

  • Upload
    hakhanh

  • View
    217

  • Download
    2

Embed Size (px)

Citation preview

Walking with Euler through Ostpreußen and RNA

Mark Muldoon

February 4, 2010

Konigsberg (1652) ≈ Kaliningrad (2007)?

The Konigsberg Bridge problem asks whether it is possible to walk around theold city in such a way as to:

• start and finish in the same place and

• cross each of the seven bridges exactly once.

And you have to cross the bridges from one end to the other: there’s no walkinghalfway out, turing around, and then doing the other half from the other side.

Euler’s abstraction

Leonhard Euler solved it in a paper presented to the Academy of Science in St.Petersburg in August 1735: the drawing at right comes from his paper.

Euler’s abstraction

Leonhard Euler solved it in a paper presented to the Academy of Science in St.Petersburg in August 1735: the drawing at right comes from his paper.

His key insight is that although walking along the banks or around the islandsmay get us from one bridge to another, the details of those parts of the walkare unimportant: it’s the way that the bridges connect the land masses thatmatters.

Notions and Notation

• The drawing at right is an example of a graph.

• The dots are called vertices or nodes.

• The curves connecting them are edges or arcs.

• The number of edges attached to a vertex isthe vertex’s degree.

• A walk through the graph that doesn’t re-useany edges is a trail.

• A trail that uses each edge exactly once is anEulerian trail.

• An Eulerian trail that starts an ends at thesame place is an Eulerian tour

Armed with all this new vocabulary, we can translate (and then solve) theKonigsberg Bridge Problem. It becomes: Is there an Eulerian tour in this

graph?

No Eulerian Tours in Konigsberg

It’s not hard to see that such a tour is impossible:

• Imagine that we start at the rightmost vertex:we must leave by taking one of its three edges,which we then can’t use again.

• Later in the walk, we’ll need to return, usingup another edge.

• Then we’ll have to leave a second time(otherwise the third edge will go unused).

• But then we’ll have used all three edges andthere will be no way to get back: our walkcan’t finish where it started ⇒ No Eulerian

tour.

Similar reasoning shows that a graph can’t have an Eulerian tour if any of itsvertices has odd degree. It could, however, have an Eulerian trail that startedand finished on different vertices: the first and last vertex could have odddegree, but all others would still have to be even.

Another problem, not obviously involving graphs . . .

Ribonucleic acid (RNA) is a family of polymeric molecules whose members playmany roles1 in the chemistry of living cells. RNA is similar to its more famoussister molecule, DNA (deoxyribonucleic acid), in that it consists of long chainscomposed of 4 subunits called bases. For RNA the bases are U, C, G, and A

and they can be combined in any order, so an RNA molecule can berepresented by a string such as

AGUCAGUGAGCA

Sequencing RNA

In the early days of RNA research (1960’s) biochemists could sequenceaccurately only rather short pieces of RNA, but were interested in much longerchains. They addressed the problem by chopping up the longer chains intosmaller fragments that they could sequence directly. These assemblies ofsmaller fragments are called digests of the original molecule and one way tosequence a long chain is to compare and combine the results of two differentsorts of digest.

1If you want to learn more about RNA and its amazingly rich biochemistry, I recommend abook by (Manchester’s own) Terry Brown, Genomes 3, Garland Scientific (2006). The previousedition, Genomes 2, is freely available online at http://bit.ly/Genomes2.

Two different sorts of digest

Our first enzyme cuts the RNA sequence after every G, while the second cutsafter both U and C:

AGUCAGUGAGCAG-digest−−−−−→ AG UCAG UG AG CA

AGUCAGUGAGCAUC-digest−−−−−→ AGU C AGU GAGC A

Of course, the digests don’t come so neatly organized: in a real scientificproblem we’d get two jumbles of short fragments like

G-digest: AG AG CA UG UCAG

UC-digest: A C AGU AGU GAGC

and have to work out how to rearrange them in a sensible order.

A thought experiment: digesting the digests

Imagine attacking each fragment from the G-digest with the UC-enzyme.They’d break down as follows:

AGUC-digest−−−−−→ AG UG

UC-digest−−−−−→ U G

UCAGUC-digest−−−−−→ U C AG AG

UC-digest−−−−−→ AG

CAUC-digest−−−−−→ C A

The small pieces produced in this way are called extended bases. Note thatwe’d get the same set of extended bases if we’d started with the fragments ofthe UC-digest and attacked then with the G-enzyme.

We’ll mainly be interested in those cases, such as UCAGUC-digest−−−−−→ U C AG,

where a second digest would produce two or more extended bases: we’ll usethem to draw a very helpful graph.

A new kind of graph

The double-digests that would yield two or more pieces are

CAUC-digest−−−−−→ C A UCAG

UC-digest−−−−−→ U C AG UG

UC-digest−−−−−→ U G

We’ll use them to draw a graph like,

C

AG GU C A

which has:

• a vertex for each extended base that appears on the right-hand side of oneof the digests at the top of the slide;

• a directed edge (a bit like a bridge with a one-way street on it) connectingthe first and last extended base in each of the digests-of-digests and

• for those cases where we got three or more extended bases, an edge label.

Eulerian trails to the rescue!

If we do the same thing with the other set of digests—attacking the originalUC-fragments with the G enzyme—and then add the corresponding edges toour graph we end up with:

C

AG

AG GU C A

There’s exactly one (directed) Eulerian trail through this graph and, if we writedown the labels of its vertices and edges in the order we encounter them we get

AG · U · C · AG · U · G · AG · C · A = AGUCAGUGAGCA

which is the original sequence!

A glimpse of why this works

AG · U · C · AG · U · G · AG · C · A

AG · U · C · AG · U · G · AG · C · A

G-digest

UC-digest

The diagram above shows two views of our example sequence, one withG-fragments shown in red rounded boxes and another with the UC-fragments inblue boxes: in both cases, the extended fragments are shown separated by dots.

• It’s only the larger fragments—those containing two or more extendedbases—that contribute to the graph.

• These large fragments overlap in a very particular way: the extended basethat marks the end of a large blue fragment also marks the beginning ofthe next large red one.

• The endpoints of large fragments are the vertices of our graph, while thegraph’s directed edges, along with their labels, come from to the bigfragments themselves.

Afterword: problems to think about on the way home

1 Back to Konigsberg: I argued that there can’t be an Eulerian tour (in anundirected graph) if there’s a vertex with odd degree, but what if allvertices have even degree . . . is there always a tour then?

2 What’s the corresponding result for directed graphs? Hint: it’s helpful to

define separate in- and out-degrees. The former counts the directed edges

pointing into a vertex, while the latter counts the edges pointing out.

3 Does the RNA-sequencing approach described here always yield a uniqueanswer? Hint: draw graphs and find sequences consistent with the

examples below.

G-digest UC-digest

G G U UAAAG U AAAGGGU

G CG UG ACC C C GU GGAC

G AG UG UCG C GG GU AGU

And if you’re super-keen, I first learned about this problem from a textbook

Edgar G. Goodaire and Michael M. Paramenter (2006). Discrete

Mathematics with Graph Theory, 3rd Edition, ISBN 0-13-167995-3.

whose authors give lots of related exercises.