4 String MatchingTwo string algorithms we study in this section have many applications in diverse fields. We adopt the inspiration from computational biology where they apply to DNA sequences. But applications in other areas abound, especially when we consider them as "sequence" algorithms which could apply to sequences of objects of any type with the appropriate comparison operators defined, held in any sequential data structure (array, deque, list, vector). The first of these is the Longest Common Subsequence algorithm which, given a pair of strings (sequences) seeks a maximum length sequence that is a subsequence of each. The second is the Edit Distance algorithm which, given two sequences, asks for the minimum cost set of edit operations that transform one into the other. These two algorithms are related by both outcome and solution technique. The Longest Common Subsequence problem is equivalent to the Edit Distance problem with two operations insert and delete each of cost 1. And both problems yield to a dynamic programming solution.
4.1 Longest Common SubsequenceGiven two sequences s of length m and t of length n a common subsequence is a sequence that appears in both s and t. (Note that a subsequence need not be contiguous.) For example, the two strings s = "BDACADBC" and t = "ADBCABAD" have "DCB" in common: s = BDACADBC t = ADBCABAD The longest common subsequence (LCS) problem is: given two sequences s and t, calculate their longest common subsequence: produce a longest sequence of items that appear left-to-right (but not necessarily in a contiguous block) in both sequences. Note that we say "a" not "the" because there may be more than one LCS in a pair of sequences. In the example above, DCAB is a solution to LCS(s,t): s = BDACADBC t = ADBCABAD A legitimate interpretation of the LCS is to provide an optimal way to "align" the two sequences: s = BDACADBC- t = ADBCA-BAD where we use '-' as a place-holder for a "missing" item. Note by the way that there is another LCS for these two sequences: s = BDACADBC = --BDACADBC t = ADBCABAD = ADBCABAD-- We will use strings using the alphabet {A,B,C,D,E,...,Z} (upper case letters) in illustrations. The most well-known area of application is DNA sequences, which are modelled with the 4-character alphabet {A,C,T,G}. Here is one more example: s = ABADEFGAXBCDYBCDZAABD = ABADEFGAXBCDYBCD-ZAABD t = HJXKLYKKKKZJJJ = HJ------XKL-YKKKKZJJJ- We will use Dynamic Programming to solve this problem. As usual, we first derive a solution for finding the length L of the longest common subsequence. Then finding the actual subsequence can be calculated using backtracing in the sub-problem space. Let
Note that L = L(m,n) is the solution. Consider the situation where we want to calculate L(i,j) from solutions on shorter initial segments.
This analysis leads to the following bottom-up calculation: void LCS (s,t) { fsu::Matrix<size_t> L (m+1,n+1,0); for (size_t i = 1; i <= m; ++i) { for (size_t j = 1; j <= n; ++j) { if (s[i-1] == t[j-1]) L[i,j] = 1 + L[i-1,j-1]; else L[i,j] = Max ( L[i-1,j] , L[i,j-1] ); } } // L[m,n] is the length of a LCS } Because indexing of sequences in practice is [0,n) = [0,n-1], and also because we need the 0-index row and column of L for initialization, the matrix indices are [0,m]×[0,n] and the sequence indices are offset by 1. The matrix indices track one higher than the string indices because it is convenient to initialize the process with M[i,0] = M[0,j] = 0. Constructing the actual LCS (in both s and t) can be done with a backtracing recurrence subroutine of the executor, operating on the matrix L after LCS has been executed to populate L with subproblem solutions, using BitVectors bvs and bvt: // backtracing subroutine { bvs.Unset(); bvt.Unset(); for (size_t i = m, j = n; i > 0, j > 0; ) { if (s[i-1] == t[j-1]) { bvs.Set(i-1); bvt.Set(j-1); // indicates s[i-1] == t[j-1] is in the LCS --i; --j; } else if (L[i,j] == L[i-1,j]) { --i; } else // L[i,j] == L[i,j-1] { --j; } } } The overall code organization is similar to the bottom-up code for the Knapsack problem (illustrated for C-strings s,t): size_t LCS(const char* s, size_t m, const char* t, size_t n, fsu::BitVector& bvs, fsu::BitVector& bvt); // executor, called by client void LCS(const char* s, size_t m, const char* t, size_t n, fsu::Matrix<size_t>& L); // subsidiary called by executor; sets values in L size_t LCS(const char* s, size_t m, const char* t, size_t n, fsu::BitVector& bvs, fsu::BitVector& bvt) { Matrix <size_t> L(1+m,1+n,0); LCS(s, m, t, n, L); // call to matrix calculator sets values of L { // subroutine setting bvs and bvt (as above) } return L[m][n]; } Proposition 1. The Longest Common Subsequence calculator LCS(m,n) has runtime Θ(m×n) and runspace +Θ(m×n). The backtracing subroutine has runtime Θ(m+n) and operates in existing space. To make optimal practical use of LCS it is best practice to display the two sequences aligned with vertical bars indicating matches (rather than color), as in this sample output from LCS_i.x: >LCS.x CCGATGATCATTGCCAGTCCACCGATTGTGAGAACGACAGCGACTCCAGC CCGATGACTTTTGCAGCTCCACCGATTTTGGTCCAGC Length of LCS: 33 ... optimal alignment: s = CCGATGA--TCATTGCCAG-TCCACCGA-TTGTGAGAACGACAGCGACTCCAGC ||||||| | ||| ||| |||||||| || | | | |||||| t = CCGATGACTT--TTG-CAGCTCCACCGATTT-T----------G-G--TCCAGC (Other information, of interest to the algorithm developer but not necessarily to the user, is also displayed by LCS_i.x.) Exercise 1. Locate the directory LIB/notes_support in your course library and copy/execute the demonstration program: LCS_i.x # Longest Common Substring (As usual, the "_i" indicates Linux/Intel executable. Entering the command without arguments prints some documentation to screen.) Use the example code above for the LCS calculator, devise and implement a display process that uses the results of a call to the LCS algorithm (that is, the return value and the bitvectors) and produces the enhanced output as in the demo program. This will require an alignment process that displays the aligned strings (as illusrated above). After development, your program should behave identically to LCS_i.x. As always, type the code using the eyes->brain->fingers->keyboard->screen->eyes loop (or equivalent). Do not copy/paste. ∎ 4.2 Edit DistanceThe edit distance between two strings is important in widely diverse applications ranging from computational biology (for example, finding the distance between two DNA sequences), information extraction, speech recognition, and other areas where comparison of two data sequences is important. Here is an excellent introduction with a definition of edit distance (and Levenshtein distance in particular) and an overview of the calculation of edit distance using dynamic programming: Minimum Edit Distance - Explained! by Dan Jurafsky, Stanford University. We begin with the same input requirements as LCS, but with modified results: return the edit distance between two strings s along with an "edit transcript" that can be used to transform s into t. Define 3 edit operations and their cost as follows: To understanding edit distance, consider the first element of s and compare with the first element of t. We have four choices:
The process continues until elements of the two sequences are exhausted. Total cost of the transformation is obtained by adding up the costs of the edits at each step, using this cost table:
The edit distance is the least cost sequence of edit actions transforming s to t. The problem of finding the edit distance and an optimal edit transcript (ED) is very similar to the longest common subsequence problem (LCS). ED has some properties that make it more useful however. The first is that edit distance satisfies the mathematical properties required for a metric: Definition. Given a space X, a mapping d : X×X→[0,∞) is a metric iff the following three conditions hold:
Proposition 2. Edit distance is a metric on the space of sequences. Proof. Suppose that x, y, and z are sequences and denote by ed the edit distance mapping. Since no edits are required to transform x to itself, ed(x,x) = 0 (axiom 1). If an optimal sequence of edits transforms x to y, then by replacing I with D and D with I (noting that I and D have equal cost) we obtain an equal-cost sequence of edits transforming y back to x. Moreover, this latter edit transcript must be optimal, otherwise we could "flip" a less costly sequence of edits into a less costly version of the original, which is optimal. Therefore ed(x,y) = ed(y,x) (axiom 2). Suppose that Txy, Tyz, Txz are optimal edit transcripts taking x to y, y to z, and x to z, respectively. If Txy+Tyz provides a less costly way to edit x to z then Txz must have been non-optimal because we can replace it with Txy+Tyz. Therefore
which verifies axiom 3. ∎ Corollary. Edit distance satisfies the Bellman Optimal Substructure Property. With all that background, let's move toward develop a dynamic programming algorithm. Begin as in the LCS problem with two sequences
We want to calculate the edit distance ed(s,t) using a bottom-up Dynamic Programming approach. Let
We can initialize the calculation by observing that:
and then proceed to populate the entire matrix using a recurrence:
where
In words, D(i,j) is the minimum of three previously calculated values, representing three kinds of edits:
Example 1. ED( ABCDE , ABDDD ) action cost s: ABCDE sub at s[2] 2 ABDDE sub at s[3] 2 t: ABDDD optimal alignment: s = ABCDE || | t = ABDDD ed: 2 + 2 = 4
Example 2. ED( ABCDE , ABDDDE ) action cost s: ABCDE ins at s[2] 1 AB-CDE sub at s[3] 2 t: ABDDDE optimal alignment: s = AB-CDE || || t = ABDDDE ed: 1 + 2 = 3 In addition to calculating the edit distance, we want detailed information comparing the two sequences: an edit transcript that provides a sequence of edits transforming s to t and an optimal alignment that shows the two strings (with gaps at insert locations) aligned according to the returned optimal edit transcript. The optimal alignment can be obtained by post-processing using the edit transcript. The optimal edit transcript is obtained by backtrace of the algorithm itself. One way to accomplish such a backtrace is to keep an ancillary matrix of "parent" directions for each value of D(i,j). The "parent" direction is one of three possibilities, depending on which of the three values is used in the Min operation:
Using D,U,L for "Diagonal", "Up", and "Left" the auxilliary parent matrix P provides signposts to construct an optimal path from D(m,n) to D(0,0). The edit process is done from left to right in s and t and corresponds to a path from D(0,0) to D(m,n) in the distance matrix. An optimal edit is obtained by reversing the optimal parent path. The following table summarizes the processes:
Example 3. ED( ABCDE , ABDDDDDDE ). The matrices D and P calculate to: --------------------- --------------------- | 0 1 2 3 4 5 6 7 8 9 | | - L L L L L L L L L | | 1 0 1 2 3 4 5 6 7 8 | | U D L L L L L L L L | D = | 2 1 0 1 2 3 4 5 6 7 | P = | U U D L L L L L L L | | 3 2 1 2 3 4 5 6 7 8 | | U U U D D D D D D D | | 4 3 2 1 2 3 4 5 6 7 | | U U U D D D D D D L | | 5 4 3 2 3 4 5 6 7 6 | | U U U U D D D D D D | --------------------- --------------------- The backtrace used to construct an optimal s→t edit transcript is highlighted. The results from the algorithm are: Edit Distance: 6 // Levenshtein - substitution cost = 2 s: ABCDE s > t transcript: MMIIIISMM t > s transcript: MMDDDDSMM t: ABDDDDDDE optimal alignment: s = AB----CDE || || t = ABDDDDDDE
Example 4. ED( CCGATGATCATTGCCAGTCCACTTGTGAGAACGACAGCGACTCCAGC , CCGATGACTTTTGCAGCTCCACTTTTGGTCCAGC ). The matrices D and P calculate to: ----------------------------------------------------------------------------------------------------------- | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | | 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | | 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | | 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | | 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | | 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | | 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | | 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | | 8 7 6 5 4 3 2 1 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | | 9 8 7 6 5 4 3 2 1 2 2 3 4 5 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | | 10 9 8 7 6 5 4 3 2 2 3 3 4 5 6 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | | 11 10 9 8 7 6 5 4 3 2 2 3 3 4 5 6 6 7 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | | 12 11 10 9 8 7 6 5 4 3 2 2 3 4 5 6 7 7 7 8 9 10 11 11 12 13 14 15 16 17 18 19 20 21 22 | | 13 12 11 10 9 8 7 6 5 4 3 3 3 3 4 5 6 7 8 8 9 10 11 12 12 13 14 14 15 16 17 18 19 20 21 | | 14 13 12 11 10 9 8 7 6 5 4 4 4 4 3 4 5 6 7 8 8 9 10 11 12 13 14 15 15 16 16 17 18 19 20 | | 15 14 13 12 11 10 9 8 7 6 5 5 5 5 4 4 5 5 6 7 8 9 9 10 11 12 13 14 15 16 16 16 17 18 19 | | 16 15 14 13 12 11 10 9 8 7 6 6 6 6 5 4 5 6 6 7 8 8 9 10 11 12 13 14 15 16 17 17 16 17 18 | | 17 16 15 14 13 12 11 10 9 8 7 7 7 6 6 5 4 5 6 7 8 9 9 10 11 12 13 13 14 15 16 17 17 16 17 | | 18 17 16 15 14 13 12 11 10 9 8 7 7 7 7 6 5 5 5 6 7 8 9 9 10 11 12 13 14 14 15 16 17 17 17 | | 19 18 17 16 15 14 13 12 11 10 9 8 8 8 7 7 6 5 6 5 6 7 8 9 10 11 12 13 14 15 14 15 16 17 17 | | 20 19 18 17 16 15 14 13 12 11 10 9 9 9 8 8 7 6 6 6 5 6 7 8 9 10 11 12 13 14 15 14 15 16 17 | | 21 20 19 18 17 16 15 14 13 12 11 10 10 10 9 8 8 7 7 7 6 5 6 7 8 9 10 11 12 13 14 15 14 15 16 | | 22 21 20 19 18 17 16 15 14 13 12 11 11 11 10 9 9 8 8 7 7 6 5 6 7 8 9 10 11 12 13 14 15 15 15 | | 23 22 21 20 19 18 17 16 15 14 13 12 11 12 11 10 10 9 8 8 8 7 6 5 6 7 8 9 10 11 12 13 14 15 16 | | 24 23 22 21 20 19 18 17 16 15 14 13 12 12 12 11 11 10 9 9 9 8 7 6 5 6 7 8 9 10 11 12 13 14 15 | | 25 24 23 22 21 20 19 18 17 16 15 14 13 12 13 12 11 11 10 10 10 9 8 7 6 6 7 7 8 9 10 11 12 13 14 | | 26 25 24 23 22 21 20 19 18 17 16 15 14 13 13 13 12 12 11 11 11 10 9 8 7 6 6 7 8 8 9 10 11 12 13 | | 27 26 25 24 23 22 21 20 19 18 17 16 15 14 14 14 13 13 12 12 12 11 10 9 8 7 7 6 7 8 9 10 11 11 12 | | 28 27 26 25 24 23 22 21 20 19 18 17 16 15 15 14 14 14 13 13 13 12 11 10 9 8 8 7 7 8 9 10 10 11 12 | | 29 28 27 26 25 24 23 22 21 20 19 18 17 16 16 15 14 15 14 14 14 13 12 11 10 9 9 8 7 8 9 10 11 10 11 | | 30 29 28 27 26 25 24 23 22 21 20 19 18 17 17 16 15 15 15 15 15 14 13 12 11 10 10 9 8 8 9 10 10 11 11 | | 31 30 29 28 27 26 25 24 23 22 21 20 19 18 18 17 16 16 16 16 16 15 14 13 12 11 11 10 9 9 9 10 10 11 12 | | 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 18 17 16 17 16 16 16 15 14 13 12 12 11 10 10 9 9 10 11 11 | | 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 19 18 17 17 17 17 17 16 15 14 13 13 12 11 11 10 10 10 10 11 | | 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 19 18 18 18 18 17 17 16 15 14 14 13 12 12 11 11 10 11 11 | | 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 20 19 19 18 18 18 17 17 16 15 15 14 13 13 12 11 11 11 11 | | 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 21 20 20 19 19 18 18 18 17 16 16 15 14 14 13 12 11 12 12 | | 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 21 21 20 20 19 19 19 18 17 17 16 15 15 14 13 12 11 12 | | 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 22 21 20 20 19 20 19 18 18 17 16 16 15 14 13 12 11 | | 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 22 22 21 21 20 20 20 19 19 18 17 17 16 15 14 13 12 | | 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 23 23 22 21 21 21 21 20 20 19 18 18 17 16 15 14 13 | | 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 24 23 23 22 21 22 22 21 21 20 19 19 18 17 16 15 14 | | 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 24 24 23 22 21 22 22 21 21 20 19 19 18 17 16 15 | | 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 24 24 23 22 22 23 22 22 21 20 19 19 18 17 16 | | 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 25 24 23 23 23 23 23 22 21 20 19 19 18 17 | | 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 25 24 24 24 24 24 23 22 21 20 19 19 18 | | 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 25 25 25 25 25 24 24 23 22 21 20 19 19 | | 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 26 26 26 26 25 25 24 23 22 21 20 19 | ----------------------------------------------------------------------------------------------------------- and ----------------------------------------------------------------------- | - L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L | | U D D L L L L L D L L L L L D L L D L D D L D L L L L L L L D D L L D | | U D D L L L L L D L L L L L D L L D L D D L D L L L L L L L D D L L D | | U U U D L L D L L L L L L D L L D L L L L L L L L L L D D L L L L D L | | U U U U D L L D L L L L L L L D L L L L L D L L L L L L L L L L D L L | | U U U U U D L L L D D D D L L L L L D L L L L D D D D L L D L L L L L | | U U U D U U D L L L L L L D L L D L L L L L L L L L L D D L L L L D L | | U U U U D U U D L L L L L L L D L L L L L D L L L L L L L L L L D L L | | U U U U U D U U D D D D D L L L L L D L L L L D D D D L L D L L L L L | | U D D U U U U U D D D D D D D L L D L D D L D L L L L L L L D D L L D | | U U U U D U U D U D D D D D D D L L L L L D L L L L L L L L L L D L L | | U U U U U D U U U D D D D L L U D D D L L L L D D D D L L D L L L L L | | U U U U U D U U U D D D D D D D D D D D D D D D D D D L L D L L L L L | | U U U D U U D U U U U D D D L L D L D D D D D D D D D D D L L L L D L | | U D D U U U U U D U U D D D D L L D L D D L D L L D D D D D D D L L D | | U D D U U U U U D U U D D D D D D D L D D D D L L L L L L D D D L L D | | U U U U D U U D U U U D D D U D D D D D D D L D D D D D D D D D D L L | | U U U D U U D U U U U D D D U U D L L D D D D D D D D D D L L L U D L | | U U U U U D U U U D D D D U D U U D D L L L L D D D D L D D L L L U D | | U D D U U U U U D U U U D D D U U D D D D L D L D D D D D D D D L L D | | U D D U U U U U D U U U D D D D U D D D D L D L L L L L L L D D L L D | | U U U U D U U D U U U U D D U D U U D D U D L L L L L L L L L U D L L | | U D D U U U U U D U U U D D D U D D D D D U D L L L L L L L D D U D D | | U U U U U D U U U D D D D D U U D U D U D U U D D D D L L D L L L L D | | U U U U U D U U U D D D D D U U D U D D D U U D D D D L L D L L L L L | | U U U D U U D U U U U U U D D U D U U D D U U U U D D D D L L L L D L | | U U U U U D U U U D D D D U D U U D D D D U U D D D D L D D L L L L L | | U U U D U U D U U U U U U D D D D D U D D U U U U U D D D L D D D D L | | U U U U D U U D U U U U U U D D U D U D D D U U U U D U D D D D D L D | | U U U D U U D U U U U U U D D U D D U D D U U U U U D D D D D D D D L | | U U U U D U U D U U U U U U D D U D U D D D U U U U D U U D D D D U D | | U U U U D U U D U U U U U U D D U D D D D D U U U U D U U D D D D D D | | U D D U U U U U D U U U U U D U U D D D D U D U U U D U U D D D L D D | | U U U D U U D U U U U U U D U D D U D U D D U U U U D D D D U D D D L | | U U U U D U U D U U U U U U U D U U D D D D U U U U D U U D U D D D D | | U D D U U U U U D U U U U U D U D D D D D U D U U U D U U D D D U D D | | U U U U D U U D U U U U U U U D D U D U D D U D U U D U U D U U D D D | | U U U D U U D U U U U U U D U U D U D U D U D D U U D D D D U U U D L | | U D D U U U U U D U U U U U D U U D D D D U D D U U D U U D D D U U D | | U U U D U U D U U U U U U D U U D U D U U D U D U U D D D D U U U D U | | U U U U D U U D U U U U U U U D U U D D U D U D D U D U U D U U D U U | | U D D U U U U U D U U U U U D U U D D D D U D D D U D U U D D D U U D | | U U U U U D U U U D D D D U U U U U D U D U U D D D D U U D U U U U U | | U D D U U U U U D U U U U U D U U D U D D U D U D D U D U U D D U U D | | U D D U U U U U D U U U U U D U U D U D D D D U D D U D U U D D U U D | | U U U U D U U D U U U U U U U D U U U U U D U U D D D D U U U U D U U | | U U U D U U D U U U U U U D U U D U U U U U D U D D D D D U U U U D U | | U D D U U U U U D U U U U U D U U D U D D U D D D D D U D U D D U U D | ----------------------------------------------------------------------- respectively. Again the backtrace used to construct an optimal s→t edit transcript is highlighted. The results from the algorithm are: Edit Distance: 19 // substitution cost = 1 s: CCGATGATCATTGCCAGTCCACTTGTGAGAACGACAGCGACTCCAGC s > t transcript: MMMMMMMDMSMMSSMMMIMMMMMMMDMDDDDDDDDDSMDMDDMMMMMM t > s transcript: MMMMMMMIMSMMSSMMMDMMMMMMMIMIIIIIIIIISMIMIIMMMMMM t: CCGATGACTTTTGCAGCTCCACTTTTGGTCCAGC optimal alignment: s = CCGATGATCATTGCCAG-TCCACTTGTGAGAACGACAGCGACTCCAGC ||||||| | || ||| ||||||| | | | |||||| t = CCGATGA-CTTTTGCAGCTCCACTT-T---------TG-G--TCCAGC Proposition 3. The Edit Distance calculator ED(m,n) has runtime Θ(m×n) and runspace +Θ(m×n). The backtrace subroutine has runtime Θ(m+n) and operates in +Θ(m×n) space. Exercise 2. Show that:
Exercise 3. Locate the directory LIB/notes_support in your course library and copy/execute the demonstration program: ED_i.x # Edit Distance (As usual, the "_i" indicates Linux/Intel executable. Entering the command without arguments prints some documentation to screen.) Using the algorithms developed above, devise and implement an ED calculator. Behavior should be indentical to that of ED_i.x. ∎ |