Longest Common Subsequence
X = <x1, x2, x3, ...,xm>
is a sequence
Z = <z1, z2, z3, ...,zk>
is a subsequence of X if z1 = xi1, z2=
xi2, ...zk = xik
0 < i1 < i2... < ik
£ m
Given sequences X and Y, Z is a common subsequence if Z is a
subsequence to both X and Y. If Z is the longest possible subsequence of
both X and Y then it is the longest common subsequence.
X = < A, G, C, G, T, A, G >
Y = < G, T, C, A, G, A >
a common subsequence is < G, C, A > also < G, C,
G, A > and < G, T, A, A > since no common subsequence of length 5
exists there are 2 LCG's: < G, C, G, A > and < G, T, A, G >.
Finding the LCS:
Could enumerate all subsequences of X (length m) ® 2mand
Y (length n) ® 2n must search for
hits and sort by length. This is an exponential algorithm. LCS has an
optimal substructure property based on prefixes. If X = <x1,
...,xm> then Xi = <x1, ...,xi>
is the ith prefix of X and X0 is empty.
Theorem 16.1 ( Optimal substructure for LCS)
If X = <x1, ...,xm> and if Y =
<y1, ...,yn> are sequences, let Z = <z1,
...,zk> be some LCS of x and y.
- If xm = yn then zk =
xmand Zk-1 is an LCS of Xm-1 and
Yn-1
- If xm ¹ yn
then zk ¹xm
Þ Z is an LCS of Xm-1
and Y
- If xm ¹ yn
then zk ¹ym
Þ Z is an LCS of X and Yn-1
Proof:
- If zk ¹xm then we
could add xm = yn to Z to get an LCS of
length k + 1. By contradiction it must be that zk = xm
= yn .
|zk-1 | = k - 1 and is an LCS
of Xm-1 and Yn-1 . It is an LCS, if not
then $ W CS of Xm-1 and Yn-1
with | W | > k - 1 and so by appending xm =
yn we get a CS of X and Y of length greater than k, a
contradiction.
- If zk ¹xm then z
is a cs of Xm-1 and Y. If $ W
a CS with | W | > k, then W would be a CS of X and Y, a
contradiction
- Proof by reversing x and y
This means that to find the LCS of X and Y:
if xm = yn find LCS of Xm-1 and
Ym-1
If xm ¹ y
- find LCS of Xm-1and Y
- find LCS of X and Yn-1
and take the larger of 'a.' or 'b.'. Thus we start with small problem,
find LCS and grow our solution:
Let c[i,j] = | W |, W is LCS of Xi and Yj
c[i,j] = 0 if i*j = 0
= c[i-1,j-1] + 1, if i*j
> 0 and xi = yj
= max (c[i, j-1].
c[i-1,j]) if i,j > 0 and xi ¹
yj
Could use this to write an exponential algorithm via recursion.
However, there are only Q (m n)
sub-problems, so we use DP. Store c[i,j] and b[i,j] which points to the
optimal sub-problem chosen. c[m,n] contains the length of the LCS and
b[m,n] directions used to build it.
LCS-Length ( X, Y )
- m ¬ length[X]
- n ¬ length[Y]
- for i ¬ 1 to m
- do c[i,0] ¬ 0
- for j ¬ 0 to n
- do c[0,j] ¬ 0
- for i ¬ 1 to m
- do for j ¬ 1 to n
- do if xi = yj
-
then c[i,j] ¬ c[i-1, j-1] + 1
-
b[i,j] ¬ "\"
-
else if c[i - 1, j] ³ c[i, j-1]
-
then c[i,j] ¬ c[i-1, j]
-
b[i,j] ¬ "Ý"
-
else c[i,j] ¬ c[i, j-1]
-
b[i,j] ¬ "¬"
- return c and b
Running Time = O (m n) since each table entry takes O (1)
time.
Note:
"\" means both the same
"Ý" means c[i - 1, j] ³
c[i, j-1]
"¬" means c[i - 1, j] <
c[i, j-1]
The "\" diagonal arrows lengthen the LCS
Print-LCS (b, X, i, j)
- If i = 0 or j = 0
- then return
- if b[i,j] = "\"
- then Print-LCS (b, X, i-1, j-1)
- print
- else if b[i,j] = "Ý"
- then Print-LCS ( b, X, i-1, j)
- else Print-LCS (b, X, i, j-1)
This takes O ( m + n ) since either i or j is decremented in the
recursion.
|