**PARALLEL PREFIX**

It is abundantly clear, that there exist dependencies between the calculations of cells. At a time, only one cell can be calculated. This poses severe implications on the time and space constraints when we try to speed up the process. Parallel prefix is a remedy to this problem.

Using parallel prefix we reduce the dependencies between the cells and calculate the scores parallel. In fact, we are able to calculate the cells of an entire row in parallel. This allows us to optimize the time as well as space.

The algorithm is optimized in time and is implemented to compare two 8 sequences, each 8 characters long. The time taken for prefix computation is log2(8) = 3 clock cycles. It is a form of data parallel programming

**PARALLEL PREFIX FORMULAE**

Given n inputs (a0,a1,....an) and a binary operator (ex maximum of 2 inputs). applying the binary operator in parallel prefix operation to the n inputs gives n outputs which are (a0, max(a0,a1), max(a0,a1,a2), ...max(a0,a1,a2,...an)). This operation is done in log2(n) steps.

You can find details on the method here - http://en.wikipedia.org/wiki/Prefix_sum

**BASIC ALGORITHM**

Let A = a1, a2, a3,.... am and B = b1, b2, b3,...... bn be the two sequences that are to be compared. The aim of this algorithm is to generate a table T of size(m+1)x(n+1) such that T[i,j] contains the optimal scoring for alignment a1,a2,....,ai and b1,b2,...bj.

Consider a simple scoring function

f (a,b) = 1 if a=b,

= 0 if a≠b

= -1 if a='-' or c='-'

Using this the scoring matrix for a gap alignment, match , mismatch can be calculated

T[0,0] is aligning two empty strings.

Hence T[0,0]=0.

T[0,j] = Σ f('-',bk) for k=1 to j

T[i,0] = Σ f(ak,'-') for k=1 to I

= T[i-1,0] + f(ai,'-')

When we align the non empty strings a1, a2,... ai and b1, b2,..... bi three possibilities arise

1. ai aligned with a gap

2. bj aligned with a gap

3. ai aligned with bj

Thus the table entry is given by

T[i,j] = max ((T[i-1,j] + f(ai , '-')), (T[i,j-1] + f('-',bj)) , (T[i-1,j-1] + f(ai , bj)))

The table matrix can be filled row by row or column by column or diagonal by diagonal. If we fill the matrix row by row using dynamic programming the time taken is of order O(mn). T[m,n] gives the optimal score once the table gets completed. Then traceback procedure is followed to get the optimal alignment of the two sequences.

Parallel prefix algorithms are used to compute the table values one row or column at a time. The distribution of work is uniform to all the processors at all times. There is also a reduction in number of table entries communicated per iteration.

Suppose the ith row is being computed using the values from the (i-1)th row which is previously calculated.

( T[i-1,j] + f(ai , '-'), (T[i-1,j-1] + f(ai , bj)) are pre-computed using the values in the previous row.

After separating the pre-computed terms,

w[j] = max ((T[i-1,j] + f(ai , '-'), (T[i-1,j-1] + f(ai , bj))

These values are calculated once the previous row table values are known.

Thus

T[i,j] = max ( w[j] , (T[i,j-1] + f('-',bj)))

Assume

x[j] = T[i,j] - Σ f('-',bk) for k=1 to j

= max( (w[j] - Σ f('-',bk) for k=1 to j), (T[i,j-1] - Σ f('-',bk) for k=1 to j))

= max( (w[j] - Σ f('-',bk) for k=1 to j) , x[j-1])

y[j] = Σ f('-',bk) for k=1 to j (already known – these are the gap penalties (row 0 of the table))

= gj if the gap penalties are considered to be a constant (-1)

z[j] = w[j] – y[j] (already known w[j] is precomputed and y[j] is known)

Therefore

x[j] = max((w[j]+gj), (x[j-1]) and

T[i,j] = x[j] – g[j]

Consider x[j]

x[j] = max((w[j]+gj), (x[j-1])

= max((w[j]+gj), max((w[j-1]+g[j-1]), x[j-2])

Max is an associative operator

max(a,max(b,c)) = max(a,b,c) hence

x[j] = max((w[j]+gj), (w[j-1]+g[j-1]), x[j-2])

= max((w[j]+gj), (w[j-1]+g[j-1]), (w[j-2]+g[j-2]),.....(w[0]+g[0]))

= max(z[j] ,z[j-1], z[j-2],.....,z[0])

The w[k] and g[k] for k=j to 0 are pre-calculated with the values in the previous row of the table. These values are already known. Putting these values into the parallel prefix algorithm will give the necessary values of x[k] for k = j to 0

Once x[k] for k = j to 0 is known, the table values are obtained by

T[i,j] = x[j] – g[j] for a row I.

In the next post I describe the parallel prefix methodology

## No comments:

## Post a Comment