This post is in continuation with the previous post which describes parallel prefix algorithm in detail. Have a look at that post before reading through here. This is a sample implementation example of the parallel prefix algorithm

Consider two sequences each 8 words long. (PELICANS, COELACAN). The table to be constructed is of size [9x9].

Row 0 is assumed to have the row gap penalties and column 0 is assumed to have column gap penalties. To calculate the table entries for row 1, t[1,j], j varies from 0 to 8.

1. w[j] is defined as

w[j] = max ((T[i-1,j] + f(ai , '-'), (T[i-1,j-1] + f(ai , bj))

Thus for row 1

w[0] = max((T[0,0] – 1), (T(0,-1) + f(a1 , b0)) = T[0,0]-1

w[1] = max((T[0,1] – 1), (T(0,0) + f(a1, b1))

w[2] = max((T[0,2] – 1), (T(0,1) + f(a1, b2))

w[3] = max((T[0,3] – 1), (T(0,2) + f(a1, b3))

w[4] = max((T[0,4] – 1), (T(0,3) + f(a1, b4))

w[5] = max((T[0,5] – 1), (T(0,4) + f(a1, b5))

w[6] = max((T[0,6] – 1), (T(0,5) + f(a1, b6))

w[7] = max((T[0,7] – 1), (T(0,6) + f(a1 , b7))

w[8] = max((T[0,8] – 1), (T(0,7) + f(a1, b8))

It is to be noted that if f(ai , bj) is obtained from the scoring matrix and previous row values are available the w values for the next row can be calculated easily.

2. Calculate Z values for row 1

z[j] = w[j] – y[j] (j varies from 0 to 8)

Assuming constant gap penalties of -1

z[0] = w[0] -y[0] = w[0] +0

z[1] = w[1] -y[1] = w[1] +1

z[2] = w[2] -y[2] = w[2] +2

z[3] = w[3] -y[3] = w[3] +3

z[4] = w[4] -y[4] = w[4] +4

z[5] = w[5] -y[5] = w[5] +5

z[6] = w[6] -y[6] = w[6] +6

z[7] = w[7] -y[7] = w[7] +7

z[8] = w[8] -y[8] = w[8] +8

Z[j], j varies from 0 to 8 are known.

3. Calculating x[j] j varies from 0 to 8

x[j] = max( z[0] , z[1], ...z[j])

x[0] = z[0] (no need for prefix calculation, column 0 values)

x[1] = max(z[0], z[1])

x[2] = max(z[0], z[1], z[2])

x[3] = max(z[0], z[1], z[2], z[3])

x[4] = max(z[0], z[1], z[2], z[3], z[4])

x[5] = max(z[0], z[1], z[2], z[3], z[4], z[5])

x[6] = max(z[0], z[1], z[2], z[3], z[4], z[5], z[6])

x[7] = max(z[0], z[1], z[2], z[3], z[4], z[5], z[6], z[7])

x[8] = max(z[0], z[1], z[2], z[3], z[4], z[5], z[6], z[7], z[8])

Giving the values of Z into a 8 byte parallel prefix module with max as the binary operator, we get x[1] to x[8] values in log2(8) = 3 stages. x[0] does not require a prefix calculation to be performed.

Nodes are used to perform the maximum operation. Each node has two inputs and an output. They operate in two modes. Operation mode in which the output is the maximum of two inputs and the duplication mode in which the output is duplicate of one of the inputs. 8 nodes are used in this case and they operate in parallel. The system traverses through each of its stages in one clock pulse and hence the values of x are available in 3 clock cycles.

4. Calculating the table entries T[1,j] for row 1

Once the values of x[j] j varies from 0 to 8 are available, we can calculate the the table entries using the formulae

T[i,j] = x[j] – g[j]

T[1,0] = x[0] + y[0] = x[0] - 0

T[1,1] = x[1] + g[1] = x[1] - 1

T[1,2] = x[2] + g[2] = x[2] - 2

T[1,3] = x[3] + g[3] = x[3] - 3

T[1,4] = x[4] + g[4] = x[4] - 4

T[1,5] = x[5] + g[5] = x[5] - 5

T[1,6] = x[6] + g[6] = x[6] - 6

T[1,7] = x[7] + g[7] = x[7] - 7

T[1,8] = x[8] + g[8] = x[8] - 8

These values of T[i,j] can be used to find the w[j] values that are required for calculating T[2,j]. Thus we observe how the table values are calculated in parallel for each row.

Next post describes the VHDL Implementation of this algorithm.

(parallel prefix- click to enlarge)

**PARALLEL PREFIX METHODOLOGY**Consider two sequences each 8 words long. (PELICANS, COELACAN). The table to be constructed is of size [9x9].

Row 0 is assumed to have the row gap penalties and column 0 is assumed to have column gap penalties. To calculate the table entries for row 1, t[1,j], j varies from 0 to 8.

1. w[j] is defined as

w[j] = max ((T[i-1,j] + f(ai , '-'), (T[i-1,j-1] + f(ai , bj))

Thus for row 1

w[0] = max((T[0,0] – 1), (T(0,-1) + f(a1 , b0)) = T[0,0]-1

w[1] = max((T[0,1] – 1), (T(0,0) + f(a1, b1))

w[2] = max((T[0,2] – 1), (T(0,1) + f(a1, b2))

w[3] = max((T[0,3] – 1), (T(0,2) + f(a1, b3))

w[4] = max((T[0,4] – 1), (T(0,3) + f(a1, b4))

w[5] = max((T[0,5] – 1), (T(0,4) + f(a1, b5))

w[6] = max((T[0,6] – 1), (T(0,5) + f(a1, b6))

w[7] = max((T[0,7] – 1), (T(0,6) + f(a1 , b7))

w[8] = max((T[0,8] – 1), (T(0,7) + f(a1, b8))

It is to be noted that if f(ai , bj) is obtained from the scoring matrix and previous row values are available the w values for the next row can be calculated easily.

2. Calculate Z values for row 1

z[j] = w[j] – y[j] (j varies from 0 to 8)

Assuming constant gap penalties of -1

z[0] = w[0] -y[0] = w[0] +0

z[1] = w[1] -y[1] = w[1] +1

z[2] = w[2] -y[2] = w[2] +2

z[3] = w[3] -y[3] = w[3] +3

z[4] = w[4] -y[4] = w[4] +4

z[5] = w[5] -y[5] = w[5] +5

z[6] = w[6] -y[6] = w[6] +6

z[7] = w[7] -y[7] = w[7] +7

z[8] = w[8] -y[8] = w[8] +8

Z[j], j varies from 0 to 8 are known.

3. Calculating x[j] j varies from 0 to 8

x[j] = max( z[0] , z[1], ...z[j])

x[0] = z[0] (no need for prefix calculation, column 0 values)

x[1] = max(z[0], z[1])

x[2] = max(z[0], z[1], z[2])

x[3] = max(z[0], z[1], z[2], z[3])

x[4] = max(z[0], z[1], z[2], z[3], z[4])

x[5] = max(z[0], z[1], z[2], z[3], z[4], z[5])

x[6] = max(z[0], z[1], z[2], z[3], z[4], z[5], z[6])

x[7] = max(z[0], z[1], z[2], z[3], z[4], z[5], z[6], z[7])

x[8] = max(z[0], z[1], z[2], z[3], z[4], z[5], z[6], z[7], z[8])

Giving the values of Z into a 8 byte parallel prefix module with max as the binary operator, we get x[1] to x[8] values in log2(8) = 3 stages. x[0] does not require a prefix calculation to be performed.

Nodes are used to perform the maximum operation. Each node has two inputs and an output. They operate in two modes. Operation mode in which the output is the maximum of two inputs and the duplication mode in which the output is duplicate of one of the inputs. 8 nodes are used in this case and they operate in parallel. The system traverses through each of its stages in one clock pulse and hence the values of x are available in 3 clock cycles.

4. Calculating the table entries T[1,j] for row 1

Once the values of x[j] j varies from 0 to 8 are available, we can calculate the the table entries using the formulae

T[i,j] = x[j] – g[j]

T[1,0] = x[0] + y[0] = x[0] - 0

T[1,1] = x[1] + g[1] = x[1] - 1

T[1,2] = x[2] + g[2] = x[2] - 2

T[1,3] = x[3] + g[3] = x[3] - 3

T[1,4] = x[4] + g[4] = x[4] - 4

T[1,5] = x[5] + g[5] = x[5] - 5

T[1,6] = x[6] + g[6] = x[6] - 6

T[1,7] = x[7] + g[7] = x[7] - 7

T[1,8] = x[8] + g[8] = x[8] - 8

These values of T[i,j] can be used to find the w[j] values that are required for calculating T[2,j]. Thus we observe how the table values are calculated in parallel for each row.

Next post describes the VHDL Implementation of this algorithm.

## No comments:

## Post a Comment