Poly SPARC

Posts

Showing posts from February, 2012

Biological Sequence Alignment using Parallel Prefix Algorithm implemented using FPGA

February 24, 2012

Project Documentation Introduction Important Terms Needleman-Wunsch Algorithm Parallel Prefix Algorithm Parallel Prefix Methodology VHDL Implementation Data Path Salient Features Model Calculations Download source code Hope this helps you as much as you have helped me. Abishek

Model Calculations and Proof of Concept Simulations - Biological Sequence Alignment

February 24, 2012

MODEL CALCULATIONS The example under consideration is The Row 0 and Column 0 are assumed to be known. Calculating Row 1 w[j] = max ((T[i-1,j] + f(ai , '-'), (T[i-1,j-1] + f(ai , bj)) w[1] = max((T[0,1] – 1), (T(0,0) + f(a1, b1)) = 0 w[2] = max((T[0,2] – 1), (T(0,1) + f(a1, b2)) = -1 w[3] = max((T[0,3] – 1), (T(0,2) + f(a1, b3)) = -2 w[4] = max((T[0,4] – 1), (T(0,3) + f(a1, b4)) = -3 w[5] = max((T[0,5] – 1), (T(0,4) + f(a1, b5)) = -6 w[6] = max((T[0,6] – 1), (T(0,5) + f(a1, b6)) = -5 w[7] = max((T[0,7] – 1), (T(0,6) + f(a1 , b7)) = -6 w[8] = max((T[0,8] – 1), (T(0,7) + f(a1, b8)) = -7 z[j] = w[j] + y[j] z[1] = w[1] -y[1] = w[1] +1 = 0 z[2] = w[2] -y[2] = w[2] +2 = 0 z[3] = w[3] -y[3] = w[3] +3 = 0 z[4] = w[4] -y[4] = w[4] +4 = 0 z[5] = w[5] -y[5] = w[5] +5 = 0 z[6] = w[6] -y[6] = w[6] +6 = 0 z[7] = w[7] -y[7] = w[7] +7 = 0 z[8] = w[8] -y[8] = w[8] +8 = 0 (Compare with x_reg in the output) T[1,1] = x[1] + g[1] = x[1] - 1 = -1 T[1,2] = x[2] + g[2] = x[2]...

Salient Features - Biological Sequence alignment

February 24, 2012

SALIENT FEATURES OF THE PROGRAM Finite state machine cycles through 4 states s0, s1, s2, s3. The states are indicated by a 2 bit counter (counter_2bit) which increments every clock pulse. s0 = 00 s1 = 01 s2 = 10 s3 = 11 One cycle is one pass through the four states of the finite state machine. For a particular row i, consider the scoring matrix RAM (w_ram). Assume the scores for the particular row are already available. Address to the RAM is generated by a 3 bit counter. Every s0, the address to the RAM is incremented. At s1 the read enable of the ram is made high. At s2, the scores for the next row are available in the outputs of the ram. Thus during the calculation of table entries for a particular row, the scores required for the next row are fetched from the memory and are available. While calculating the table entries of a particular row, the table entries of the previous row are already available in the s3 stage of the previous cycle. The parallel prefix module gets the z inp...

Data Path - Biological Sequence Alignment

February 24, 2012

The control logic implementing the sequence alignment algorithm is described here. DATAPATH It is assumed that the scores of alignment are already present in a scoring matrix. The scoring matrix is implemented using a RAM memory. The logic for calculating scores is not implemented. It can be interfaced with the memory so as to write into it. It is designed to give the values of f(ai , bj) for a particular i (row) and all j (columns). In a single read operation it ouputs 8 bytes of data. The scores along with the previous table entries are used to calculate all the w[j] values in parallel. This is done by the w_calculation module. This module receives 8 bytes from the scoring matrices corresponding to f(ai , bj) for the respective row and 8 bytes from the previously calculated row from the table namely T(i-1,j) for all values of j. It calculates the following. w[j] = max ((T[i-1,j] + f(ai , '-'), (T[i-1,j-1] + f(ai , bj)) z[j] = w[j] – y[j] for all 8 values of j. ...

VHDL Implementation - Biological Sequence Alignment

February 24, 2012

This post describes the VHDL implementation details and abstractions. Kindly look at the post on parallel prefix methodology to understand the context under which these VHDL modules work NODE The basic functional unit of the parallel prefix method is a node. It has two modes of operation namely operation and duplication. The node performs the binary operation on the two inputs given to it when it operates in the operation mode. When in the duplication mode, the output is the first input to the node. It is a combinational circuit. Its VHDL representation is as follows. ------------------------------------------------------------------------------- -- Parallel Prefix : Node .vhd -- Abishek Ramdas -- NYU Poly -- date - 11/26/10 ------------------------------------------------------------------------------- library ieee; use ieee.numeric_std.all; use ieee.std_logic_1164.all; use ieee.std_logic_signed.all; entity node is port ( x_input: in std_lo...

Parallel Prefix Methodology - Biological Sequence Alignment

February 24, 2012

This post is in continuation with the previous post which describes parallel prefix algorithm in detail. Have a look at that post before reading through here. This is a sample implementation example of the parallel prefix algorithm (parallel prefix- click to enlarge) PARALLEL PREFIX METHODOLOGY Consider two sequences each 8 words long. (PELICANS, COELACAN). The table to be constructed is of size [9x9]. Row 0 is assumed to have the row gap penalties and column 0 is assumed to have column gap penalties. To calculate the table entries for row 1, t[1,j], j varies from 0 to 8. 1. w[j] is defined as w[j] = max ((T[i-1,j] + f(ai , '-'), (T[i-1,j-1] + f(ai , bj)) Thus for row 1 w[0] = max((T[0,0] – 1), (T(0,-1) + f(a1 , b0)) = T[0,0]-1 w[1] = max((T[0,1] – 1), (T(0,0) + f(a1, b1)) w[2] = max((T[0,2] – 1), (T(0,1) + f(a1, b2)) w[3] = max((T[0,3] – 1), (T(0,2) + f(a1, b3)) w[4] = max((T[0,4] – 1), (T(0,3) + f(a1, b4)) w[5] = max((T[0,5] – 1), (T(0,4) + f(a1, b5)) w...

Parallel Prefix Algorithm - Biological Sequence Alignment

February 24, 2012

PARALLEL PREFIX It is abundantly clear, that there exist dependencies between the calculations of cells. At a time, only one cell can be calculated. This poses severe implications on the time and space constraints when we try to speed up the process. Parallel prefix is a remedy to this problem. Using parallel prefix we reduce the dependencies between the cells and calculate the scores parallel. In fact, we are able to calculate the cells of an entire row in parallel. This allows us to optimize the time as well as space. The algorithm is optimized in time and is implemented to compare two 8 sequences, each 8 characters long. The time taken for prefix computation is log2(8) = 3 clock cycles. It is a form of data parallel programming PARALLEL PREFIX FORMULAE Given n inputs (a0,a1,....an) and a binary operator (ex maximum of 2 inputs). applying the binary operator in parallel prefix operation to the n inputs gives n outputs which are (a0, max(a0,a1), max(a0,a1,a2), ...max(a0,a1,a2,...

Biological Sequence Alignment - Needleman-Wunsch Algorithm

February 24, 2012

The Needleman-Wunsch algorithm is discussed here before going into the details of implementation. NEEDLEMAN-WUNSCH ALGORITHM This Algorithm is a Dynamic Programming based algorithm. It finds the best, overall character to character match score between the two sequences. For this, an Alignment Matrix is used to calculate the best overall, character to character match score between the 2 sequences. The result of the algorithm is a singular score which reflects the degree of goodness of the match. A trace-back process is then used to retrieve the actual alignment. PROPERTIES OF ALIGNMENT MATRIX 1. Alignment takes place in a two-dimensional matrix. 2. Gap characters can never be paired to each other. The matrix is an (m+1) x (n+1) matrix, where ‘m’ and ‘n’ are the lengths of the two sequences. 3. Each cell corresponds to a pairing of one letter from each sequence. 4. Alignment starts at the upper left and follows a mostly diagonal path down and to the right. 5. Each cell of t...

Biological Sequence Alignment - Important Terms

February 24, 2012

I am introducing a few terms that are important in terms of understanding biological sequence alignment. I do not have a background in biology and understanding these terms are important when implementing the Needle-Wunsch algorithm. WHAT IS BLAST? BLAST is an acronym for Basic Local Alignment Search Tool. Despite the adjective "Basic" in its name, BLAST is a sophisticated software package that has become the single most important piece of software in the field of bioinformatics. There are several reasons for this. First, sequence similarity is a powerful tool for identifying the unknowns in the sequence world. Second, BLAST is fast. The sequence world is big and growing rapidly, so speed is important. Third, BLAST is reliable, from both a rigorous statistical standpoint and a software development point of view. Fourth, BLAST is flexible and can be adapted to many sequence analysis scenarios. Finally, BLAST is entrenched in the bioinformatics culture to the extent that...

FPGA Implementation of Biological Sequence Alignment Algorithm - Introduction

February 24, 2012

Biological Sequence Alignment Implemented using FPGA - Needleman_Wunsch Algorithm Biological sequences like DNA and proteins show complex patterns of similarity to one another. In this regard, they mirror the external morphologies of the organisms in which they reside. You'll notice that birds, for example, show natural groupings. You don't have to be a biologist to see that ducks, geese, and swans comprise a reasonably natural group called the waterfowl, and that the similarities between ducks and geese seem too great to explain by mere coincidence. Biological sequences are no different. After all, the reason why ducks look like ducks and geese look like geese is because of their genes. Many molecular biologists are convinced that understanding sequence evolution is tantamount to understanding evolution itself. Genetic sequences change over time due to three forces: mutation, natural selection, and genetic drift. By studying the similarity or dissimilarity between the sequ...