Posts

Showing posts from February, 2012

Biological Sequence Alignment using Parallel Prefix Algorithm implemented using FPGA

Project Documentation Introduction Important Terms Needleman-Wunsch Algorithm Parallel Prefix Algorithm Parallel Prefix Methodology VHDL Implementation Data Path Salient Features Model Calculations Download source code Hope this helps you as much as you have helped me. Abishek

Model Calculations and Proof of Concept Simulations - Biological Sequence Alignment

Image
MODEL CALCULATIONS The example under consideration is   The Row 0 and Column 0 are assumed to be known. Calculating Row 1 w[j] = max ((T[i-1,j] + f(ai , '-'), (T[i-1,j-1] + f(ai , bj)) w[1] = max((T[0,1] – 1), (T(0,0) + f(a1, b1)) = 0 w[2] = max((T[0,2] – 1), (T(0,1) + f(a1, b2)) = -1 w[3] = max((T[0,3] – 1), (T(0,2) + f(a1, b3)) = -2 w[4] = max((T[0,4] – 1), (T(0,3) + f(a1, b4)) = -3 w[5] = max((T[0,5] – 1), (T(0,4) + f(a1, b5)) = -6 w[6] = max((T[0,6] – 1), (T(0,5) + f(a1, b6)) = -5 w[7] = max((T[0,7] – 1), (T(0,6) + f(a1 , b7)) = -6 w[8] = max((T[0,8] – 1), (T(0,7) + f(a1, b8)) = -7 z[j] = w[j] + y[j] z[1] = w[1] -y[1] = w[1] +1 = 0 z[2] = w[2] -y[2] = w[2] +2 = 0 z[3] = w[3] -y[3] = w[3] +3 = 0 z[4] = w[4] -y[4] = w[4] +4 = 0 z[5] = w[5] -y[5] = w[5] +5 = 0 z[6] = w[6] -y[6] = w[6] +6 = 0 z[7] = w[7] -y[7] = w[7] +7 = 0 z[8] = w[8] -y[8] = w[8] +8 = 0 (Compare with x_reg in the output) T[1,1] = x[1] + g[1] = x[1] - 1 = -1 T[1,2] = x[2] + g[2] = x[2]

Salient Features - Biological Sequence alignment

SALIENT FEATURES OF THE PROGRAM Finite state machine cycles through 4 states s0, s1, s2, s3. The states are indicated by a 2 bit counter (counter_2bit) which increments every clock pulse. s0 = 00 s1 = 01 s2 = 10 s3 = 11 One cycle is one pass through the four states of the finite state machine. For a particular row i, consider the scoring matrix RAM (w_ram). Assume the scores for the particular row are already available. Address to the RAM is generated by a 3 bit counter. Every s0, the address to the RAM is incremented. At s1 the read enable of the ram is made high. At s2, the scores for the next row are available in the outputs of the ram. Thus during the calculation of table entries for a particular row, the scores required for the next row are fetched from the memory and are available. While calculating the table entries of a particular row, the table entries of the previous row are already available in the s3 stage of the previous cycle. The parallel prefix module gets the z inp

Data Path - Biological Sequence Alignment

Image
The control logic implementing the sequence alignment algorithm is described here.  DATAPATH It is assumed that the scores of alignment are already present in a scoring matrix. The scoring matrix is implemented using a RAM memory. The logic for calculating scores is not implemented. It can be interfaced with the memory so as to write into it. It is designed to give the values of f(ai , bj) for a particular i (row) and all j (columns). In a single read operation it ouputs 8 bytes of data. The scores along with the previous table entries are used to calculate all the w[j] values in parallel. This is done by the w_calculation module. This module receives 8 bytes from the scoring matrices corresponding to f(ai , bj) for the respective row and 8 bytes from the previously calculated row from the table namely T(i-1,j) for all values of j. It calculates the following. w[j] = max ((T[i-1,j] + f(ai , '-'), (T[i-1,j-1] + f(ai , bj)) z[j] = w[j] – y[j] for all 8 values of j.

VHDL Implementation - Biological Sequence Alignment

Image
This post describes the VHDL implementation details and abstractions. Kindly look at the post on parallel prefix methodology to understand the context under which these VHDL modules work NODE The basic functional unit of the parallel prefix method is a node. It has two modes of operation namely operation and duplication. The node performs the binary operation on the two inputs given to it when it operates in the operation mode. When in the duplication mode, the output is the first input to the node. It is a combinational circuit. Its VHDL representation is as follows. ------------------------------------------------------------------------------- -- Parallel Prefix : Node .vhd -- Abishek Ramdas -- NYU Poly -- date - 11/26/10 ------------------------------------------------------------------------------- library ieee; use ieee.numeric_std.all; use ieee.std_logic_1164.all; use ieee.std_logic_signed.all; entity node is port ( x_input: in std_lo

Parallel Prefix Methodology - Biological Sequence Alignment

Image
This post is in continuation with the previous post which describes parallel prefix algorithm in detail. Have a look at that post before reading through here. This is a sample implementation example of the parallel prefix algorithm (parallel prefix- click to enlarge) PARALLEL PREFIX METHODOLOGY Consider two sequences each 8 words long. (PELICANS, COELACAN). The table to be constructed is of size [9x9]. Row 0 is assumed to have the row gap penalties and column 0 is assumed to have column gap penalties. To calculate the table entries for row 1, t[1,j], j varies from 0 to 8. 1. w[j] is defined as w[j] = max ((T[i-1,j] + f(ai , '-'), (T[i-1,j-1] + f(ai , bj)) Thus for row 1 w[0] = max((T[0,0] – 1), (T(0,-1) + f(a1 , b0)) = T[0,0]-1 w[1] = max((T[0,1] – 1), (T(0,0) + f(a1, b1)) w[2] = max((T[0,2] – 1), (T(0,1) + f(a1, b2)) w[3] = max((T[0,3] – 1), (T(0,2) + f(a1, b3)) w[4] = max((T[0,4] – 1), (T(0,3) + f(a1, b4)) w[5] = max((T[0,5] – 1), (T(0,4) + f(a1, b5)) w

Parallel Prefix Algorithm - Biological Sequence Alignment

PARALLEL PREFIX It is abundantly clear, that there exist dependencies between the calculations of cells. At a time, only one cell can be calculated. This poses severe implications on the time and space constraints when we try to speed up the process. Parallel prefix is a remedy to this problem. Using parallel prefix we reduce the dependencies between the cells and calculate the scores parallel. In fact, we are able to calculate the cells of an entire row in parallel. This allows us to optimize the time as well as space. The algorithm is optimized in time and is implemented to compare two 8 sequences, each 8 characters long. The time taken for prefix computation is log2(8) = 3 clock cycles. It is a form of data parallel programming PARALLEL PREFIX FORMULAE Given n inputs (a0,a1,....an) and a binary operator (ex maximum of 2 inputs). applying the binary operator in parallel prefix operation to the n inputs gives n outputs which are (a0, max(a0,a1), max(a0,a1,a2), ...max(a0,a1,a2,

Biological Sequence Alignment - Needleman-Wunsch Algorithm

The Needleman-Wunsch algorithm is discussed here before going into the details of implementation. NEEDLEMAN-WUNSCH ALGORITHM This Algorithm is a Dynamic Programming based algorithm. It finds the best, overall character to character match score between the two sequences. For this, an Alignment Matrix is used to calculate the best overall, character to character match score between the 2 sequences. The result of the algorithm is a singular score which reflects the degree of goodness of the match. A trace-back process is then used to retrieve the actual alignment. PROPERTIES OF ALIGNMENT MATRIX 1. Alignment takes place in a two-dimensional matrix. 2. Gap characters can never be paired to each other. The matrix is an (m+1) x (n+1) matrix, where ‘m’ and ‘n’ are the lengths of the two sequences. 3. Each cell corresponds to a pairing of one letter from each sequence. 4. Alignment starts at the upper left and follows a mostly diagonal path down and to the right. 5. Each cell of t

Biological Sequence Alignment - Important Terms

I am introducing a few terms that are important in terms of understanding biological sequence alignment. I do not have a background in biology and understanding these terms are important when implementing the Needle-Wunsch algorithm. WHAT IS BLAST? BLAST is an acronym for Basic Local Alignment Search Tool. Despite the adjective "Basic" in its name, BLAST is a sophisticated software package that has become the single most important piece of software in the field of bioinformatics. There are several reasons for this. First, sequence similarity is a powerful tool for identifying the unknowns in the sequence world. Second, BLAST is fast. The sequence world is big and growing rapidly, so speed is important. Third, BLAST is reliable, from both a rigorous statistical standpoint and a software development point of view. Fourth, BLAST is flexible and can be adapted to many sequence analysis scenarios. Finally, BLAST is entrenched in the bioinformatics culture to the extent that

FPGA Implementation of Biological Sequence Alignment Algorithm - Introduction

Biological Sequence Alignment Implemented using FPGA - Needleman_Wunsch Algorithm Biological sequences like DNA and proteins show complex patterns of similarity to one another. In this regard, they mirror the external morphologies of the organisms in which they reside. You'll notice that birds, for example, show natural groupings. You don't have to be a biologist to see that ducks, geese, and swans comprise a reasonably natural group called the waterfowl, and that the similarities between ducks and geese seem too great to explain by mere coincidence. Biological sequences are no different. After all, the reason why ducks look like ducks and geese look like geese is because of their genes. Many molecular biologists are convinced that understanding sequence evolution is tantamount to understanding evolution itself. Genetic sequences change over time due to three forces: mutation, natural selection, and genetic drift. By studying the similarity or dissimilarity between the sequ

How to post source code on blogger

http://codeformatter.blogspot.com/2009/06/about-code-formatter.html Paste your code in the box provided and generate the HTML. Copy the HTML code generated and paste it in your post. It worked :) Abishek

Sparse matrix - 3 tuple representation

//program to 3-tuple representation #include<stdio.h> #include<unistd.h> #define SIZE 3000; void main() { int a[5][5],row,columns,i,j; printf("Enter the order of the matrix(5*5)"); scanf("%d %d",&row,&columns); printf("Enter the element of the matrix\n"); for(i=0;i<row;i++) for(j=0;j<columns;j++) { scanf("%d", &a[i][j]); } printf("3-tuple representation"); for(i=0;i<row;i++) for(j=0;j<columns;j++) { if(a[i][j]!=0) { printf("%d %d %d", (i+1),(j+1),a[i][j]); } } getchar(); }

Cache Blocking in C

/*:) cache_blk_mat_mult :) have fun */ // header files #include<unistd.h> #include<stdio.h> int min(int, int); //main func int main(int argc, char *argv[]) { int jj,kk,j,k,i; int B = 5, N = 10,r; int x[N][N],y[N][N],z[N][N]; for(i=0;i<N;i++) for(j=0;j<N;j++) { printf("%d%d ",i,j); x[i][j]=5; y[i][j]=6; z[i][j]=7; } for(jj=0;jj<N;jj=jj+B) { printf("jj = %d\n",jj); for(kk = 0;kk<N;kk=kk+B) { printf("kk = %d\n",kk); for(i=0;i<N;i++) { printf("i = %d\n",i); for (j = jj; j < min(jj+B-1,N); j++) { r = 0; for (k = kk; k < min(kk+B-1,N); k++) { printf("r = %d + y[%d][%d]*z[%d][%d]\n",r,i,k,k,j); r = r + y[i][k]*z[k][j]; } x[i][j] = x[

P Threads in C

#include<pthread.h> #include<stdio.h> #include<stdlib.h> #define NUM_THREADS 5 int abi = 100; void *PrintHello(void *threadid) { long tid; long tid2; tid = (long)threadid; tid2 = tid; printf("Hello World! its me thread # %ld\n",tid); printf("incrementing tid2 in thread # %ld\n%ld\n",tid,tid2*2); pthread_exit(NULL); } int main(int argc, char *argv[]) { pthread_t threads[NUM_THREADS];//array of pthreads int rc; long t; abi++; for(t=0;t<NUM_THREADS;t++) { printf("In main: creating thread %ld\n",t); rc = pthread_create(&threads[t],NULL,PrintHello, (void *)t); if(rc) { printf("Error code returned from code pthread_create() is %d\n",rc); exit(-1); } } pthread_exit(NULL); }

System function in C

System function in C – Used to run shell commands inside a C program. /*:) system :) have fun */ // header files #include<unistd.h> #include<stdio.h> #include<stdlib.h> //main func int main(int argc ,char *argv[]) { printf("running ps with system() function\n"); system("ps -ax &"); printf("done"); return 0; }

Reading a file using fread

/*:) fread_fileV1 :) have fun */ // header files #include<unistd.h> #include<stdio.h> //main func int main(int argc, char *argv[]) { FILE *fp; char buff[11]; int pid; fp=fopen("baby","r"); pid=fork(); if(pid==0) { printf("Initial child pointer is %L\n",ftell(fp)); fread(buff,sizeof(buff),1,fp); buff[10]='\0'; printf("child read is %s\n",buff); printf("the child pointer after reading is %l\n",ftell(fp)); printf("the child is exiting \n"); } else { wait(0); printf("initial parent pointer is %l\n",ftell(fp)); fread(buff,sizeof(buff),1,fp); buff[10]='\0'; printf("parent read is %s\n",buff); printf("the parent poitner after read is %l\n",ftell(fp)); printf("parent exiting"); } return 0; }

File handler

files are the means to communicate between processes files are shared between them. when a read is made in one of the processes it moves the file posiion handler.here the read in the child process moves the file position pointer ten positions. when the read is made from the parent process the file is read from the eleventh position hence different things are displayed in the screen. // header files #include<unistd.h> #include<stdio.h> #include<fcntl.h> //main func int main(int argc, char *argv[]) { int fp,pid; char buff[11]; fp=open("baby",O_RDONLY); pid = fork(); if(pid ==0) { printf("child begins\n"); read(fp,buff,10); buff[10]='\0'; printf("child read\n"); puts(buff); printf("the file handle is :%d\n",lseek(fp,0,1)); printf("child ends\n");

Fork - Program to fork a child process 2

/*:) fork_test :) have fun */ // header files #include<unistd.h> #include<stdio.h> #include<sys/types.h> //main func int main(int argc, char *argv[]) { pid_t new_pid; char *message; int n; printf("fork program starting"); new_pid = fork(); switch(new_pid) { case -1: printf("fork failed"); exit(1); case 0: message = "this is child"; n=5; break; default: message = "this is parent"; n=3; break; } for(;n>0;n--) { puts(message); sleep(1); } exit(0); }

Fork - Program to fork a child process

/*:) forktestV1 :) have fun 1. the variable i is declared common to both the parent as well as the child process. but parent and child each get their own version of the variable with the samme name.incrementing the variable in the child does not increment the variabble in the parent. 2. also if a pointer is used the value is not incremented. parent and child both gets a pointer to a variable. 3. when declared globally also does not increment the variable. each process gets a global variable. 4. printing the addres in child and parent gives the same value for the address 5. how it works is the child process executes and after its tiem slice it is moved to the swap space. the parent then executes and after its time slice it is moved to the swap space. the address of the variable remains the same. */ // header files #include<unistd.h> #include<stdio.h> //main func int i=10; int main(int argc, char *argv[]) { int pid; pi

Program to automate pre program writing process

When writing a program, usually we need to create a folder and create a .C file inside the folder. In the .C file, header files need to be added. This program automates this and opens the .C file in Emacs for writing the program. /*usage: ./prog <directory name> <file name w/o extension.c> location: mybin need: to create a directory and program file to write programs. it creates the directory in the default progs folder at a single stroke. this program also writes a default code within the file creats a dir <directory name> and inside it creates a file <filename> no need to add extension .c as it automatically appends it */ #include<unistd.h> #include<sys/types.h> #include<sys/stat.h> #include<error.h> #include<errno.h> #include<dirent.h> #include<stdio.h> #include<stdlib.h> #include<wait.h> int main(int argc, char *argv[]) { int flag; DIR *dp; FILE

Core Dump - Check for core dump

/* core_dumpV1 */ // header files #include<unistd.h> #include<stdio.h> //main func void main(int argc,char *argv[]) { int i,pid,exitstat,status,j=0; pid=fork(); if(pid==0) { i=10/j; } else { wait(&status); if(status&80) printf("our core dumped\n"); else printf("core dumped my foot\n"); } }

Syncing Processes 2

Process Synchronization method 2 /* proc_syncV2 */ // header files #include<unistd.h> #include<stdio.h> int main() { int i,pid,exitstat,status; pid=fork(); if(pid==0) { sleep(10); exit(3); } else { wait(&status); printf("status returned by the process is %d\n",status); printf("status & 0xff = %d\n",status&0xff); if(status&0xff!=0) { printf("signal is interrupted\n"); } else { exitstat=(int)status/256; printf("Exit status from %d was %d\n",pid,exitstat); } } return 0; }

Syncing processes

Syncing Processes – When running two or more processes, synchronization is important. Two versions of synchronization is available here. /* proc_syncV1 */ // header files #include<unistd.h> #include<stdio.h> //main func int main(int argc, char *argv[]) { int i=0, pid,cpid,dip; printf("ready to fork\n"); pid=fork() if(pid==0) { printf("1st child process id is %d\n",getpid()); printf("first child terminating from memory\n"); } else { dip=fork(); if(dip==0) { printf("2nd child process id is %d\n",getpid()); printf("seoncd child is terminating from memory\n"); } else { cpid=wait(0); printf("child with pid %d died\n",cpid); cpid=wait(0);

Playing a playlist of MP3 files using the mpg player in Linux

/* mpg3V1 */ // header files #include<unistd.h> #include<stdio.h> #include<error.h> #include<errno.h> //main func int main(int argc, char *argv[]) { File *playlist;//file pointer for playlist char filename[256];//getting filename from the playlist pid_t pid_mpg;//the pid of child process for launching mpg player playlist=fopen("/home/abishek/Music/playlist.txt","r");//reads the playlist intp fp "playlist" if(playlist==NULL) { error(1,errno,"cannot open the playlist"); } while(filename!=NULL) { filename=fgets(filename,sizeof(filename),playlist);//gets first filename from playlist if(filename==NULL)//checking end of file { printf("end of file reached"); return 0; } pid_mpg=fork();//branching child process cp->mpg123 pp->wait if(pid_mpg==-1) { error(1,errno,"cant

C programs - playing MP3 files using C

/* snd_testV1 */ // header files #include<unistd.h> #include<stdio.h> #include<error.h> #include<errno.h> //main func int main(int argc, char *argv[]) { FILE *song; FILE *snd_op; char song_data[256]; song=fopen("/home/abishek/Music/PPRY.mp3","r"); snd_op=fopen("/dev/dsp","w"); if(song==NULL) { error(1,errno,"cannot open file"); } if(snd_op==NULL) { error(1,errno,"cannot open player"); } while(song_data!=NULL) { fgets(song_data,sizeof(song_data),song); if(song_data==NULL) { //eof reached return return 0; } fprintf(snd_op,"%s",song_data); } fclose(snd_op); fclose(song); }

C Program - Copy data from a file to another

copy.c SYNTAX: copy file1 file2 file1 - file to be copied file2 - file on to which file1 is to be copied copies file1 to file2 after creating file2. if already existing it rewrites the oldfile with the new file remarks 1. error checking is done 2. if file1 is not present it shows error 3. working for files in the same directory as well as files in other directories 4. copies in blocks of 1024 bytes buffer size is 1024 5. if file2 is not present it creates a file if already present it over writes #include<unistd.h> #include<fcntl.h> #include<stdlib.h> #include<sys/stat.h> #include<stdio.h> #include<error.h> #include<errno.h> int main(int argc, char *argv[]) { char block[1024]; int in, out; size_t rlen,wlen; char *filein; char *fileout; filein=argv[1]; fileout=argv[2]; // printf("%s\n",filein); // printf("%s\n",fileou

C Programs - Hello World

I am uploading source code of few commonly used functions. The programs are written in C and I used gcc to compile and test these programs. I hope you find them useful. As it is always the case, I am starting with hello world #include<stdlib.h> #include<stdio.h> void main() { printf("HEllo World"); }