Friday, February 24, 2012

Biological Sequence Alignment using Parallel Prefix Algorithm implemented using FPGA

Uploading files in blogger

It is not possible to upload files in blogger but there are free hosting sites available to let you upload your files and share them easily.

Few of them are mentioned here

This file sharing site is interesting because it offers 5GB upload limit.
http://www.filedropper.com/

Have fun :)
Abishek

VHDL Files download here - Biological Sequence Alignment


DOWNLOAD - VHDL files, test .do files for testing using ModelSIM. Also find the Xilinx Project files (including the bit file) for implmentation in Nexys 2 SPARTAN 3E FPGA board. 



VHDL IMPLEMNTATION - description of the files

The different modules that are used to implement parallel prefix based sequence matching
1. node.vhd
2. RAM_NC.vhd
3. w_calculation.vhd
4. adder_signed_8bit.vhd
5. parallel_prefix.vhd

1. NODE.vhd
This module has already been described earlier. It is the basic block of parallel prefix method. The module is tested with the test_node.do file. The output of this simulation is shown below
Node Test
Note the maximum of x_input and y_input is found and given to x_output when dup_en=0 and x_input is given to x_output when dup_en = 1, signed comparison is done between the two inputs.

2. RAM_NC.vhd
  • Random access memory is used to store scoring matrix
  • table entries
This RAM is defined in RAM_NC.vhd
A RAM of 8x64 bits is defined in the RAM. The inputs are clk, we, en, addr, di and output is do. A read is done when en =1 and an address is given. Read is done in the rising edge of clock. A write can be done by loading a value in the di input of the module and a address in addr input and making high both en and we inputs. Writing into the address is also done in the rising edge of the clock.

3.Adder_signed_8bit.vhd
Inputs are inA, inB, add_en. Outputs are sum, overflow.Signed addition of inputs inA, inB. The inputs are added only when add_en=1 . The sum is given in sum output and an overflow output is also present. The overflow has no use in this module but might be useful later. The test file is test_adder.do

4. w_calculation
This module is used to obtain the values of z[j]s using scoring matrix and the previous table entries. The test file for this module is test_w_calc.do. The simulated results are as shown

Table_entry_prev is used to store the previous table entry. Scoring matrix value is present in the variable scoring_matrix_value.

T[i-1,j] + f(ai , '-') is represented by table_ent_minus1
T[i-1,j-1] + f(ai , bj) is represented by table_entry_plus_score

In this module, maximum of table_entry_minus1 and table_entry_plus_score is found for 8 values. The counter_3bit is used to keep track of the column gap penalty (column 0 of the table). The maximum values are found (w[j] values) and added with y[j] values (row 0 of the table) to get z[j]. The z[j] values are loaded into the output w_next_row when ready signal is made high.

5. Parallel Prefix
Counter_2bit is used as a reference for pipelining. At 00 falling edge, values are loaded into the x_ram for parallel prefix operation. At 00 note the values loaded into the x_ram. As parallel prefix proceeds, the maximum values for each stage is shown by x_ram values for counter_2bit values of 01, 10. At counter_2bit = 11 we note the values in x_ram are arranged as required. The maximum value among the 8 inputs is present in x_ram(7).

(click to enlarge)

Thus concludes the parallel prefix implementation of Biological Sequence Alignment using FPGA. Thank you,

Abishek

Model Calculations and Proof of Concept Simulations - Biological Sequence Alignment

MODEL CALCULATIONS

The example under consideration is


The Row 0 and Column 0 are assumed to be known.

Calculating Row 1
w[j] = max ((T[i-1,j] + f(ai , '-'), (T[i-1,j-1] + f(ai , bj))
w[1] = max((T[0,1] – 1), (T(0,0) + f(a1, b1)) = 0
w[2] = max((T[0,2] – 1), (T(0,1) + f(a1, b2)) = -1
w[3] = max((T[0,3] – 1), (T(0,2) + f(a1, b3)) = -2
w[4] = max((T[0,4] – 1), (T(0,3) + f(a1, b4)) = -3
w[5] = max((T[0,5] – 1), (T(0,4) + f(a1, b5)) = -6
w[6] = max((T[0,6] – 1), (T(0,5) + f(a1, b6)) = -5
w[7] = max((T[0,7] – 1), (T(0,6) + f(a1 , b7)) = -6
w[8] = max((T[0,8] – 1), (T(0,7) + f(a1, b8)) = -7

z[j] = w[j] + y[j]
z[1] = w[1] -y[1] = w[1] +1 = 0
z[2] = w[2] -y[2] = w[2] +2 = 0
z[3] = w[3] -y[3] = w[3] +3 = 0
z[4] = w[4] -y[4] = w[4] +4 = 0
z[5] = w[5] -y[5] = w[5] +5 = 0
z[6] = w[6] -y[6] = w[6] +6 = 0
z[7] = w[7] -y[7] = w[7] +7 = 0
z[8] = w[8] -y[8] = w[8] +8 = 0

(Compare with x_reg in the output)

T[1,1] = x[1] + g[1] = x[1] - 1 = -1
T[1,2] = x[2] + g[2] = x[2] - 2 = -2
T[1,3] = x[3] + g[3] = x[3] - 3 = -3
T[1,4] = x[4] + g[4] = x[4] - 4 = -4
T[1,5] = x[5] + g[5] = x[5] - 5 = -5
T[1,6] = x[6] + g[6] = x[6] - 6 = -6
T[1,7] = x[7] + g[7] = x[7] - 7 = -7
T[1,8] = x[8] + g[8] = x[8] - 8 = -8

Compare with outputs y0 to y7 in the simulated output. Row 1 values are calculated.

Simulation of Row 1
 Row 1 values can be checked to be correct.(click to enlarge)

Calculation of Row 2

Row 2
w[j] = max ((T[i-1,j] + f(ai , '-'), (T[i-1,j-1] + f(ai , bj))

w[1] = max((T[1,1] – 1), (T(1,0) + f(a2, b1)) = max(-2,-2) = -2
w[2] = max((T[1,2] – 1), (T(1,1) + f(a2, b2)) = -2
w[3] = max((T[1,3] – 1), (T(1,2) + f(a2, b3)) = -1 (note: f(a2, b3) = 1, match)
w[4] = max((T[1,4] – 1), (T(1,3) + f(a2, b4)) = -4
w[5] = max((T[1,5] – 1), (T(1,4) + f(a2, b5)) = -5
w[6] = max((T[1,6] – 1), (T(1,5) + f(a2, b6)) = -6
w[7] = max((T[1,7] – 1), (T(1,6) + f(a2, b7)) = -7
w[8] = max((T[1,8] – 1), (T(1,7) + f(a2, b8)) = -8

z[j] = w[j] - y[j]
z[1] = w[1] -y[1] = w[1] + 1 = -1
z[2] = w[2] -y[2] = w[2] + 2 = 0
z[3] = w[3] -y[3] = w[3] + 3 = 2
z[4] = w[4] -y[4] = w[4] + 4 = -4
z[5] = w[5] -y[5] = w[5] + 5 = -5
z[6] = w[6] -y[6] = w[6] + 6 = -6
z[7] = w[7] -y[7] = w[7] + 7 = -7
z[8] = w[8] -y[8] = w[8] + 8 = -8
(Compare with w_x_reg in simulated output)

After parallel prefix

x[1] = max( z[1]) = -1
x[2] = max( z[1], z[2]) = 0
x[3] = max( z[1], z[2], z[3]) = 2
x[4] = max( z[1], z[2], z[3], z[4]) = 2
x[5] = max( z[1], z[2], z[3], z[4], z[5]) = 2
x[6] = max( z[1], z[2], z[3], z[4], z[5], z[6]) = 2
x[7] = max( z[1], z[2], z[3], z[4], z[5], z[6], z[7]) = 2
x[8] = max( z[1], z[2], z[3], z[4], z[5], z[6], z[7], z[8]) = 2
(Compare with x_reg in the output)

T[1,1] = x[1] – g[1] = x[1] - 1 = -2
T[1,2] = x[2] – g[2] = x[2] - 2 = -2
T[1,3] = x[3] – g[3] = x[3] - 3 = -1
T[1,4] = x[4] – g[4] = x[4] - 4 = -2
T[1,5] = x[5] – g[5] = x[5] - 5 = -3
T[1,6] = x[6] – g[6] = x[6] - 6 = -4
T[1,7] = x[7] – g[7] = x[7] - 7 = -5
T[1,8] = x[8] – g[8] = x[8] - 8 = -6
Compare with outputs y0 to y7 in the simulated output. Row 2 values are calculated.




































The other values are calculated and documented in this table.










They are in conformance with the values obtained in simulations.


Salient Features - Biological Sequence alignment

SALIENT FEATURES OF THE PROGRAM
Finite state machine cycles through 4 states s0, s1, s2, s3. The states are indicated by a 2 bit counter (counter_2bit) which increments every clock pulse.
s0 = 00
s1 = 01
s2 = 10
s3 = 11
One cycle is one pass through the four states of the finite state machine.
For a particular row i, consider the scoring matrix RAM (w_ram). Assume the scores for the particular row are already available. Address to the RAM is generated by a 3 bit counter. Every s0, the address to the RAM is incremented. At s1 the read enable of the ram is made high. At s2, the scores for the next row are available in the outputs of the ram. Thus during the calculation of table entries for a particular row, the scores required for the next row are fetched from the memory and are available.

While calculating the table entries of a particular row, the table entries of the previous row are already available in the s3 stage of the previous cycle. The parallel prefix module gets the z inputs during the falling edge of the s0 state. This is done so as to provide enough time for the combinational calculation of z[j]s from previous row table entries and values from scoring matrices. The z[j]s is available in w_x_reg at the rising edge of state 0. This value is latched on to x_ram during the falling edge of the same state. This is done so that the inputs to the parallel prefix module are available when it starts running at state s1.

The parallel prefix algorithm then runs on the z[j] values stored in x_ram. The initial z[j] values are
available in its inputs during state s0. The parallel prefix algorithm starts executing at s1, goes to the second stage at state s2 and completes the operation in state s3. The appropriate control signals for the nodes are given based on the table described previously. 8 nodes are instantiated and their mode of operation is controlled by 8 bit control register dup_en. Each bit of this register control the appropriate node. At s1, node 0 is alone in the duplication mode and rest is in operation mode. In s2, node0, node1 are in duplication mode and rest in operation mode. In s3, node0, node1, node2, node3 are in duplication mode and rest is in operation mode. The appropriate dup_en signals are generated using shift_control logic. Thus at s3 the x[j] values are available.

These x[j]s are connected to 8 signed 8 bit adders and added with 8 bytes of y[j] during the same state s1. This is done with minimum delay since it is a combinational circuit. The add_en signal is generated every s3 and it adds the 8 bytes of x[j]s to 8 bytes of y[j]s to get 8 bytes of table entries. These table entries are stored in table_ram as well as fed back to the w_calculation module to calculate w values for the next row.

At state 1, table_ram is write-enabled and the 8 bytes of table entries are stored in the memory. By the end of state s3, previous row table entries and scoring matrix values for the next row are available at the inputs of w_calculation module. w_calculation module is enabled in state s0 of the next cycle. The combinational logic calculates values of z and they are latched on the inputs of parallel prefix module by the falling edge of the s0 state of the next cycle.

This process is repeated 8 cycles to fill all entries of the table. the table generated at the end is optimized only to contain the calculated table values. Row0 and column0 values are not present in the table as they can be easily calculated.

TIME EFFICIENCY
Dynamic programming algorithms require O(mn) time to complete sequence matching. In this case matching 8 word query with a 8 word database, time taken by dynamic programming techniques would have been 64 clock pulses. But parallel prefix algorithm is much faster. Each cycle of the parallel prefix requires 4 clock pulses. Number of cycles in this particular case is 8. so the complete operation of sequence matching is done in a span of 32 clock pulses which is almost half the time required by dynamic programming algorithms. This is achieved by proper pipelining of the different stages. Pipelining is done so as to prefetch the values from memory, calculations are done in parallel for 8 bytes and writng in to the table is pipelined with calculation of w values for the next row. The control signals for both the RAMs, w_calculation module, the 8 nodes and the 8 signed adders are appropriately generated using the 2 bit counter as the reference.

Data Path - Biological Sequence Alignment

The control logic implementing the sequence alignment algorithm is described here. 

DATAPATH

It is assumed that the scores of alignment are already present in a scoring matrix. The scoring matrix is implemented using a RAM memory. The logic for calculating scores is not implemented. It can be interfaced with the memory so as to write into it. It is designed to give the values of f(ai , bj) for a particular i (row) and all j (columns). In a single read operation it ouputs 8 bytes of data.

The scores along with the previous table entries are used to calculate all the w[j] values in parallel. This is done by the w_calculation module. This module receives 8 bytes from the scoring matrices corresponding to f(ai , bj) for the respective row and 8 bytes from the previously calculated row from the table namely T(i-1,j) for all values of j. It calculates the following.

w[j] = max ((T[i-1,j] + f(ai , '-'), (T[i-1,j-1] + f(ai , bj))
z[j] = w[j] – y[j] for all 8 values of j. f(ai , '-') is assumed to be constant '-1' and y[j] represents the zeroth row of the table and is assumed to be constant (-1 to -8). The output of the w_calculation module is 8 bytes of z which is used to calculate the table entries for the current row.


(aargh clarity vs boundaries - clarity wins)

The 8 bytes of z[j] are fed into 8 nodes of parallel prefix module. In 3 clock cycles the 8 bytes of x[j] are calculated in parallel. The operation performed is
x[j] = max((w[j]+gj), (x[j-1])




Once Z[j] values are known, the values of z[j] are added with y[j] which is stored in a separate RAM. 8 signed adders are used to perform the operation T[i,j] = x[j] – g[j] in parallel.

The result is stored into a Table Matrix which contains the optimized scores of alignment excluding the zeroth row and column. The table matrix is implemented as a RAM. It stores 8 rows each consisting of 8 bytes each corresponding to the optimal alignment. Trace back can be done by reading through the table. Trace back has not been implemented.


The final block diagram of the digital circuit is given here. 


























































VHDL Implementation - Biological Sequence Alignment

This post describes the VHDL implementation details and abstractions. Kindly look at the post on parallel prefix methodology to understand the context under which these VHDL modules work

NODE
The basic functional unit of the parallel prefix method is a node. It has two modes of operation namely operation and duplication. The node performs the binary operation on the two inputs given to it when it operates in the operation mode. When in the duplication mode, the output is the first input to the node. It is a combinational circuit. Its VHDL representation is as follows.
 -------------------------------------------------------------------------------  
 -- Parallel Prefix : Node .vhd  
 -- Abishek Ramdas  
 -- NYU Poly  
 -- date - 11/26/10  
 -------------------------------------------------------------------------------  
 library ieee;  
 use ieee.numeric_std.all;  
 use ieee.std_logic_1164.all;  
 use ieee.std_logic_signed.all;  
 entity node is  
 port (  
 x_input: in std_logic_vector(7 downto 0); -- input to the node from the same line  
 y_input: in std_logic_vector(7 downto 0); -- input to the nodefrom the previous level  
 dup_en: in std_logic; -- dup_en=1 node in duplicate mode, dup_en=0 node in operation mode  
 x_output: out std_logic_vector(7 downto 0)); -- stores result of operation into x array in next clock cycle  
 end node;  
 architecture behav of node is  
 signal x_2s_comp : std_logic_vector(7 downto 0); -- 2's complement of x_input incase it is negative  
 signal y_2s_comp : std_logic_vector(7 downto 0); -- 2's complement of y_input incase it is negative  
 begin -- behav  
 calc_proc : process(y_input, x_input, dup_en)  
 begin  
 if(dup_en='0')then --operation mode  
 if(x_input(7)='0' and y_input(7)='0')then --both the inputs are positive  
 if(x_input<y_input)then  
 x_output<=y_input;  
 else  
 x_output<=x_input;  
 end if;  
 elsif(x_input(7)='0' and y_input(7)='1')then --x input +, y input -  
 x_output <= x_input;  
 elsif(x_input(7)='1' and y_input(7)='0')then -- x input -, y input +  
 x_output <= y_input;  
 elsif(x_input(7)='1' and y_input(7)='1')then -- x input -, y input -  
 11  
 x_2s_comp <= not(x_input)+"00000001"; -- take twos complimnet of both the numbers to find bigger number  
 y_2s_comp <= not(y_input)+"00000001";  
 if(x_2s_comp<y_2s_comp)then  
 x_output<=x_input;  
 else  
 x_output<=y_input;  
 end if;  
 else  
 if(x_input<y_input)then  
 x_output<=y_input;  
 else  
 x_output<=x_input;  
 end if;  
 end if;  
 else --dupicate mode  
 x_output<=x_input;  
 end if;  
 end process;  
 end behav;  

As mentioned earlier, to calculate the values of x[j] from z[j] (j=1 to 8), 8 nodes are used so as to optimize time. Each node gets a z input and input from its previous stage. The modes of the different nodes follow the pattern shown in the table.

(couldnt make it smaller and still legible :))

A control logic is implemented which generates the signal corresponding to the mode of operation. This is described in the next post

Parallel Prefix Methodology - Biological Sequence Alignment

This post is in continuation with the previous post which describes parallel prefix algorithm in detail. Have a look at that post before reading through here. This is a sample implementation example of the parallel prefix algorithm

(parallel prefix- click to enlarge)

PARALLEL PREFIX METHODOLOGY
Consider two sequences each 8 words long. (PELICANS, COELACAN). The table to be constructed is of size [9x9].

Row 0 is assumed to have the row gap penalties and column 0 is assumed to have column gap penalties. To calculate the table entries for row 1, t[1,j], j varies from 0 to 8.

1. w[j] is defined as
w[j] = max ((T[i-1,j] + f(ai , '-'), (T[i-1,j-1] + f(ai , bj))

Thus for row 1
w[0] = max((T[0,0] – 1), (T(0,-1) + f(a1 , b0)) = T[0,0]-1
w[1] = max((T[0,1] – 1), (T(0,0) + f(a1, b1))
w[2] = max((T[0,2] – 1), (T(0,1) + f(a1, b2))
w[3] = max((T[0,3] – 1), (T(0,2) + f(a1, b3))
w[4] = max((T[0,4] – 1), (T(0,3) + f(a1, b4))
w[5] = max((T[0,5] – 1), (T(0,4) + f(a1, b5))
w[6] = max((T[0,6] – 1), (T(0,5) + f(a1, b6))
w[7] = max((T[0,7] – 1), (T(0,6) + f(a1 , b7))
w[8] = max((T[0,8] – 1), (T(0,7) + f(a1, b8))

It is to be noted that if f(ai , bj) is obtained from the scoring matrix and previous row values are available the w values for the next row can be calculated easily.


2. Calculate Z values for row 1
z[j] = w[j] – y[j] (j varies from 0 to 8)
Assuming constant gap penalties of -1
z[0] = w[0] -y[0] = w[0] +0
z[1] = w[1] -y[1] = w[1] +1
z[2] = w[2] -y[2] = w[2] +2
z[3] = w[3] -y[3] = w[3] +3
z[4] = w[4] -y[4] = w[4] +4
z[5] = w[5] -y[5] = w[5] +5
z[6] = w[6] -y[6] = w[6] +6
z[7] = w[7] -y[7] = w[7] +7
z[8] = w[8] -y[8] = w[8] +8
Z[j], j varies from 0 to 8 are known.


3. Calculating x[j] j varies from 0 to 8
x[j] = max( z[0] , z[1], ...z[j])
x[0] = z[0] (no need for prefix calculation, column 0 values)
x[1] = max(z[0], z[1])
x[2] = max(z[0], z[1], z[2])
x[3] = max(z[0], z[1], z[2], z[3])
x[4] = max(z[0], z[1], z[2], z[3], z[4])
x[5] = max(z[0], z[1], z[2], z[3], z[4], z[5])
x[6] = max(z[0], z[1], z[2], z[3], z[4], z[5], z[6])
x[7] = max(z[0], z[1], z[2], z[3], z[4], z[5], z[6], z[7])
x[8] = max(z[0], z[1], z[2], z[3], z[4], z[5], z[6], z[7], z[8])

Giving the values of Z into a 8 byte parallel prefix module with max as the binary operator, we get x[1] to x[8] values in log2(8) = 3 stages. x[0] does not require a prefix calculation to be performed.

 
 Nodes are used to perform the maximum operation. Each node has two inputs and an output. They operate in two modes. Operation mode in which the output is the maximum of two inputs and the duplication mode in which the output is duplicate of one of the inputs. 8 nodes are used in this case and they operate in parallel. The system traverses through each of its stages in one clock pulse and hence the values of x are available in 3 clock cycles.

4. Calculating the table entries T[1,j] for row 1
Once the values of x[j] j varies from 0 to 8 are available, we can calculate the the table entries using the formulae
T[i,j] = x[j] – g[j]
T[1,0] = x[0] + y[0] = x[0] - 0
T[1,1] = x[1] + g[1] = x[1] - 1
T[1,2] = x[2] + g[2] = x[2] - 2
T[1,3] = x[3] + g[3] = x[3] - 3
T[1,4] = x[4] + g[4] = x[4] - 4
T[1,5] = x[5] + g[5] = x[5] - 5
T[1,6] = x[6] + g[6] = x[6] - 6
T[1,7] = x[7] + g[7] = x[7] - 7
T[1,8] = x[8] + g[8] = x[8] - 8
These values of T[i,j] can be used to find the w[j] values that are required for calculating T[2,j]. Thus we observe how the table values are calculated in parallel for each row.

Next post describes the VHDL Implementation of this algorithm.


Parallel Prefix Algorithm - Biological Sequence Alignment

PARALLEL PREFIX

It is abundantly clear, that there exist dependencies between the calculations of cells. At a time, only one cell can be calculated. This poses severe implications on the time and space constraints when we try to speed up the process. Parallel prefix is a remedy to this problem.

Using parallel prefix we reduce the dependencies between the cells and calculate the scores parallel. In fact, we are able to calculate the cells of an entire row in parallel. This allows us to optimize the time as well as space.

The algorithm is optimized in time and is implemented to compare two 8 sequences, each 8 characters long. The time taken for prefix computation is log2(8) = 3 clock cycles. It is a form of data parallel programming


PARALLEL PREFIX FORMULAE

Given n inputs (a0,a1,....an) and a binary operator (ex maximum of 2 inputs). applying the binary operator in parallel prefix operation to the n inputs gives n outputs which are (a0, max(a0,a1), max(a0,a1,a2), ...max(a0,a1,a2,...an)). This operation is done in log2(n) steps.

You can find details on the method here - http://en.wikipedia.org/wiki/Prefix_sum


BASIC ALGORITHM

Let A = a1, a2, a3,.... am and B = b1, b2, b3,...... bn be the two sequences that are to be compared. The aim of this algorithm is to generate a table T of size(m+1)x(n+1) such that T[i,j] contains the optimal scoring for alignment a1,a2,....,ai and b1,b2,...bj.

Consider a simple scoring function
f (a,b) = 1 if a=b,
= 0 if a≠b
= -1 if a='-' or c='-'
Using this the scoring matrix for a gap alignment, match , mismatch can be calculated

T[0,0] is aligning two empty strings.
Hence T[0,0]=0.
T[0,j] = Σ f('-',bk) for k=1 to j
T[i,0] = Σ f(ak,'-') for k=1 to I
= T[i-1,0] + f(ai,'-')

When we align the non empty strings a1, a2,... ai and b1, b2,..... bi three possibilities arise
1. ai aligned with a gap
2. bj aligned with a gap
3. ai aligned with bj
Thus the table entry is given by
T[i,j] = max ((T[i-1,j] + f(ai , '-')), (T[i,j-1] + f('-',bj)) , (T[i-1,j-1] + f(ai , bj)))

The table matrix can be filled row by row or column by column or diagonal by diagonal. If we fill the matrix row by row using dynamic programming the time taken is of order O(mn). T[m,n] gives the optimal score once the table gets completed. Then traceback procedure is followed to get the optimal alignment of the two sequences.

Parallel prefix algorithms are used to compute the table values one row or column at a time. The distribution of work is uniform to all the processors at all times. There is also a reduction in number of table entries communicated per iteration.

Suppose the ith row is being computed using the values from the (i-1)th row which is previously calculated.
( T[i-1,j] + f(ai , '-'), (T[i-1,j-1] + f(ai , bj)) are pre-computed using the values in the previous row.

After separating the pre-computed terms,
w[j] = max ((T[i-1,j] + f(ai , '-'), (T[i-1,j-1] + f(ai , bj))
These values are calculated once the previous row table values are known.

Thus
T[i,j] = max ( w[j] , (T[i,j-1] + f('-',bj)))

Assume
x[j] = T[i,j] - Σ f('-',bk) for k=1 to j
= max( (w[j] - Σ f('-',bk) for k=1 to j), (T[i,j-1] - Σ f('-',bk) for k=1 to j))
= max( (w[j] - Σ f('-',bk) for k=1 to j) , x[j-1])

y[j] = Σ f('-',bk) for k=1 to j (already known – these are the gap penalties (row 0 of the table))
= gj if the gap penalties are considered to be a constant (-1)

z[j] = w[j] – y[j] (already known w[j] is precomputed and y[j] is known)

Therefore
x[j] = max((w[j]+gj), (x[j-1]) and
T[i,j] = x[j] – g[j]

Consider x[j]
x[j] = max((w[j]+gj), (x[j-1])
= max((w[j]+gj), max((w[j-1]+g[j-1]), x[j-2])

Max is an associative operator
max(a,max(b,c)) = max(a,b,c) hence

x[j] = max((w[j]+gj), (w[j-1]+g[j-1]), x[j-2])
= max((w[j]+gj), (w[j-1]+g[j-1]), (w[j-2]+g[j-2]),.....(w[0]+g[0]))
= max(z[j] ,z[j-1], z[j-2],.....,z[0])

The w[k] and g[k] for k=j to 0 are pre-calculated with the values in the previous row of the table. These values are already known. Putting these values into the parallel prefix algorithm will give the necessary values of x[k] for k = j to 0

Once x[k] for k = j to 0 is known, the table values are obtained by
T[i,j] = x[j] – g[j] for a row I.

In the next post I describe the parallel prefix methodology

Biological Sequence Alignment - Needleman-Wunsch Algorithm

The Needleman-Wunsch algorithm is discussed here before going into the details of implementation.

NEEDLEMAN-WUNSCH ALGORITHM

This Algorithm is a Dynamic Programming based algorithm. It finds the best, overall character to character match score between the two sequences. For this, an Alignment Matrix is used to calculate the best overall, character to character match score between the 2 sequences. The result of the algorithm is a singular score which reflects the degree of goodness of the match. A trace-back process is then used to retrieve the actual alignment.

PROPERTIES OF ALIGNMENT MATRIX

1. Alignment takes place in a two-dimensional matrix.
2. Gap characters can never be paired to each other. The matrix is an (m+1) x (n+1) matrix, where ‘m’ and ‘n’ are the lengths of the two sequences.
3. Each cell corresponds to a pairing of one letter from each sequence.
4. Alignment starts at the upper left and follows a mostly diagonal path down and to the right.
5. Each cell of the matrix actually contains two values, a ‘score’ and a ‘pointer.’
6. The score is derived from the scoring scheme.
7. The pointer is a directional indicator (an arrow) that points up, left, or diagonally up and left and navigates the matrix.
8. When two letters are aligned (match/mismatch), the path follows a diagonal trajectory.
9. The path moves horizontally when COELANCANTH is paired to gap characters.
10. The graph moves vertically when the letters from PELICAN are paired with gap characters.
11. Assign values for the first row and column of the matrix.
12. The score of each cell is set to the gap score multiplied by the distance from the origin.
13. Gaps may be present at the beginning of either sequence, and their cost is the same as anywhere else.
14. The arrows all point back to the origin, which ensures that alignments go all the way back to the origin (a requirement for global alignment).
15. We wish to determine the maximum scoring alignment
After computing these scores, assign the maximum value to the cell and point the arrow in the direction of the maximum score. Continue this operation until the entire matrix is filled, and each cell contains the score and pointer to the best possible alignment at that point.

PARALLEL PREFIX

It is abundantly clear, that there exist dependencies between the calculations of cells. At a time, only one cell can be calculated. This poses severe implications on the time and space constraints when we try to speed up the process. Parallel prefix is a remedy to this problem.

Using parallel prefix we reduce the dependencies between the cells and calculate the scores parallel. In fact, we are able to calculate the cells of an entire row in parallel. This allows us to optimize the time as well as space.

The algorithm is optimized in time and is implemented to compare two 8 sequences, each 8 characters long. The time taken for prefix computation is log2(8) = 3 clock cycles. It is a form of data parallel programming

Biological Sequence Alignment - Important Terms

I am introducing a few terms that are important in terms of understanding biological sequence alignment. I do not have a background in biology and understanding these terms are important when implementing the Needle-Wunsch algorithm.

WHAT IS BLAST?
BLAST is an acronym for Basic Local Alignment Search Tool. Despite the adjective "Basic" in its name, BLAST is a sophisticated software package that has become the single most important piece of software in the field of bioinformatics. There are several reasons for this. First, sequence similarity is a powerful tool for identifying the unknowns in the sequence world. Second, BLAST is fast. The sequence world is big and growing rapidly, so speed is important. Third, BLAST is reliable, from both a rigorous statistical standpoint and a software development point of view. Fourth, BLAST is flexible and can be adapted to many sequence analysis scenarios. Finally, BLAST is entrenched in the bioinformatics culture to the extent that the word "blast" is often used as a verb. There are other BLAST-like algorithms with some useful features, but the historical momentum of BLAST maintains its popularity above all others.

WHAT IS ALIGNMENT?
An alignment of two sequences is a one-to-one correspondence between their characters, without reordering, but with the possibility of some number of insertions or deletions (i.e., gaps or indels).

TYPES OF ALIGNMENT
Global Alignment: Both sequences are aligned along their entire lengths and the best alignment is found. Needleman-Wunsch is an algorithm for global alignment.

Local Alignment: The best sub-sequence alignment is found. TreeBLAST and Smith-Waterman algorithms implement Local Alignment. The significant difference is that although Smith-Waterman will report the best sub-sequence alignment between the two sequences, Tree BLAST can be used to report all the statistically significant scores.

ALIGNMENT SCORING THEORY

Sequences, or (more commonly) parts of sequences, are considered to have a possible biological relationship if the scoring procedure outlined here yields a score having statistical significance. Typically, one of the sequences has unknown function (e.g. a hypothesized gene) while the other is the database being searched for matches. We refer to the former as the query sequence of length m and the latter as the database of length n. Since the query is matched with only part of the database at a time, it is convenient to talk about scoring a possible alignment of the two sequences. Frequently, we are interested in the best possible matches of any subsequence of the query with the database, a process called local alignment. More precisely, an alignment of two sequences is a one-one correspondence between their characters, without reordering, but with the possibility of a number of insertions or deletions (i.e., gaps or indels).

The basis of alignment scoring is that character matches can be scored independently, and then combined into an alignment score. Each possible match has an independently generated score, with positive scores for exact or close matches and negative scores for mismatches. These scores are available a priori. We refer to the sequence of initial character- character scores as the Score-Sequence for the alignment. If no indels are considered, then the alignment is said to be un-gapped, and the alignment score is generated by summing the score sequence. Gaps are handled by adding a penalty per gap based on the length of the gap. Usually the first indel in a gap is assigned a larger penalty than its successors; various, generally simple, functions are used to generate gap penalties.
The methodology of scoring the alignments in both the algorithms is significantly different. However, the scoring scheme used is the same.

COMPUTATION OF SCORES

Match Score: Sum of the diagonal cell score and the score for a match (+1) or mismatch (-1).
Vertical Gap Score: Sum of the cell to above and the gap score (-1).
Horizontal Gap Score: Sum of the cell to the left and the gap score (-1).

FPGA Implementation of Biological Sequence Alignment Algorithm - Introduction


Biological Sequence Alignment Implemented using FPGA - Needleman_Wunsch Algorithm

Biological sequences like DNA and proteins show complex patterns of similarity to one another. In this regard, they mirror the external morphologies of the organisms in which they reside. You'll notice that birds, for example, show natural groupings. You don't have to be a biologist to see that ducks, geese, and swans comprise a reasonably natural group called the waterfowl, and that the similarities between ducks and geese seem too great to explain by mere coincidence. Biological sequences are no different. After all, the reason why ducks look like ducks and geese look like geese is because of their genes. Many molecular biologists are convinced that understanding sequence evolution is tantamount to understanding evolution itself.

Genetic sequences change over time due to three forces: mutation, natural selection, and genetic drift. By studying the similarity or dissimilarity between the sequences of various organisms, we can gain a deeper insight in to the evolution process. Numerous algorithms have been proposed in this regard, the best known being Needleman-Wunsch, Smith-Waterman and BLAST.

The length of typical Biological sequences is billions of characters. Naturally, the sequence alignment operation between two such sequences will be extremely, time and space consuming. This requirement has spawned several fast heuristic algorithms as well as hardware implementations.

The aim of this project is to implement the Needleman-Wunsch algorithm on FPGA. For Needleman-Wunsch algorithm, we use the Parallel Prefix method to reduce the dependencies between the calculations of adjacent cells of the matrix.

Thursday, February 2, 2012

How to post source code on blogger

http://codeformatter.blogspot.com/2009/06/about-code-formatter.html Paste your code in the box provided and generate the HTML. Copy the HTML code generated and paste it in your post. It worked :) Abishek

Sparse matrix - 3 tuple representation

 //program to 3-tuple representation  
 #include<stdio.h>  
 #include<unistd.h>  
 #define SIZE 3000;  
 void main()  
 {  
  int a[5][5],row,columns,i,j;  
  printf("Enter the order of the matrix(5*5)");  
  scanf("%d %d",&row,&columns);  
  printf("Enter the element of the matrix\n");  
  for(i=0;i<row;i++)  
   for(j=0;j<columns;j++)  
    {  
      scanf("%d", &a[i][j]);  
    }  
  printf("3-tuple representation");  
  for(i=0;i<row;i++)  
   for(j=0;j<columns;j++)  
    {  
      if(a[i][j]!=0)  
       {  
        printf("%d %d %d", (i+1),(j+1),a[i][j]);  
       }  
    }  
  getchar();  
 }  

Cache Blocking in C

 /*:) cache_blk_mat_mult :) have fun */  
 // header files  
 #include<unistd.h>  
 #include<stdio.h>  
 int min(int, int);  
 //main func  
 int main(int argc, char *argv[])  
 {  
  int jj,kk,j,k,i;  
  int B = 5, N = 10,r;  
  int x[N][N],y[N][N],z[N][N];  
  for(i=0;i<N;i++)  
   for(j=0;j<N;j++)  
    {  
      printf("%d%d ",i,j);  
      x[i][j]=5;  
      y[i][j]=6;  
      z[i][j]=7;  
    }  
  for(jj=0;jj<N;jj=jj+B)  
   {  
    printf("jj = %d\n",jj);  
    for(kk = 0;kk<N;kk=kk+B)  
      {  
       printf("kk = %d\n",kk);  
       for(i=0;i<N;i++)  
        {  
         printf("i = %d\n",i);  
         for (j = jj; j < min(jj+B-1,N); j++)  
           {  
            r = 0;  
            for (k = kk; k < min(kk+B-1,N); k++)  
             {  
              printf("r = %d + y[%d][%d]*z[%d][%d]\n",r,i,k,k,j);  
              r = r + y[i][k]*z[k][j];  
             }  
            x[i][j] = x[i][j] + r;  
            printf("x[%d][%d]\n",i,j);  
            getchar();  
           }  
        }  
      }  
   }  
  return 0;   
 }  
 int min(int a, int b)  
 {  
  if(a<b)  
   return a;  
  else  
   return b;  
 }  

P Threads in C

 #include<pthread.h>  
 #include<stdio.h>  
 #include<stdlib.h>  
 #define NUM_THREADS 5  
 int abi = 100;  
 void *PrintHello(void *threadid)  
 {  
  long tid;  
  long tid2;  
  tid = (long)threadid;  
  tid2 = tid;  
  printf("Hello World! its me thread # %ld\n",tid);  
  printf("incrementing tid2 in thread # %ld\n%ld\n",tid,tid2*2);  
  pthread_exit(NULL);  
 }  
 int main(int argc, char *argv[])  
 {  
  pthread_t threads[NUM_THREADS];//array of pthreads  
  int rc;  
  long t;  
  abi++;  
  for(t=0;t<NUM_THREADS;t++)  
   {  
    printf("In main: creating thread %ld\n",t);  
    rc = pthread_create(&threads[t],NULL,PrintHello, (void *)t);  
    if(rc)  
      {  
       printf("Error code returned from code pthread_create() is %d\n",rc);  
       exit(-1);  
      }  
   }  
  pthread_exit(NULL);  
 }  

System function in C

System function in C – Used to run shell commands inside a C program.
 /*:) system :) have fun */  
 // header files  
 #include<unistd.h>  
 #include<stdio.h>  
 #include<stdlib.h>  
 //main func  
 int main(int argc ,char *argv[])  
 {  
  printf("running ps with system() function\n");  
  system("ps -ax &");  
  printf("done");  
  return 0;   
 }  

Reading a file using fread

 /*:) fread_fileV1 :) have fun */  
 // header files  
 #include<unistd.h>  
 #include<stdio.h>  
 //main func  
 int main(int argc, char *argv[])  
 {  
  FILE *fp;  
  char buff[11];  
  int pid;  
  fp=fopen("baby","r");  
  pid=fork();  
  if(pid==0)  
  {  
  printf("Initial child pointer is %L\n",ftell(fp));  
  fread(buff,sizeof(buff),1,fp);  
  buff[10]='\0';  
  printf("child read is %s\n",buff);  
  printf("the child pointer after reading is %l\n",ftell(fp));  
  printf("the child is exiting \n");  
  }  
  else  
  {  
  wait(0);  
  printf("initial parent pointer is %l\n",ftell(fp));  
  fread(buff,sizeof(buff),1,fp);  
  buff[10]='\0';  
  printf("parent read is %s\n",buff);  
  printf("the parent poitner after read is %l\n",ftell(fp));  
  printf("parent exiting");  
  }  
  return 0;   
 }  

File handler

files are the means to communicate between processes files are shared between them. when a read is made in one of the processes it moves the file posiion handler.here the read in the child process moves the file position pointer ten positions. when the read is made from the parent process the file is read from the eleventh position hence different things are displayed in the screen.
 // header files  
 #include<unistd.h>  
 #include<stdio.h>  
 #include<fcntl.h>  
 //main func  
 int main(int argc, char *argv[])  
 {  
      int fp,pid;  
      char buff[11];  
      fp=open("baby",O_RDONLY);  
      pid = fork();  
      if(pid ==0)  
      {  
           printf("child begins\n");  
           read(fp,buff,10);  
           buff[10]='\0';  
           printf("child read\n");  
           puts(buff);  
           printf("the file handle is :%d\n",lseek(fp,0,1));  
           printf("child ends\n");  
      }  
      else  
      {  
           wait(0);  
           read(fp,buff,10);  
           buff[10]='\0';  
           printf("parent read\n");  
           puts(buff);  
           printf("the file handle is :%d\n",lseek(fp,0,1));  
           printf("parent exiting\n");  
      }  
      return 0;   
 }  

Fork - Program to fork a child process 2

 /*:) fork_test :) have fun */  
 // header files  
 #include<unistd.h>  
 #include<stdio.h>  
 #include<sys/types.h>  
 //main func  
 int main(int argc, char *argv[])  
 {  
  pid_t new_pid;  
  char *message;  
  int n;  
  printf("fork program starting");  
  new_pid = fork();  
  switch(new_pid)  
   {  
   case -1:  
    printf("fork failed");  
    exit(1);  
   case 0:  
    message = "this is child";  
    n=5;  
    break;  
   default:  
    message = "this is parent";  
    n=3;  
    break;  
   }  
  for(;n>0;n--)  
   {  
    puts(message);  
    sleep(1);  
   }  
  exit(0);  
 }  

Fork - Program to fork a child process

 /*:) forktestV1 :) have fun   
 1. the variable i is declared common to both the parent as well as the child process. but parent and child each get their own version of the variable with the samme name.incrementing the variable in the child does not increment the variabble in the parent.   
 2. also if a pointer is used the value is not incremented. parent and child both gets a pointer to a variable.  
 3. when declared globally also does not increment the variable. each process gets a global variable.  
 4. printing the addres in child and parent gives the same value for the address  
 5. how it works is the child process executes and after its tiem slice it is moved to the swap space. the parent then executes and after its time slice it is moved to the swap space. the address of the variable remains the same.  
 */  
 // header files  
 #include<unistd.h>  
 #include<stdio.h>  
 //main func  
 int i=10;  
 int main(int argc, char *argv[])  
 {  
      int pid;  
      pid=fork();  
      if(pid==0)  
      {  
           printf("the value of i in the child process is %d\n",i);  
           i=i+10;  
           printf("the value of i inside child after incrementing is %d\n",i);  
           printf("the address of i in child is %u\n ",&i);  
           printf("child terminated\n");  
      }  
     else  
      {  
           wait(0);  
           printf("the address of i in parent is %u\n ",&i);  
           printf("the value of i in the parent process is %d\n",i);  
      }  
      return 0;  
 }  

Program to automate pre program writing process

When writing a program, usually we need to create a folder and create a .C file inside the folder. In the .C file, header files need to be added. This program automates this and opens the .C file in Emacs for writing the program.
 /*usage: ./prog <directory name> <file name w/o extension.c>  
 location: mybin  
 need: to create a directory and program file to write programs. it creates the directory in the default progs folder at a single stroke. this program also writes a default code within the file  
 creats a dir <directory name> and inside it creates a file <filename> no need to add extension .c as it automatically appends it  
  */  
 #include<unistd.h>  
 #include<sys/types.h>  
 #include<sys/stat.h>  
 #include<error.h>  
 #include<errno.h>  
 #include<dirent.h>  
 #include<stdio.h>  
 #include<stdlib.h>  
 #include<wait.h>  
 int main(int argc, char *argv[])  
 {  
  int flag;  
  DIR *dp;  
  FILE *fp;  
  char s[256];  
  pid_t pid;  
  int ret;  
  if((flag=chdir("/home/abishek/progs"))==-1)//default dir for progs in system  
   {  
    error(1,errno,"cannot open the progs folder bub");  
   }  
  if((flag=mkdir(argv[1],S_IRUSR|S_IWUSR|S_IXUSR))==-1)//creates dir with gn name  
   {  
    error(1,errno,"Directory not created bub");  
   }  
  dp=opendir(argv[1]);  
  if((flag=chdir(argv[1]))==-1)  
   {  
    error(1,errno,"Error in changing Dir bub");  
   }  
  sprintf(s,"%s.c",argv[2]);//appends.c to the file name given  
  char *arg[]={"emacs","-nw",s,NULL};  
  fp=fopen(s,"w");  
  fprintf(fp,"/*:) %s :) have fun */\n",argv[2]);//default code written into the program  
  fprintf(fp,"// header files\n");  
  fprintf(fp,"#include<unistd.h>\n#include<stdio.h>\n");  
  fprintf(fp,"//main func\nint main(int argc char, *argv[])\n{\nreturn 0; \n}");  
  if(fclose(fp)==-1)  
   {  
    error(1,errno,"cannot close file bub");  
   }  
  ret=execv("/usr/bin/emacs",arg);  
  if(ret==-1)  
   {  
    error(1,errno,"error opening emacs");  
   }  
 }  

Core Dump - Check for core dump

 /* core_dumpV1 */  
 // header files  
 #include<unistd.h>  
 #include<stdio.h>  
 //main func  
 void main(int argc,char *argv[])  
 {  
      int i,pid,exitstat,status,j=0;  
      pid=fork();  
      if(pid==0)  
      {  
           i=10/j;  
      }  
      else  
      {  
           wait(&status);  
           if(status&80)  
                printf("our core dumped\n");  
           else  
            printf("core dumped my foot\n");  
      }  
 }  

Syncing Processes 2

Process Synchronization method 2
 /* proc_syncV2 */  
 // header files  
 #include<unistd.h>  
 #include<stdio.h>  
 int main()  
 {  
      int i,pid,exitstat,status;  
      pid=fork();  
      if(pid==0)  
      {  
       sleep(10);  
       exit(3);  
      }  
      else  
      {  
           wait(&status);  
           printf("status returned by the process is %d\n",status);  
           printf("status & 0xff = %d\n",status&0xff);  
           if(status&0xff!=0)  
           {  
                printf("signal is interrupted\n");  
           }  
           else  
           {  
                exitstat=(int)status/256;  
                printf("Exit status from %d was %d\n",pid,exitstat);  
           }  
      }  
      return 0;  
 }  

Syncing processes

Syncing Processes – When running two or more processes, synchronization is important. Two versions of synchronization is available here.
 /* proc_syncV1 */  
 // header files  
 #include<unistd.h>  
 #include<stdio.h>  
 //main func  
 int main(int argc, char *argv[])  
 {  
      int i=0, pid,cpid,dip;  
      printf("ready to fork\n");  
      pid=fork()  
      if(pid==0)  
      {  
           printf("1st child process id is %d\n",getpid());  
               printf("first child terminating from memory\n");  
      }  
      else  
      {  
           dip=fork();  
           if(dip==0)  
           {  
                printf("2nd child process id is %d\n",getpid());  
                printf("seoncd child is terminating from memory\n");  
           }  
           else  
           {  
                cpid=wait(0);  
                printf("child with pid %d died\n",cpid);  
                cpid=wait(0);  
                printf("child wiht pid %d fied\n",cpid);  
                printf("i am parent %d \n",getpid());  
           }  
      }  
 }  

Playing a playlist of MP3 files using the mpg player in Linux

 /* mpg3V1 */  
 // header files  
 #include<unistd.h>  
 #include<stdio.h>  
 #include<error.h>  
 #include<errno.h>  
 //main func  
 int main(int argc, char *argv[])  
 {  
  File *playlist;//file pointer for playlist  
  char filename[256];//getting filename from the playlist  
  pid_t pid_mpg;//the pid of child process for launching mpg player  
  playlist=fopen("/home/abishek/Music/playlist.txt","r");//reads the playlist intp fp "playlist"  
  if(playlist==NULL)  
   {  
    error(1,errno,"cannot open the playlist");  
   }  
  while(filename!=NULL)  
   {  
    filename=fgets(filename,sizeof(filename),playlist);//gets first filename from playlist  
    if(filename==NULL)//checking end of file  
      {  
       printf("end of file reached");  
       return 0;  
      }  
    pid_mpg=fork();//branching child process cp->mpg123 pp->wait  
    if(pid_mpg==-1)  
      {  
       error(1,errno,"cant fork child process");  
      }  
    else if(pid==0)//child process  
      {  
       char *arg[]={"mpg123",filename,NULL};  
       //launch mpg player here  
      }  
    else  
      {  
       //parent process. waits for child to complete playing  
       int stat,ret;  
       ret=wait(&stat);  
       //insert code here corresponding to child process  
      }}  

C programs - playing MP3 files using C

 /* snd_testV1 */  
 // header files  
 #include<unistd.h>  
 #include<stdio.h>  
 #include<error.h>  
 #include<errno.h>  
 //main func  
 int main(int argc, char *argv[])  
 {  
      FILE *song;  
      FILE *snd_op;  
      char song_data[256];  
      song=fopen("/home/abishek/Music/PPRY.mp3","r");  
      snd_op=fopen("/dev/dsp","w");  
      if(song==NULL)  
       {  
        error(1,errno,"cannot open file");  
       }  
      if(snd_op==NULL)  
       {  
        error(1,errno,"cannot open player");  
       }  
      while(song_data!=NULL)  
       {  
        fgets(song_data,sizeof(song_data),song);  
        if(song_data==NULL)  
         {  
           //eof reached return  
           return 0;  
         }  
        fprintf(snd_op,"%s",song_data);  
       }  
      fclose(snd_op);  
      fclose(song);  
 }  

C Program - Copy data from a file to another

copy.c SYNTAX: copy file1 file2 file1 - file to be copied file2 - file on to which file1 is to be copied copies file1 to file2 after creating file2. if already existing it rewrites the oldfile with the new file remarks 1. error checking is done 2. if file1 is not present it shows error 3. working for files in the same directory as well as files in other directories 4. copies in blocks of 1024 bytes buffer size is 1024 5. if file2 is not present it creates a file if already present it over writes
 #include<unistd.h>  
 #include<fcntl.h>  
 #include<stdlib.h>  
 #include<sys/stat.h>  
 #include<stdio.h>  
 #include<error.h>  
 #include<errno.h>  
 int main(int argc, char *argv[])  
 {  
      char block[1024];  
      int in, out;  
      size_t rlen,wlen;  
      char *filein;  
      char *fileout;  
      filein=argv[1];  
      fileout=argv[2];  
      //     printf("%s\n",filein);  
      //     printf("%s\n",fileout); //prints the file names of file 1,2  
      in=open(filein,O_RDONLY);  
      out=open(fileout,O_WRONLY|O_CREAT,S_IRUSR|S_IWUSR);//creates o/p file if not present  
      //     printf("File descriptor\n1.IN file:\t%d\n2.OUT file:\t%d\n",in,out); //prints the file descriptors   
      if(in==-1|out==-1)  
       {  
        if(in==-1)  
         error(1,0,"input file not present");// if ip file not present  
        if(out==-1)  
         error(1,0,"output file cannot be opened");//op file problem  
        exit(1);  
       }  
      while((rlen=read(in,block,sizeof(block)))>0)  
      {  
       //printf("read length:\t%d\n",rlen);  
           write(out,block,rlen);  
      }  
      printf("Copy from %s to %s Successful\n",argv[1],argv[2]);  
      return 0;  
 }  

C Programs - Hello World

I am uploading source code of few commonly used functions. The programs are written in C and I used gcc to compile and test these programs. I hope you find them useful. As it is always the case, I am starting with hello world
 #include<stdlib.h>  
 #include<stdio.h>  
 void main()  
 {  
  printf("HEllo World");  
 }