H5N1 Open Science Initiative

This initiative has been started by Guenter and Rudi however we now need more help from the scientific community. This is the area where we combine our notes and results.

Each H5N1 virus consists of 8 segments: HA,M,NA,NP,NS,PA,PB1,PB2. Segments M,NS have 2 genes , the others 1, 10 genes in total. Each segment consists of 800–2500 nucleotides, which are written as letters from {a,c,g,t}. These about 13000 letters completely determine the virus.

For a short explanation see CIDRAP’s Pandemic Flu Overview

As of 20.Jan.2006 there are 1072 complete segments from 350 different H5N1-virii in the database at the National Center for Biotechnology Information as well as lots of partial segments whose full nucleotide-sequences aren’t yet published.

The number of nucleotides in the segments are:

HA 1678 - 1780 nucleotides 111 complete segments
M 953 - 1043 nucleotides 168 complete segments
NA 1351 - 1459 nucleotides 156 complete segments
NP 1498 - 1570 nucleotides 104 complete segments
NS 810 - 903 nucleotides 196 complete segments
PA 2152 - 2251 nucleotides 97 complete segments
PB1 2233 - 2373 nucleotides 80 complete segments
PB2 2281 - 2342 nucleotides 94 complete segments

The names of the virii and the nucleotid-sequences can be downloaded here (zip file).

The sequence-data from all published influenza virii can be found on NCBI’s ftp server (fna format, size: 31MB).

3 nucleotids form one of 21 amino-acids and the properties of the virii indeed only depend on the sequence of the aminoacids. A mutation, a replacement of one nucleotid, may not change the amino-acid, but may still demonstrate some evolution of the virus, so we keep the nucleotid-sequences here and not the aminoacid-sequences.

See also at the Global Voices blog a report about the computed evolutionary tree of 30 selected of these sequences.

Rudi has run the new h5n1f.zip data and gotten the following new result: png | pdf | postscript

