A comparison of H5N1(A) hemagglutinin sequences showing the sialic acid receptor region, and the two single amino acid changes responsible for transmissibility between humans.
In a single short region of the gene for the hemagglutinin (HA) gene of the H5N1(A) virus (the “H” in H5N1) there are two changes which are most likely to be responsible for transmissibility between humans, as opposed to transmissibility between birds and the occasional, rare infection of a human. Hemagglutinin is one of the10 proteins produced by the Influenzavirus A genome, and the principal protein responsible for the ability of the virus to enter host cells.
Very few of the amino acids coded for by the hemagglutinin gene are constant between Influenzavirus A strains. There is one region however, that has certain constant features: A few amino acids that do not mutate between strains, and this region seems to be the sialic acid receptor region. Sialic acid protrudes from animal cell outer membranes, and provides the binding site for the hemagglutinin molecule, which allows the virus to be engulfed by target cells, and also allows the virus to release its genetic material once they have been engulfed. 1 2 Siallic acids are found on many cells and substances in the body, but which sialic acid is involved and how it is attached to other molecules is an important part of whether a flu virus will be able to infect the cell. One protein with a sialic acid linked to galactose by an alpha-2,3 linkage (SAalpha2,3Gal) is expressed primarily on the surfaces of cells in the intestines of birds, and, it is inferred, in human lung cells deep in the alveoli of the lungs. Another related protein, with a sialic acid linked to galactose by an alpha-2,6 linkage (SAalpha2,6Gal) is expressed in the mammalian and human cells of the mucous membranes of the upper respiratory tract. 3 The complete distribution of the SAalpha linkages in the body isn’t known completely, however.
As has been found by comparisons of this region among various Influenzavirus A strains, the key difference between those that prefrentially infect humans vs. those that preferentially infect birds are changes in two amino acids. Both are found next to two invariant glutamines: when the virus preferentially infects birds, one is a glutamate, and another is a glycine. If one or the other changes to an aspartate, the virus preferentially infects swine, but can still infect birds and humans, although only weakly transmissible among them. If both locations mutate to code for asparate, the virus becomes much more transmissible among humans. This change appears to have no direct relationship to the pathogenicity among the hosts, which is coded for by other proteins, among them the three polymerase chains. 4
The following are NCBI protein-protein Blast queries of some of the published Influenzavirus A hemagglutinin sequences. These queries search the amino acids of the relevant segment of the encoded hemagglutin protein, with the constant amino acids, along with the two asparates being searched for, but with the variable sites being “masked”, that is being made irrelevant for the search. The searched for amino acids are in uppercase, the masked amino acids are in lowercase. The sequence used is that of the consensus sequence for the (A/Bar-headed Goose/Qinghai/75/05(H5N1)) type variant of avian H5N1, which is generally identical to that currently circulating in the Near East, Africa, and Europe, and not very different from the strains that have ciruclated in Southeast Asia.
One Blast query will search the H5N1 hemagglutinin sequences, including those that have infected humans and other mammals, and another query will search H1N1 hemagglutinin sequences, including the (A/Brevig Mission/1/1918(H1N1)) virus which was recovered from the body of an Alaskan Native woman buried in the permafrost who died in the 1918 H1N1 pandemic. As can be seen from the query of the H1N1 strains, those with asparates in both locations almost exclusively infected humans, those with just one aspartate infected swine, and those with a glutamate and a glycine are contagious among birds.
First, an explanation of these NCBI Blast queries:
For those programs that use amino acid query sequences (BLASTP and TBLASTN), the accepted amino acid codes are:
A alanine P proline B aspartate or asparagine Q glutamine C cystine R arginine D aspartate S serine E glutamate T threonine F phenylalanine U selenocysteine G glycine V valine H histidine W tryptophan I isoleucine Y tyrosine K lysine Z glutamate or glutamine L leucine X any M methionine * translation stop N asparagine - gap of indeterminate length
The masked query sequence being searched for is as follows:
HHPndaaDQtrlYqnpttyisvgtstlnqrlvPkIAtrskvnDQsG
The alanine (“A”) in the amino acid sequence “PkIA” is not actually invariant among all Influenzavirus A human strains, but is necessary to have this to find “hits” (matches) in H5N1 sequences, regardless of how closely the full amino acid pattern is matched. If this is changed to a lowercase “a”, then only sequences that actually have aspartates (“D@”) in the “@DQ” locations will be found.
To run these Blast queries, click on the link, and in the resulting NCBI Blast page, click on the
button at the very bottom of the page (not the first button after “Now:”, but below the “Options” and “Format” sections).
This will bring up a results page with the Result ID and, the estimated time to the completion of the query, and which will automatically refresh until the query is done. Please be patient - when the NCBI Blast servers are very busy this may take some time to complete.
NCBI Blast query of 250 H5N1(A) hemagglutinin sequences to search for human transmissibility
The result page will show the following:
- A graphical color-coded table showing the lengths of the matching regions of NCBI accessions that were found - the Distribution of the 250 Hits on the Query Sequence,
- A list of sequences producing significant alignments, with links to the sequences themselves in the left-hand column,
- And below this, a table of alignments.
This alignment table displays the query in the first row, with the masked (non-significant) amino acids displayed in lowercase and in red, and the actual amino acids being searched for in uppercase and in black.
Each amino acid in a hit that matches the query is shown as a “.”, and only those that are different are displayed. These alignments are flanked by the numbers of the starting and ending position of the string of amino acids in that sequence.
Notice that these numbers vary between sequences, which shows that the active sialic acid binding site can “move around” within the hemagglutinin protein chain.
We also see here the various mutations that the active site has undergone between different virus samples. These are more or less random, but show that the virus has the ability to mutate rather rapidly. This is because Influenzaviruses are RNA viruses - RNA consists of a single strand of nucleotides that code for amino acides, rather than the double helix, two strands of the DNA molecule. This means that Influenzaviruses have no “error-correction” mechanism while creating copies of themselves inside the host cells, and therefore are free to mutate rather rapidly.
Two differences in all H5N1(A) samples (hopefully!) show up:
An E, a glutamate, at position 8, and a G, a glycine, at position 43. It is precisely at these positions that if either become a D, an aspartate, that the virus has the ability to be transmissible between swine, and if both mutate to Ds, aspartates, the virus gains the ability to be transmissible between humans.
This clearly illustrates the ease with which the virus mutates, and the relative simplicity of the mutation required to turn avian H5N1(A) into a human pandemic.
The query will also show which strains of the virus, if any, have begun to mutate in this direction, in one or the other location.
Now, to see how this happened during the 1918 pandemic, which also originated with an avian Influenzavirus, we can run the same query against the H1N1(A) sequences in the database:
Those sequences that are labelled with a geographic location, and not with a species, are human flu viruses.
- (A/South Carolina/1/18 (H1N1)) is a sequence recovered from preserved lung tissue of a soldier who died in the 1918 pandemic.
One can see that those sequences that have Ds, aspartates, in both locations are human sequences, as above. While those viruses that do not have Ds, asparates, in both locations can of course infect humans occasionally, as can be seen here, these are not the H1N1(A) sequences that were sampled from the 1918 pandemic and the yearly flu seasons thereafter. Those that have other amino acids in both locations, especially a G, glycine, at position 43, are generally samples from swine. Those that have both an E, glutamate, at position 8, and a G, glycine, at position 43, are almost always samples from birds.
One can practically predict, just by looking at the sequence, which samples are from humans, swine, and birds. This holds true for all other Influenzavirus strains - the same query can be run against these by changing the “Limit by entrez query” field in the “Options for advanced blasting” section of the query form to H3N2, or any other Influenzavirus strain designation.
There has been speculation that other changes are required to make an avian flu virus transmissible between humans, but no evidence has been found of this in the hundreds of viruses sequenced so far.
References
1. de Lima et al. (1995) Target cell membrane sialic acid modulates both binding and fusion activity of influenza virus. Biochim Biophys Acta. 1995 Jun 14;1236(2):323–30. Abstract
2. Chu VC and Whitaker GR (2004) Influenza virus entry and infection require host cell N-linked glycoprotein. Proc Natl Acad Sci U S A. 2004 December 28; 101(52): 18153–18158. Link
3. Shinya et al. (2006) Avian flu: influenza virus receptors in the human airway. Nature. 2006 Mar 23;440(7083):435–6. Link
4. Salomon et al. (2006) The polymerase complex genes contribute to the high virulence of the human H5N1 influenza virus isolate A/Vietnam/1203/04. J Exp Med. 2006 Mar 20;203(3):689–97. Link
See also H5N1 Viral Sequences

