Intergenic subset organization within a set of geographically-defined viral sequences from the 2009 H1N1 influenza A pandemic
Thompson, William A.
Weltman, Joel K.
MetadataShow full item record
We report a bioinformatic analysis of the datasets of sequences of all ten genes from the 2009 H1N1 influenza A pandemic in the state of Wisconsin. The gene with the greatest summed information entropy was found to be the hemagglutinin (HA) gene. Based upon the viral ID identifier of the HA gene sequence, the sequences of all of the genes were sorted into two subsets, depending upon whether the nucleotide occupying the position of maximum entropy, position 658 of the HA sequence, was either A or U. It was found that the information entropy (H) distributions of subsets differed significantly from each other, from H distributions of randomly generated subsets and from the H distributions of the complete datasets of each gene. Mutual information (MI) values facilitated identification of nine nucleotide positions, distributed over seven of the influenza genes, at which the nucleotide subsets were disjoint, or almost disjoint. Nucleotide frequencies at these nine positions were used to compute mutual information values that subsequently served as weighting factors for edges in a graph net-work. Seven of the nucleotide positions in the graph network are sites of synonymous mutations. Three of these sites of synonymous mutation are within a single gene, the M1 gene, which occupied the position of greatest graph centrality. It is proposed that these bioinformatic and network graph results may reflect alterations in M1-mediated viral packaging and exteriorization, known to be susceptible to synonymous mutations.