The progression to active tuberculosis (TB) is thought to depend on environmental, bacterial and host factors. We are using a genomics approach to characterize the independent roles of host and pathogen genome variation, as well as their interaction, in active TB.
We enrolled N=2091 sputum smear positive pulmonary TB patients in Vietnam (2009-2011) resulting in N=1760 host and pathogen paired DNA samples. Firstly, we conducted a human genome-wide association study (GWAS) of TB cases and controls. 582,559 SNPs were tested for association with TB. No SNPs surpassed genome-wide significance (P <5 x10-8) although there was a suggestion of SNP associations on chromosome 6. Secondly, we performed whole genome sequencing of N=1635 Mycobacterium tuberculosis (Mtb) isolates. 73,718 high quality SNPs were used to reconstruct a maximum likelihood phylogeny showing lineage 2 as dominant (N=1055, 64.5%) and Beijing subgroup lineage 2.2.1 as the most prevalent clone (N=957, 59%). Our data suggests that Beijing is common amongst younger people, is displacing lineage 1, and is dominant due to frequent transfer into Vietnam and enhanced transmissibility. Investigating homoplasic sites we have identified a mutation in EsxW which could potentially contribute to the enhanced transmission of Beijing. We also performed a bacterial GWAS to identify known and novel drug resistance mutations in our Mtb dataset.
Finally, combining host and pathogen genetics data we looked for association of human SNPs stratified by the infecting pathogen lineage. The potential association on chromosome 6 was confined to patients infected with lineage 1. Replication of the potential MHC class II protective association is ongoing in an independent TB cohort (N=1000) and we are expanding our unique host and pathogen datasets to identify novel aspects of host and pathogen biology associated with TB.
Funded by NHMRC Australia, A*STAR Singapore, Wellcome Trust UK. We thank the heads/staff of district TB units in HCMC who recruited patients.