Comparative genomic analysis of ten clinical Streptococcus pneumoniae collected from a Malaysian hospital reveal 31 new unique drug-resistant SNPs using whole genome sequencing

Background Streptococcus pneumoniae or pneumococcus is a leading cause of morbidity and mortality worldwide, specifically in relation to community-acquired pneumonia. Due to the overuse of antibiotics, S. pneumoniae has developed a high degree of resistance to a wide range of antibacterial drugs. Methods In this study, whole genome sequencing (WGS) was performed for 10 clinical strains of S. pneumoniae with different levels of sensitivity to standard antibiotics. The main objective was to investigate genetic changes associated with antibiotic resistance in S. pneumoniae. Results Our results showed that resistant isolates contain a higher number of non-synonymous single nucleotide polymorphisms (SNPs) as compared to susceptible isolates. We were able to identify SNPs that alter a single amino acid in many genes involved in virulence and capsular polysaccharide synthesis. In addition, 90 SNPs were only presented in the resistant isolates, and 31 SNPs were unique and had not been previously reported, suggesting that these unique SNPs could play a key role in altering the level of resistance to different antibiotics. Conclusion Whole genome sequencing is a powerful tool for comparing the full genome of multiple isolates, especially those closely related, and for analysing the variations found within antibiotic resistance genes that lead to differences in antibiotic sensitivity. We were able to identify specific mutations within virulence genes related to resistant isolates. These findings could provide insights into understanding the role of single nucleotide mutants in conferring drug resistance. Electronic supplementary material The online version of this article (10.1186/s12929-018-0414-8) contains supplementary material, which is available to authorized users.


Background
Streptococcus pneumoniae is a Gram-positive human pathogen naturally inhabiting the human nasopharynx (which considered to be the reservoir as this pathogen has no animal or insect vectors) and is responsible for invasive and noninvasive diseases including meningitis, pneumonia, bacteremia, otitis media, and sinusitis [1][2][3].
According to the World Health Organization (WHO), this bacterium is responsible for 1.6 million deaths annually, including 0.7-1 million in children less than 5 years old and mostly in developing countries [4,5]. In the United States, the annual number of deaths caused by pneumococcal pneumonia or meningitis is 40,000 [6,7]. In Asia, S. pneumoniae is the major cause of acute respiratory infections (ARIs) in children under 5 years old [8]. Five of the 10 countries with the largest number of deaths caused by pneumococcal infections in children below 5 years old are located in Asia, including India, China, Bangladesh, Pakistan, and Afghanistan [9]. Currently, more than 93 different S. pneumoniae serotypes have been identified based on the immunochemical differences in the capsular polysaccharides [9][10][11].
The mechanism used by S. pneumoniae to become pathogenic is still poorly understood, and most likely it depends on the interaction between pneumococcal virulence factors and the host's immunological response [12,13]. For decades, penicillin has been the principle option for the treatment of infections associated with S. pneumoniae [14]. The main targets for penicillin and other β-lactams antibiotics are the penicillin-binding proteins (PBPs). These enzymes are essential for the synthesis of the bacterial cell wall. β-lactams act by binding to these enzymes and reducing peptidoglycan synthesis and remodeling. Subsequently, this leads to disruption of cell wall integrity and cell lysis [15,16].
Like other Gram-positive bacteria, S. pneumoniae has developed significant resistance over the last few decades against a wide range of antibiotics due to extensive over-use. Mutating the target proteins such as PBPs to reduce their affinity to β-lactam antibiotics is the main resistance mechanism developed by pneumococci to resist β-lactams [17]. Moreover, S. pneumoniae has developed powerful resistance tools against erythromycin and other macrolides by modifying the target site (23S ribosomal RNA) using the erm gene or by efflux of the antibiotic from the bacterial cell through acquisition of the mef gene [18]. Tetracyclines are bacteriostatic agents that stop bacteria from reproducing by binding to the 30S subunit of the bacterial ribosome. Pneumococcal resistance to tetracycline occurs through ribosomal protection mediated by tet(O) and tet(M) genes [19]. There has been a huge increase in the number of penicillin-resistant pneumococcal isolates over the past three decades, and many strains are now resistant against common antibacterial drugs such as βlactams, macrolides, and fluoroquinolones [20,21]. A study conducted by Hackel and his co-workers on 2173 worldwide pneumococcal isolates, showed that 33.3% of the isolates were resistant to penicillin, 22.9% to erythromycin, and 16.2% to both erythromycin and penicillin [22].
Whole genome sequencing (WGS) has become a powerful tool for drug development by allowing researchers to investigate the mode of action of antibiotics and the mechanisms involved in bacterial resistance [23,24]. Additionally, WGS can be utilised to investigate the evolution of resistance in real-time under a range of conditions [25]. In this study, we report the whole genome sequencing for 10 pneumococcal isolates with a range of susceptibility and resistance to different antimicrobial drugs to elucidate the association between antibiotic resistance and the underlying genetic changes.

Bacteria and MIC determinations
Ten pneumococcal clinical strains were collected from the microbiological specimens of patients cared for at University of Malaya Medical Centre (UMMC) over a three-year period from September 2010 to May 2012 (Table 1). All the isolated strains were stored in brain heart infusion (BHI) broth at − 80°C. Pneumococcal isolates were grown in blood agar containing 5% defibrinated sheep blood as previously described [26]. Cultures were incubated for 16-24 h at 37°C under 5% CO 2 . Multiplex PCR was performed to identify each strain serotype as previously described [27]. Minimal inhibitory concentration (MIC) was determined following the broth microdilution assay as described by the Clinical and Laboratory Standards Institute (CLSI) guidelines. Cation-adjusted Mueller-Hinton broth with lysed horse blood was inoculated with a 5 × 10 5 cfu/mL bacterial suspension. The MIC was recorded as the lowest dilution showing no visible growth. All of the results were obtained from three independent trials.

Library preparation and whole genome sequencing
A DNeasy Blood & Tissue Kit (Qiagen) was used to extract genomic DNA from pneumococcal cells cultured overnight following the manufacturer's guidelines. Whole genome sequencing was performed using the Illumina HiSeq 2000 platform consisting of 1 lane 100 bp paired-end reads. Briefly, Covaris S2 was used to fragment all genomic DNA at the temperature of 5.5 to 6°C for 40 s. The fragmented DNAs were ends repaired, added with dA base and ligated with Illumina indexed adapters. Invitrogen 2% agarose E-gel was used for size selections of the samples. The selected DNA fragments with adapter molecules on both ends underwent ten cycles of PCR for amplification of prepared material. The Low quality bases were removed with a quality score filter of ≥ 30 using PRINSEQ version 0.20.3 [28].

Assembly
Assembly was performed utilising SPAdes assembler version 3.8.1 [29] by using metaSPAdes option specific for metagenome assemblies. Assembler was run using iterative kmer lengths ranging from 27 to 77. The 10 assembled genomes were compared to the reference genome S. pneumoniae TIGR4 (NC_003028.3) using MetaQuast [30].

Gene prediction and clustering
Genes prediction on the draft assemblies was performed by using the Prokka (Prokaryotic annotation) tool. Functionally, Prokka predicts genes based on available annotation information such as CDS and proteins. It builds HMM databases which are searched by using HMMER3. Prokka was run using customised parameters that were set to annotate against reference genome S. pneumoniae TIGR4 with an evalue of 1e-10. To create gene clusters among 10 isolates and the reference genome all the amino acid sequences in fasta format were retrieved. All proteins were subjected for BLASTp (e.value <1e-5) against the same set of sequences in order to perform all versus all blast. A connection (edge) between two genes was assigned if more than one third of the region aligned to both genes. An h-score (0 to 100) was used to weight the similarity (edge). For two genes G1 and G2, the Hscore was defined as score (G1G2) / max (score (G1G1), score (G2G2); the score used here was the BLAST raw score. Gene families were identified by using clustering by Hcluster_sg [31]. We used the average distance for the hierarchical clustering algorithm, with the parameters of minimum edge weight set to 5 and the minimum edge density (total number of edges / theoretical number of edges) set to 0.35.

Variant calling and phylogeny
Single nucleotide polymorphisms (SNPs) were identified using kSNP3 program (version 3.021) which identifies pan-genome SNPs in a set of genome sequences and builds a phylogenetic tree based on the SNPs [32]. kSNP3 was run using the standard mode of SNP detection and annotation using S. pneumoniae TIGR4 as reference with Kmer size of 11. Kmer was calculated by Kchooser program which accurately defines a kmer size based on the draft genome assemblies. Phylogeny trees are parsimony trees based on consensus trees from different samples. Although parsimony trees do not define evolution lineage, they do help to define the nearer samples based on changes per number of SNPs. A complex heat map package from Bioconductor was used to generate heat maps in R. Clusters are predicted using Euclidean distance method.

Statistical analysis
Statistical analysis testing the difference in SNP number between the antibiotic resistant and susceptible isolates was performed using two-sample Student's t-test with a significant level at p < 0.05.

Results
Selection and whole-genome sequencing of S. pneumoniae Ten isolates were selected from a larger collection of pneumococcal isolates according to their susceptibility to four different antibiotics: penicillin, cefotaxime, erythromycin, and tetracycline. Table 2 summarises the MICs for all 10 isolates. Isolates SPS1, SPS2, and SPS3 were non-susceptible to all antibiotics; isolate SPS4 was susceptible to penicillin, cefotaxime, and erythromycin; SPS5 exhibited susceptibility to cefotaxime and erythromycin, but it showed resistance to penicillin and tetracycline. Isolate SPS6 was susceptible to all four antibiotics; conversely, isolates SPS7 and SPS10 were resistant to all four antibiotics. Isolates SPS8 and SPS9 were resistant to all antibiotics but they showed susceptibility to penicillin. The WGSs of all 10 isolates were conducted to investigate the association between antibiotic-resistance and More than 80% of the reads were above a Phred quality score of 30 indicating the high-quality of the sequencing data. The overall GC% content for all 10 isolates ranged from 39.12 to 39.72% and was similar to that of the TIGR4 reference genome (39.7%) [33]. The number of genes was 2352 and 2159 for SPS1 and SPS2, respectively. SPS3, SPS4, SPS5, and SPS6 had gene contents of 1983, 1983, 2020, and 1984, respectively. SPS7, SPS8, SPS9, and SPS10 showed gene contents of 2064, 1980, 2035, and 1924, respectively (Additional file 1). The number of tRNAs was also similar for all the 10 isolates in the range of 41-46 (Additional file 1). Figure 1 represents a circular map of the ten pneumococcal clinical isolates compared to the reference genome of isolate TIGR4. All 10 isolates showed > 90% identity with the reference genome. Different colours represent the BLASTn matches between 70% to 100% nucleotide identities. The full assembly and gene content for each pneumococcal isolate can be found in Additional file 1.

Core genome polymorphism
To identify sequence variations, WGS reads from each strain were mapped to the TIGR4 reference genome of S. pneumoniae using the Bowtie2 software [34]. The genome sequences of all the clinical isolates revealed a high level of similarity and the virulence genes that are known to be involved in drug-resistance are well conserved among all the 10 isolates (Table 3). In order to identify differences that alter the level of resistance of these clinical isolates we focused on identifying SNPs in genes engaged with antibiotics pathways. Table 4 represents the total number of SNPs identified for each isolate, which ranged from 3600 to 6548 SNPs. SNPs that cause a change in amino acids, start codons, and stop codons were classified as "non-synonymous SNPs". Figure 2 illustrates the distribution of all SNPs in both antibiotic resistant and susceptible isolates. The majority of these non-synonymous SNPs associated with pneumococcal essential genes were present in antibiotic resistant strains (p = 0.016). Penicillin-resistant isolates showed 3301 SNPs, while susceptible isolates had only 281 SNPs. 6343 SNPs were associated with tetracyclineresistant isolates compared to only 21 SNPs associated with isolate SPS6 (Fig. 2). Similarly, ceftriaxone and erythromycin resistant isolates showed greater number of SNPs (5234) compared to 111 SNPs in susceptible isolates (Fig. 2). The complete list of SNPs in the 10 . The blank spaces in the rings represent matches with less than 70% or no BLAST matches to the reference genome. The image was prepared using Blast Ring Image Generator [51]   isolates sequenced and the TIGR4 reference genome can be found in Additional file 2. To explore the potential link between sequence variants with virulence characteristics, non-synonymous polymorphisms were extracted from genes annotated as virulence factors or involved in bacterial resistance [35,36] (Table 4). The conserved nonsynonymous polymorphisms in all resistant pneumococcal isolates were identified. These SNPs could possibly play an important biological role as they result in stop codons or frame shifts in protein sequence (Table 5). A total of 16 genes with 90 non-synonymous SNPs found only in the resistant isolates were identified. The presence of some of these SNPs in more than one resistant isolate suggests that these SNPs or a subset of them might have potential roles in antibiotic resistance. For example, the same mutation (G597E) associated with penicillin binding protein PBP2b was found in four different resistant isolates SPS7, SPS8, SPS9, and SPS10. Similarly, mutation (T23A) associated with virulent gene pneumococcal surface protein A (PspA) was identified in four resistant isolates SPS1, SPS7, SPS8, and SPS9 (Additional file 3). By blasting our sequences, we were able to identify 31 unique nonsynonymous SNPs associated with penicillin binding proteins (PBPs) and other virulent genes that were not previously published ( Table 6). The numbers of non-synonymous SNPs corresponding to the selected genes essential for bacterial survival and virulence based on previous literatures [37][38][39] in each isolate are presented in Table 5. Figure 3 represents the number of non-synonymous SNPs that are in genes associated with virulence and antibiotics resistance. Resistant isolates to all four antibiotics have higher numbers of SNPs associated with virulent genes than susceptible isolates (SPS4 and SPS6), suggesting that the presence of certain SNPs could be more related to drugresistance (Fig. 4). Genes encoding capsular polysaccharide (CPS) biosynthesis proteins Cps4E, Cps4F, Cps4G, and Cps4H did not possess any mutations amongst all the pneumococcal isolates (Table 5).

Phylogenetic analysis of S. pneumoniae isolates
A parsimony tree with respect to reference and 10 isolates was generated from the kSNP3 pipeline, parsimony tree is consensus tree based on all of the SNPs identified between the reference genome TIGR4 and the 10 isolates. Branch lengths are expressed in terms of changes per number of SNPS. Our result showed that isolates SPS3, SPS4, SPS5, and SPS6 were closest to each other, while isolates SPS8, SPS9, and SPS10 were closely related to each other. All these eight isolates formed one clade. On the other hand isolates SPS1, SPS2, and SPS7 were closely related to each other (Fig. 5).

Discussion
WGS has become an essential tool to elucidate the mechanisms used by bacteria to resist various antibiotics. In the present study, we have investigated the genomic variations and mutations among genes associated with virulence and antibiotic resistance in 10 clinical isolates of S. pneumoniae selected based on their susceptibility profiles against four antibiotics. Using WGS technique, the full genomic sequences of the 10 isolates were compared to that of the S. pneumoniae reference genome TIGR4. Whole genome sequencing of the 10 isolates has revealed a high degree of sequence conservation between the pneumococcal isolates regardless of their susceptibility to antibiotics. This high sequence similarity of the isolates could possibly be explained by the low number of isolates and also by the fact that all the isolates had been collected from the same hospital. Nevertheless, cell wall surface anchor protein -- All S. pneumoniae loci number (sp_#) are according to the annotation of reference strain TIGR4 these results are in agreement with previous studies showing that closely related isolates may possess different levels of resistance to antibiotics [40]. The genes known to be involved in antibiotic resistance are well conserved among all the 10 isolates; however, we were able to identify many mutations that differentiate resistant form susceptible isolates. Our results showed that the majority of the SNPs occur in the resistant strains rather than the susceptible strains (Fig. 3). For instance, penicillin-resistant isolates showed a greater number of SNPs (3301) compared to susceptible isolates (281 SNPs). Figure 3 reveals that the highest numbers of SNPs among all the isolates are present in prtA, CpsD, CbpE, CbpA, CbpD, and CpsA. These genes could play a role in pneumococcal resistance to antibiotics. We were able to identify 90 non-synonymous SNPs associated with the essential genes of the resistant isolates, and some of them have reappeared in more than one resistant isolate, while none of them have occurred in susceptible strains (Additional file 3). Figure 4 shows that resistant Table 6 Unique non-synonymous single nucleotide polymorphisms (SNPs) associated with penicillin binding proteins (PBPs) and other virulent genes found in all ten pneumococcal isolates isolates isolates possess a higher number of SNPs associated with virulent genes than the susceptible isolates (SPS4 and SPS6). These results suggest that the presence of particular SNPs could play a role in conferring resistance to antibiotics. Out of these 90 SNPs, 31 were unique and found in penicillin binding proteins (PBP1A, PBP1B, PBP2A, PBP2B, and PBP2X), virulent genes (pneumolysin and PspA), sensor histidine kinase (ciaH), and CpsA; capsular polysaccharide biosynthesis protein (Table 6). However, the role of these SNPs in antibiotic resistance need to be investigated as some of them especially those related to penicillin binding proteins were also present in isolates SPS8 and SPS9 that are susceptible to penicillin.
In our study, we observed that genes encoding capsular polysaccharide biosynthesis proteins CpsE, CpsF, CpsG, and CpsH did not contribute to antibiotic resistance in all the resistant-types (Table 5). On the other hand, our results revealed that several novel mutations are present within capsular biosynthesis genes CpsA, CpsB, CpsC, and CpsD associated with resistant isolates ( Table 3). The synthesis of capsular polysaccharides is regulated by a set of genes located at the same locus (cps) between dexB and aliA. Except for serotypes 3 and 37, the first four genes of cps locus (CpsA-D) are common in all pneumococcal serotypes. These four genes encode proteins that affect the level of CPS expression Fig. 3 Heatmap represents the number of Non-synonymous SNPs from S. pneumoniae isolates from antibiotic-resistance genes [41]. Although CpsA has no impact on the transcription of CPS in S. pneumoniae, a mutant of pneumococcus lacking CpsA has been shown to produce a reduced level of CPS [42]. CpsB is a manganese-dependent phosphotyrosineprotein phosphatase; it has been shown that CpsB is necessary for the dephosphorylation of CpsD. Mutants with CpsB deletions tend to have an increased level of phosphorylated CpsD, which leads to a significant decrease in production of CPS [43]. CpsC is a membrane protein required for CpsD tyrosine autophosphorylation. A novel role for CpsC in the attachment of CPS to the pneumococcal cell wall has been identified recently [44]. CpsD is an auto-phosphorylating tyrosine kinase. Mutations in CpsD affecting the ATPbinding domain eliminate CPS production in S. pneumococcus. Therefore, the capsular genes CpsB, CPsC and CpsD work together to regulate CPS biosynthesis [43,44].
S. pneumoniae resists penicillin and other β-lactams by altering PBPs, the main enzymes involved in the final stage of cell wall synthesis. Six PBPs have been identified in pneumococcus PBP1a, 1b, 2×, 2a, 2b, and 3 [15]. Mutations in three of the PBPs (PBP2b, PBP2x, and PBP1a) have the most significant effect on β-lactams resistance. Several groups have reported mutations in genes encoding PBPs [45][46][47]. The MIC levels for isolates SPS1 and SPS2 were 2 μg/ml for penicillin and 1 μg/ml for cefotaxime. On the other hand, SPS7 and SPS10 showed the same level of resistance toward penicillin but an increased resistance for cefotaxime (2 μg/ml). Through our analysis, we were able to identify a non-synonymous SNP (G597E) in both SPS7 and SPS10 associated with penicillin binding protein PBP2B. The same SNP was found in isolates SPS8 and SPS9, and both of these two isolates showed high MICs toward cefotaxime (> 8 μg/ml and 8 μg/ml, respectively) (Additional file 3). However, our results showed that all the pneumococcal isolates regardless of their sensitivity to penicillin possess same mutations in the genes encoding penicillin-binding proteins, confirming previous reports that showed that these proteins are not the only determinants of penicillin resistance [48]. Furthermore, we were able to identify a unique mutation (H180D) in the sensor histidine kinase gene (ciaH) in resistant isolates only (Additional file 3). Mutations in ciaH increase resistance to β-lactams, as this gene is involved in the biosynthesis of cell wall components [49].
The phylogenetic relationships among different clinical isolates of S. pneumonaie were examined using the parsimony tree based on SNPs from whole genome sequencing. From the results, we observed that isolates SPS1 and SPS2 were clustered in one clade; isolates SPS8, SPS9, and SPS10 grouped in one clade; and isolates SPS4, SPS5 and SPS6 clustered in different clade (Fig. 5). These results are consistent with the MIC profile of the 10 isolates ( Table  2). The observations that pneumococcal isolates with similar MIC profile were grouped together in a phylogenetic Fig. 4 Heatmap represents the presence and absence of SNPs (numbers on the right side) in some of virulent genes in all ten isolates. Annotation on the right side of the heat map is the gene names. (Blue color represents absence and pink color represents presence) tree suggest that they possess common mutations and were probably originated from the common clone. It is possible that these strains could have evolved and acquired mutations in a similar manner due to selection pressures. Surprisingly, our results revealed that resistant isolate SPS3 was closely related to the susceptible isolates SPS4, SPS5, and SPS6. Among all the resistant isolates, SPS10 showed the highest number of SNPs (34 SNPs) ( Table 3). On the other hand, SPS3 was the least resistant isolate having nonsynonymous SNPs compared to other resistant isolates. Moreover, three non-synonymous SNPs (I178T, V22A, and T19A) were common among all the resistant isolates except isolate SPS3 (Additional file 3). This finding suggests that SPS3 could resist antibiotics using a unique mechanism as compared to other resistant isolates. The high phylogenetic relatedness among the clinical pneumococcal isolates with similar MIC profile is related to the specific SNPs in the mutated genes. The presence of identical uncommon mutations, as well as certain genes in the grouped isolates in the phylogenetic tree, is indicative of a single cluster of strains circulating in the population. For instance, the mutations S29A in cpsB, H197L in cpsC, M79I in cpsD, and Q136K in Ply were all found in isolates SPS1, SPS7, and SPS8 (Table 4). All three isolates are closely related to each other in the phylogenetic tree (Fig. 5). Similarly, certain genes such as SP_0461, SP_0463, SP_0357, and SP_1765 were only found in pneumococcal isolates SPS3, SPS4, SPS5, and SPS6 (Additional file 1).

Conclusion
In conclusion, this study compared the genomic sequences of 10 pneumococcal isolates with different susceptibility to multiple antibiotics. The high degree of sequence conservation and the presence of the same SNPs especially those related to genes involved in β-lactam resistance in both sensitive and resistant isolates, makes it a difficult task to identify distinct mechanisms of resistance that differentiate strains with different drug-sensitivities, and that antibiotic resistance cannot be only linked to the presence of certain genes. These results are in agreement with previous assumptions that bacterial virulence is the result of a gathering of pathogenicity-related genes that interact in various combinations [50] and that multi-drug resistance could be a result of combinations of mutations that lead to overexpression of several multi-drug efflux pumps; outer membrane porins, β-lactam acylases and enzymes and structural components involved in peptidoglycan stability (targets of βlactams); gyrase mutations; and aminoglycoside phosphotransferases and -acetylases [40]. We were able to identify unique SNPs associated with virulent genes that could have a possible role in resistance to various antibiotics. To confirm these results future studies on virulence gene knockouts are needed to link the role of antibiotic resistance with these genes. This study was also limited by the relatively small number of isolates included in the analysis. Moreover, all resistant genes have yet to be subjected to individual mutational analysis. This can be achieved by introducing the SNPs on the resistant genes by site-directed mutagenesis and further expression analysis. The development of bacterial resistance towards antibiotics is a complex mechanism and multiple genetic alterations such as addition/deletion of specific genes, mutations, or a combination of both could be involved in the process. Whole genome sequencing can be utilised in conjunction with current epidemiological studies, diagnostic assays, and antimicrobial susceptibility tests to understand the genetic variation and pathogen biology of "high-risk" bacteria. It is also important to note that t. Parsimony tree with respect to reference and ten isolates was generated from kSNP3 pipeline. Parsimony tree is consensus tree based all of the SNPs identified between the reference and isolates. Branch lengths are expressed in terms of changes per number of SNP