In silico design of a Zika virus non-structural protein 5 aiming vaccine protection against zika and dengue in different human populations

The arboviruses Zika virus (ZIKV) and Dengue virus (DENV) have important epidemiological impact in Brazil and other tropical regions of the world. Recently, it was shown that previous humoral immunity to DENV enhances ZIKV replication in vitro, which may lead to more severe forms of the disease. Thus, traditional approaches of vaccine development aiming to control viral infection through neutralizing antibodies may induce cross-reactive enhancing antibodies. In contrast, cellular immune response was shown to be capable of controlling DENV infection independently of antibodies. The aim of the present study was to design a flavivirus NS5 protein capable of inducing a cellular immune response against DENV and ZIKV. A consensus sequence of ZIKV NS5 protein was designed among isolates from various continents. Epitopes were predicted for the most prevalent alleles of class I and II HLA in the Brazilian population. Then, this epitopes were analyzed with regard to their conservation, population coverage and distribution along the whole antigen. Nineteen epitopes predicted to be more reactive (percentile rank <1) and 100% conserved among ZIKV and DENV serotypes were selected. The distribution of such epitopes along the protein was shown on a three-dimensional model and population coverage was calculated for different regions of the world. The designed protein was predicted to be stable and the distribution of selected epitopes was shown to be homogeneous along domains. The population coverage of selected epitopes was higher than 50% for most of tropical areas of the world. Such results indicate that the proposed antigen has the potential to induce protective cellular immune response to ZIKV and DENV in different human populations of the world.


Background
The Flavivirus genus of the Flaviviridae family consists of more than 70 viruses, many of which are arthropod-borne viruses, i.e. arboviruses [1]. It is estimated that more than 390 million people are infected annually by any of the four serotypes of Dengue virus (DENV1-4) [2], the most prevalent of the emerging arboviruses. Recently, Zika virus (ZIKV), an emergent pathogen previously associated with mild infections, started to be associated with microcephaly in babies and Guillain-Barré syndrome in the Americas [3][4][5]. Together, these two Flavivirus consist a world public health concern for which there is not specific treatment. For now, controlling arthropod vectors is the most effective and available method to prevent DENV and ZIKV burden. However, effective vaccines are needed in order to complement mosquito control programs, once that eradication of vectors is challenging and time consuming.
The only anti-DENV vaccine candidate approved for use in humans is based on a chimera between DENV and Yellow fever virus (YFV) [6][7][8][9][10]. It consists of chimeric viruses for each of DENV serotypes in which genomic sequences coding for YFV envelope proteins were replaced by those of DENV. Thus, specific immunity against DENV is concentrated on antigens that mainly induce generation of antibodies. Unfortunately, especially for DENV2, the vaccine formulation based on those chimeric viruses did not achieve the expected protective efficacy in phase III clinical trials in different regions of the world. In addition, it was reported to present higher incidence of hospitalization for dengue in year 3 after vaccination among children younger than 9 years of age [10]. Some severe forms of dengue are mediated by a phenomenon called antibody-dependent enhancement, in which immunoglobulins produced in response to a previous DENV serotype cross-react with viral particles of a second serotype and mediate enhanced infection of Fc-γ receptor bearing cells. Such enhanced infection leads to higher viral loads and severe disease. Relevantly, such phenomenon was reported between DENV and ZIKV [11][12][13][14]. Thus, the risk of inducing antibody-mediated enhancement of infection between these two major arboviruses will depend on the profile of immune response induced.
In contrast to envelope proteins, which are structural proteins, non-structural proteins are not present in viral particles and are not able to contribute directly to antibody-dependent enhancement. In addition, they were shown to be major targets for CD4 + and CD8 + T lymphocytes involved with control of DENV spread and intracellular replication [15][16][17]. Also, CD8 + T lymphocytes with multifunctional cytokine secretion patterns were found in volunteers immunized with a live attenuated tetravalent dengue vaccine [18]. Such lymphocytes target highly conserved epitopes located on non-structural proteins and this immunological pattern fits with those found after natural infections in which control of disease was observed [18]. Moreover, our last reports showed that protection against DENV is achieved independently of humoral immunity [19]. CD4 + and CD8 + T lymphocytes targeting non-structural proteins, mainly NS5, were shown to be essential for protection capacity. When a recombinant purified form of NS5 protein was used as a vaccine antigen, we achieved 70% of protection working on a mouse model [20].
The NS5 protein is a well-conserved antigen between DENV and ZIKV. It is the major DENV target for cellular immune response. In this study we aimed to design a vaccine antigen capable of inducing cross-protective cellular immunity against DENV and ZIKV. Here, we predicted NS5 common epitopes for DENV and ZIKV to be presented by HLA (human leucocyte antigen) encoded by alleles of different populations of the world. The designed antigen was shown to present a homogeneous distribution of epitopes and its predicted 3D model was shown to fit with the structure of a ZIKV native NS5. The population coverage of the antigen was higher than 50% for most of tropical regions of the world. Thus, it is possible to predict vaccine efficacy by region. In summary, we present a designed antigen which may be a valuable alternative in order to control the burden of DENV and ZIKV.

NS5 sequences database building
One database was built with NS5 protein amino acid sequences from ZIKV and DENV isolated in different continents of the world. Sequences in FASTA format were retrieved from the National Center for Biotechnology Information (NCBI) protein database (http:// www.ncbi.nlm.nih.gov/protein/). Criteria for selecting sequences were: i) complete annotation of the NS5 protein and ii) absence of undefined amino acid in the sequence. The database consisted of 153 NS5 protein amino acid sequences from the four serotypes of DENV and of 41 NS5 protein amino acid sequences ZIKV isolates. A sequence of NS5 protein from Spondwedi virus (accession number: ABI54480.1) was included to serve as a control. Accession numbers of DENV and ZIKV NS5 sequences are shown in Additional file 1: Tables S1 to S5.
Multiple alignment, consensus sequence design and phylogeny of the ZIKV sequences Evolutionary analyses were conducted in MEGA7. Multiple Alignments of DENV and ZIKV sequences were carried out using the ClustalW method [21]. A NS5 consensus sequence among ZIKV isolates was designed based on results obtained from multiple alignment of ZIKV NS5 proteins, using MegAlign program from Lasergene. The evolutionary history of DENV and ZIKV NS5 proteins was inferred using the Neighbor-Joining method. Spondwedi virus was used as a control. A bootstrap test of 1000 replicates was applied. The evolutionary distances were computed using the Poisson correction method. The analysis involved 195 amino acid sequences.

Survey of the most frequent HLA alleles in the Brazilian population
The allelic frequency of both, class I and class II HLA, in the Brazilian population, was retrieved from NCBI database (https://www.ncbi.nlm.nih.gov/projects/gv/mhc/ihwg.cgi). Alleles frequent in at least 3% of the Brazilian population were selected.

Epitope prediction analysis
Epitopes within the NS5 protein consensus were predicted using the IEDB (Immune epitope database) analysis resource (http://tools.immuneepitope.org/mhci/), IEBD recommended method. NetMHCpan method was also used when prediction was not possible by using IEBD recommended method. The epitopes from the NS5 consensus were ranked by their percentile rank value and those with a percentile rank ≤1 were selected. Epitopes with 8, 9, 10 and 11 amino acids in length were considered for Class I HLA and 15 amino acids for Class II HLA. A file containing the best ranked epitopes was created.

Analysis of conservancy of predicted epitopes
The IEDB conservancy analysis tool (http://tools.iedb.org/ conservancy) was used to determine the conservancy of the predicted epitopes among ZIKV lineages and DENV serotypes. Only predicted epitopes with a percentile rank ≤1 were used in this analysis. Only epitopes 100% conserved among all sequences of DENV and ZIKV were selected.

Population coverage analyses
Epitopes previously selected were used to determine the population coverage by the IEDB population coverage calculation tool (http://tools.immuneepitope.org/tools/ population/iedb_input). All of the selected epitopes were analyzed for similarity with human proteome using BLAST program (http://www.ncbi.nlm.nih.gov/BLAST/) to verify if they would not trigger autoimmunity.

Structural biology analysis
The ZIKV NS5 consensus amino acid sequence was used to compute a 3D model using the I-TASSER modeling method. The best computed model was subjected to TM-align structural alignment program to match the first I-TASSER model to all structures in the PDB library (Protein Data Bank -http://www.rcsb.org). The top 1 model was also used to determine epitope distribution in protein domains. Epitopes were marked in the NS5 protein 3D model using the PyMOL program aiming to access their distribution.

ZIKV NS5 phylogeny
In order to verify the origin of ZIKV isolates from Brazil and the Americas we carried out a phylogenetic analysis with NS5 amino acid sequences from ZIKV from throughout the world. We also included NS5 sequences from DENV serotypes from different genotypes isolated in different continents in order to verify similarities among ZIKV and DENV proteins. A comprehensive phylogenetic analysis of NS5 amino acid sequences based on world representative strains of ZIKV and DENV is shown in Fig. 1. NS5 sequences from Brazilian and American isolates grouped with those from the Asian lineage of ZIKV. None of the NS5 sequences from American isolates grouped with those from the African lineage of ZIKV. This result indicates that Brazilian and American ZIKV isolates analyzed in this study belong to the Asian lineage.
In addition, the phylogeny indicates that both, DENV and ZIKV NS5 proteins diverged before their separation into two different viral species. In other words: proteins diverge since the root of the phylogenetic tree.

Consensus sequence design
Although it has been shown that Brazilian and American ZIKV isolates belong to the Asian lineage, all the sequences contained in ZIKV database were used in the design of the consensus. It means that African sequences were also used. The option for including them was based on the aim of preventing disease with imported viruses from African lineage. Attempts to design a consensus NS5 protein between ZIKV and DENV resulted in a non-stable 3D model (data not shown). The ZIKV NS5 consensus is shown in Fig. 2.

The most frequent alleles in Brazilian population were selected
The survey of HLA frequencies in Brazilian population demonstrated that the most frequent are those in Table 1

Epitope prediction and conservation analyses
Epitopes with higher binding affinity to the HLA molecules selected before were identified. Then, they were analyzed with regard to their conservation in all the ZIKV NS5 sequences contained in the database previously prepared. Epitopes with percentile rank ≤1 (high binding affinity) and 100% conserved among ZIKV and DENV NS5 sequences were selected. We selected 19 epitopes (Table 2) 100% conserved among all DENV and ZIKV NS5 amino acid sequences analyzed in this study. This result indicates that NS5 protein concentrates a relevant number of epitopes highly conserved among ZIKV lineages and DENV serotypes. In addition, conservation analyses also showed that 624 epitopes are 100% conserved among ZIKV lineages, 85 epitopes are 100% conserved among ZIKV and DEN1, 102 epitopes are 100% conserved among ZIKV and DEN2, 61 epitopes are 100% conserved among ZIKV and DEN3 and 126 epitopes are 100% conserved among ZIKV and DEN4. This result indicates that the consensus ZIKV NS5 protein share different numbers of epitopes with each of your homologous proteins.
Population coverage of epitopes predicted in ZIKV NS5 consensus vary in different human populations of the planet As shown in conservancy analyses, we selected nineteen epitopes with 100% identity among all the ZIKV and DENV NS5 sequences from databases. Such epitopes were shown to present a relevant population coverage with regard to the most prevalent HLA alleles in the Brazilian population, ranging from 3.9% to 33.73% (Table 2). Together, they presented accumulated population coverages of 74.07% and 68.70% for Brazilian and United States  The tree shows that NS5 sequences from Brazilian ZIKV isolates grouped with those from Asian ZIKV lineage. The evolutionary history was inferred by using the Maximum Likelihood method based on the JTT matrix-based model [1]. The percentage of trees in which the associated taxa clustered together is shown next to the branches. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model, and then selecting the topology with superior log likelihood value. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 195 amino acid sequences. All positions containing gaps and missing data were eliminated. There were a total of 896 positions in the final dataset. Evolutionary analyses were conducted in MEGA7  (Table 3). These results indicate that the consensus NS5 protein proposed in this study is a promising antigen with regard to induce cellular immune response against ZIKV and DENV in different human populations of the world.

3D NS5 model and epitope distribution on the protein
The structure computed for the ZIKV NS5 consensus was shown to be similar to a previous reported ZIKV NS5 structure deposited in PDB library [22]. As shown in Table 4, a high level of identity and a high TM-score were found between the consensus protein and the top ranked similar, a previous reported ZIKV NS5 protein.
In addition, those 19 epitopes found to be 100% conserved among ZIKV lineages and DENV serotypes were shown to be homogeneously distributed along the protein structure (Fig. 3). Some overlapping epitopes were found in the protein and thus, their distribution is shown in regions. Three regions containing epitopes specific for HLA-A presentation were found (Fig. 3a). One of them is located at the MTase domain and contains only one epitope (Region 1, epitope LSRNSTHEMY) (Fig. 3b). Two other regions (2 and 3) which contain together six epitopes are located at the RdRp domain (Fig. 3c). Two epitopes are located in region 2 (Region 2, epitopes CVYNMMGKR and CVYNMMGKREK) and four in region 3 (Region 3, epitopes AIWYMWLGAR, WYMWLGAR, RAIWYMWL GAR and IWYMWLGAR). One region containing only one epitope specific for HLA-B presentation is located at the RdRp domain (Region 1, epitope LEFEALGF) (Fig. 3d). Five regions containing epitopes specific for HLA-C presentation were found (Fig. 3e). Two of them are located at the MTase domain. Region 1 contains three epitopes (Region 1, epitopes SRNSTHEM, SRNSTHEMY and SRNSTHEMYW) and region 2 contains one epitope (Region 2, epitope GRGGWSYY) (Fig. 3f). Three other regions are located at the RdRp domain: region 3 contains three epitopes (Region 3, epitopes RAIWYMWL, WYMWLGAR and SRAIWYMWL), region 4 contains one epitope (Region 4, epitope LEFEALGF) and region 5 contains four epitopes (Region 5, epitopes YADDTAGW DT, YADDTAGWDTR, YADDTAGWD and ADDTAGW D) (Fig. 3g).

Discussion
Zika virus (ZIKV) is a new emergent pathogen previously associated with mild infections which took off to cause microcephaly in babies and Guillain-Barré syndrome in the Americas [3][4][5]. Dengue virus (DENV1-4) [2] infects around 390 million of people per year. These two Flavivirus consists a world public health concern for which there is not specific treatment. There is not a vaccine approved for controlling ZIKV in humans. The only anti-DENV vaccine candidate approved for use in humans is based on a chimera between DENV and Yellow fever virus (YFV), in which envelope proteins are the only DENV-specific antigens [6][7][8][9][10]. Such proteins are main targets for humoral immune response. However, especially for DENV2, the vaccine formulation based on those chimeric viruses did not achieve the expected protective efficacy in phase III clinical trials. In addition, it was reported to present higher incidence of hospitalization among children [10]. Such increased hospitalization may be related to antibody dependent enhancement. Importantly, enhancement of ZIKV infection mediated by DENV-specific antibodies was observed in vitro [11][12][13][14]. Thus, the risk of inducing antibody-mediated enhancement of infection between these two major arboviruses will depend on the profile of immune response induced. It was recently shown that protection against DENV is achieved independently of humoral immunity [19]. CD4 + and CD8 + T lymphocytes targeting non-structural protein 5 (NS5) were shown to be essential for protection. When a recombinant purified form of NS5 protein was used as a vaccine antigen, 70% of protection was achieved on a mouse model [20]. In this study, we aimed to design a vaccine antigen capable of inducing crossprotective cellular immunity against DENV and ZIKV. As the NS5 protein is the most conserved antigen between DENV and ZIKV, a ZIKV consensus amino acid sequence was shown to present a relevant number of epitopes 100% conserved among DENV serotypes and ZIKV lineages to be presented by HLA (human leucocyte antigen) alleles of different populations of the world. The designed antigen was shown to present a homogeneous distribution of epitopes and its predicted 3D model was shown to fit with the structure of a ZIKV native NS5.
A comprehensive phylogenetic analysis based on NS5 protein amino acid sequences from ZIKV and DENV isolates from different continents was carried out. It was   shown that sequences from American ZIKV isolates grouped with those from isolates of the ZIKV Asian lineage. This corroborates the recent literature [23]. In addition, DENV and ZIKV NS5 proteins were shown to have diverged before their separation into different viral species and evolutionary distances could not be inferred based on phylogenetic tree. However, the consensus ZIKV NS5 protein was shown to share different numbers of epitopes with each of your homologous proteins. This result indicates that the proposed antigen could induce crossreactive immune responses with different levels of intensity to ZIKV, DEN1, DENV2, DENV3 and DENV4. The consensus NS5 protein was designed based on ZIKV isolates from throughout the world. Attempts to produce a ZIKV/DENV consensus resulted in a nonstable protein model (data not shown). Nevertheless, nineteen epitopes specific for presentation by class I HLA were found to be 100% conserved among DENV serotypes and ZIKV lineages. Population coverages of such epitopes with regard to different human populations in the world were shown to be relevant. Predicted vaccine coverage for populations from several tropical areas were higher than 50%. This is an important characteristic of the designed antigen considering that HLA restriction is a major challenge in vaccine development, which may significantly Percentage of sequence identity in the structurally aligned region; f Cov represents the coverage of the alignment by TM-align and is equal to the number of structurally aligned residues divided by length of the query protein Fig. 3 Distribution of class I HLA predicted epitopes along the ZIKV NS5 consensus 3D model. Epitopes found to be 100% conserved among ZIKV lineages and DENV serotypes were shown to be homogeneously distributed along the protein structure. Three regions containing epitopes specific for HLA-A presentation were found (a). One of them is located at the MTase domain and contains only one epitope (Region 1, epitope LSRNSTHEMY) (b). Two other regions (2 and 3) which contain together six epitopes are located at the RdRp domain (c). Two epitopes are located in region 2 (Region 2, epitopes CVYNMMGKR and CVYNMMGKREK) and four in region 3 (Region 3, epitopes AIWYMWLGAR, WYMWLGAR, RAIWYMWLGAR and IWYMWLGAR). One region containing only one epitope specific for HLA-B presentation is located at the RdRp domain (Region 1, epitope LEFEALGF) (d). Five regions containing epitopes specific for HLA-C presentation were found (e). Two of them are located at the MTase domain. Region 1 contains three epitopes (Region 1, epitopes SRNSTHEM, SRNSTHEMY and SRNSTHEMYW) and region 2 contains one epitope (Region 2, epitope GRGGWSYY) (f). Three other regions are located at the RdRp domain: region 3 contains three epitopes (Region 3, epitopes RAIWYMWL, WYMWLGAR and SRAIWYMWL), region 4 contains one epitope (Region 4, epitope LEFEALGF) and region 5 contains four epitopes (Region 5, epitopes YADDTAGWDT, YADDTAGWDTR, YADDTAGWD and ADDTAGWD) (g) affect vaccine efficacy and effectiveness among human populations of different regions.
The 100% conserved epitopes were shown to be homogeneously distributed along protein domains. Although such amino acid sequences may be used in a polyepitope development project, their homogeneous distribution along the protein structure is important for using the whole antigen. As NS5 protein is a major target for CD4 + and CD8 + T lymphocytes, using the whole antigen such as we previously reported [19] would be a promising strategy in order to induce a significant immune response against the conserved epitopes. Thus, we conclude that the proposed antigen has the potential to induce protective cellular immune response to ZIKV and DENV in different human populations in the world.

Conclusions
In the end of the analysis we conclude that NS5 consensus protein have the potential to figurate a vaccine antigen to induce cross protection to ZIKV end DENV in different human populations of the world. The next step in this process is to analyze real immune response for this antigen in mouse model and see how the immune response process will proceed.

Additional file
Additional file 1: Table S1. -ZIKV Sequences. The sequences are named with virus species, accession number and country of isolation. Table S2 -DENV1 sequences. The sequences are named with virus species and serotype, accession number and country of isolation. Table S3 -DENV2 sequences. The sequences are named with virus species and serotype, accession number and country of isolation. Table S4 -DENV3 sequences. The sequences are named with virus species and serotype, accession number and country of isolation. Table S5 -DENV4 sequences. The sequences are named with virus species and serotype, accession number and country of isolation (DOCX 18 kb)