A novel 9-bp insertion detected in steroid 21-hydroxylase gene (CYP21A2): prediction of its structural and functional implications by computational methods

Background Steroid 21-hydroxylase deficiency is the most common cause of congenital adrenal hyperplasia (CAH). Detection of underlying mutations in CYP21A2 gene encoding steroid 21-hydroxylase enzyme is helpful both for confirmation of diagnosis and management of CAH patients. Here we report a novel 9-bp insertion in CYP21A2 gene and its structural and functional consequences on P450c21 protein by molecular modeling and molecular dynamics simulations methods. Methods A 30-day-old child was referred to our laboratory for molecular diagnosis of CAH. Sequencing of the entire CYP21A2 gene revealed a novel insertion (duplication) of 9-bp in exon 2 of one allele and a well-known mutation I172N in exon 4 of other allele. Molecular modeling and simulation studies were carried out to understand the plausible structural and functional implications caused by the novel mutation. Results Insertion of the nine bases in exon 2 resulted in addition of three valine residues at codon 71 of the P450c21 protein. Molecular dynamics simulations revealed that the mutant exhibits a faster unfolding kinetics and an overall destabilization of the structure due to the triple valine insertion was also observed. Conclusion The novel 9-bp insertion in exon 2 of CYP21A2 genesignificantly lowers the structural stability of P450c21 thereby leading to the probable loss of its function.


Background
Congenital adrenal hyperplasia (CAH; OMIM# 201910) is an autosomal recessive disorder caused by deficiency of one of the five steroidogenic enzymes involved in cortisol biosynthesis. Steroid 21-hydroxylase deficiency accounts for about 90-95% of all CAH cases [1]. Deficiency of cor-tisol results in excessive production of androgens leading to prenatal virilization in females and rapid somatic growth in both sexes [2]. CAH has been traditionally divided into three forms; severe salt wasting (SW), less severe simple virilizing (SV) and asymptomatic non-classic (NC) form. The SW form is common and found in 75% of all CAH patients. In addition to decreased cortisol, aldosterone biosynthesis is also impaired in these patients resulting in severe renal salt loss and hypotonic shock, unless treated during the neonatal period [2]. Patients with SV form do not have aldosterone deficiency and thus salt loss is not present. Symptoms like variable degree of ambiguous genitalia in females, growth acceleration and pseudoprecocious puberty are seen in SV forms. The milder non-classical form is asymptomatic at birth and presents with various degrees of late onset features of hyperandrogenism.
The wide spectrum of clinical manifestations seen in this disorder is due to varied degrees of enzyme activity caused by different mutations in the CYP21A2 gene encoding the steroid 21-hydroxylase enzyme. Although, both large and point mutations have been seen in different populations [3][4][5][6][7], point mutations constitute a larger proportion. The CYP21A2 gene is part of a complicated structure, referred to as the RCCX-module located in human leukocyte antigen class III locus on chromosome 6p21.3 [8]. Approximately 30 kb upstream of CYP21A2, a pseudogene (CYP21A1P) is present; which is 98% identical to CYP21A2. CYP21A1P cannot synthesize the functional protein due to presence of numerous deleterious mutations.
About 95% of the mutated alleles in patients with steroid 21-hydroxylase deficiency are generated by transfer of DNA sequences from CYP21A1P to CYP21A2 by gene conversion events [9]. The remaining 5% alleles have new/ rare mutations due to random events [2]. The number of rare mutations identified has increased dramatically in the last few years [10]. Most of these mutations are unique to individual families but some are population specific [11][12][13]. Functional characterization of most of the mutants using site directed mutagenesis followed by in vitro expression analyses have been helpful in correlating clinical severity to the degree of loss of enzyme function caused by the mutations [10]. However such studies are difficult to perform in routine laboratory set up with limited resources. Alternatively, these experimental assays could be complemented with computational studies; wherein the structural and functional perturbations of a protein by virtue of mutation/s could be predicted. Consequence of single mutation (W62G) on structural stability of lysozyme [14] and structural perturbations caused by three different activating mutations of the hLHR gene found in four unrelated Brazilian boys with male-limited preco-cious puberty have been studied by MD simulations [15]. Computational scientists have also used MD simulations to independently explore the role of mutations on protein stability and activity [16][17][18][19][20][21][22]. In the present study, molecular modeling and MD simulations were carried out to understand the structural and probable functional implications of the triple valine insertion of human P450c21.

Clinical history of proband
The proband presented on day 30 with ambiguous genitalia and failure to thrive. She was the first child born of non-consanguineous marriage with an uneventful antenatal history. She was born at full term by normal delivery. Her clinical examination showed prominent phallus with clitoral hypertrophy and partial fusion of labia (Prader stage IV). Gonads were non-palpable. Genitogram and panendoscopy revealed normal bladder, common urogenital sinus, vagina, and uterine cervix. Her weight was initially 2.9 kg (well below the 0.4 th centile). Later at 12 weeks her length was 54.7 cm (2 nd centile) and weight 4.3 kg (2 nd centile). Hormonal examination revealed elevated 17-hydroxyprogesterone (17-OHP) (8800 ng/dl; normal value < 630 ng/dl), testosterone (1.5 ng/ml; normal range 0-0.28 ng/ml) and low cortisol (5.6 ug/dl; normal range 11-18 ug/dl). Sodium and potassium levels were 138 mmol/l and 6.1 mmol/l respectively. Her chromosomal analysis revealed 46, XX normal karyotype. She was diagnosed as simple virilising CAH and put on a relatively low dose of 5 mg/m 2 of hydrocortisone and 50 μg of fludrocortisone daily. In second year, she showed normal development with 17-OHP level of 560 ng/dl. Her weight was 8 kg, height -74 cm and head circumference -45.3 cm (all below 0.4 th centile). Her bone age was normal. She was continued with the same dose till her third year. Thereafter her dose had to be increased to 9 mg of hydrocortisone and 100 μg of fludrocortisone daily due to increased level of 17-OHP (7200 ng/dl). She is now 5 yrs old and developing normally.

Molecular analysis
Informed consent was obtained from both parents and the study was approved by the Ethical Committee of our institution. About 2 ml of whole blood was collected in EDTA vacutainers (Becton-Dickinson, USA) for CYP21A2 gene analysis and DNA was extracted using Qiagen kit (QIAmp DNA Blood Kit, QIAGEN GmbH, Hilden). 200 ng of genomic DNA was subjected to selective amplification of CYP21A2 gene in two different fragments of 1.2 and 2.3 kb respectively using previously described primers [23]. PCR was carried out in 2720 thermocycler (Applied Biosystem) in a reaction volume of 50 μl containing 1.5 mM MgCl 2 , 0.2 mM dNTPs, 200 mM (NH4) 2 SO 4 , 750 mM Tris-HCl (pH 8.8), 0.1% Tween 20, 2 units of Taq polymerase (Fermentas, Life Sciences) and 0.5 μM of primers. PCR was performed at 96°C for 3 min for initial denaturation followed by 30 cycles of 95°C for 1 min, 55.5°C for 25 sec, and 72°C for 3 min and final extension at 72°C for 6 min. PCR products were resolved on 1% agarose gel stained with ethidium bromide, visualized under UV transilluminator.
The PCR products were purified using QIAGEN kit (QIAmp Gel Purification Kit, QIAGEN GmbH, Hilden). Purified products were subjected to direct sequencing with ten different primers (Table 1) based on dideoxynucleotide terminator methodology using the BigDye Terminator Cycle Sequencing Ready Reaction kit (PE, Applied Biosystems, Perkin Elmer Corp., Foster City, CA). The analysis was carried out in the ABI PRISM 3130 Genetic Analyser (PE, Applied Biosystems). The numbering of the nucleotides was according to Higashi et al [24].

Web-based tools
The web-based tools were used for the purpose of secondary structure prediction, template recognition, model evaluation and sequence analysis ( Table 2).

Secondary structure prediction
GeneSilico metaserver, JPRED and SYMPRED servers [25][26][27] were used to predict the secondary structures of human P450c21 ( Table 2). The output of each of these secondary structure prediction servers is the consensus secondary structure predicted for the protein based on a collection of independent algorithms. The SYMPRED result is based on the consensus formed by the predictions of HD, PROF, SSPRO, YASPIN, JNET and PSIPRED programs while GeneSilico result is based on PSIPRED and JNET.
Identification of suitable template FUGUE (v2.s.07; Table 2) was used for identification of template for modeling the structure of human P450c21 [28]. The best template is identified based on their Z-scores. The templates which have a Z-score of 6 or more are labeled as "certain" with 99% confidence.

Assessment of the theoretical structure available for human P450c21
The theoretical structure of human P450c21 (PDB ID: 2GEG) has been elucidated using rabbit P4502c5 (PDB ID: 1N6B) as template [29]. The quality of this theoretical structure was assessed in two ways. Firstly, using the alignment of the target-template provided by FUGUE, the agreement between the predicted consensus secondary structure of human P450c21 with that of rabbit P4502c5 secondary structure was compared. Secondly, the quality of the theoretical model was evaluated using Verify3D [30] and Colorado3D server [31] ( Table 2).

Generation of structure for the mutant
The structure of mutant (MT) was modeled using the theoretical structure of wild-type (WT; 2GEG) as the template. Discovery Studio v 1.7 (Accelrys) was used for building the homology-based model.

MD simulations
MD simulations were performed using GROMACS 3.3.1 [32]. United atom representation was used except for polar and aromatic ring hydrogen atoms. GROMOS96 forcefield was used for energy calculations. Van der Waals interactions were calculated with a distance cut-off of 0.9 nm. Electrostatic interactions were treated using the cutoff method [33]. Neutralizing counter ions were added when charged residues were present. The models were solvated with SPC water molecules [34] and simulated in a triclinic box [35,36] with periodic boundary conditions. The simulations were performed in the canonical NVT ensemble. The models were first energy minimized using steepest descent algorithm with a tolerance of 1 J mol -1 nm -1 ; this was followed by position restrained MD simulations for 10 ps. Initial velocities were generated conforming to Maxwell velocity distribution. A time step of integration of 2 fs was used. LINCS algorithm was used to constrain the bonds [37]. The productive run was initiated for 10 ns; this duration was sufficient to compare the stabilities of the protein structures. The simulations were performed at 300K.

Analysis of MD trajectories
With an objective of understanding the structural and functional implications of the triple valine insertion, trajectories of WT and MT were analyzed for the following structural properties as a function of time: a) Potential energy, b) The root mean square deviation (RMSD) of the Cα atoms with respect to the starting conformation, c) RMSD of the hydrophobic residues postulated to be involved in membrane binding [29] with respect to the starting conformation d) Distance between the Cα atoms of residues S108 and D287; both of which are suggested to be involved in heme and substrate-binding [29], e) Structural perturbations near the insertion site. The trajectories of the simulations were plotted using XMGRACE [38]. DSSP [39] program was used for secondary structure assignment and MolScript [40] was used to create and render the molecular images.

Sequence logo
For understanding the conservation of residues in the immediate vicinity of the insertion, a sequence logo was constructed using the program WebLogo [41]. Proteins that share more than 30% similarity with human P450c21 and are deposited in the SwissProt database were identified using the PSI-BLAST algorithm [42]. The 50 identified homologs were all cytochromes and shared similarity throughout the length of the protein. These proteins were then subjected to multiple sequence alignment using CLUSTALW [43] and a sequence logo was constructed.

Mutational analysis
Clinical and biochemical findings in our patient were consistent with her simple virilizing phenotype. Entire sequencing of CYP21A2 gene (including 10 exons, 9 introns and < 400 base pairs upstream to the transcription initiation codon) was performed in order to detect known and unknown mutations. Electropherogram showed a known I172N mutation in exon 4 of one allele and an inframe insertion of 9-bp TGTGGTGGT at nucleotide 306 in exon 2 of other allele (Fig. 1A). This 9-bp insertion resulted in triple valine insertion between V70 and L71 of the wild type P450c21 (Fig. 1B). Re-sequencing of Exon 2 using reverse primer 2R (Table 1) showed a frameshift 9bp away from the frameshift seen in the forward strand. Using SeqScape v2.1.1, forward and reverse sequences were aligned against the reference sequence (see Additional file 1 part A) that confirmed the insertion as highlighted sequences (see Additional file 1 part B). The shift in the insertion site was clearly observed in the forward (see Additional file 1 part C) and reverse (see Additional file 1 part D) strand that further confirmed this insertion as duplication of TGTGGTGGT base pairs in exon 2, at nucleotide position 306 of CYP21A2 gene (see Additional file 1 part E). This novel insertion mutation was inherited from the father, while the I172N was inherited from the mother, but both parents were heterozygous for their respective CYP21A2 mutation and, therefore, clinically healthy. This insertion/duplication in exon 2 is novel (GenBank accession no. EF661662) and has not been reported previously in patients with 21-hydroxylase deficiency. This mutation was not found in 100 control subjects.
In silico studies were performed to understand the effect of the insertion of triple valine on the structure and function of P450c21.

Effect on sequence conservation
The first step towards elucidating the effect of the insertion on the structure-function characteristics of the protein was to investigate the occurrence of conserved residues in the vicinity of the mutation. The sequence logo revealed the presence of two highly conserved, charged residues viz. E79 and R91 that occur downstream of the insertion site. In addition to these, hydrophobic residues occurring at 48, 51, 58, 61 63, 80, 81 and 82 positions and glycine residues occurring just before the insertion site at 56 and 64 th position are also conserved (Fig. 2). Hence, the mutation could cause a drastic change in the position of these conserved residues and thus lead to loss of structure and function of the protein.

Secondary structure prediction
The secondary structures for human P450c21 were predicted using various online softwares (Table 2). Except for a few regions, there was good agreement in the predictions from the various servers (see Additional file 2). There does not seem to be any discordance with the secondary structures predicted at the site of mutation (see Additional file 2).

Identification of suitable template
Cytochrome P4502c5 of rabbit (PDB ID: 1DT6) was identified as the best template, for modeling the structure of human P450c21, by FUGUE (Z-score of 33.12; "certain" with 99% confidence). As compared to the other templates, it also showed full-length sequence alignment with least number of gaps. The consensus predicted secondary structural elements of human P450c21 also aligned fairly well (see Additional file 3). There exists another experimentally elucidated structure for rabbit P4502c5 [1N6B; [44]] with a better resolution as compared to 1DT6. The secondary structure compositions of both these PDB entries are identical and the RMSD between them is 0.8 Å.
Hence, 1N6B was considered as the ideal template for modeling of human P450c21. A theoretical structure of human P450c21 (PDB ID: 2GEG) has been elucidated using rabbit P4502c5 (PDB ID: 1N6B) as template [29]. This structure was thus used to represent the WT and also serve as the template for modeling the MT.
Direct DNA sequencing of exon 2 of CYP21A2 gene

Validation of the theoretical structures
The Verify3D analyses of theoretical models of both WT and MT revealed that most of the residues had a positive score. For low-scoring regions of the model, it was observed that the corresponding regions of the template (1N6B) too shared a low score. The structural anomalies could thus have been passed down from the template dur-ing modeling (see Additional file 4). The structures were also colour-rendered based on the Verify3D scores generated by the Colorado3D server. The major part of the structures, in all the cases, had a good score (see Additional file 5).
Sequence logo depicting the conservation of residues near the insertion site CαRMSD plot of P450c21 structure with respect to the starting conformation WT and MT plots are rendered in black and red respectively. The figure was prepared using XMGRACE (38).

Structural implications of the insertion Loss of structural stability
The WT and MT structures were subjected to MD simulations. The RMSD analysis of the trajectory revealed that the WT dynamics attained equilibration at around 3 ns with an RMSD of around 0.5 nm, while the MT attained equilibrium at around 6 ns with a higher RMSD (0.75 nm; Fig. 3A). In concurrence to the above observation, the potential energy of the MT remains higher through out the dynamics as compared to the WT (Fig. 3B). The loss of the secondary structures at the region near the insertion site during the course of the simulation clearly reveals a faster unfolding of the MT as compared to the WT (Fig. 4). All these observations suggest that the triple valine insertion causes an overall destabilization of the protein.

Loss of H-bond interactions at the vicinity of insertion
Further examination of the atomic-level interactions at the vicinity of the insertion present in human P450c21 (WT) revealed that residues N72 and E79 participate in hydrogen bonding with distantly placed residues viz. T52, N387 and S374 (Fig. 5). Trajectory analysis revealed that the hydrogen bond interactions between E79 and S374 seem to be destroyed in the early stages of simulation for MT whereas these were retained in WT (see Additional file 6 part A). The hydrogen bonds between N72-N387 and N72-T52 were also lost rapidly for the MT (see Additional file 6 parts B and C).

Functional implications of the insertion Heme and substrate-binding
Docking and homology studies of cytochrome P450s have helped in identification of putative heme and substrate-binding residues. The residues S108 and D287 have thus been implicated in heme and substrate-binding [29]. These residues were seen to be interacting with each other in P450c21. The change in distance between the Cα atoms of these residues was monitored for the trajectories as a function of time for both the WT and MT structures (Fig.  6A). The distance between these two residues in the native WT and MT structures is 0.6 nm. It has to be noted here that the starting structures for simulations of WT and MT were identical except for the triple valine insertion and the residues S108 and D287 were distantly placed from the insertion site. In WT dynamics the distance between these two residues was maintained at around 0.6 nm, while for the MT the distance initially increased to around 2.25 nm and later stabilizes at around 1.25 nm (Fig. 6A).

Hydrophobic interactions
Based on Optimal Docking Area method, three segments corresponding to residues 30-42, 63-66 and 211-219 of human P450c21 were found to form a surface exposed Snapshots of the conformation of human P450c21, near the insertion site (residues 57-82), during the course of simulation Figure 4 Snapshots of the conformation of human P450c21, near the insertion site (residues 57-82), during the course of simulation. The top and bottom panels represent the WT and MT structures respectively as a function of time. The structures have been rendered using MolScript (40) and DSSP program (39) was used for acquiring the secondary structure information.
hydrophobic patch [29]. It was hypothesized that these residues would have to participate in hydrophobic interactions to minimize the destabilizing effect caused by the surface exposed hydrophobic region. Additionally, they are positioned adjacent to the N-terminal transmembrane region of the protein. Hence these hydrophobic residues were suggested to be probably involved in binding with the ER membrane [29]. These residues were considered as a cluster and the RMS deviation of the Cα atoms of this cluster with respect to the starting conformation was compared for the WT and MT structures during the course of simulation. It was observed that the RMS deviation increased as a function of time for the MT as compared to the WT (Fig. 6B).

Discussion
CAH is found in wide range of clinical severity ranging from subtle hormone imbalance in adults to severe life threatening salt wastage in newborns. Detection of underlying mutations in CYP21A2 gene encoding steroid 21hydroxylase enzyme is helpful both for confirmation of diagnosis and management of CAH patients. Different kinds of mutations result in different degrees of enzymatic impairment of P450c21, which result in varied pheno-types of CAH patients. Although, a large number of novel mutations have been reported in CYP21A2 gene over the past few years and these mutations have continued expanding worldwide, large insertion/duplication mutations are uncommon. A duplication of 16 bases (CCT-GGATGACACGGTC) at codons 393-397 of exon 9 [45] and a large duplication of 111 bases from codons 21 to 57 inserted at codon 58 in exon 1 of the CYP21A2 gene have been reported [46]. Once a novel mutation is encountered, it is necessary to carry out the functional studies to establish genotype -phenotype correlation so that the prognostic evaluation can be made for the proper management of the patient.
A large number of mutations detected in CYP21A2 gene have been characterized to prove their clinical relevance and impact on P450c21 protein. For characterization of mutations, functional studies have been carried out by site directed mutagenesis followed by in vitro expression of the mutant protein in transiently transfected mammalian cells. The residual enzyme activity is then measured towards both natural substrates (17-OHP and progesterone) and compared with the WT protein. Percentage of enzyme activity is correlated with the clinical phenotype and accordingly the mutation is classified as SV, SW or NC type [10,[47][48][49][50]. However such studies are laborious, time consuming and in some mutant proteins, the biochemical and biophysical evaluation is not possible by in vitro studies. In such cases, in silico studies can provide additional clinically useful information that could not be possible by examining the patients [29]. Molecular modeling has been used to study the putative effects of steroid 21hydroxylase gene mutations. Structural features deduced from the models were in good correlation with clinical severity of P450c21 mutants, which shows the applicability of a modeling approach in assessment of new P450c21 mutations [29,[51][52][53][54].
In this particular case, the 9-bp insertion/duplication did not result in frameshift as expected from insertions/duplications in general; instead it led to insertion of three valine residues between V70 and L71. Since this insertion mutation does not exist in pseudogene, gene conversion may not be the cause of this mutation. This duplication might have been generated by intergenic recombination during meiosis. Theoretically, this insertion could lead to absence of residual 21-hydroxylase activity resulting in the severe SW phenotype. Given the fact that CAH is an autosomal recessive disorder, the clinical severity reflects the milder mutation present in the patient. Hence SV phenotype of our patient that reflected the mild mutation I172N present on one allele, could not give additional information about the possible structural perturbation caused by the novel mutation present on the other allele. In such compound heterozygous cases with a mild known muta-Partial structure of human P450c21 depicting the residues whose H-bond (dotted line) interactions get affected due to mutation Figure 5 Partial structure of human P450c21 depicting the residues whose H-bond (dotted line) interactions get affected due to mutation. The structures in green and blue represent regions from 46-94 residues and 365-390 residues respectively. The figure has been rendered using MolScript (40).
tion on one allele and a novel mutation on other allele, it is difficult to classify the mutation on the basis of clinical phenotype.
In the present study, molecular modeling and MD simulations were carried out to analyze the structural consequences of this insertion in P450c21 and to better understand the molecular pathology of CAH in the proband.
The structure of human P450c21 has not been experimentally elucidated. However, a theoretical model for it is available in the Protein Data Bank (PDB ID: 2GEG; [29]). The accuracy of this theoretical model was ascertained using structure prediction and evaluation tools (see Additional files 2, 3, 4, 5). Subsequently, using the WT as the template, the structure of the MT was modeled.
Both the WT and MT structures were subjected to MD simulations for comparison of their structural stabilities. The trajectory analyses were carried out to understand i) the effect on the structural stability and ii) plausible functional implications of the insertion.
Studies on the unfolding dynamics of the WT and MT revealed that, although the two shared an identical start-ing structure -except for the site of insertion, the MT exhibits faster unfolding and is less stable (Figs. 3A-B). It was also observed that the secondary structures are gradually lost at the site of insertion during the course of simulation in the MT (Fig. 4). The observations indicate that the insertion seems to have a profound effect on reducing the intrinsic stability of human P450c21.
In the absence of experimental data, the interactions and stability of the regions of the protein predicted to be harboring functional importance were selected for trajectory analyses. Based on docking and homology studies of cytochrome P450s, residues S108 and D287 have been implicated in both heme and substrate-binding [29]. Docking and structural studies have also helped in identification of hydrophobic patches in human P450c21, which are suggested to be involved in ER membrane binding [29]. The results of the present simulation studies revealed that the Cα distance between the residues S108 and D287 increase drastically in the MT while it remains the same in the WT (Fig. 3C). These residues are placed far away from the site of insertion and yet their interactions seem to be affected in the mutant. This indicates that the insertion of triple valine seems to have caused not only a local structural perturbation but also a drastic disturbance in the structural stability of protein. Similarly, in the case of the hydropho- The change in Cα distances between the residues S108 and D287, both of which are postulated to be involved in heme and substrate binding, as a function of time Figure 6 The change in Cα distances between the residues S108 and D287, both of which are postulated to be involved in heme and substrate binding, as a function of time. (A). Change in the CαRMSD plot of the hydrophobic patches with respect to the starting structure for WT and MT during the course of the simulation (B). WT and MT plots are rendered in black and red respectively. The figure was prepared using XMGRACE (38).
bic regions implicated in ER membrane binding, the RMSD observed during the course of simulation suggested that this region is highly unstable in the MT as compared to the WT (Fig. 3D).
The sequence logo reveals that the insertion occurs in a region flanked by the presence of many conserved residues (Fig. 2). This observation, along with the results of the trajectory analyses, strongly suggests that the mutation could lead to a loss of the structure and function of human P450c21.

Conclusion
Analysis of CYP21A2 gene in an Indian child with classical CAH, revealed a novel 9 base pair TGTGGTGGT insertion at nucleotide position 306 in exon 2. This insertion resulted in a triple valine insertion between V70 and L71 of P450c21. MD simulations revealed that this insertion seems to cause an overall destabilization of the structure. Although the insertion does not occur in the immediate vicinity of the postulated heme and substrate binding residues; trajectory analyses reveal that their interactions seem to get disrupted in the MT. These observations indicate that the insertion could result in SW phenotype had it been present in homozygous state. We emphasise this mutation should be added to the panel of mutations to be screened in Indian population.