- Open Access
Identification of group specific motifs in Beta-lactamase family of proteins
Journal of Biomedical Sciencevolume 16, Article number: 109 (2009)
Beta-lactamases are one of the most serious threats to public health. In order to combat this threat we need to study the molecular and functional diversity of these enzymes and identify signatures specific to these enzymes. These signatures will enable us to develop inhibitors and diagnostic probes specific to lactamases. The existing classification of beta-lactamases was developed nearly 30 years ago when few lactamases were available. DLact database contain more than 2000 beta-lactamase, which can be used to study the molecular diversity and to identify signatures specific to this family.
A set of 2020 beta-lactamase proteins available in the DLact database http://22.214.171.124/DLact were classified using graph-based clustering of Best Bi-Directional Hits. Non-redundant (> 90 percent identical) protein sequences from each group were aligned using T-Coffee and annotated using information available in literature. Motifs specific to each group were predicted using PRATT program.
The graph-based classification of beta-lactamase proteins resulted in the formation of six groups (Four major groups containing 191, 726, 774 and 73 proteins while two minor groups containing 50 and 8 proteins). Based on the information available in literature, we found that each of the four major groups correspond to the four classes proposed by Ambler. The two minor groups were novel and do not contain molecular signatures of beta-lactamase proteins reported in literature. The group-specific motifs showed high sensitivity (> 70%) and very high specificity (> 90%). The motifs from three groups (corresponding to class A, C and D) had a high level of conservation at DNA as well as protein level whereas the motifs from the fourth group (corresponding to class B) showed conservation at only protein level.
The graph-based classification of beta-lactamase proteins corresponds with the classification proposed by Ambler, thus there is no need for formulating a new classification. However, further characterization of two small groups may require updating the existing classification scheme. Better sensitivity and specificity of group-specific motifs identified in this study, as compared to PROSITE motifs, and their proximity to the active site indicates that these motifs represents group-specific signature of beta-lactamases and can be further developed into diagnostics and therapeutics.
Beta lactamases are enzyme responsible for resistance to penicillin, cephalosporin and related beta lactam compounds. The enzymes hydrolyze the beta-lactam ring of these antibiotics and thus inactivate these drugs . Almost as soon as a new beta-lactam antibiotic is introduced into the clinical usage, some previously unrecognized beta-lactamase with the capability of destroying this activity is identified , thus making beta-lactamases a serious threat to public health. In order to combat this threat we need to study the molecular and functional diversity of these enzymes and identify signatures specific to these enzymes. These signatures will enable us to develop inhibitors and diagnostic probes for the beta lactamase enzymes.
Beta lactamases show extensive molecular and functional diversity. Based on the characteristics of the enzymes and their substrate profile, a number of classification schemes have been proposed [3, 4]. Among these, a functional classification scheme proposed by Ambler  is most widely accepted and used. In this scheme beta-lactamases have been divided into four classes i.e. A, B, C and D based upon their amino acid sequences . Ambler originally specified two classes i.e. class A, the active site serine beta lactamases and class B the metallo-beta lactamases that require a bivalent metal ion, usually Zn2+ for their activity. Later class C and class D were added to this classification. Enzymes from class A, C and D contain serine-based active site. Proteins from class A, C and D show sufficient structural similarity indicating that these may have descended from a common ancestor . Class B consists of metallo beta lactamases and is perhaps the most heterogeneous class among all the classes of beta-lactamases. It has been further divided into a number of sub-classes . In recent years, many new lactamases belonging to class B have been identified and sequenced. Their clinical importance is highlighted by the fact that these can hydrolyze carbapenems compounds which most often escape the activity of serine beta lactamase. The class B lactamases have been divided into three sub-classes B1, B2 and B3 .
Each class contains specific signature or motifs . For example sequence belonging to class A contain three conserved elements i.e. S-X-X-K, S-D-N and K-T-G at positions 70, 130 and 234 respectively. Sequence belonging to class C contains S-X-S-K, Y-S-N and K-T-G at position 64, 150 and 314 respectively. Class D lactamase contains S-X-X-K, Y-G-N and K-T-G at positions 70, 144 and 214 respectively. Sequences belonging to class B contain H-90, D-92, L-117, H-168, G-204 and H-236 as conserved residues located at the bottom of the active site. Among these H-80, H-90 and H-168 accommodate Zn2+ which is required for the activity of class B beta-lactamases .
However, the above mentioned classification and identified class-specific motifs are useful; these have been identified using a limited set of sequences. Recently we have developed a database of beta-lactamase genes, identified from sequenced bacterial genomes and plasmids . This database contains 2020 beta-lactamase genes from 457 bacterial strains and offered us an opportunity to study diversity of lactamase genes and to identify molecular signatures of lactamase family. The classification approach used in this study is based on evolutionary relationship between beta-lactamase proteins and hence closer to natural classification. Group-specific signatures were also identified from sequences in each group.
Protein and DNA sequences of 2020 lactamase genes available in DLact database  were used in this study. The lactamase sequences in DLact database were identified using experimentally identified lactamase proteins. The chromosomal and Plasmid DNA sequences of 457 sequenced bacterial strains were scanned for genes homologous to these experimentally identified lactamase proteins using BLASTx . Each putative lactamase gene was annotated using Interproscan, rpsBLAST, Pfam and SMART. The database is available at http://126.96.36.199/DLact
Classification of genes
The protein sequences of lactamase genes were separated according to their source genome. Each lactamase protein from a genome was compared to all proteins from another genome using BLASTP. Pair of sequences that were best hits when either sequence was used as query was identified as BeTs (Bi-directional hits), and the sequences in pair were considered as functionally related. BeTs relationship is one of the bare operational definitions of orthology. If the best hit was in only one direction, then no relationship was assumed between the sequence pair. BeTs pairs were clustered using a procedure adopted for developing COG (Cluster of orthologous groups) database . In addition to above criterion, we imposed an e-value cutoff. Thus, a sequence pair was considered to be related by BeTs if in both direction, the e-value associated with each BLAST comparison was less than 0.001. Since in this approach only comparisons between proteins from two separate genomes were made, obvious paralogous were avoided. Furthermore, introduction of the e-value cutoff eliminated spurious matches. The output of this step was a graph with the lactamase genes as nodes and Best Bi-directional Hit relationship between a pair of genes as edges. This graph was used as an input to the cytoscape  which performed graph-based clustering of lactamase genes [Figure 1].
Identification of group-specific motifs
Redundancy in sequence from each group was removed using CD-HIT  with a cut-off of 0.9. Motifs in each of the non-redundant set were searched using PRATT  at various 'C%' values. The parameter 'C%' signifies minimum percentage of sequences that should match a motif. Increase in 'C%' increases specificity of identified patterns. Group-specific sensitivity and specificity was calculated for each pattern as:
Where TP(True Positive) is the number of sequences from a selected group containing a given motif, FP(False Positive) is the number sequences from other groups containing given motif and FN(False Negative) is the number of sequences from selected group that do not contain given motif.
The graph-based classification of beta-lactamase proteins resulted in the formation of six highly interconnected groups (Figure 1). The protein sequences within each group showed dense BeTs relationships, while few relationships between sequences across the groups were also observed. We arbitrarily labelled these groups as Group A-F. Among these, four major groups (Group A-D) contain 191, 726, 774 and 73 proteins while two minor groups (Group E and F respectively) contain 50 and 8 proteins.
Based on the information available in literature about the molecular characteristics of beta-lactamase proteins , we found that each of the four major groups corresponds to the four groups proposed by Ambler . Interestingly proteins from the group E and F does not contain beta lactamase domain or signatures specific to beta lactamases reported in literature. Proteins from both these groups showed BeTS relationship with proteins from Group B. Sequence from each the four major groups contain at least one conserved element, characteristic of beta lactamases . The two minor groups do not contain molecular signatures of beta-lactamase proteins reported in literature. However, the density of BeTS relationships indicates that these are potentially novel types of lactamases or highly divergent relatives of Group B lactamases. These proteins are uncharacterized in NCBI. Further analysis is needed to characterize sequences from these groups.
Redundant sequences from each group were removed using CD-HIT and the non-redundant sequences from each group were aligned using T-Coffee . The group-specific motifs identified by PRATT from each group had a high sensitivity (> 70%) and very high specificity (> 90%). The motifs from groups A, C and D (corresponding to class A, C and D of Ambler classification) showed a high level of conservation at DNA as well as protein levels (Figure 2). Sequence belongs to class E does not shows any sequence conservation. Whereas sequence belongs to class F contain G, P, N, F, D and E towards the N-terminal (Figure 2). The motifs identified from current study were compared to group-specific motifs present in the PROSITE  database. It was observed that PROSITE motifs were significantly less sensitive than motifs identified in present study. It is likely that the large number of sequences used in identification of motifs has resulted in higher sensitivity.
Mapping of the group-specific motifs on the crystal structures showed that motifs identified in this study mapped in close proximity to the active site (Figure 3).
Group A specific motif identified from this study was present on the S-D-N functional element responsible for catalytic activity of class A lactamase  whereas PROSITE motif was located on S-X-X-K element. Group A specific PROSITE and motifs identified from this study were present on the opposite boundaries of the active site but are adjacent to each other in three dimensional orientations.
Motifs specific to group C and D identified in this study overlapped with the PROSITE motifs specific to corresponding classes (Table 1). Motifs specific to group C and D were found on S-X-S-K and S-T-F-K regions respectively (Figure 3). In class C, electrostatic interaction between Lys (K) and Ser (S) seems to play role in orientating the active site. Electrostatic interaction between S and K creates a net positive potential in the catalytic site, where the carboxylic group of the beta-lactam is expected to bind. While in class D, active site residue i.e. Ser (S) lie in specific motif (S-T-F-K) identified from this study.
Motifs specific to group B (corresponding to class B in ambler classification) showed conservation at only protein level (Figure 2). Proteins belonging to group B are highly divergent and heterogeneous . It has been further divided into further subgroups. It is interesting to mention here that both the novel groups identified in this study (Group E and F) has BeTs links to group B. Lack of conservation at DNA level indicates that proteins from this group experience diverse evolutionary forces and may be case for convergent evolution. The location of motifs specific to group B overlapped with the PROSITE motifs specific to corresponding class (Table 1). Motifs specific to group B was found on H-X-H-X-D which is involved in specific interactions with zinc ion (Zn2+), required for activity of this class of beta-lactamase .
Proteins from group E lack lactamase domain. Motifs specific to group E showed conservation at protein level. Lack of conservation at DNA level and BeTs links to group B indicates that proteins from this group are likely to be highly divergent proteins belonging to group B. Length of proteins belonging to group E is similar to proteins from group B. Both groups E and B showed conserved histidine (H) and aspartic acid (D) at the sequence level.
Motifs from group F showed higher conservation at both protein and DNA level (Figure 2). The proteins from this group lacked lactamase domain. Motifs specific to group F contains conserved G, F, D and E [Table 2].
Motifs found in class E and F were very specific to beta-lactamase, but we were unable to find any lactamase domain because these classes are highly divergent. However, we suspect proteins belonging to these groups as beta-lactamase because group E and F proteins showed similarity to experimentally identified lactamase proteins which we used in initial screening. Proteins from both groups showed Bi-Directional Best Hit (BeTS) relationships with beta-lactamases from other characterized classes e.g. both group E and F has BeTS links with B. The BeTs are generally considered as criterion for defining orthology (functional equivalence).
The present work was carried out with two objectives (i) to evaluate the classification of lactamase genes on a large dataset and (ii) to identify specific motifs/regions which can be further developed to diagnostic primers and probes. The results show that the graph-based classification of beta-lactamase proteins from DLact database corresponds with the classification proposed by Ambler , thus there is no need for formulating a new classification. However, identification of two small groups indicates that an update of the existing classification scheme may be required. The identity of proteins from group E and F as lactamases has been defined on the basis of their initial homology with the experimentally characterized lactamases and then later their BeTs relationship with group B lactamases. However, proteins from both these groups do not contain lactamase domain. The proteins are labelled as uncharacterized in various genome databases and annotation sites. It is critical therefore that experiment should be conducted on these groups.
The group-specific motifs identified from this study showed better sensitivity and specificity in comparison to the motifs available in PROSITE. It is likely that the motifs identified using larger dataset have identified stronger consensus regions and suppressed weakly conserved regions. Proximity of these motifs to the class specific active sites indicates that these regions are either structurally of functionally important for the lactamase activity. The sequence logos of regions containing class-specific motifs show that these regions have lesser substitution rates as compared to the other regions in proteins. Thus it is likely that these regions can be used to develop diagnostic probes. We are studying the co-occurrence profiles of motifs in order to identify non-overlapping regions which can be developed into sensitive class-specific primers.
The study has achieved its objectives in terms of evaluating classification of beta-lactamases proposed by Ambler  on a large dataset and identifying group-specific motifs.
The graph-based classification of beta-lactamase proteins from DLact database corresponds with the four group classification proposed by Ambler, thus there is no need for formulating a new classification. However, further characterization of two small groups may require updating the existing classification scheme. Group-specific motifs identified from six groups in this study had high sensitivity (> 70%) and very high specificity (> 90%), as compared to PROSITE motifs, and their proximity to the active site indicates that these motifs represents characteristic group-specific signature of beta-lactamases and can be further developed into better diagnostics and therapeutics.
Matagne A, Lamotte-Brasseur J, Frere JM: Catalytic properties of class A beta-lactamases: efficiency and diversity. Biochem J. 1998, 330 (Pt 2): 581-598.
Bush K: Classification of beta-lactamases: groups 1, 2a, 2b, and 2b'. Antimicrob Agents Chemother. 1989, 33: 264-270.
Bush K, Jacoby GA, Medeiros AA: A functional classification scheme for beta-lactamases and its correlation with molecular structure. Antimicrob Agents Chemother. 1995, 39: 1211-1233.
Ambler RP, Coulson AF, Frere JM, Ghuysen JM, Joris B, Forsman M, Levesque RC, Tiraby G, Waley SG: A standard numbering scheme for the class A beta-lactamases. Biochem J. 1991, 276 (Pt 1): 269-270.
Ambler RP: The structure of beta-lactamases. Philos Trans R Soc Lond B Biol Sci. 1980, 289: 321-331. 10.1098/rstb.1980.0049.
Garau G, Di Guilmi AM, Hall BG: Structure-Based Phylogeny of the Metallo-beta-Lactamases. Antimicrob Agents Chemother. 2005, 49: 2778-84. 10.1128/AAC.49.7.2778-2784.2005.
Hall BG, Salipante SJ, Barlow M: The metallo-beta-lactamases fall into two distinct phylogenetic groups. J Mol Evol. 2003, 57: 249-254. 10.1007/s00239-003-2471-0.
Galleni M, Lamotte-Brasseur J, Rossolini GM, Spencer J, Dideberg O, Frere JM: Standard numbering scheme for class B beta-lactamases. Antimicrob Agents Chemother. 2001, 45: 660-663. 10.1128/AAC.45.3.660-663.2001.
Singh R, Suchir A, Singh H: DLact: An antimicrobial resistance gene database. Journal of Computational Intelligence in Bioinformatics. 2008, 1: 93-108.
Gish W, States DJ: Identification of protein coding regions by database similarity search. Nat Genet. 1993, 3: 266-272. 10.1038/ng0393-266.
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA: The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003, 4: 41-10.1186/1471-2105-4-41.
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Research. 2003, 13: 2498-2504. 10.1101/gr.1239303.
Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006, 22: 1658-1659. 10.1093/bioinformatics/btl158.
Jonassen I, Collins JF, Higgins DG: Finding flexible patterns in unaligned protein sequences. Protein Sci. 1995, 4: 1587-1595. 10.1002/pro.5560040817.
Notredame C, Higgins DG, Heringa J: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000, 302: 205-217. 10.1006/jmbi.2000.4042.
Sigrist CJ, Cerutti L, Hulo N, Gattiker A, Falquet L, Pagni M, Bairoch A, Bucher P: PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform. 2002, 3: 265-274. 10.1093/bib/3.3.265.
Hall BF, Barlow M: Revised ambler classification of beta lactamases. J Antimicrob Chemother. 2005, 55 (6): 1050-1051. 10.1093/jac/dki130.
Carfi A, Pares S, Duee E, Galleni M, Duez C, Frere JM, Dideberg O: The 3-D structure of zinc metallo-beta-lactamase from bacillus cereus revels a new type of protein fold. EMBO J. 1995, 14 (20): 4914-4921.
We would like to thank Indian Council of Medical Research for providing financial support for conducting this research. The work was carried out under the project "Biomedical Informatics Centres of ICMR".
The authors declare that they have no competing interests.
RS has made substantial contributions to the analysis and interpretation of data, and drafting the manuscript. AS has participated in the mapping of patterns on the structure. HS has conceptualization of idea and has provided directions for the work.