Modeling: Phylogenetic Trees
Phylogenetic analysis requires little pre-existing literature in order to construct a predictive evolutionary model, because a phylogeny’s strength of prediction is based on analysis of biological sequences, data that is now readily available. This method of modeling is favorable for novel bacteriocin research because of the limited literature available for these bacteriocins.
The hybrid bacteriocin lacticin Q-lacticin Z was successfully created by University of Southern Denmark’s 2016 iGEM team [1]. Because these class II lacticin bacteriocins have been hybridized before, we hypothesize that the lacticin bacteriocin, lacticin Z, will have a greater affinity for hybridization with another bacteriocin that is evolutionarily similar to itself. Through the reconstruction of a phylogenetic tree and through review of past literature, we determined two bacteriocins that are suitable for hybridization: aureocin A53 and epidermicin NI01.
Summary of Results
We created a phylogenetic tree with relative divergence times using 93 chosen bacteriocin protein sequences, with the majority of the sequences belonging to class II bacteriocins—the same class as lacticin Z. The bacteriocins were then grouped into clades on the basis of similar divergence times (≥ 0.17) to the outgroup, a class of bacteriocins known as colicins. Out of the 17 clades produced, for each clade, we then reviewed past literature in order to see if there was a correlation between divergence time and conservation of bacteriocin function, especially for clades close to lacticin Z, a statement that would support our hypothesis of hybridization.
We found that 8 out of the 17 bacteriocin clades (47%) had some form of conserved function and mode of action, including the clade that contained lacticin Z. These eight clades were groups from either class I, class IIa, or class IId. Supporting the correlation between bacteriocin functionality and sequence homology, especially in the classes mentioned, was the fact that all but one of these 8 clades had a divergence time less than or equal to 0.04. Furthermore, nearly all class I bacteriocins analyzed belonged to one clade (dark-blue region), with a significantly small divergence time to each other (0.01). The largest divergence time relative to the outgroup (9.79) belonged to the enterocin group (yellow region).
Using this method of clustering-and-research, lacticin Z was determined to be most suitable for hybridization with three other bacteriocins: lacticin Q, epidermicin NI01, and aureocin from the TE8 strain of Staphylococcus. These results provide an evolutionary basis for SDU Denmark’s 2016 iGEM team’s successful hybridization of lacticin Z with lacticin Q, which was the inspiration for our project. Out of the other two bacteriocins, with both having divergence times of 0.04 to lacticin Z, epidermicin NI01 was chosen over aureocin TE8 because of the difficulties of obtaining the TE8 strain.
However, because lacticin Z was also evolutionarily close to a surprisingly large number of other aureocins from different bacterial taxa, and because of aureocin A53’s highly conserved functionality between species, we then decided to choose a second bacteriocin and make a second hybrid. Aureocin A53 from Staphylococcus aureus, which had a relative divergence time of 1.25 to lacticin Z, was chosen as the second bacteriocin.
Summary of Methods
The construction of a phylogenetic tree can be divided into three steps: sequence selection, multiples sequence alignment, and phylogenetic reconstruction.
Sequence selection involves picking the appropriate bacteriocin sequences to analyze. All 93 bacteriocin sequences were selected under the basis of the availability of literature, representability of a clade, and evolutionary similarities with lacticin Z.
Multiple sequence alignment describes the process of aligning the pool of sequences to identify regions of conserved function and structure. Phylogenetic reconstruction is the process of creating the phylogenetic tree using the model of best fit. Throughout the process, through research and trial-and-error, we chose the algorithm and method (e.g. PAM matrix, maximum likelihood method, JTT model) that best represents our knowledge of bacteriocin hybridization.
Results: Class Composition
Bacteriocins can be divided on the basis of two groups: those produced by gram-negative bacteria (colicins and microcins), and those produced by gram-positive (divided into five classes). Our phylogenetic tree of 93 bacteriocin sequences was composed of 14 class I bacteriocins, 64 class II bacteriocins, 4 class III bacteriocins, 6 class V bacteriocins, and 5 colicin bacteriocins that served as an outgroup. Each color on the tree below represents a clade grouped on the basis of relative divergence time (≤ 0.17).
Class II bacteriocins, which are small, unmodified membrane-active peptides, can be further divided into four subclasses. There was a correlation between subclass and homology for class IIa and class IId bacteriocins, but not for class IIb and class IIc. For example, all class IIa bacteriocins related to the pediocin family, such as enterocins, had very close divergence times (yellow to orange region). The protein structural analysis supports these results, as all these bacteriocins contain the conserved N-terminal sequence KYYGNGVXCXXXXCXV(D/N)WGXA, with the sequence between the two cysteines consisting of one or two charged residues, and a serine or threonine residue [2]. Class IId bacteriocins, which are leaderless peptides that are synthesized and secreted without a further processing [3], were especially important because this subclass includes our starting bacteriocin, lacticin Z. Fortunately, like class IIa, most of these bacteriocins shared homology (cyan to light-green region).
There was also little correlation between evolutionary history and bacteriocin class for classes III and V (no common colored region). However, because bacteriocins are grouped into classes based on their internal cross linking and primary structures [4], this was likely due to the small number of bacteriocin sequences we analyzed, all of which happened to be diverse by chance. In contrast, most class I bacteriocins, which are small post-translationally modified peptides, showed a close evolutionary history (dark-blue region).
Results: Divergence and Clustering
The figure below represents the same tree, but with each color/clade grouped under one arbitrary name. The parenthesis denote the number of sequences in each group. Relative divergence times are shown on the branches, with the largest divergence time being from the colicin outgroup to the enterocin group (9.79).
Every group of bacteriocins that had a divergence time of 0.17 or less between one another was “clustered” into a clade/group. There were 17 total clades, with four clades between 0.08 and 0.17 divergence times, and the other thirteen having values of 0.03 or less. By reviewing past literature for bacteriocins within each clade, each clade was then determined to be either functionally similar (green), not functionally similar (red), or undetermined due to knowledge or time constraints (grey). The circles on the right hand side of the figure above correspond to this result, as well as the pie chart in the next section. The blue circles are the clades that will make up the very last figure on the bottom of this page.
Results: Analysis of Clades
Nearly half of all 17 clades show some similarity in function (8 clades, 47%), and seven out of eight of these clades had a divergence time less than or equal to 0.04. Although these clades are arbitrarily defined, this suggests a correlation between sequence homology and protein function for some bacteriocins, and especially those within class I, class IIa, and class IId.
In order to find the bacteriocin that best suites lacticin Z for hybridization, we used a process of elimination method. For obvious reasons, clades determined to have no similarities in function (6 clades, 35%) and clades with not enough information (3 clades, 18%) was ruled out. Out of the remaining 8 “green” clades that are similar in function, clade 1 and clade 4 consist of mostly enterocins and pediocins from class IId (note: the numbers refer to the numbers inside the green circle in the above tree). These bacteriocins were ruled out because of their large divergence time and difference in protein sequence from lacticin Z, which is in clade 15.
Clade 10 consists of class I bacteriocins mainly from the nisin, nso, and nsu families. This clade was ruled out because, as mentioned previously, class I bacteriocins undergo post-translational modification, which would complicate hybridization due to the cleavage of peptide bonds.
The remaining clades (12, 13, 14, and 15) are all from class IId bacteriocins, and through review of literature, was determined to have a similar mode of action that lyse target cells via the formation of pores in the membrane [5][6]. This is possible because these bacteriocins are high in amphiphilic amino acids—this structural conservation also explains their close evolutionary relationship—that allows the peptides to dock on the membrane. In the below tree, the four clades have been expanded back to individual bacteriocins; the last clade, clade 17, was the colicin outgroup and was not accounted for.
In the above figure, the names on the right are the arbitrary names for the four clades, and relative divergence times are shown on the branches. Lacticin Z, our starting bacteriocin, is highlighted in green. Notably, the majority of these bacteriocins are derivatives of aureocin within different genus and species. The bacteriocins highlighted in grey are aureocins within the genus Bacillus. Since there were more other aureocins more closely related to lacticin Z, the Bacillus genus was ruled out. As mentioned earlier, because lacticin Q has already been hybridized with lacticin Z, it was also ruled out. Epidermicin NI01 (highlighted yellow) is one of the most evolutionary similar bacteriocins to lacticin Z, and as a result was chosen for hybridization (divergence time 0.04). Out of the remaining three aureocin A53 bacteriocins, although aureocin A53 from Staphylococcus aureus had a higher divergence time at 1.25, it was difficult to obtain the other two strains of aureocin, and so the aureocin A53 from Staphylococcus aureus(highlighted yellow) was chosen as the second bacteriocin for hybridization with lacticin Z.
Methods: Sequence Selection
We chose to analyze mainly class II bacteriocins as potential hybrids, because all bacteriocins in this class are largely unmodified [7]. Because these bacteriocins do not undergo significant post-translational modification, we hypothesize that hybridization will be easier to achieve with two class II bacteriocins, rather than the hybridization of two different classes. Furthermore, class II bacteriocins are all small, membrane-active peptides, and are grouped into subclasses based on common structural features and shared homologies [8]—a feature that makes phylogenetic analysis useful. Shorter amino acid sequences also mean that similarities in structure between these bacteriocins can be more easily identified.
Bacteriocin sequences were chosen using BLAST, a common bioinformatics multiple sequence alignment algorithm used to identify sequences that are similar to one or more input sequences—in this case, the protein sequence of lacticin Z—using NCBI’s protein-protein, pBLAST algorithm. The substitution matrix PAM was chosen over BLOSUM, since PAM matrices score alignments between more closely related protein sequences. The lowest PAM matrix number available, PAM 30, was used for the same reason.
For a more complete phylogenetic tree, other class II bacteriocins that are representative of the class and/or well-studied bacteriocins were included, such as circularin, enterocin, and leucocin, as well as some bacteriocins from class I, III, and V [9]. Colicins, a class of large bacteriocins produced by gram-negative bacteria [8], was chosen as the outgroup in order to root the tree.
Methods: Multiple Sequence Alignment
These chosen sequences were aligned with ClustalW’s multiple sequence alignment algorithm using the PAM substitution matrix. We chose to analyze protein sequences over nucleotide sequences because protein sequence alignments often predict homologous relationships more accurately [10]. It is easier to achieve statistically significant alignments with protein sequences because a larger alphabet of characters—20 amino acids compared to 4 nucleotides—means less likelihood of obtaining a match by chance. Furthermore, because of the redundancy in genetic code, synonymous mutations in codons are not accounted for in protein alignment. In nucleotide alignments, synonymous mutations have the possibility of scoring the same as non-synonymous mutations, even though synonymous mutations are not under selective pressure.
Methods: Phylogenetic Reconstruction
After aligning sequences, we chose to reconstruct a phylogenetic tree using the maximum likelihood method. Phylogenetic tree construction methods can be categorized into two main groups: distance based and character based [11]. We did not use a distance based approach, such as UPGMA and other neighbor joining methods, because these algorithms cluster sequences solely based on genetic distance, and not based on divergence times.
Out of the character based approaches, we chose the maximum likelihood method because it is one of the most robust methods, and is favorable for phylogeny reconstruction of sequences where we have limited prior knowledge of their evolutionarily relationships, such as novel bacteriocins [12]. Because our bacteriocins’ divergence times were relatively unknown to us, we also ruled out maximum parsimony and minimum evolution methods, because these methods tend to be unreliable given deeper divergence times.
Phylogenetic reconstruction using the maximum likelihood method was carried out using MEGA. The Jones Taylor Thronton (JTT) substitution model was used because MEGA’s jModelTest determined this model to be the best fit for our pool of sequences. The divergence times for each taxon, relative to the outgroup, was then computed using the RelTime method in MEGA [13].
References
[1] University of Southern Denmark iGEM 2016 Team. Verification of bacteriocins as antimicrobials. 2016.igem.org/Team:SDU-Denmark
[2] Fimland, G., Johnsen L., Axelsson L., Brurberg M. B., Nes I. F., Eijsink V. G., and Nissen- Meyer J. (2000). A C-terminal disulfide bridge in pediocin-like bacteriocins renders bacteriocin activity less temperature dependent and is a major determinant of the antimicrobial spectrum. J Bacteriol. doi:182:2643- 2648.
[3] Cintas, L. M., Casaus P., Herranz C., Håvarstein L. S., Holo H., Hernandez P. E., and Nes I. F. (2000). Biochemical and genetic evidence that Enterococcus faecium L50 produces enterocins L50A and L50B, the sec-dependent enterocin P, and a novel bacteriocin secreted without an N-terminal extension termed enterocin Q. J.Bacteriol. doi:182:6806-6814.
[4] Kemperman, R., Kuipers A., Karsens H., Nauta A., Kuipers O., and Kok J. (2003). Identification and characterization of two novel clostridial bacteriocins, circularin A and closticin 574. Applied Environmental Microbiology. doi:69:1589-1597.
[5] Upton M., Sandiford S. Identification, Characterization, and Recombinant Expression of Epidermicin NI01, a Novel Unmodified Bacteriocin Produced by Staphylococcus epidermidis That Displays Potent Activity against Staphylococci. American Society for Microbiology.
[6] Jacqueline D, Netz A, Bastos F, Sahl H. G. (2002). Mode of Action of the Antimicrobial Peptide Aureocin A53 from Staphylococcus aureus. Applied and Environmental Microbiology, 5274-5280. Doi: 10.1128/AEM.68.111.5274-5280.2002
[7] Cotter P. D., Ross R. P., & Hill C. (2013). Bacteriocins — a viable alternative to antibiotics? Nature Reviews Microbiology, 11, 95-105. doi:10.1038/nrmicro2937
[8] Kemperman R. A. (2005). Functional analysis of circular and linear bacteriocins of Gram-positive bacteria. University of Groningen.
[9] Kjos M., et al. (2011). Target recognition, resistance, immunity and genome mining of class II bacteriocins from Gram-positive bacteria. Microbiology, 157, 3256-3267. doi:10.1099/mic.0.052571-0
[10] Koonin E. V. & Galperin M. Y. (2003). Principles and Methods of Sequence Analysis. Chapter 4. Boston, MA: Kluwer Academic.
[11] Lio P., Goldman N. (1998). Models of Molecular Evolution and Phylogeny. Genome Research, 8, 1223-1244.
[12] Rizzo J., Rouchka, E. C., (2007). Review of Phylogenetic Tree Construction. University of Louisville.
[13] Tamura K., Battistuzzi F. U., Billing-Ross P, Murillo O, Filipski A, and Kumar S. (2012). Estimating Divergence Times in Large Molecular Phylogenies. Proceedings of the National Academy of Sciences. doi:109/19333-19338.
- © 2017 Stony Brook iGEM
- Design: HTML5 UP