Theory

Directed Evolution – a (very) short overview

After decades of continuous methodological advancement and increasing understanding of biomolecule structure and functionality directed evolution nevertheless is still the most powerful method in protein or aptamer engineering. Classic in vitro strategies, however, require substantial effort in terms of lab work and time investment to perform several consecutive rounds of evolution^1,2. In order to automate the laborious process scientists have tried to devise systems that traverse the four steps of Darwinian evolution (mutation, expression, selection, replication) in a continuous cycle in vivo (reviewed in ³). In the earliest approaches this was achieved by simply cloning the gene of interest into mutator E. coli strains⁴ showing reduced DNA replication fidelity or into E. coli strains carrying inducible mutator plasmids⁵. While these settings proved to be useful for the generation of complex multifactorial phenotypes like organic solvent tolerance⁵ they cannot provide the regional selectivity that is desired in single-gene protein evolution as globally enhanced mutagenesis leads to slow growth and reduced transformation efficiency⁶ in addition to obscured phenotypic expression due to unwanted off-target mutations^7,8.

Therefore, the ideal system for in vivo directed evolution would avoid those side effects by subjecting the host organism to locally confined hypermutation. Such an arrangement would allow the researcher to rapidly mutate and evolve a defined single sequence or gene while leaving the rest of the genome unchanged. Several strategies to locally constrain enhanced mutation rates have been devised so far; including plasmids harboring regions with low replication fidelity⁹, elaborate phage-assisted systems confining the accumulation of mutations to the phage genome while keeping the overall mutational load in the cell population in a steady state¹⁰ or retrotransposon-based methods¹¹. Although definitely representing a large step towards the right direction, still none of the designs mentioned allow the continuous mutation of a single copy of a single gene in vivo. With D.I.V.E.R.T. we want to build a system that does.

The D.I.V.E.R.T. concept

In their work published in 2016 Crook et al. probably generated the very first retrotransposon-based system for in vivo directed evolution by inserting a gene of interest into a truncated version of the native Ty1 retrotransposon in S. cerevisiae (Figure 1)¹¹. Thus, the GOI is subjected to the retrotransposon life cycle and continuously mutated due to the error-prone nature of Ty1 reverse transcriptase (low fidelity is inherent to most reverse transcriptases¹²).

Figure 1: Scheme of the design used by Crook et al. in ¹¹.

Although having delivered impressive results such as a mutation rate of 0.15 kb^-1 the concept still holds room for improvement as argued by Zheng et al. in their review of targeted mutagenesis¹³. For example, Ty1 as a mobile genetic element can reintegrate anywhere in the genome, preferred upstream of genes transcribed by RNA polymerase III¹⁴ possibly reducing selection efficiency due to the presence of multiple copies of the heterologous gene in a single cell. Additionally, in this setting the reverse transcriptase likewise is continuously mutated increasing the risk of inactivation as time carries on.

Drawing inspiration from this retroelement-based approach and having its potential weaknesses in mind we started thinking about the benefits of a more universal hypermutation strategy relying on reverse transcription of the gene of interest and reintegration of the generated cDNA such as:

Site-specific reintegration would make sure to replace the original gene variant maintaining single-copy status for optimal selection behavior.
Expression of the required reverse transcriptase in trans rather than within the synthetic retroelement eliminates the risk of inactivating the enzyme by mutation.

In such a system only the GOI would be mutated and remain being present in just a single copy. The overall scheme of our D.I.V.E.R.T. (directed in vivo evolution via reverse transcription) concept is depicted in Figure 2.

Figure 2: General scheme of the D.I.V.E.R.T. cycle

Theoretical considerations

To sum up: we wanted to build a fully synthetic retrotransposon-like genetic element that would allow our gene of interest to continuously undergo the retrotransposon life cycle accumulating mutations over time. The main events that need to be functionally implemented for such a system to work would be the in vivo reverse transcription carried out by a heterologous reverse transcriptase as well as site-specific recombination. For both processes several options are available. The reasoning that goes into our choices as well as some other thoughts are briefly explained below.

Host range

Not being reliant on host-specific factors (like the Ty1 retrotransposon in yeast) but only on heterologous components (RT and recombinase) D.I.V.E.R.T. – at least in general – should be applicable in a broad range of organisms as we wanted to show by performing our proof of concept experiment in yeast as well as in E. coli. Some minor adjustments in regard to the host, however, need to be made. For example, the relevant proteins have to be tagged with a NLS in eukaryotic systems and ribosome vs. reverse transcriptase interactions might play a role in prokaryotes. More details can be found in the D.I.V.E.R.T. experiment section. (LINK)

RT choice and priming conditions:

In the early phase of project planning we spent quite some time on gathering relevant information about a variety of reverse transcriptases including processivity, mutation rate, mutational spectra and RNase H activity. Finally, we chose to use Moloney murine leukemia virus (MMLV) RT mainly due to it being the best characterized monomeric¹⁵ reverse transcriptase while the active forms of many other RTs (e.g. HIV¹⁶ and ASLV¹⁷ RTs) are heterodimers. Picking a monomeric enzyme allowed us to save some IDT DNA synthesis credits; a wise decision, since we managed to use up all the credits for gBlocks, primers and DNA oligos during lab work in the summer months. As an additional requirement MMLV RT also shows RNase H activity which might help to synthesize double-stranded cDNA by degrading the RNA template after the first cDNA strand has been generated and we knew that active enzyme could be expressed in E. coli¹⁸.

In its natural context MMLV RT uses a mouse tRNA^Pro for priming^19,20. It has been shown, though, that MMLV RT shows not very stringent preferences and that MMLV can replicate using different tRNAs as long as the PBS is complementary to the 3’ end of the tRNA²¹. From in vitro studies (and cDNA synthesis kits for RT-PCRs) we furthermore know that MMLV RT can also initiate replication using DNA or RNA oligonucleotides with the general sequence of efficiency being DNA-oligo, tRNA, RNA-oligo²². Unfortunately, finding data on in vivo priming conditions for any heterologous reverse transcriptase in E. coli is difficult as apparently only one study has performed reverse transcription in E. coli so far²³. In this work that focused on the generation of ssDNA for the creation of DNA nanostructures in vivo Elbaz et al. designed the 3’ end of their mRNA in a way so that it formed a distinct structure that on one hand acted as an transcription terminator while on the other hand it promoted efficient priming of reverse transcription by dimeric HIV RT.

So, after an extensive literature research, those were the priming conditions we found to be worthwhile to consider:

RNA oligos transcribed in trans featuring a defined 3’ end
DNA oligos generated using a native retroelement (e.g. retrons in coli; used in a similar fashion in ²⁴)
The tRNA corresponding to the heterologous RT (in case of MMLV tRNA^Pro from mouse; transcribed in trans with a defined 3’ end)
A tRNA of the host organism (is already present in the cell, no need for worrying about primer production)
Self-priming using a simple hairpin or a more sophisticated structure at the mRNA 3’ end like in ²³.

For ease of implementation and broad host generality we opted for using RNA oligos in our “lucky shot” D.I.V.E.R.T. experiment (LINK). Nevertheless, being able to prime with a native tRNA would be most convenient. Hence, in our priming condition assay (LINK) we tested 5 of the most abundant tRNAs in E. coli for initiating reverse transcription with MMLV RT.

Reintegration of the generated cDNA

Once the mRNA has been reverse transcribed and potentially mutated it has to replace the original copy of the gene of interest. Again, different methods are available:

Site-specific recombinase-mediated recombination (analogous to recombinase-mediated cassette exchange)^25,26
Homologous recombination

For our D.I.V.E.R.T. experiment (LINK) we decided to employ the Flp/FRT site-specific recombination system as described in ²⁵. However, extended FRT sites needed for efficient recombination are palindromic and form stable hairpins possibly acting as transcription (or reverse transcription) terminators. To evaluate this potential problem we determined the bidirectional termination efficiency of extended FRT sites using the classic terminator strength assay. (LINK)

Does it work? Find a rigorous proof

In the early phase of development the mutation rates achieved might be very low as the full D.I.V.E.R.T. cycle will rarely be completed. Hence, using mutation frequency as an indicator or test on whether the system works might be problematic. Crook et al.¹¹ solved this problem by using a very elegant and rigorous proof of concept assay (Figure 1, b) that was inspired by a test on recombination frequency of the Ty1 element²⁷. Basically, they chose a selectable marker as cargo (i.e. GOI) that was inserted into the retrotransposon in reverse orientation. The GOI was disrupted by an intron which again was reversely oriented with respect to the cargo (i.e. oriented in the retrotransposon’s sense direction). Hence, upon transcription of the cargo the mRNA would contain the reverse complement of the intron which would not be spliced leading to nonfunctional protein. Upon transcription of the retroelement, however, the intron would end up in the right orientation and going to be spliced. After reverse transcription and reintegration the intron would not be present in the new version of the retroelement leading to functional protein and selectable cells.

For our D.I.V.E.R.T. experiments we used a similar concept, details can be found in the experiments section. (LINK)

Team:BOKU-Vienna/Description