Number of mutations and mutated sequences

Directed evolution experiments are basically a search for a set of mutations. Consequently sequencing a few plaques from a PACE or PREDCEL experiment is regularly performed to monitor the current state of the experiments and to make sure mutagenesis plasmids work. To minimize the required time, consumed materials and costs, it is helpful to estimate the number of clones that are sequenced so that with a given probability at least one clone contains a mutation. To calculate this probability or the number of clones needed to reach it, we used a basic mutation model. The mutation rate is assumed to be the same for each sequence and each position of a sequence. Mutations that revert previous mutations are ignored since we ususally expected only a few mutations in hundreds of basepairs. Expected number of mutations in a single sequence $p_{m}$: $$p_{m} = \frac{N_{M} }{L_{S} } = N_{g} \cdot r_{M} = \frac{t \cdot \Phi_{L} }{2} \cdot r_{M}$$ $N_{M}$ is the number of mutations, while $L_{S}$ is the length of the sequence that can mutate free of seleciton and is sequenced. $N_{g}$ is the number of generations that happened before the sequencing. According to Esvelt et al.RN66 the number of generations translates into half the flow rate $\Phi_{L}$ in volumes per hour. The basic assumption is a steady state of phage replication and dilution which is reasonable for PACE. For PREDCEL experiments arbitrary numbers of generations can be set. The expected share of sequences that shows at least one mutation in $L_{S}$ basepairs, is the probability that $L_{S}$ basepairs stay unchanged when $p_{m}$ mutations per sequence length are expected: $$p_{M} = \frac{N_{M} }{N_{S} } = 1 - p_{(N_{M}=0)} = 1 - (1-p_{m})^{L_{S} } $$ With this equation it is possible to calculate the number of sequences $N_{S}$ that have to be sequenced in order to find a mutated one with a probability of $p_{(N_{M} > 0)}$. The number of sequences $N_{S}$ that need to be sequenced is the relation of the probability to find at least one mutated sequence $P_{(N_{M}>0)}$ to the probability of a single sequence to be mutated $P_{M}$. $$ N_{S} = \frac{p_{M} }{p_{(N_{M} > 0)} } $$ The probability to find at least one mutated sequence under the given conditions can be calculated using the complementary probability. It is the probability to find exactly zero mutated sequences. $$p_{(N_{M}>0)} = 1 - (1-p_{M})^{N_{S} }$$ Using this equation leads to $$N_{S} = \frac{p_{M} }{1 - (1-p_{M})^{N_{S} } } $$ $$ \Leftrightarrow N_{S} = \frac{ln(1-p_{(N_{M}>0)})}{ln(1-p_{M})}$$ $$ \Leftrightarrow N_{S} = \frac{ln(1-p_{(N_{M}>0)})}{ln\Big((1 - p_{m})^{L_{S} } \Big)}$$ Set $\Phi$ to zero to use the number of generations for the calculation. If $\Phi_{L}$ and the number of generations are given, $\Phi_{L}$ is used. Consider $L_{Sequence}$ as the number of basepairs that is expected to be mutated. If half of the sequence you are interested in, is highly conserved choose a lower $L_{Sequence}$.

Table 1: Additional Variables and Parameters used for the calculation of the number of mutated sequences List of all additional paramters and variables used in the numeric solution of this model. When possible values are given.

Symbol	Value and Unit	Explanation
$t $	[h]	Total time in lagoon
$p_{m} $	[bp/bp]	Expected number of mutations per sequence
$p_{M} $	[bp/sequences]	Expected number of mutations in all sequences
$N_{M} $	[bp]	Number of mutated basepairs
$L_{S} $	[bp]	Length of sequence that is considered
$N_{g} $	[generations]	Number of generations
$r_{M} $	$[\frac{1}{bp \cdot generation}]$
$\Phi_{L} $	[Vol/h]
$N_{S} $	[sequences]	Number of sequences
$p_{(N_{M} > 0)} $		Probability to find at least one mutated sequence in a pool of sequences
$p_{(N_{M} = 0)} $		Probability to find no mutated sequences in a pool of sequences

Team:Heidelberg/Model/Lagoon Contamination

Number of mutations and mutated sequences

References

Quote

Useful Links

Follow us on

Contact us

Symbol	Value and Unit	Explanation
\(t \)	[h]	Total time in lagoon
\(p_{m} \)	[bp/bp]	Expected number of mutations per sequence
\(p_{M} \)	[bp/sequences]	Expected number of mutations in all sequences
\(N_{M} \)	[bp]	Number of mutated basepairs
\(L_{S} \)	[bp]	Length of sequence that is considered
\(N_{g} \)	[generations]	Number of generations
\(r_{M} \)	\([\frac{1}{bp \cdot generation}]\)
\(\Phi_{L} \)	[Vol/h]
\(N_{S} \)	[sequences]	Number of sequences
\(p_{(N_{M} > 0)} \)		Probability to find at least one mutated sequence in a pool of sequences
\(p_{(N_{M} = 0)} \)		Probability to find no mutated sequences in a pool of sequences