Case13a

# Model - Lattice Model

One of the goals of our project was providing our detection tool with a long shelf-life under simple storage conditions. For this reason, we investigated the use of Tardigrade-specific intrinsically Disordered Proteins (TDPs) to preserve Cas13a activity upon desiccation and storage at room temperature. These TDPs originate from the tardigrade, an extremely resilient micro-animal (Sloan et al. 2017), and mediate the tardigrade’s desiccation tolerance. The mechanism behind this has been hypothesized to be a vitrification process (Boothby et al. 2017). By forming a glass-like matrix upon desiccation, TDPs protect cellular structures and proteins within the cells from degradation. As hydrophilic, disordered proteins, they can form dynamic scaffolds upon desiccation, while they have mutual repulsive interactions.

This matrix formation implies that during dehydration, the TDPs rearrange themselves around, in our case, Cas13a, in a process that could also be compared to phase separation. To simulate the dynamics of this phase separation, we decided to utilize a lattice model, which is well established for modelling these kinds of problems. With our model, we were able to determine which of the four TDPs CAHS 94205, CAHS 106094, SAHS 33020 and SAHS 68234 was most suitable for preserving Cas13a activity after drying and storage at room temperature. Additionally, we studied the TDP behaviour at different concentrations and estimated in what concentrations TDPs have to be used for drying.

### Proteins in the lattice model

To keep our model simple, we wanted to take only next neighbour interactions on the lattice into account. Therefore, to be sure that no long-range interactions were neglected in the model, we calculated the Debye length, the length over which electrostatic interactions decline exponentially, with equation (debyelength)

$$\lambda_D = \sqrt{\frac{\epsilon_0 \cdot\epsilon_r \cdot k_B\cdot T}{2\cdot N_A\cdot e^2 \cdot I}}\tag{1}$$

where $\epsilon_0$ is the vacuum permittivity, $\epsilon_r$ is the relative permittivity, $k_B$ the Boltzmann constant, $N_A$ the avogadro constant and I the ionic strength of the solution (Russel and Saville 1989). With an electrolyte containing Tris-HCl (c = 27 mM) and NaCl (c = 24 mM), we found that the Debye length of the solution we were drying is 0.0024 nm. Since this length is smaller than the average size of an amino acid, we were free in selecting the grid size of the lattice model.

To effectively model protein-protein interactions, rather than interactions between individual amino acids, we decided to discretize our proteins into blocks of five amino acids per unit.

As it is not straightforward to manually derive a protein model using protein structure files, we wrote a program that converts the commonly used "protein database" file extension (.pdb) for Cas13a into a 'LEGO-block' model (in the form of a matrix) (Figure 1). The hydrophobicity and charge (isoelectric point) scales applied (Biro, 2006) can be easily adapted.

Figure 1: LEGO-block model of Cas13a to input into the lattice model. A) Model of Cas13a hydrophobicity distribution. For visualization purposes, a colour threshold was applied: red areas are hydrophobic while grey areas are not. B) Model for Cas13a charge (isoelectric point) distribution. For visualization purposes, a colour threshold was applied: blue patches are positively charged while gray patches are not charged or negatively charged.

We provide the Matlab script, enabling anyone with basic Matlab knowledge to import a structure, which can be converted to a coarse-grained 'LEGO-block' model with a resolution that can be set by the user. In addition, the code intuitively includes the desired hydrophobicity and charge scales to consider, which can be adapted as desired.

However, since TDPs are intrinsically disordered proteins, this method could not be applied to them. Thus, we simply settled on averaging the properties of five neighbouring amino acids in the sequence, instead of in space. Similar to protein folding, there are two main driving forces behind protein-protein interactions which are electrostatic and hydrophobic interactions (Vagedes et al. 2002). To be able to calculate the interaction energies between different protein units, the hydrophobic moment and isoelectric point (Biro 2006) of each amino acid were averaged for each unit during the discretization.

### Generating the simulation environment

Besides the discretizations of the proteins used in the lattice model, several other aspects were simplified. In the model, we wanted to observe the interaction of a single Cas13a protein with TDPs at a certain concentration. To realize this, we fixed Cas13a at a specific point in the simulation environment and employed periodic boundary conditions, emulating a larger volume occupied by Cas13a and TDPs.

To generate the simulation environment, the concentration of Cas13a was set to 0.5 g/L, the concentration of Cas13a isolated with our protein purification protocol. For the planned Cas13a drying assays, 4 µL Cas13a would be dried in a final volume of 100 µL with a given concentration of TDP. From a given TDP concentration and its molar mass, we could then calculate the number of TDPs that needed to be generated to obtain the right ratio of TDPs to Cas13a.

Subsequently, we calculated an average volume occupied by one amino acid of 128,50 Å^3 from the volumes of individual amino acid residues (Counterman and Clemmer 1999) and multiplied this by the number of amino acids in one unit cell to determine its volume. This volume was then used to calculate the lattice dimensions for the model. Since we were investigating a drying process, we opted to study phase separation behaviour at volume fractions of the initial reaction volume. With preliminary runs we determined that a volume fraction of 0.1 was still large enough to reasonably assume that no protein degradation had occurred, but also small enough to allow for efficient simulation.

### Interaction Energies

Finally, to be able to model the protein-protein interactions, we also needed to assign energies to different protein conformations. As we were studying both electrostatic and hydrophobic interactions, we wanted to calculate energy values based on both types. While a variety of hydrophobicity scales exist, they usually only describe relative hydrophobicity between amino acids (Wimley and White 1996, Wolfenden et al. 1981, Rose et al. 1985). While actual free energy values were estimated for hydrophobic interactions between amino acid residues (Miyazawa and Jernigan 1985), they are based on the distance between amino acid residues observed in protein x-ray structures. This means that they are not applicable for the units in our models, which contain five amino acid residues each.

For this reason we decided to use the hydrophobic compatibility index (HCI) established by (Biro 2006), along with the charge compatibility index (CCI) described in the same paper for consistency. Both indices give a maximal index of 20 for favourable interactions and a minimal index of 1 for unfavourable interactions. To convert these indices into energies, we subtracted 10 from the indices and multiplied by -1 to assign negative energies (-10 - 0) to favourable and positive energies (0 - 10) to unfavourable interactions. Furthermore, electrostatic interactions have been found to have a slightly higher free energy contributions than hydrophobic interactions, 1.4 kcal/mol (Berg 2002) versus 0.6 kcal/mol for small (36 residues) and 1.6 kcal/mol for large (341 residues) proteins (Pace et al. 2011). Since TDPs have ~210-230 residues, we estimated a hydrophobic interaction energy to be 0.9 kcal/mol. Thus, we came up with the following equations for hydrophobic and electrostatic interaction energies scaled to their free energy contributions.

$$E_{HCl} = -\left( 10 - \left| [HM(A) - HM(B)]\frac{19}{10.6}\right| \right)\frac{0.9}{1.4} \tag{2}$$ $$E_{CCL} = - \left(1- [PI(A) - 7][PI(B)-7]\frac{19}{33.8} \right) \tag{3}$$

### Model Conditions

In the simulation, Cas13a stays fixed at the center. However, a range of movements is possible for the TDPs surrounding it as illustrated in the figures below . Both movements of single units, at the end or the middle of a chain, and movements of multiple units including, snake movements, rotations of chain ends, and translations are possible.

Figure 2: 2D illustration of the process behind the movement of a single unit at the end of a chain. Possible new positions of the unit are determined and one is selected at random.

Figure 3: 2D illustration of the process behind the movement of a single unit in the middle of a chain. Possible new positions of the unit are determined by overlaying the possible open sites of its neighbours and one is selected at random.

Figure 4: 2D illustration of the process behind a snake movement. A possible new position of one end of the chain is determined and the following units moved along the chain.

Figure 5: 2D illustration of a rotational movement. A unit in the chain and on end of the chain is picked and the angle and axis (in 3D) by which the chosen part moves is generated randomly.

Figure 6: 2D illustration of a translation move with a shift of (-1, -2). A chain is picked and moved by a randomly generated shift.

The simulation generates random moves and calculates the energy difference between the new and old chain conformation. As stated earlier, only nearest neighbour interactions are taken into account for this.

We employ the metropolis monte carlo algorithm in the simulation, accepting and rejecting moves with a probability of:

This prevents the system from falling into a local energy minimum, because it allows moves with unfavourable interactions. The probability of accepting unfavourable interactions decreases exponentially for interaction energies higher than the system’s energy. The value of

represents the energy present in the system and was chosen to be 1/20 for our simulation, after several trial runs.

### Results

With our lattice model, we wanted to study the interactions between TDPs and Cas13a upon drying. For this purpose, we first had to assess if the model we were using was suitable for modelling protein interactions. Afterwards, we investigated which TDP would confer desiccation tolerance to Cas13a the best, and at which concentration we should use our TDPs.

### Modelling General Protein Interactions

In our lattice model, we convert the hydrophobicity and charge compatibility indices established by Biro 2006. We tested if the generated energy values had the expected impact on the protein behaviour, by running the model without any favourable interactions, by generating only positive energy values (0-20), and with exclusively favourable interactions, by generating only negative energy values (-20-0) between the chains. By observing the behaviour of the chains after a given number of iterations, we confirmed that the model can provide information on the behaviour of polymers in solution.

In Figure 7 below, the polymer arrangement does not change significantly over 50000 iterations. While the chains have changed positions and it is clear that movements were generated, the conformations and the localization over the environment are similar. On the other hand, Figure 8 clearly shows a clustering of chains after the same amount of iterations, which would be expected if all chain interactions are favourable. As the energy input in both cases yields the expected results, we can conclude that the lattice model can be used to simulate the phase separation of polymers in solution and that it can be used to model the interactions between Cas13a and TDPs.

Figure 7: Two states of a simulation without favourable protein interactions. To the left, the initial state of the simulation is shown immediately after it is generated, to the right, the environment is shown after 50 000 iterations.

Figure 8: Two states of a simulation with exclusively favourable protein interactions. To the left, the initial state of the simulation is shown immediately after it is generated, to the right, the environment is shown after 50 000 iterations. Clusters of TDPs can be recognized, confirming the hypothesized protein-protein interactions.

### Preservation of Cas13a activity by different types of TDPs

To determine which TDP preserves Cas13a activity upon drying best, we investigated the behaviour of CAHS 94205, CAHS 106094, SAHS 330 20 and SAHS 68234 with Cas13a at a concentration of 1 g/L and a volume fraction of 0.1, since Boothby et al. 2017 showed that LDH activity was already well preserved at a TDP concentration of ~ 1 g/L. The simulations were run over 300000 iterations. One of these simulations (SAHS 33020, 1 g/L, volume fraction 0.1) is shown in Figure 9.

Figure 9: Simulation of Cas13a with SAHS 33020 at a concentration of 1 g/L and in a volume fraction of 0.1.

Both the migration of proteins towards Cas13a and formation of protein clusters is visible. However, protein clusters that form further away from Cas13a tend to remain in place and attract other chains. This is also visible in Figure 10, which plots the number of SAHS 33020 proteins at different distances from Cas13a over the number of iterations. During the simulation the area close to Cas13a clearly attracts other proteins as visible in the lower bottom part of the figure. However, while there is a general movement towards Cas13a, tend to stay in clusters for an extended amount of time, as is for example visible at around 150000 iterations. To observe migration of more TDPs towards Cas13a, the simulation would have to be run for a longer amount of time.

Figure 10: Number of SAHS 33020 proteins at different distances from Cas13a over 300000 iterations. Data was extracted from a Simulation of Cas13a with SAHS 33020 at a concentration of 1 g/L and in a volume fraction of 0.1.

To determine which TDP would preserve Cas13a activity best, we analyzed two parameters: the coverage of Cas13a during the lattice model simulation (see Figure 11) and the development of the energy over time (see Figure 12).

The coverage of Cas13a is measured by counting the number of immediate neighbours to all units in the Cas13a protein. If the protein is covered with TDPs more quickly, they can form the glass-like matrix supporting its structure earlier, preventing degradation of the protein. In Figure 11, the number of immediate neighbours of Cas13a is plotted against the number of iterations per chain. All TDPs display similar development of coverage, but SAHS 33020 and CAHS 94205 display a steeper slope than the other proteins in their respective families and a higher coverage at 800 iterations per chain. This suggests that they are better suited for drying Cas13a.

Figure 11: Number of immediate neighbours to Cas13a against the number of iterations per chain for TDPs CAHS 94205, CAHS 106094, SAHS 33020 and SAHS 68234 at a concentration of 1 g/L and in a volume fraction of 0.1.

This result is partly reflected in the interaction energies calculated in the different TDP simulations. A steeper slope of the energy graph indicates that the phase separation occurs faster and that the steady state is reached at an earlier stage. Thus, Figure 12 implies that the two SAHS proteins and out of these SAHS 33020 would confer the highest desiccation tolerance to Cas13a. However, when comparing within the protein families, CAHS  904205 and SAHS 33020 show the best results again.

Figure 12: Normalised energy of the system against the number of iterations per chain for TDPs CAHS 94205, CAHS 106094, SAHS 33020 and SAHS 68234 at a concentration of 1 g/L and in a volume fraction of 0.1.

While the energy and Cas13a coverage development showed slightly different results, we decided to use CAHS 94205 and SAHS 33020 to dry Cas13a in further experiments, to have TDPs of both families to test in the lab. Additionally, we experimentally determined that these two proteins also had the largest effect on desiccation tolerance of LDH, which further supported our choice. All in all however, we expect that SAHS 33020 performs best in Cas13a drying assays.

### Preservation of Cas13a activity at different concentrations

As CAHS 94205 and SAHS 33020 were chosen to dry Cas13a, we simulated the behaviour of the two proteins at concentrations ranging from 0.1 to 1 g/L. An example on how the difference in concentration affects the number of chains is shown in Figure 13 for SAHS 33020. We again chose a volume fraction of 0.1 and analyzed the development of Cas13a coverage (see Figure 14) as well as the energy over time (see Figure 15).

Figure 13: Initial state of lattice model simulation for SAHS 33020 concentrations ranging from 0.1 to 1 g/L at a volume fraction of 0.1.

In Figure 14 the number of direct neighbours to Cas13a is plotted against the TDP concentration at the same time in each simulation. Boothby et al. 2017 had previously observed a logarithmic behaviour of LDH activity towards TDP concentrations as high as 10 g/L. For this reason, we fitted a logarithmic curve over all points with function (4).

$$y = m\cdot \ln(x) +b\tag{4}$$

The coefficient of determination of the resulting curve is between 0.7 and 0.8 for both fits, indicating that it complies with the data reasonably well. The system might not have reached its saturation point, but the curve evens out towards higher TDP concentrations as would be expected. This means that, up to a concentration of 1 g/L at least, a higher Cas13a coverage can be achieved by increasing the TDP concentration. Better preservation of Cas13a activity may be possible to obtain with higher TDP concentrations, but our results indicate that the difference in the resulting desiccation tolerance will become less pronounced at higher concentrations. This complies with observations by Boothby et al. 2017, who observed that the curve was flattening out towards the saturation point after around 1 g/L.

Figure 14: a) Number of direct neighbours to Cas13a over a) CAHS 94205 and b) SAHS 33020 concentrations ranging from 0.1 to 1 g/L at 800 moves per chain at a volume fraction of 0.1. A logarithmic function was fitted to the data. The resulting R^2 of the fits is between 0.7 and 0.8 for both.

The normalised energy of CAHS 94205 and SAHS 33020 over concentration shown in Figure 15, shows a similar trend of a steeper slope and therefore better protection of Cas13a for higher TDP concentrations. Furthermore, we could observe in the development of the system’s energy over time, that the energy of SAHS 33020 simulations generally displayed a steeper slope than that of CAHS 94205 simulations for all concentrations except 0.1 g/L. This implies again, that of all TDPs SAHS 33020 is best suited to preserve Cas13a activity upon drying. It also means that SAHS 33020 will likely show better activity preservation at lower concentrations than CAHS 94205.

Figure 15: a) Normalised energy of the system against the number of iterations per chain for simulations with concentrations ranging from 0.1 to 1 g/L for a) CAHS 94205 and b) SAHS 33020 at 800 moves per chain and in a volume fraction of 0.1.

While we cannot give recommendations for higher concentrations, it appears that a concentration of at least 1 g/L of either CAHS 94205 or SAHS 33020 is advisable for drying Cas13a. However, SAHS 33020 seems overall best suited for drying Cas13a and may even confer enough desiccation tolerance to Cas13a at lower concentrations. This is important to keep undesired side effects, such as the activation of Cas13a through TDPs as observed in the experiments, at a minimum.

#References: