Team:CGU Taiwan/Model

iGem CGU_Taiwan 2017 - Model


Introduction

Nowadays, deforestation is depriving the forest resources on a massive scale. Our earth is running out of natural sources, and the animals are losing their home. To solve this problem, people started to recycle papers in order to decrease the cut of trees, and also decrease the amount of waste. The regeneration of paper makes a significant contribution to the waste reduction, and also the effective utilization of natural resources. However, there is another problem inside the process of recycling paper.

The manufacturing process of reprocessed paper can be roughly divided into pulping, cleaning, deinking, fiberizing, suspending, pressing, and drying. Among these processes, deinking process belongs to the most important section which carries out the removal of the adhered substances from paper surface, such as the color kernels of printing ink, and the adhesive agents. However, the currently existing deinking processes mainly adopt chemical deinking method by using chemical agents such as sodium hydroxide (NaOH), sodium silicates (Na2SiO3), sodium phosphates (Na2PO3), and potassium phosphates (KH2PO4) which is harmful to the environment.

To improve the current deinking method, CGU Taiwan team starts to develop the genetically engineered enzymes, thus the environmental contamination can be reduced by using enzymatic deinking method instead of chemical method. Based on the previous research of bio-enzyme deinking by Prof. Su, Yu-Chang (2002), the project focus on three different kinds of deinking enzyme, such as glucanase, xylanase, and lipase. Glucanase and xylanase belong to cellulase and hemi-cellulase. In the deinking process, glucanase and xylanase bind with cellulose which is the main composition of paper fiber, while lipase bind with ink, so that the ink is able to be removed from paper fiber.

However, there is different kinds of organisms producing different kinds of enzymes. To complete this project, we have to choose out the most suitable enzymes those are best fit to our need. To do so, computer simulation is the most effective and accurate way by testing for the binding affinity of enzymes and their ligand. Besides, we set up a hypothesis that the protein which produced by eukaryotic cells will perform better than the protein which produced by prokaryotic cells, because the protein post-translational modification has performed in the translation process of eukaryotic cells, so we think that the eukaryotic proteins are having higher binding affinity with their ligand, and we want to verify our hypothesis.

In brief, we firstly use an original amino acid sequence of an enzyme as a template to find out the other proteins those have high similarity on their amino acid sequence from protein data bank, and then simulate the docking between proteins and their ligand by using protein docking software. Finally, the protein docking software will help to calculate the binding affinity of each protein, so that we can choose out the enzyme which is the most effective in performing deinking, and also increase the possibility of success of this project.


Material and method

In the modeling process, xylanase, glucanase and lipase have to be simulated individually, because each simulation can just work on one enzyme. The whole modeling process can be divided into five part: search for similar proteins, fix proteins, energy minimization of ligands, protein docking, and organize the data.


Template search of deinking enzyme-relating proteins in SWISS-MODEL

First and foremost, we use an original amino acid sequence of each enzyme to find out the similar proteins by using SWISS-MODEL. SWISS-MODEL is a structural bioinformatics software that can perform homology modeling of protein 3D structures. The original amino acid sequences are referenced from the research of bio-enzyme deinking by Prof. Su, Yu-Chang (Su, Yu-Chang, 2002), so we have used beta-glucanase and xylanase those are produced by a prokaryotic organism called Bacillus subtilis as the original templates. For lipase, we have tried to use a eukaryotic lipase that produced by Rhizopus niveus as the original template. By searching similar templates on SIWSS-MODEL, we can get the protein databank (PDB) ID, oligo-state, and the sequence similarity. Next, download all of the PDB files of the proteins from protein data bank (PDB). For lipase, since the number of similar proteins are too much, so we have to screen out the proteins those are exactly lipase, but not other proteins like hydrolase, depolymerase, cutinase, and so on.


Energy minimization of ligand in Discovery Studio

After preparing protein files, we also have to find out the PDB file of the ligands those are going to be binding with proteins. In this project, we use cellulose as the ligand of xylanase and glucanase, while we use triacylglyceride as the ligand of lipase. These ligands need to be performed energy minimization on another software called Discovery Studio, because we want to make sure that the all of the protein docking are in the same energy level, and the docking process won’t be affected by the free electrons of the ligand.


Fix protein conformation problem

Before protein docking, we have to use a protein docking software called AutoDock to test about which proteins have conformation problem, and then use Discovery Studio to fix the alternate conformation problem in order to avoid error on protein docking.


Protein-ligand docking simulation in AutoDock

After preparing all of the conditions, we now can perform protein docking by using AutoDock. AutoDock is an automated protein-ligand docking software that helps to calculate the binding affinity between protein and ligand. In addition, the AutoDock needs to be operated with a virtual screening tool called PyRx to promptly screen out the best chemical structures of each protein from numerous chemical structures during docking. Since this software can just perform one protein docking for each time, so we need to manually operate each protein docking one by one.


Data organization and protein visualization

After finished all of the protein docking, the protein docking results are organized by using Microsoft Excel according to their binding affinity ranking. Besides, we have to find out the organisms of producing each protein, so that we can compare the binding affinity of proteins those produced by prokaryotic organisms or eukaryotic organisms. In addition, we can also use a protein visualization software called PyMOL to visualize the protein-ligand docking situation and their 3D structure.


Result

In the first section, the related proteins of each enzyme are searched by using SWISS-MODEL, and then searched for the organisms of each protein. The proteins are finally performed protein-ligand docking to calculate the binding affinity of each protein. For the binding affinity, the more negative number shows the better binding affinity.

The table 1 shows the result of xylanase-cellulose docking. The first one which ranked 0 is the docking result between original xylanase and cellulose, and their binding affinity shows -5.8. While the protein with PDB ID of 5hxv has the best rank in this simulation, it has the binding affinity of -10.6. This protein 5hxv is a eukaryotic protein which classified to hydrolase, its sequence identity which compared to the original protein sequence is 52.247%. From rank 79 to 98, the binding affinity is not shown due to the docking error during the docking simulation in AutoDock. Furthermore, according to the table 1, the proteins those ranked in top 3 are produced from eukaryotes. The number of eukaryotic protein in the top 10 proteins is 6.

The table 2 shows the result of glucanase-cellulose docking. The 3O5S protein which ranked 78 is the docking result between original glucanase and cellulose, and their binding affinity shows -5.7. While the protein with PDB ID of 2vy0 and 3wdy has the best rank in this simulation, they have the binding affinity of -9.7. Protein 2vy0 is a prokaryotic protein, its sequence identity which compared to the original protein sequence is 26.316%; while protein 3wdy is a eukaryotic protein, its sequence identity which compared to the original protein sequence is 18.023%; and both of them are classified to hydrolase. From rank 80 to 81, the binding affinity is not shown due to the docking error during the docking simulation in AutoDock. Furthermore, according to the table 1, the number of eukaryotic protein in the top 10 proteins is 4.

Table 3 shows the result of lipase-triacylglycerol docking. The 1LGY lipase which ranked 109 is the docking result between original lipase and triacylglycerol, and their binding affinity shows -4.4. While the lipase with PDB ID of 1AQL, 2YIJ and 4K6K has the best rank in the lipase-triacylglycerol docking with score of -6.5, and its sequence identity which compared to the original lipase amino acid sequence is about 14~24%. Among these top 3, lipase 4K6K is produced by a eukaryotic organism called Moesziomyces antarcticus which classified to yeast. Since our project is using yeast as our enzyme secreting organism, this 4K6K protein might perform better in the deinking system. Furthermore, according to the table 3, the number of eukaryotic protein in the top 10 proteins is 8.

For the purpose of confirming the effectiveness of top rank protein, we had visualized the protein-ligand docking situation by using protein 2vg9 as the test object. Figure 1 shows that the cellulose molecule is completely surrounded by xylanase 2vg9 molecule, and the whole binding situation is looked like the cellulose molecule is fit to the shape of xylanase 2vg9 molecule. This phenomenon means that the binding between this enzyme and ligand is very ideal. In figure 2, the binding situation of protein which lowly ranked is relatively worse than highly ranked protein, because we observed that the cellulose is just bind on aside of protein, the efficiency of reaction between this protein and ligand might be not in good condition.

Therefore, according to the results, we can confirm that the high rank proteins are more suitable to be used in the enzymatic deinking process, and we can also increase the prospect of success in the experiment by using the high rank proteins instead of original protein which produced from Bacillus subtilis. In the future, this computer simulation results will be used as a reference for the sequence design in the wet lab.


Discussion

By analyzing the protein-ligand docking results, we observed that the binding affinity of high rank proteins is still high no matter the sequence identity is lower than other proteins. Likewise, even though the sequence identity is higher, the binding affinity is not always high. For instance, the xylanase related protein 2vg9 (rank 2) has only the 34.659% of sequence similarity which compared to the amino acid sequence of original xylanase 5k9y, but it still has the high binding affinity. This phenomenon states that inside the 34.659% of amino acid sequence of 2vg9 might has contained the ‘conserved sequence’ of xylanase that use for the binding with cellulose.

In addition, as I mentioned in the introduction, we had set up a hypothesis that the eukaryotic protein will get the higher binding affinity rank than the prokaryotic protein due to the post-translational modification of protein. We think that the amino acid sequence of eukaryotic protein has been modified in the process of evolution, so the protein-ligand binding situation has become higher. From the result of xylanase-cellulose docking, we observed that the percentage of eukaryotic protein in the top 10 proteins is 60%; in the result of lipase-triacylglycerol docking, we also observed that the percentage is 80%. Thus, this result has increased the possibility of our hypothesis. However, the result of glucanase-cellulose docking shows that the percentage of eukaryotic protein in the top 10 proteins is only 40%, and this result is not matched to our hypothesis. This is possibly because the glucanase is a common enzyme, and many organisms have produced glucanase for digesting polysaccharide; while xylanase is more specialized than glucanase, only the micro-organisms those degrade the plants will produce xylanase because they need to get special nutrients from plants for growing. Since the organisms those produce glucanase are diversified, and not specialized for the micro-organisms those degrade the cellulose in the plant cell wall, thus the result of glucanase-cellulose docking is might not accurate than the result of xylanase-cellulose docking.

In the process of this simulation, we have also faced some problem about the file format. At the beginning, we use Basic Local Alignment Tool (BLAST) from National Center for Biotechnology Information (NCBI) to find out the similar protein, and then download the files of each protein. However, we realized that the FASTA file format is incompatible on AutoDock, and this AutoDock software just accept the PDB file format. Thus, we changed to use SWISS-MODEL as the tool for searching similar protein because SWISS-MODEL provide PDB ID of each protein, so we can download the PDB file in the Protein Data Bank one by one. But SWISS-MODEL has a disadvantage, because NCBI protein BLAST is able to set the different protein condition such as database, organism and so on, but SWISS-MODEL can’t specify the protein condition. But we still selected SWISS-MODEL to search proteins due to the PDB file format problem.

Moreover, in the xylanase-cellulose docking result, we got some useful information. The top 3 rank proteins have their specific characteristic: protein 5hxv is thermophilic; protein 2vg9 is basophilic; and the protein 1yna is thermostable on 50 – 80 °C with pH 3-12. These proteins have commercial reference, and their characteristics provide lots of convenience on experiment operation.

In summary, by performing computer simulation and calculating the binding affinity of protein-ligand docking in dry lab, we are able to analyze and screen out the most effective proteins from the vast number of related proteins, and visualize the 3D structure of protein-ligand docking situation which is unable to observe in wet lab experiment. Therefore, this modeling has provided the high reliability data, and also increased the possibility of success in the wet lab experiments.


Reference

1. SWISS-MODEL
https://swissmodel.expasy.org/
2. Protein Data Bank
http://www.rcsb.org/pdb/home/home.do
3. BIOVIA Discovery Studio
http://accelrys.com/products/collaborative-science/biovia-discovery-studio/
4. AutoDock
http://autodock.scripps.edu/
5. PyMOL
https://www.pymol.org/
6. Studies on the Enzymatic Deinking of Wastepaper (Yu-Chang Su, Shan-Yuan Chiou, Shu-Ching Lee, Ruoh-Yun Yeh, 2002)
7. Production of Cellobiose by Enzymatic Hydrolysis: Removal of β-Glucosidase from Cellulase by Affinity Precipitation Using Chitosan (Taira Homma, Michihiro Fujii, Jun-ichi Mori, Tohru Kawakami, Kenji Kuroda, and Masayuki Taniguchi, 1992)
8. Identification and characterization of core cellulolytic enzymes from Talaromyces cellulolyticus (formerly Acremonium cellulolyticus) critical for hydrolysis of lignocellulosic biomass (Hiroyuki Inoue, Stephen R Decker, Larry E Taylor, Shinichi Yano and Shigeki Sawayama, 2014)
9. Directed evolution to produce an alkalophilic variant from a Neocallimastix patriciarum xylanase. (Chen YL, TangTY, Cheng KJ, 2001)
10. Thermomyces lanuginosus: properties of strains and their hemicellulases. (Singh S, Madlala AM, Prior BA, 2003)

Figure 1. The 3D structure of xylanase (PDB ID: 2vg9) binds with cellulose.
This figure shows the 3D structure of xylanase-cellulose docking situation which visualized by PyMOL. The polysaccharide that shown in the middle is cellulose, while the macromolecule on the surrounding of cellulose is xylanase 2vg9 which ranked number 2 in the xylanase-cellulose docking.

Figure 2. The 3D structure of protein (PDB ID: 3b5l) binds with cellulose.
This figure shows the 3D structure of xylanase-cellulose docking situation which visualized by PyMOL. The polysaccharide that shown in the left side is cellulose, while the macromolecule on the right side of cellulose is protein 3b5l which ranked number 77 in the xylanase-cellulose docking.