Difference between revisions of "Team:Bordeaux/Model"

 
(15 intermediate revisions by 3 users not shown)
Line 2: Line 2:
 
<html>
 
<html>
  
<link href="https://fonts.googleapis.com/css?family=Lato" rel="stylesheet">
 
<style>
 
.ourWorks * {width : 80%; margin: 2% auto; color: #E0E0E0}
 
.ourWorks p {padding-left: 10px; text-align: justify; font-family: 'Lato'; font-size: 16px;}
 
.ourWorks h1 {padding-left: 20px; font-size: 20px}
 
.ourWorks h2 {padding-left: 25px; font-size: 18px;}
 
.ourWorks h3 {padding-left: 30px; font-size: 16px}
 
.ourWorks ul li{padding-left: 10px; margin: 30px; font-size: 16px; font-family:'Lato'; list-style:square; line-height: 150%}
 
</style>
 
  
<div class="ourWorks">
+
<div class="column full_size judges-will-not-evaluate">
 +
<h3>★  ALERT! </h3>
 +
<p>This page is used by the judges to evaluate your team for the <a href="https://2017.igem.org/Judging/Medals">medal criterion</a> or <a href="https://2017.igem.org/Judging/Awards"> award listed above</a>. </p>
 +
<p> Delete this box in order to be evaluated for this medal criterion and/or award. See more information at <a href="https://2017.igem.org/Judging/Pages_for_Awards"> Instructions for Pages for awards</a>.</p>
 +
</div>
 +
<div class="clear"></div>
  
<h1>RNA-Seq Analysis Pipeline For Splicing Studies</h1>
+
<div class="column full_size">
<h2>1. What is RNA-Seq ?</h2>
+
<h1> Modeling</h1>
  
<p>
+
<p>Mathematical models and computer simulations provide a great way to describe the function and operation of BioBrick Parts and Devices. Synthetic Biology is an engineering discipline, and part of engineering is simulation and modeling to determine the behavior of your design before you build it. Designing and simulating can be iterated many times in a computer before moving to the lab. This award is for teams who build a model of their system and use it to inform system design or simulate expected behavior in conjunction with experiments in the wetlab.</p>
  RNA-Seq is a molecular biology method used to quantify RNA in a specific sample. It is based on New Generation Sequencing (NGS) methods and allows the study of gene expression.
+
</p>
+
  
<p>
+
</div>
  At the beginning of DNA sequencing in the 70’s, two main methods were developed. One of them by Walter Gilbert (USA) and another one by Frederick Sanger (UK), they both obtained the chemistry Nobel prize in 1980. The two approaches are really different and we will quickly sum them up.</p>
+
<div class="clear"></div>
 
+
<ul class="ourWorks">
+
  <li>
+
    <b>Maxam and Gilbert method :</b> This method works in several steps. First, the two extremity of a double strand DNA (dsDNA) are radioactively labeled. Then the targeted DNA sequence is selected by polyacrylamide gel electrophoresis (PAGE). The two strands are separated by thermic denaturation and purified by PAGE. Some chemical modifications are performed on these strands in a way that each sample of DNA contain zero or one modification. The DNA is then cleaved by piperidine. Finally an electrophoresis allows to recomposing the initial sequence. The main problem with this method is the use or radioactivity and the toxic chemicals used.
+
  </li>
+
  <li>
+
    <b>Sanger method :</b> This approach use a target DNA which will be incubated with a polymerase. This polymerase needs oligonucleotids to perform polymerisation. Sanger had the idea to use desoxyribonucleotids and a little quantity of didesoxyribonucleotids (ddNTP). When the polymerase incorporates a ddNTP the polymerisation stops. Thus a migration by electrophoresis allows to determine the initial DNA sequence.
+
  </li>
+
</ul>
+
  
 +
<div class="column half_size">
 +
<h3> Gold Medal Criterion #3</h3>
 
<p>
 
<p>
  From the 90’s several new methods were developed to increase the performances and decrease the costs of sequencing. These methods so called NGS methods reduced costs and increased speed of sequencing, and thus opened new perspective for biologist. This year, for the IGEM competition we focused on one of the NGS method which is RNA-Seq (Figure 1).
+
To complete for the gold medal criterion #3, please describe your work on this page and fill out the description on your <a href="https://2017.igem.org/Judging/Judging_Form">judging form</a>. To achieve this medal criterion, you must convince the judges that your team has gained insight into your project from modeling. You may not convince the judges if your model does not have an effect on your project design or implementation.  
</p>
+
 
+
<img src="https://img15.hostingpics.net/pics/688663RNASeq.png" alt="FUCK !!!" style="width:200px">
+
 
+
<p style="font-size:12">
+
  <b>Figure 1 :</b> RNA-Seq Analysis. The first step is to extract mRNA from a sample. Then a specific enzyme called reverse transcriptase allows the reverse transcription into cDNA. Finally these cDNA are sequenced using NGS methods.
+
 
</p>
 
</p>
  
 
<p>
 
<p>
  RNA-Seq as said previously allows to quantify RNA into a cell at a particular time. With NGS development, a huge amount of data became available to scientists. They actually needed peoples to compute these data and this is when bioinformaticians came up. Computers are actually thought to treat a lot of data faster than humans. Thus a lot of tools were developed to process NGS outputs. For the competition we used some of these tools to study splicing in C. elegans organism. Lets see how we proceeded !
+
Please see the <a href="https://2017.igem.org/Judging/Medals"> 2017 Medals Page</a> for more information.  
 
</p>
 
</p>
 +
</div>
  
<h2>2. How to study splicing with RNA-Seq ?</h2>
+
<div class="column half_size">
 
+
<h3>Best Model Special Prize</h3>
<h3>2.1. Aligning & Mapping</h3>
+
  
 
<p>
 
<p>
  In bioinformatics, sequence alignment is a way of arranging RNA sequences in relation to each other to determine their structure or function similarities. Sequences are stored in a matrix where rows from each sequence are compared. Gaps can be added into sequences so that identical or similar characters are aligned in successive columns. The organism studied here is C.elegans. The purpose here was to align RNAseq reads to its reference  genome by using the Hisat algorithm.
+
To compete for the <a href="https://2017.igem.org/Judging/Awards">Best Model prize</a>, please describe your work on this page  and also fill out the description on the <a href="https://2017.igem.org/Judging/Judging_Form">judging form</a>. Please note you can compete for both the gold medal criterion #3 and the best model prize with this page.  
  RNA is transcribed from DNA sequences that are composed of alternating coding exons and non-coding introns. A pre-RNA is produced that contains the transcribed Exons and Introns.
+
<br><br>
 +
You must also delete the message box on the top of this page to be eligible for the Best Model Prize.
 
</p>
 
</p>
 
<p>
 
  Out of this pre-RNA, only coding Exons must be kept and the introns removed. This process of removing introns is called splicing. Different combinations of exons can be brought together to produce different variants of the protein to be, in a process called alternative splicing.
 
  It is those spliced RNA sequences that are then sequenced. To do, so they are retro-transcribed into their complementary DNA, the cDNA. This DNA is sequenced using NGS.
 
</p>
 
 
<p>
 
  Current sequencing technologies methods split the large DNA molecules to be sequenced into small chunks called reads. These reads sequences are mapped to the genome reference using algorithms like bowtie. Because reads are small, some sequences can be redundant, present at different locations in the genome, making them hard to map. To circumvent this, a technique of mapping called paired-end is used. It consists in sequencing a cDNA fragment at its extremities in both directions, 3’ to 5’ and 5’ to 3’ (reverse strand). Because these reads originate from the same fragment the distance between them is know and it is easier to map them. Indeed, if two reads can map at a same location only one will have its pair mapping further at the correct distance.
 
</p>
 
 
<p>
 
  When many reads cover a common region, this region of the genome is highly expressed, because many RNA were produced out of it and reads found for them.
 
</p>
 
 
<p>
 
  Reads come up as fastq files, which are formated text files, stored in the SRA archive (Sequenced Reads Archive) at the NCBI. The fastq files are produced by the sequencing technologies and consist of the combination of fasta information (the raw nucleic sequence) and the quality score of each sequenced nucleic acid base. It is possible to download them in a programmatic manner using the fastQ-dump software from the SRA toolkit.  The name of the SRA archive collection is specified as well as the sequencing method (single or paired end), to produce one or two fastq files.
 
</p>
 
 
<p>
 
  These fastq files are the input for the HISAT software, based on bowtie, it performs the mapping of the reads on the genome. HISAT was used with the parameters previously described in the work of Denis Dupuy that produced the reference junctions file (ref). HISAT outputs bam files, they are a binary version of a sam file which contains the mapping informations like localisation of sequences reads sequences.
 
</p>
 
 
<h3>2.2. Extracting junctions</h3>
 
 
<p>
 
  While some sequenced reads will fall within the boundaries of an exon, some of them correspond to a sequence overlapping an Exon-Exon boundary. When mapped to the genome, these reads will have the left part of their sequence on the end of the first exon and the right one further, after the intron, on the beginning of the following exon. This is how a junction can be detected. It is not that simple but algorithms like bowtie can infer whether it is a true junction or not. Alternative splicing produces different exon combinations through different junctions, thus, a junction actually represents a specific spliced form of RNA.
 
</p>
 
 
<p>
 
  After the step of aligning and mapping we went through the junction extraction process. To perform this, we used two well known tools, called samtools and regtools. Samtools allowed us to index the bam file output from the alignment and mapping steps. This is necessary as regtools, which allows the extraction of junctions and generates a bed file, needs an indexed bam file for input. Finally after the junctions extraction we ended up with a bed file containing all the junctions (spliced forms) of the sample.
 
</p>
 
 
<p>
 
  The more reads map to a junction, the more often the two corresponding exons are associated. This is how we score a junction, by the number of reads mapping to it. However, the score isn’t sufficient to reflect the different expression levels of variants, because it is dependant on the expression level and can’t be used as a comparison variable. This is why we had to implement a usage ratio calculation.
 
</p>
 
 
<h3>2.3. Calculating ratios</h3>
 
 
<p>
 
  This step is the core of our pipeline. The method used to calculate was developed by Denis Dupuy (IECB, Pessac). It relies on the START and STOP positions of each junction, we talk about acceptor (for start) and donor (for stop).
 
  The first thing we had to do was to regroup all the junctions sharing an acceptor or a donor. We then computed the ratio by just dividing the own score of a junction by the sum of the scores of the junction pool. Thus at the end a junction can have one acceptor ratio and one donor ratio. Finally the minimal score ratio between these two has been used because it is more robust, indeed, the lower the ratio, the higher the score at the denominator and the more represented the junctions.
 
</p>
 
 
<p>
 
  The final output of this step is a CSV file containing the  usage ratio for each junction. From this, we had to clean the data to keep only the junctions, for each gene, with a common acceptor/donor and a ratio equal to one. It was an important step because the CSV file contained all the junctions, even the one which where very rare, and could not be separated from the background noise due to the RNA-Seq method or some errors from the splicing machinery. This corresponds to the rare-junctions identified in Denis Dupuy’s work, those junctions having a ratio inferior to 1%.
 
</p>
 
 
<h3>2.4. Plotting the results</h3>
 
 
<p>
 
  The next step of the workflow was the data visualisation. There are actually several way to visualize data and make comparisons. In our case we could just compare data between the different conditions present in the dataset or use a reference file for the C.elegans. The question is : is there any reference file for C. elegans splicing ? and actually there is one. Denis Dupuy worked with the method described previously and actually generated the so wanted C. elegans reference file for splicing events. We then had a reference to compare our data with. This is exactly what we did. Using this reference which gives the “normal” splicing ratios of each form allowed us to compare the evolution of splicing with our data. This reference file is really important because with it, we can say if a splicing form is over/under-represented under specific conditions and that makes all the power of the method !
 
</p>
 
 
<p>
 
  The results obtained from the ratio calculation was then computed to extract some extra data. At the beginning we simply plotted f(reference_ratio) = sample_ratio. Denis had the idea to calculate the distance and slope between the points related to be able to generate a new type of plot. Actually we were not really happy on how our plots were. There were a lot of points and no ways to focus on specific gene unless digging into the code itself to make a selection manually. That was not an option so we asked ourselves : how could we represent our data to make them easy to use ? Like a lot of things in informatics, other people thought about it and developed a really nice library called `plotly`. By using this we were able to generate beautiful plots and moreover interactive ones. The user can now easily travel inside the data and visualize what he wants. There is still a step left and it is the interpretation of our data.
 
</p>
 
 
<h3>2.5. Analysing the results</h3>
 
  
 
</div>
 
</div>
 +
<div class="clear"></div>
  
 
<div class="column full_size">
 
<div class="column full_size">
 
+
<h5> Inspiration </h5>
<h1>Parts</h1>
+
 
+
<p>Each team will make new parts during iGEM and will submit them to the Registry of Standard Biological Parts. The iGEM software provides an easy way to present the parts your team has created. The <code>&lt;groupparts&gt;</code> tag (see below) will generate a table with all of the parts that your team adds to your team sandbox.</p>
+
<p>Remember that the goal of proper part documentation is to describe and define a part, so that it can be used without needing to refer to the primary literature. Registry users in future years should be able to read your documentation and be able to use the part successfully. Also, you should provide proper references to acknowledge previous authors and to provide for users who wish to know more.</p>
+
</div>
+
 
+
<div class="column half_size">
+
<div class="highlight">
+
<h5>Note</h5>
+
<p>Note that parts must be documented on the <a href="http://parts.igem.org/Main_Page"> Registry</a>. This page serves to <i>showcase</i> the parts you have made. Future teams and other users and are much more likely to find parts by looking in the Registry than by looking at your team wiki.</p>
+
</div>
+
</div>
+
 
+
 
+
 
+
 
+
<div class="column half_size">
+
 
+
<h5>Adding parts to the registry</h5>
+
<p>You can add parts to the Registry at our <a href="http://parts.igem.org/Add_a_Part_to_the_Registry">Add a Part to the Registry</a> link.</p>
+
<p>We encourage teams to start completing documentation for their parts on the Registry as soon as you have it available. The sooner you put up your parts, the better you will remember all the details about your parts. Remember, you don't need to send us the DNA sample before you create an entry for a part on the Registry. (However, you <b>do</b> need to send us the DNA sample before the Jamboree. If you don't send us a DNA sample of a part, that part will not be eligible for awards and medal criteria.)</p>
+
</div>
+
 
+
 
+
 
+
 
+
 
+
<div class="column half_size">
+
 
+
<h5>What information do I need to start putting my parts on the Registry?</h5>
+
<p>The information needed to initially create a part on the Registry is:</p>
+
<ul>
+
<li>Part Name</li>
+
<li>Part type</li>
+
<li>Creator</li>
+
<li>Sequence</li>
+
<li>Short Description (60 characters on what the DNA does)</li>
+
<li>Long Description (Longer description of what the DNA does)</li>
+
<li>Design considerations</li>
+
</ul>
+
 
+
 
<p>
 
<p>
We encourage you to put up <em>much more</em> information as you gather it over the summer. If you have images, plots, characterization data and other information, please also put it up on the part page. </p>
+
Here are a few examples from previous teams:
 
+
</p>
</div>
+
 
+
 
+
<div class="column half_size">
+
 
+
<h5>Inspiration</h5>
+
<p>We have a created  a <a href="http://parts.igem.org/Well_Documented_Parts">collection of well documented parts</a> that can help you get started.</p>
+
 
+
<p> You can also take a look at how other teams have documented their parts in their wiki:</p>
+
 
<ul>
 
<ul>
<li><a href="https://2014.igem.org/Team:MIT/Parts"> 2014 MIT </a></li>
+
<li><a href="https://2016.igem.org/Team:Manchester/Model">Manchester 2016</a></li>
<li><a href="https://2014.igem.org/Team:Heidelberg/Parts"> 2014 Heidelberg</a></li>
+
<li><a href="https://2016.igem.org/Team:TU_Delft/Model">TU Delft 2016  </li>
<li><a href="https://2014.igem.org/Team:Tokyo_Tech/Parts">2014 Tokyo Tech</a></li>
+
<li><a href="https://2014.igem.org/Team:ETH_Zurich/modeling/overview">ETH Zurich 2014</a></li>
 +
<li><a href="https://2014.igem.org/Team:Waterloo/Math_Book">Waterloo 2014</a></li>
 
</ul>
 
</ul>
</div>
 
  
<div class="column full_size">
 
<h5>Part Table </h5>
 
  
<p>Please include a table of all the parts your team has made during your project on this page. Remember part characterization and measurement data must go on your team part pages on the Registry. </p>
 
 
<div class="highlight">
 
 
 
</html>
 
<groupparts>iGEM17 Bordeaux</groupparts>
 
 
<html>
 
 
</div>
 
</div>
</div>
 
 
 
 
  
 
</html>
 
</html>

Latest revision as of 22:44, 26 October 2017

Wrong