... | ... | @@ -44,6 +44,35 @@ Reads are removed if one of the following cases occur: |
|
|
- if aligned at too many position. This shouldn't happen since the use of the arguments in bwa.
|
|
|
|
|
|
Multiple files are created from the execution of this script.
|
|
|
|
|
|
<table>
|
|
|
<tr>
|
|
|
<td>
|
|
|
<code>
|
|
|
TestProject.all.log.dat
|
|
|
TestProject.all.refseq.total.dat
|
|
|
TestProject.all.refseq.umi.dat
|
|
|
TestProject.all.spike.total.dat
|
|
|
TestProject.all.spike.umi.dat
|
|
|
TestProject.all.unknown_list
|
|
|
TestProject.all.well_summary.dat
|
|
|
TestProject.all.well_summary.pdf
|
|
|
</code>
|
|
|
</td><td>
|
|
|
<code>
|
|
|
TestProject.unq.log.dat
|
|
|
TestProject.unq.refseq.total.dat
|
|
|
TestProject.unq.refseq.umi.dat
|
|
|
TestProject.unq.spike.total.dat
|
|
|
TestProject.unq.spike.umi.dat
|
|
|
TestProject.unq.unknown_list
|
|
|
TestProject.unq.well_summary.dat
|
|
|
TestProject.unq.well_summary.pdf
|
|
|
</code>
|
|
|
</td>
|
|
|
</tr>
|
|
|
</table>
|
|
|
|
|
|
The name of the files start with the name of the project.
|
|
|
The second part is either "all" or "unq":
|
|
|
- "all" counts reads that have aligned on multiple genes. The read count is assigned to the gene defined as the primary alignement in the bam files. There is no other reason than random to assign the read to a particular transcript as primary alignment even though the secondary alignments are all viable solutions.
|
... | ... | @@ -60,11 +89,64 @@ The third part of the name defines the informations to be found: |
|
|
- "total" is the number of total reads of the counts without taking into account the UMIs. Counts can be artificially increased by the PCR steps of the technique.
|
|
|
- "umi" is the number of unique molecules of RNA, this taking into account the UMIs for each read. If two reads of the same sample map to the same gene and have the same UMI, it will only be counted for 1.
|
|
|
|
|
|
**Thus, the expression matrix usually used for secondary analysis is "XXX.unq.refseq.umi.dat"**
|
|
|
### Explanations of the well_summary file
|
|
|
|
|
|
Example of **unq.well_summary.dat**
|
|
|
```
|
|
|
sample1 sample2 sample3 sample4 sample5 sample6
|
|
|
Assigned 189919 216155 200135 91456 223437 132242
|
|
|
Aligned 189716 215980 199942 91374 223260 132137
|
|
|
Refseq_Total 188798 215611 198770 91167 222835 131923
|
|
|
Refseq_UMI 95060 102253 92633 47353 105240 72743
|
|
|
Mito_Total 0 0 0 0 0 0
|
|
|
Mito_UMI 0 0 0 0 0 0
|
|
|
Genes_Detected 383 371 378 342 366 350
|
|
|
Spike_Total 0 0 0 0 0 0
|
|
|
Spike_UMI 0 0 0 0 0 0
|
|
|
```
|
|
|
|
|
|
|||
|
|
|
|:---|:---|
|
|
|
| assigned | number of reads with the sample barcode |
|
|
|
| aligned | number of reads aligned on a track of the reference used. The reference contains the sequences of the refseq transcripts, the genomic sequence of the mitochondrial chromosome and the spike-in sequences |
|
|
|
| Refseq_Total | number of reads aligned on refseq transcripts only. The reads aligned on the mitochondrial chromosome and the reads aligned on refseq sequences that do not have an annotation (i.e. not attributed to a gene) are not counted |
|
|
|
| Refseq_UMI | number of unique mRNA molecules with same alignment as above |
|
|
|
| Mito_Total | number of reads aligned on chromosome M |
|
|
|
| Mito_UMI | number of unique mRNA molecules aligned on chromosome M |
|
|
|
| Genes_Detected | number of genes which have at least 1 count |
|
|
|
| Spike_Total | number of reads aligned on spike-in sequences |
|
|
|
| Spike_UMI | number of unique mRNA molecules aligned on spike-in sequences |
|
|
|
|
|
|
|
|
|
**The expression matrix usually used for secondary analysis is "XXX.unq.refseq.umi.dat"**
|
|
|
|
|
|
Example of **unq.refseq.umi.dat**
|
|
|
```
|
|
|
sample1 sample2 sample3 sample4 sample5 sample6
|
|
|
ABCB1 26 1 27 6 2 0
|
|
|
ABCB10 2 5 4 2 7 4
|
|
|
ABCD2 4 7 6 7 11 8
|
|
|
ACTN1 3 64 1 42 77 36
|
|
|
ACTN1-AS1 0 0 0 0 0 0
|
|
|
ADD3 101 162 104 92 263 165
|
|
|
ADD3-AS1 0 0 0 0 0 0
|
|
|
ADGRG1 17 0 28 0 2 0
|
|
|
ADPRM 10 52 4 37 57 67
|
|
|
ADTRP 4 5 0 15 92 67
|
|
|
AHNAK 168 21 203 16 40 29
|
|
|
AIF1 12 301 5 107 134 108
|
|
|
ALOX5AP 257 76 349 81 113 67
|
|
|
ANKRD55 1 1 0 0 36 12
|
|
|
ANXA1 878 150 1076 109 308 313
|
|
|
ANXA11 16 28 20 21 33 10
|
|
|
ANXA2 199 12 225 2 16 4
|
|
|
ANXA2R 100 153 81 91 179 148
|
|
|
AP3M1 0 0 0 0 0 0
|
|
|
```
|
|
|
|
|
|
![expression matrix](../images/refseqUmi.png "Example of the expression matrix unq.refseq.umi.dat")
|
|
|
Note that this matrix is not normalized.
|
|
|
|
|
|
All files resulting from this script, expression matrix, summaries etc... are stored in the **EXPRESSION** folder.
|
|
|
All files resulting from this script (expression matrix, summaries etc...) are stored in the **EXPRESSION** folder.
|
|
|
|
|
|
<div style="text-align: right">
|
|
|
<i>Next: <a href="analysis/secondary_analysis">Secondary analysis</a></i>
|
... | ... | |