Eric CHARPENTIER · 7df1402e
--- a/analysis/primary_analysis.md
+++ b/analysis/primary_analysis.md
@@ -44,6 +44,35 @@ Reads are removed if one of the following cases occur:
 - if aligned at too many position. This shouldn't happen since the use of the arguments in bwa.

 Multiple files are created from the execution of this script.  
+
+<table>
+<tr>
+<td>
+<code>
+TestProject.all.log.dat
+TestProject.all.refseq.total.dat
+TestProject.all.refseq.umi.dat
+TestProject.all.spike.total.dat
+TestProject.all.spike.umi.dat
+TestProject.all.unknown_list
+TestProject.all.well_summary.dat
+TestProject.all.well_summary.pdf
+</code>
+</td><td>
+<code>
+TestProject.unq.log.dat
+TestProject.unq.refseq.total.dat
+TestProject.unq.refseq.umi.dat
+TestProject.unq.spike.total.dat
+TestProject.unq.spike.umi.dat
+TestProject.unq.unknown_list
+TestProject.unq.well_summary.dat
+TestProject.unq.well_summary.pdf
+</code>
+</td>
+</tr>
+</table>
+
 The name of the files start with the name of the project.  
 The second part is either "all" or "unq":
 - "all" counts reads that have aligned on multiple genes. The read count is assigned to the gene defined as the primary alignement in the bam files. There is no other reason than random to assign the read to a particular transcript as primary alignment even though the secondary alignments are all viable solutions.
@@ -60,11 +89,64 @@ The third part of the name defines the informations to be found:
 - "total" is the number of total reads of the counts without taking into account the UMIs. Counts can be artificially increased by the PCR steps of the technique.
 - "umi" is the number of unique molecules of RNA, this taking into account the UMIs for each read. If two reads of the same sample map to the same gene and have the same UMI, it will only be counted for 1.

- **Thus, the expression matrix usually used for secondary analysis is "XXX.unq.refseq.umi.dat"**
+### Explanations of the well_summary file
+
+Example of **unq.well_summary.dat**
+```
+                sample1  sample2  sample3  sample4  sample5  sample6
+Assigned        189919   216155   200135   91456    223437   132242
+Aligned         189716   215980   199942   91374    223260   132137
+Refseq_Total    188798   215611   198770   91167    222835   131923
+Refseq_UMI      95060    102253   92633    47353    105240   72743
+Mito_Total      0        0        0        0        0        0
+Mito_UMI        0        0        0        0        0        0
+Genes_Detected  383      371      378      342      366      350
+Spike_Total     0        0        0        0        0        0
+Spike_UMI       0        0        0        0        0        0
+```
+
+|||
+|:---|:---|
+| assigned | number of reads with the sample barcode |
+| aligned | number of reads aligned on a track of the reference used. The reference contains the sequences of the refseq transcripts, the genomic sequence of the mitochondrial chromosome and the spike-in sequences |
+| Refseq_Total | number of reads aligned on refseq transcripts only. The reads aligned on the mitochondrial chromosome and the reads aligned on refseq sequences that do not have an annotation (i.e. not attributed to a gene) are not counted |
+| Refseq_UMI | number of unique mRNA molecules with same alignment as above |
+| Mito_Total | number of reads aligned on chromosome M |
+| Mito_UMI | number of unique mRNA molecules aligned on chromosome M |
+| Genes_Detected | number of genes which have at least 1 count |
+| Spike_Total | number of reads aligned on spike-in sequences |
+| Spike_UMI | number of unique mRNA molecules aligned on spike-in sequences |
+
+
+ **The expression matrix usually used for secondary analysis is "XXX.unq.refseq.umi.dat"**
+
+Example of **unq.refseq.umi.dat**
+```
+           sample1  sample2  sample3  sample4  sample5  sample6
+ABCB1      26       1        27       6        2        0
+ABCB10     2        5        4        2        7        4
+ABCD2      4        7        6        7        11       8
+ACTN1      3        64       1        42       77       36
+ACTN1-AS1  0        0        0        0        0        0
+ADD3       101      162      104      92       263      165
+ADD3-AS1   0        0        0        0        0        0
+ADGRG1     17       0        28       0        2        0
+ADPRM      10       52       4        37       57       67
+ADTRP      4        5        0        15       92       67
+AHNAK      168      21       203      16       40       29
+AIF1       12       301      5        107      134      108
+ALOX5AP    257      76       349      81       113      67
+ANKRD55    1        1        0        0        36       12
+ANXA1      878      150      1076     109      308      313
+ANXA11     16       28       20       21       33       10
+ANXA2      199      12       225      2        16       4
+ANXA2R     100      153      81       91       179      148
+AP3M1      0        0        0        0        0        0
+```

- ![expression matrix](../images/refseqUmi.png "Example of the expression matrix unq.refseq.umi.dat")
+Note that this matrix is not normalized.  

-All files resulting from this script, expression matrix, summaries etc... are stored in the **EXPRESSION** folder.
+All files resulting from this script (expression matrix, summaries etc...) are stored in the **EXPRESSION** folder. 

 <div style="text-align: right">
 <i>Next: <a href="analysis/secondary_analysis">Secondary analysis</a></i>