Eric CHARPENTIER · 82c2ffee
--- a/usage/run_pipeline.md
+++ b/usage/run_pipeline.md
 [*Back to home*](Home)  
-*Previous: [The inputs](usage/inputs)*
+*Previous: [The input files](usage/inputs)*

 # Running the pipeline

-## The configuration file
+## Creating the configuration file

-The first task before running the pipeline is to create a **configuration file** in the json format. This task can be performed with the help of the python script in "SCRIPTS/make_srp_config.py"
+The first task before running the pipeline is to create a **configuration file** in the json format. This task can be performed with the help of the python script in "SCRIPTS/make_srp_config.py" assumin you have sourced your conda virtual environment with either `conda activate srp` or `source activate srp`.

+You can visualize the help of the script with:
+
+```
+$ python SCRIPTS/make_srp_config.py -h
+
+usage: make_srp_config.py [-h] -s FILE [-w DIR] [-i DIR] [-f FILE] [-r DIR]
+                          [-c FILE] [--minGenes N] [--minReads N]
+                          [--minLogFC N]
+
+optional arguments:
+  -h, --help            show this help message and exit
+  -s FILE, --samplesheet FILE
+                        Tab delimited file with no header describing samples.
+                        Columns must be: "well index name project condition
+                        species". Only characters "A-Z","0-9","-" and "_"
+                        allowed. All columns are mandatory. (REQUIRED)
+  -w DIR, --workdir DIR
+                        Analysis working directory. Default: current directory
+  -i DIR, --illumina-dir DIR
+                        Directory containing the fastq input files generated
+                        by Illumina. The fastq files should be in paired-end
+                        mode. If your fastq files are not coming from an
+                        Illumina sequencer, please use option -f to specify a
+                        file listing the fastq input files. (REQUIRED if no
+                        "-f")
+  -f FILE, --fastq-file FILE
+                        File describing the fastq input file. This file should
+                        be tab delimited. First column: full path of Forward
+                        file; second column: full path of Reverse file. The
+                        fastq files should be in paired-end mode. If your
+                        fastq files were generated by an Illumina sequencer,
+                        you can use option "-i" to specify the directory
+                        containing the fastq input files. (REQUIRED if no
+                        "-i")
+  -r DIR, --reference-dir DIR
+                        Directory containing the reference files. It is
+                        recommended that you use this option if you have
+                        already used this pipeline and downloaded genome
+                        files.
+  -c FILE, --conditions FILE
+                        Tab delimited file with no headers indicating which
+                        conditions to compare during differential expression
+                        analysis. Columns must be "project condition1
+                        condition2". If not specified, only primary analysis
+                        will be performed
+  --minGenes N          Minimum genes detected necessary for a sample to pass
+                        the filtering step in secondary analysis. (Default
+                        5000)
+  --minReads N          Minimum reads assigned necessary for a sample to pass
+                        the filtering step in secondary analysis. (Default
+                        200000)
+  --minLogFC N          Minimum log Fold-Change threshold for differentially
+                        expessed gene. (Default 0.58 (1.5 FC))
+```
+
+> **Note:**
+
+> - The `-i` argument is here for legacy reasons. If you have split your raw fastq files as described in *[the input files](usage/inputs#fastqFile)* page, you will use the `-f` argument and specify the path to the manifest listing the splitted fastq pairs.
+> - The `-c` argument is optionnal. It triggers the secondary analysis steps. See *[the input files](usage/inputs#compFile)* for more explanations on the comparisons file.
+> - You can specify the path of an already existing reference folder with the `-r` argument. If you do so, the already downloaded reference transcriptome will be used. If the assemblies specified in the samplesheet do not exist under this folder, they will be downloaded.
+
+The program outputs the config file on stdout. In the first time, you can try the command to see if everything is alright and in the second time, redirect the output to a file.
+
+```
+python SCRIPTS/make_srp_config.py -s <my_samplesheet> -r <path_to_reference_folder> -w <path_to_workdir> -f <path_to_manifest> > config.json
+```
+
+If you want secondary analysis to be performed, use option `-c` to specify the comparisons.
+
+```
+python SCRIPTS/make_srp_config.py -s <my_samplesheet> -r <path_to_reference_folder> -w <path_to_workdir> -f <path_to_manifest> -c <comparisons_file> > config.json
+```
+
+In every case, check the generated configuration file to see if everything seems ok.
+```
+more config.json
+```
+
+## Launch the snakemake pipeline.
+
+Test the launch with a dry run:
+```
+snakemake -nrp --config conf="config.json"
+```
+If you see the rules and commands that will be run, everything's fine.
+
+Launch the run on a personal computer:
+```
+snakemake -rp --config conf="config.json" -j 2
+```
+
+> **Note:**
+
+> - You can specify the number of jobs with `-j <N>`.
+> - :warning: Beware that even if you don't specify multiple jobs, two scripts in the pipeline are still parallelized which means you can crash the computer. The pipeline has been built to run on a HPC.
+
+### Running on a cluster 
+
+If you want to launch the pipeline on a cluster, you have to specify a script to encapsulate the jobs for snakemake.
+example for SGE:
+```
+snakemake -rp --config conf="config.json" --cluster "qsub -e ./logs/ -o ./logs/" -j 33 --jobscript SCRIPTS/sge.sh --latency-wait 100
+```
+
+> **Note:**
+
+> - The path to the log output files must **exist** (`$ mkdir ./logs`).
+
+where `SCRIPTS/sge.sh` is a wrapper for the SGE jobs:
+
+```bash
+$ cat SCRIPTS/sge.sh 
+#$ -cwd
+#$ -V
+
+{exec_job}
+```
+
+If your cluster is on slurm (or other), please refer to the snakemake documentation on [executing snakemake](https://snakemake.readthedocs.io/en/v5.1.4/executable.html).

 <div style="text-align: right">
 <i>Next: <a href="analysis/primary_analysis">The primary analysis</a></i>