|
|
# HiveAlign
|
|
|
This tool is designed to visualize graphs alignment as an [hive plot](http://www.hiveplot.com/) and to explore it in an interactive manner thanks to the [d3js hiveplot plugin](https://bost.ocks.org/mike/hive/).
|
|
|
|
|
|
In case you don't already have an alignment file, the package comes with the [L-GRAAL aligner](http://www0.cs.ucl.ac.uk/staff/natasa/L-GRAAL/index.html) and two scripts to prepare and run the alignment process respectively.
|
|
|
|
|
|
## Getting started
|
|
|
These instructions will get you a copy of the project up and running on your machine.
|
|
|
|
|
|
### Prerequisites
|
|
|
- [Git](https://git-scm.com/downloads)
|
|
|
- [Miniconda3](https://conda.io/miniconda.html)
|
|
|
|
|
|
### Installing
|
|
|
All you need to do is clone the repository and create the conda environment.
|
|
|
~~~
|
|
|
git clone https://gitlab.univ-nantes.fr/erwan.delage/HiveAlign.git
|
|
|
cd HiveAlign
|
|
|
conda env create -f conda/hivealign.yml
|
|
|
~~~
|
|
|
|
|
|
## Running the tests
|
|
|
The workflow comes with a toy dataset for testing purpose. Data were taken from the [HMP](https://hmpdacc.org/hmp/HMQCP/#protocols) and processed to produce a small and undertandable example, comparing microbiomes from two body sites.
|
|
|
|
|
|
To try the tool, just run the two following command lines.
|
|
|
~~~
|
|
|
source activate HiveAlign
|
|
|
./test/HMP/run_HMP_test.sh
|
|
|
~~~
|
|
|
|
|
|
The results are written in the **out** directory.
|
|
|
|
|
|
## Running for real
|
|
|
|
|
|
### Optionnal step
|
|
|
|
|
|
*Only if you don't already have an alignment*
|
|
|
|
|
|
**NB : L-GRAAL comes as a binary, as far as we know it only works on Linux system**
|
|
|
##### prepareLgraal.py
|
|
|
|
|
|
To produce an alignment with [L-GRAAL](http://www0.cs.ucl.ac.uk/staff/natasa/L-GRAAL/index.html), graphs first need to be transformed in the [LEDA](http://www.algorithmic-solutions.info/leda_guide/graphs/leda_native_graph_fileformat.html) format.
|
|
|
|
|
|
L-GRAAL is designed to find the best alignment according two criterions :
|
|
|
- topology, by comparing nodes graphlet signature
|
|
|
- sequences, by comparing nodes DNA sequences (typically evaluating [BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi) bitscore)
|
|
|
|
|
|
Theorically, we could provide L-GRAAL other scores than sequences comparison (such as phylogenetic distance for OTUs), but our script is not made for those scenarios.
|
|
|
|
|
|
~~~
|
|
|
python prepareLgraal.py first_graph second_graph first_graph_sequences.fasta second_graph_sequences.fasta --out L-GRAAL-input
|
|
|
~~~
|
|
|
|
|
|
Graphs format must either be GRAPHML (highly recommended) or CSV(;).
|
|
|
**FASTA sequences identifiers must exactly match nodes id.**
|
|
|
|
|
|
This script will produce two LEDA files corresponding to each graph, and a *.eval* file listing the best BLAST bitscores in a tab-separated format.
|
|
|
|
|
|
##### execute_l-graal.py.py
|
|
|
|
|
|
This script is just a python wrapper for running L-GRAAL.
|
|
|
|
|
|
~~~
|
|
|
python execute_l-graal.py L-GRAAL-input/first_graph.leda L-GRAAL-input/second_graph.leda L-GRAAL-input/blast.eval --alpha 1.1 --out L-GRAAL-output
|
|
|
~~~
|
|
|
|
|
|
The first three parameters are the one obtained from the previous step.
|
|
|
The alpha parameter controls the balance between the topological and sequence scores. If alpha is set to 0, the alignment will be built only according topological information, if set to 1, only according sequences information. Every value between 0 and 1 can be given, reflecting the tradeoff between the two criterions. See the [L-GRAAL paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4481854/) for more informations.
|
|
|
|
|
|
> **Note:**
|
|
|
> - If 1.1 is given as an alpha value, ten alignments will be computed with alpha starting at 0 and ending at 1 with a 0.1 pace.
|
|
|
|
|
|
|
|
|
The output is either an alignment file (tab-separated) or, if alpha equals 1.1, ten alignment files.
|
|
|
|
|
|
### HiveAlign
|
|
|
|
|
|
~~~
|
|
|
python HiveAlign.py first_graph second_graph L-GRAAL-output --out hiveplots
|
|
|
~~~
|
|
|
|
|
|
Graphs format must either be GRAPHML (highly recommended), gexf or CSV(;).
|
|
|
|
|
|
The third parameter is either an alignment file, or, if using L-GRAAL with alpha=1.1, a directory containting several alignment files.
|
|
|
|
|
|
The output is the hive plot as an html file.
|
|
|
|
|
|
## Interpreting the hiveplot
|
|
|
|
|
|
[Here](Interpretation)
|
|
|
|
|
|
## References
|
|
|
|
|
|
This README file was created according to the [following template](https://gist.github.com/PurpleBooth/109311bb0361f32d87a2) |