Référentiel des outils installés sur la plateforme Migale
La liste des packages R installés sur la plateforme Migale est disponible ici.
jellyfish (version 1.1.3 - 2011-12-21)
JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence. JELLYFISH can count k-mers using an order of magnitude less memory and an order of magnitude faster than other k-mer counting packages by using an efficient encoding of a hash table and by exploiting the "compare-and-swap" CPU instruction to increase parallelism.JELLYFISH is a command-line program that reads FASTA and multi-FASTA files containing DNA sequences. It outputs its k-mer counts in an binary format, which can be translated into a human-readable text format using the "jellyfish dump" command. See the documentation below for more details.
Documentation : http://genome.jouy.inra.fr/doc/genome/NGS/jellyfish-1.1.3/
Remarque : If you use JELLYFISH in your research, please cite: Guillaume Marcais and Carl Kingsford, A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics (2011) 27(6): 764-770 (first published online January 7, 2011) doi:10.1093/bioinformatics/btr011
Usage : #jellyfish
Julia (version 0.5.0 - 2017-02-08)
Julia is a high-level, high-performance dynamic programming language fortechnical computing, with syntax that is familiar to users of other technicalcomputing environments.It is a very performant programming language somehow similar to R, Matlab orPython, but with performances approaching those of C/Fortran.
Download : http://julialang.org/
Documentation : http://docs.julialang.org/en/stable/#manual
Usage : #julia
kaiju (version 1.6.0 - 2018-01-16)
Kaiju is a program for the taxonomic classification of metagenomic high-throughput sequencing reads. Each read is directly assigned to a taxon within the NCBI taxonomy by comparing it to a reference database containing microbial and viral protein sequences.
Remarque : CitationMenzel P., Ng K.L., Krogh A. (2016) Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7:11257
Usage : #kaiju -t nodes.dmp -f kaiju_db.fmi -i reads.fastq [-j reads2.fastq]
kraken (version 0.10.5 - 2015-11-25)
raken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.
Documentation : http://ccb.jhu.edu/software/kraken/MANUAL.html
Remarque : If you use Kraken in your research, please cite our paper; the citationis available on the Kraken website.
Usage : #kraken [options]
kraken2 (version 2.0 - 2018-09-06)
The second version of the Kraken taxonomic sequence classification system
Documentation : http://ccb.jhu.edu/software/kraken/
Usage : #kraken2 [options]
kSNP (version 2.1.2 - 2014-04-24)
Indentify SNPs in a set of genome sequences without the requirement of a reference sequence or a multiple sequence alignment.Reconstruction of SNP based phylogenies by maximum likelihood.
Download : http://sourceforge.net/projects/ksnp/
Usage : #kSNP -k kmer_length -f fasta -d output_directory [-p genomes4positions_list] [-u unassembled_genomes_list] [-m minimum_fraction_genomes_with_locus] [-G genbank.gbk] [-n num_CPU] [-j ] [-v ] [-c min_kmer_coverage]
LoRDEC (version 0.7 - 2017-11-03)
LoRDEC: a hybrid error correction program for long, PacBio reads LoRDEC is a program to correct sequencing errors in long reads from 3rd generation sequencing with high error rate, and is especially intended for PacBio reads. It uses a hybrid strategy, meaning that it uses two sets of reads: the reference read set, whose error rate is assumed to be small, and the PacBio read set, which is then corrected using the reference set. Typically, the reference set contains Illumina reads.
Remarque : L. Salmela and E. Rivals LoRDEC: accurate and efficient long read error correction Bioinformatics 30(24):3506-3514, 2014, doi: 10.1093/bioinformatics/btu538.
Usage : #lordec-correct [-t
] [-b ] [-e ] [-T ] [-S ] [-c] -i -2 -k -o
macsyfinder (version 1.5.0 - 2018-03-23)
Detection of macromolecular systems in protein datasets using systems modelling and similarity search.
Download : https://github.com/gem-pasteur/macsyfinder
Usage : #macsyfinder_env
mafft (version 7.310 - 2017-08-28)
MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods.
Usage : #mafft [options] input > output
Mash (version 2.0 - 2018-01-29)
Fast genome and metagenome distance estimation using MinHash
Download : https://github.com/marbl/Mash
Documentation : https://github.com/marbl/Mash
Remarque : Mash: fast genome and metagenome distance estimation using MinHash. Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH, Koren S, Phillippy AM. Genome Biol. 2016 Jun 20;17(1):132. doi: 10.1186/s13059-016-0997-x.
Usage : #mash
[options] [arguments ...]