Next Generation Sequencing

Version	MAJ	abyss
1.5.2	2014-11-18	abyss	Download	Doc

ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.

Remarque

Run Unix # Usage: ABYSS [OPTION]... FILE...

Run Web #

Version	MAJ	amos
3.1.0	2013-08-12	amos	Download	Doc

AMOS: A Modular Open-Source Assembler

Remarque

Run Unix #

Run Web #

Version	MAJ	ART
ChocolateCherryCake	2015-04-30	ART	Download	Doc

ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. ART can also simulate reads using user own read error model or quality profiles. ART supports simulation of single-end, paired-end/mate-pair reads of three major commercial next-generation sequencing platforms: Illumina's Solexa, Roche's 454 and Applied Biosystems' SOLiD. ART can be used to test or benchmark a variety of method or tools for next-generation sequencing data analysis, including read alignment, de novo assembly, SNP and structure variation discovery. ART was used as a primary tool for the simulation study of the 1000 Genomes Project . ART is implemented in C++ with optimized algorithms and is highly efficient in read simulation. ART outputs reads in the FASTQ format, and alignments in the ALN format. ART can also generate alignments in the SAM alignment or UCSC BED file format.

Remarque Citation: Weichun Huang, Leping Li, Jason R Myers, and Gabor T Marth. ART: a next-generation sequencing read simulator, Bioinformatics (2012) 28 (4): 593-594

Run Unix # README FILES in http://genome.jouy.inra.fr/doc/genome/NGS/ART

Run Web #

Version	MAJ	bcftools
1.2	2015-04-15	bcftools	Download	Doc

BCFs.bcftools (Tools for variant calling and manipulating VCFs and BCFs)

Remarque

Run Unix # bcftools

Run Web #

Version	MAJ	bedtools
2.16.2	2012-10-09	bedtools	Download	Doc

The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM. Using BEDTools, one can develop sophisticated pipelines that answer complicated research questions by "streaming" several BEDTools together. The following are examples of common questions that one can address with BEDTools.

Remarque Please cite the following article if you use BEDTools in your research: Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 6, pp. 841–842.

Run Unix #

Run Web #

Version	MAJ	bfast
0.7.0	2013-08-12	bfast	Download	Doc

BFAST : Blat-like Fast Accurate Search Tool BFAST facilitates the fast and accurate mapping of short reads to reference sequences. Some advantages of BFAST include: * Speed: enables billions of short reads to be mapped quickly. * Accuracy: A priori probabilities for mapping reads with defined set of variants. * An easy way to measurably tune accuracy at the expense of speed.

Remarque

Run Unix # bfast [options]

Run Web #

Version	MAJ	bismark
0.14.3	2015-06-05	bismark	Download	Doc

Bismark is a program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step. The output can be easily imported into a genome viewer, such as SeqMonk, and enables a researcher to analyse the methylation levels of their samples straight away.

Remarque

Run Unix # bismark [options] {-1 -2 | }

Run Web #

Version	MAJ	bowtie
1.1.2	2016-07-24	bowtie	Download	Doc

Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).

Remarque

Run Unix # bowtie [options]* {-1 -2 | --12 | ~~} []~~

Run Web #

Version	MAJ	bowtie2
2.2.5	2015-04-07	bowtie2	Download	Doc

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.

Remarque

Run Unix # bowtie2 [options]* -x {-1 -2 | -U } [-S ]

Run Web #

Version	MAJ	breakdancer
1.4.5	2015-03-06	breakdancer	Download	Doc

BreakDancerMax predicts five types of structural variants: insertions, deletions, inversions, inter- and intra-chromosomal translocations from next-generation short paired-end sequencing reads using read pairs that are mapped with unexpected separation distances or orientation.

Remarque

Run Unix # Usage: breakdancer-max

Run Web #

Version	MAJ	bsmap
2.90	2015-03-17	bsmap	Download	Doc

BSMAP is a short reads mapping software for bisulfite sequencing reads. Bisulfite treatment converts unmethylated Cytosines into Uracils (sequenced as Thymine) and leave methylated Cytosines unchanged, hence provides a way to study DNA cytosine methylation at single nucleotide resolution. BSMAP aligns the Ts in the reads to both Cs and Ts in the reference.

Remarque Citation: Xi Y, Li W: BSMAP: whole genome Bisulfite Sequence MAPping program. BMC Bioinformatics (2009) 10:232.

Run Unix # bsmap

Run Web #

Version	MAJ	bwa
0.7.12	2015-04-07	bwa	Download	Doc

BWA is a fast light-weighted tool that aligns short sequences to a sequence database, such as the human reference genome. By default, BWA finds an alignment within edit distance 2 to the query sequence, except for disallowing gaps close to the end of the query. It can also be tuned to find a fraction of longer gaps at the cost of speed and of more false alignments.

Remarque

Run Unix # bwa [options]

Run Web #

Version	MAJ	canu
1.3	2016-10-18	canu	Download	Doc

Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II or Oxford Nanopore MinION).

Remarque Citation: Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. bioRxiv. (2016).

Run Unix # canu

Run Web #

Version	MAJ	CATCh
v1	2015-03-11	CATCh	Download	Doc

CATCh: an ensemble classifier for chimera detection in 16S rRNA sequencing studies

Remarque If you are going to use CATCh, please cite it with the included software (Mothur, WEKA, RDP MultiClassifier 1.1 and DECIPHER): � Mysara M., Saeys Y., Leys N., Raes J., Monsieurs P. 2014. CATCh: an ensemble classifier for chimera detection in 16S rRNA sequencing studies. Under review. � Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. (2009). Introducing mothur: open-source, platform-independent, community-suppo rted software for describing and comparing microbial communities. Applied and environmental microbiology 75:7537�41. � Hall M, National H, Frank E, Holmes G, Pfahringer B, Reutemann P, et al. (2009). The WEKA Data Mining Software: An Update. SIGKDD Explorations 11:10�18. � Wang Q, Garrity GM, Tiedje JM, Cole Naive JR (2007), Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied an d Environmental Microbiology 09/2007; 73(16):5261-7. � ES Wright et al. (2012), DECIPHER, A Search-Based Approach to Chimera Identification for 16S rRNA Sequences. Applied and Environmental Microbiology, doi:10 .1128/AEM.06516-11.

Run Unix # CATCh.run

Run Web #

Version	MAJ	cd-hit-454
-	2013-08-05	cd-hit-454	Download	Doc

The 454 pyrosequencing reads contains artificially duplicates, which might lead to misleading conclusions. cdhit-454 is a fast program to identify exact and nearly identical duplicates, the reads begin at the same position but may vary in length or bear mismatches. cdhit-454 can process a dataset in ~10 minutes. it also provides a consensus sequence for each group of duplicates.

Remarque

Run Unix # cd-hit-454

Run Web # 4.6.1

Version	MAJ	Celera Assembler (wgs)
5.4	2009-10-29	Celera Assembler (wgs)	Download	Doc

Celera Assembler is scientific software for DNA research. It can reconstruct long sequences of genomic DNA from the fragmentary data produced by whole-genome shotgun sequencing. The Celera Assembler is mature, efficient, open-source software written mostly in C for unix operating systems.

Remarque This whole-genome shotgun (WGS) assembler software suite, also known as Celera Assembler, implements sophisticated algorithms for the reconstruction of genomic DNA sequence from data produced by a WGS sequencing experiment.

Run Unix #

Run Web #

Version	MAJ	CNVnator
0.3	2015-02-13	CNVnator	Download	Doc

CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing.

Remarque

Run Unix # cnvnator

Run Web #

Version	MAJ	corona
4.2.2	2009-09-10	corona	Download	Doc

The SOLiD System Analysis Pipeline Tool (Corona Lite) is an off-instrument SOLiD data analysis software package. It supports functionality for mapping color space reads to large or small genomes, pairing for mate-pair runs, SNP calling and generating consensus sequences.

Remarque

Run Unix #

Run Web #

Version	MAJ	cufflinks
2.2.0	2014-05-06	cufflinks	Download	Doc

Cufflinks assembles transcripts and estimates their abundances in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.

Remarque

Run Unix # cufflinks [options]*

Run Web #

Version	MAJ	cutadapt
1.7.1	2015-03-11	cutadapt	Download	Doc

cutadapt is used to remove adapter sequences from high-throughput sequencing data. This is usually necessary when the read length of the sequencing machine is longer than the molecule that is sequenced, for example when sequencing microRNAs.

Remarque

Run Unix # cutadapt [options] []

Run Web #

Version	MAJ	delly
0.6.3	2015-02-25	delly	Download	Doc

DELLY is an integrated structural variant prediction method that can detect deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends and split-reads to sensitively and accurately delineate genomic rearrangements throughout the genome.

Remarque Citation Tobias Rausch, Thomas Zichner, Andreas Schlattl, Adrian M. Stuetz, Vladimir Benes, Jan O. Korbel. Delly: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 2012 28: i333-i339.

Run Unix # Usage: delly [OPTIONS] ...

Run Web #

Version	MAJ	dwgsim
0.1.10	2013-08-02	dwgsim	Download	Doc

Whole genome simulation can be performed with dwgsim. dwgsim is based off of wgsim found in SAMtools written by Heng Li. It was modified to handle ABI SOLiD data, as well as various assumptions about aligners and positions of indels. The documentation below is for the latest dwgsim (not DNAA) release.

Remarque

Run Unix # dwgsim [options]

Run Web #

Version	MAJ	EDGE-pro
1.3.1	2013-07-02	EDGE-pro	Download	Doc

EDGE-pro, Estimated Degree of Gene Expression in PROkaryots is an efficient software system to estimate gene expression levels in prokaryotic genomes from RNA-seq data. EDGE-pro uses Bowtie2 for alignment and then estimates expression directly from the alignment results. EDGE-pro includes routines to assign reads aligning to overlapping gene regions accurately. 15% or more of bacterial genes overlap other genes, making this a significant problem for bacterial RNA-seq, one that is generally ignored by programs designed for eukaryotic RNA-seq experiments.

Remarque Please reference our paper: T. Magoc, D. Wood, and S.L. Salzberg. EDGE-pro: Estimated Degree of Gene Expression in Prokaryotic Genomes. Evolutionary Bioinformatics vol.9, pp.127-136, 2013.

Run Unix # edge.pl <-g genome> <-p ptt> <-r rnt> <-u reads>

Run Web #

Version	MAJ	FastQC
0.10.0	2012-03-05	FastQC	Download	Doc

FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.

Remarque

Run Unix # fastqc ou fastqc seqfile1 seqfile2 .. seqfileN

Run Web #

Version	MAJ	fastqp
0.1.9.1	2017-02-27	fastqp	Download	Doc

Simple FASTQ, SAM and BAM read quality assessment and plotting using Python.

Remarque

Run Unix # fastqp [-h]

Run Web #

Version	MAJ	Fastq_Screen
0.4.4	2014-07-09	Fastq_Screen	Download	Doc

Fastq screen is a simple application which allows you to search a large sequence dataset against a panel of different databases to build up a picture of where the sequences in your data originate. It was built as a QC check for sequencing pipelines but may also have uses in metagenomics studies where mixed samples are expected. Although the program wasn't built with any particular technology in mind it is probably only really suitable for processing short reads due to the use of bowtie/bowtie2 as the searching application. The program generates both text and graphical output to tell you what proportion of your library was able to map, either uniquely or in more than one location, against each of the databases in your search set.

Remarque

Run Unix # fastq_screen [OPTION]... [FastQ FILE]...

Run Web #

Version	MAJ	FLASH
1.2.11	2014-11-13	FLASH	Download	Doc

FLASH, Fast Length Adjustment of SHort reads, is a very accurate fast tool to merge paired-end reads from fragments that are shorter than twice the length of reads. The extended length of reads has a significant positive impact on improvement of genome assemblies.

Remarque

Run Unix # flash [OPTIONS] MATES_1.FASTQ MATES_2.FASTQ Run `flash --help | less' for more information.

Run Web #

Version	MAJ	flux-simulator
1.2.1	2013-07-15	flux-simulator	Download	Doc

The Flux Simulator aims at modeling RNA-Seq experiments in silico: sequencing reads are produced from a reference genome according annotated transcripts. The simulation pipeline models different steps as modules, each with a minimal set of parameters that can be estimated by experimental parameters. The first step is-in fact-a transcriptome simulator. Subsequently, common sources of systematic bias in the abundance and distribution of produced reads are simulated by in silico library preparation and sequencing.

Remarque

Run Unix # flux-simulator --help

Run Web #

Version	MAJ	GASSST
1.28	2013-08-25	GASSST	Download	Doc

GASSST : Global Alignment Short Sequence Search Tool * GASSST finds global alignments of short DNA sequences against large DNA banks. * GASSST strong point is its ability to perform fast gapped alignments. * It works well for both short and longer reads. It currently has been tested for reads up to 500bp. * The software is freely available for download under the CECILL version 2 License.

Remarque http://bioinformatics.oxfordjournals.org/content/26/20/2534.abstract?keytype=ref&ijkey=f5zH80QsuCqixRH

Run Unix # Gassst -d -i -o -p

Run Web #

Version	MAJ	GEM
20121106-022124	2013-07-25	GEM	Download	Doc

The GEM library (Also home to: The GEM mapper, The GEM RNA mapper, The GEM mappability, and others). Next-generation sequencing platforms (Illumina/Solexa, ABI/SOLiD, etc.) call for powerful and very optimized tools to index/analyze huge genomes. The GEM library strives to be a true "next-generation" tool for handling any kind of sequence data, offering state-of-the-art algorithms and data structures specifically tailored to this demanding task. At the moment, efficient indexing and searching algorithms based on the Burrows-Wheeler transform (BWT) have been implemented.

Remarque

Run Unix #

Run Web #

Version	MAJ	GMAP/GSNAP
2013-10-25	2013-10-28	GMAP/GSNAP	Download	Doc

GMAP (genomic mapping and alignment program for mRNA and EST sequences): gmap, a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models. GSNAP (Genomic Short-read Nucleotide Alignment Program): GSNAP implements computational methods for fast detection of complex variants and splicing in short reads, based on a successively constrained search process of merging and filtering position lists from a genomic index. It can align both single- and paired-end reads as short as 14 nt and of arbitrarily long length. It can detect short- and long-distance splicing, including interchromosomal splicing, in individual reads, using probabilistic models or a database of known splice sites.

Remarque

Run Unix # gmap [OPTIONS...]

Run Web #

Version	MAJ	goby
1.4.1	2010-04-15	goby	Download	Doc

Goby is a next-gen data management framework designed to facilitate the implementation of efficient next-gen data analysis pipelines. Goby provides compressed file formats that are time and space efficient. It also provides a few utilities that support the most common secondary data analyses

Remarque

Run Unix # goby

Run Web #

Version	MAJ	HISAT2
2.0.4	2016-09-07	HISAT2	Download	Doc

HISAT is a fast and sensitive spliced alignment program for mapping RNA-seq reads. In addition to one global FM index that represents a whole genome, HISAT uses a large set of small FM indexes that collectively cover the whole genome (each index represents a genomic region of ~64,000 bp and ~48,000 indexes are needed to cover the human genome). These small indexes (called local indexes) combined with several alignment strategies enable effective alignment of RNA-seq reads, in particular, reads spanning multiple exons. The memory footprint of HISAT is relatively low (~4.3GB for the human genome). We have developed HISAT based on the Bowtie2 implementation to handle most of the operations on the FM index.

Remarque

Run Unix # hisat2 [options]* -x {-1 -2 | -U | --sra-acc } [-S ]

Run Web #

Version	MAJ	ICORN
0.97	2010-11-03	ICORN	Download	Doc

iCORN (iterative correction of reference nucleotides) can correct genome sequences with short reads. Reads are mapped iteratively against the genome sequences, so far by SSAHA. Discrepancies between the multiple alignments of the mapping reads and reference are corrected, if by the correction the amount of perfect mapping reads doesn't decrease.

Remarque

Run Unix # cf. http://icorn.sourceforge.net/example.html

Run Web #

Version	MAJ	Illumina CASAVA-1.8 FASTQ Filter
0.1	2014-04-30	Illumina CASAVA-1.8 FASTQ Filter	Download	Doc

The recent version of Illumina's CASAVA pipeline (Version 1.8) produces FASTQ files with both reads that pass filtering and reads that don't. The new READ-ID (the @ line) contains many new fields, one of them indicates whether the read is filtered or not. This program can filter FASTQ files produced by CASAVA 1.8, and keep/discard reads based on this filter flag.

Remarque

Run Unix # fastq_illumina_filter -h

Run Web #

Version	MAJ	IM-TORNADO
2.0.3.3	2016-02-22	IM-TORNADO	Download	Doc

Illumina paired-end sequencing, which produces two separate reads for each DNA fragment, has become the platform of choice for 16S rDNA hypervariable tag sequencing. However, when the two reads do not overlap, existing computational pipelines analyze data from read separately and underutilize the information contained in the paired-end reads. IM-TORNADO is a tool for processing non-overlapping reads while retaining maximal information content.

Remarque If you use IM-TORNADO for your project, please cite the following manuscript: Jeraldo P, Kalari K, Chen X, Bhavsar J, Mangalam A, White B, et al. IM-TORNADO: A Tool for Comparison of 16S Reads from Paired-End Libraries. PLOS ONE 9 (12):e114804. Available from: http://dx.plos.org/10.1371/journal.pone.0114804

Run Unix #

Run Web #

Version	MAJ	inGAP
2.7.8	2011-11-02	inGAP	Download	Doc

This is a novel mining pipeline (2009), Integrative Next-generation Genome Analysis Pipeline (inGAP), guided by a Bayesian principle to detect single nucleotide polymorphisms (SNPs), insertion/deletions (indels) by comparing high-throughput pyrosequencing reads with a reference genome of related organisms. inGAP can be applied to the mapping of both Roche/454 and Illumina reads with no restriction of read length.

Remarque

Run Unix # inGAP

Run Web #

Version	MAJ	jellyfish
1.1.3	2011-12-21	jellyfish	Download	Doc

JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence. JELLYFISH can count k-mers using an order of magnitude less memory and an order of magnitude faster than other k-mer counting packages by using an efficient encoding of a hash table and by exploiting the "compare-and-swap" CPU instruction to increase parallelism. JELLYFISH is a command-line program that reads FASTA and multi-FASTA files containing DNA sequences. It outputs its k-mer counts in an binary format, which can be translated into a human-readable text format using the "jellyfish dump" command. See the documentation below for more details.

Remarque If you use JELLYFISH in your research, please cite: Guillaume Marcais and Carl Kingsford, A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics (2011) 27(6): 764-770 (first published online January 7, 2011) doi:10.1093/bioinformatics/btr011

Run Unix # jellyfish

Run Web #

Version	MAJ	kraken
0.10.5	2015-11-25	kraken	Download	Doc

raken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.

Remarque If you use Kraken in your research, please cite our paper; the citation is available on the Kraken website.

Run Unix # kraken [options]

Run Web #

Version	MAJ	LAST
861	2017-06-02	LAST	Download	Doc

LAST: Genome-Scale Sequence Comparison LAST finds similar regions between sequences, and aligns them. It is designed for comparing large datasets to each other (e.g. vertebrate genomes and/or large numbers of DNA reads). It can:

Remarque

Run Unix #

Run Web #

Version	MAJ	macs
1.4.2	2013-05-16	macs	Download	Doc

Next generation parallel sequencing technologies made chromatin immunoprecipitation followed by sequencing (ChIP-Seq) a popular strategy to study genome-wide protein-DNA interactions, while creating challenges for analysis algorithms. We present Model-based Analysis of ChIP-Seq (MACS) on short reads sequencers such as Genome Analyzer (Illumina / Solexa). MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, is publicly available open source, and can be used for ChIP-Seq with or without control samples.

Remarque

Run Unix # macs14 <-t tfile> [-n name] [-g genomesize] [options]

Run Web #

Version	MAJ	mapsembler
1.3.21	2012-05-31	mapsembler	Download	Doc

Mapsembler is a targeted assembly software. It takes as input a set of NGS raw reads and a set of input sequences (starters). It first determines if each starter is read-coherent, e.g. whether reads confirm the presence of each starter in the original sequence. Then for each read-coherent starter, Mapsembler outputs its sequence neighborhood as a linear sequence or as a graph, depending on the user choice.

Remarque Citation: Peterlongo, P., & Chikhi, R. (2012). Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer. BMC Bioinformatics, 13(1), 48. doi:10.1186/1471-2105-13-48.

Run Unix # mapsembler [-m value] [-o output] [-k value] [-i value] [-e value] [-d value] [-t value] [-E value] [-Clrgfcvsh]

Run Web #

Version	MAJ	MapSplice
1.15.2	2012-01-26	MapSplice	Download	Doc

MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery MapSplice est un algorithme de seconde génération de détection de sites d'épissage alternatifs. Son objectif est de détecter les sites d'épissage de façon sensible et spécifique en maintenant une bonne efficacité au niveau CPU et mémoire. MapSplice peut être appliqué aux reads courts (>75 pb) et long (75 pb). Il ne dépend ni des caractéristiques du site d'épissage ni de la longueur de l'intron, par conséquent, il peut détecter de nouveaux sites canoniques et non-canoniques d'épissage. MapSplice s'appuie sur la qualité et la diversité d'alignements des reads pour augmenter la précision de détection des sites d'épissage.

Remarque Publication MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery Kai Wang; Darshan Singh; Zheng Zeng; Stephen J. Coleman; Yan Huang; Gleb L. Savich; Xiaping He; Piotr Mieczkowski; Sara A. Grimm; Charles M. Perou; James N. MacLeod; Derek Y. Chiang; Jan F. Prins; Jinze Liu Nucleic Acids Research 2010; doi: 10.1093/nar/gkq622

Run Unix # python /usr/local/genome/MapSplice_1.15.2/bin/mapsplice_segments.py MapSplice.cfg

Run Web #

Version	MAJ	maq
0.7.1	2014-10-02	maq	Download	Doc

Maq is a software that builds mapping assemblies from short reads generated by the next-generation sequencing machines. It is particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has a preliminary functionality to handle AB SOLiD data.

Remarque

Run Unix # maq

Run Web #

Version	MAJ	metagene
	2007-05-04	metagene	Download	Doc

Gene Finding Program for Metagenomics MetaGene predicts prokaryotic genes on anonymous genomic sequences. Fragmented sequences (longer than 100 bp) can be accepted.

Remarque

Run Unix # metagene [multi-fasta]

Run Web #

Version	MAJ	MetaGeneAnnotator
-	2009-01-26	MetaGeneAnnotator	Download	Doc

Version améliorée du programe d'annotation de données métagénomiques Metagene. Prediction de genes procaryotes à partir d'un génome ou d'un set de génomes anonymes. Particulierement adapté aux analyses métagénomiques.

Remarque

Run Unix # metageneannotator

Run Web #

Version	MAJ	minia
1.4683	2013-02-21	minia	Download	Doc

Minia is a short-read assembler based on a de Bruijn graph, capable of assembling a human genome on a desktop computer in a day. The output of Minia is a set of contigs. Minia produces results of similar contiguity and accuracy to other de Bruijn assemblers (e.g. Velvet).

Remarque PDF and Citation R. Chikhi, G. Rizk. Space-efficient and exact de Bruijn graph representation based on a Bloom filter, WABI 2012

Run Unix # minia fasta_file kmer_size min_abundance estimated_genome_size prefix

Run Web #

Version	MAJ	mira
4.0	2014-11-18	mira	Download	Doc

MIRA is a Whole Genome Shotgun and EST Sequence Assembler for Sanger, 454 and Solexa / Illumina. It can perform Hybrid de-novo assemblies as well as SNP and mutations discovery for mapping assemblies.

Remarque

Run Unix # mira

Run Web #

Version	MAJ	MIReNA
2.0	2012-09-05	MIReNA	Download	Doc

Remarque

Run Unix #

Run Web #

Version	MAJ	mmseq
0.11.2	2012-11-20	mmseq	Download	Doc

MMSEQ: haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads pipeline The flowchart to the right depicts the MMSEQ pipeline for obtaining expression estimates from RNA-seq data. There are two routes, with starting points labelled A and B. Route A is quite fast and straightforward to run and uses pre-existing transcript sequences for alignment. Route B requires more time, as it involves the creation of custom transcript sequences based on the data.

Remarque Please cite Turro et al. 2011 (Genome Biology) if you use MMSEQ in your work. http://dx.doi.org/10.1186/gb-2011-12-2-r13

Run Unix # mmseq [OPTIONS...] hits_file output_base

Run Web #

Version	MAJ	MMSEQ
1.0.2	2013-09-02	MMSEQ	Download	Doc

MMSEQ: haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads

Remarque Please cite Turro et al. 2011 (http://dx.doi.org/10.1186/gb-2011-12-2-r13)

Run Unix # mmseq / bam2hits

Run Web #

Version	MAJ	MOCAT
1.1	2012-08-01	MOCAT	Download	Doc

MOCAT is a package for analyzing metagenomics datasets. Currently MOCAT supports Illumina single- and paired-end reads in raw FastQ format.

Remarque Jens Roat Kultima & Shinichi Sunagawa (Bork Group, EMBL)

Run Unix # MOCAT.pl -sf|sample_file 'FILE' [Pipeline, Statistics, & Additional Options]

Run Web #

Version	MAJ	MOSAIK assembler
1.1.0021	2011-06-06	MOSAIK assembler	Download	Doc

MOSAIK is a reference-guided assembler comprising of four main modular programs: * MosaikBuild * MosaikAligner * MosaikSort * MosaikAssembler. MosaikBuild converts various sequence formats into Mosaik’s native read format. MosaikAligner pairwise aligns each read to a specified series of reference sequences. MosaikSort resolves paired-end reads and sorts the alignments by the reference sequence coordinates. Finally, MosaikAssembler parses the sorted alignment archive and produces a multiple sequence alignment which is then saved into an assembly file format.

Remarque

Run Unix # MosaikAligner MosaikAssembler MosaikBuild MosaikCoverage MosaikDupSnoop MosaikJump MosaikMerge MosaikSort MosaikText

Run Web #

Version	MAJ	MPscan
-	2013-08-26	MPscan	Download	Doc

MPscan: fast localisation of multiple reads in genomes

Remarque Please cite THIS paper if you use MPscan. Rivals E., Salmela L., Kiiskinen P., Kalsi P., Tarhio J.Lecture Notes in BioInformatics (LNBI), Springer-Verlag, Vol. 5724, p. 246-260, 2009.

Run Unix # mpscan -h

Run Web #

Version	MAJ	mrFAST
2.6.0.0	2012-02-01	mrFAST	Download	Doc

mrFAST is a read mapper that is designed to map short reads to reference genome with a special emphasis on the discovery of structural variation and segmental duplications. mrFAST maps short reads with respect to user defined error threshold, including indels up to 4+4 bp. This manual, describes how to choose the parameters and tune mrFAST with respect to the library settings. mrFAST is designed to find 'all' mappings for a given set of reads, however it can return one "best" map location if the relevant parameter is invoked. NOTE: mrFAST is developed for Illumina, thus requires all reads to be at the same length. For paired-end reads, lengths of mates may be different from each other, but each "side" should have a uniform length.

Remarque Personalized copy number and segmental duplication maps using next-generation sequencing. Can Alkan, Jeffrey M. Kidd, Tomas Marques-Bonet, Gozde Aksay, Francesca Antonacci, Fereydoun Hormozdiari, Jacob O. Kitzman, Carl Baker, Maika Malig, Onur Mutlu, S. Cenk Sahinalp, Richard A. Gibbs, Evan E. Eichler. Nature Genetics, Oct, 41(10):1061-1067, 2009. Table of Contents Sample Set General Indexing Single Genome Mode Batch Mode Mapping Single-end Reads - Single Mode Single-end Reads - Batch Mode Paired-end Reads Discordant Paired-end Reads Output Format Sample Set A sample genome FASTA file, with simulated reads and a command line to map in paired-end mode is supplied. Please download the sample set. General Please download the latest version from our download page and then unzip the downloaded file. Run 'make' to build mrFAST. mrFAST generates an index of the reference genome(s) and maps the reads to reference genome. Requirements: zlib for the ability to read compressed FASTQ and write compressed SAM files. C compiler (mrFAST is developed with gcc versions > 4.1.2) Building: On Unix/Linux systems, we recommend using GNU gcc version > 4.1.2 as your compiler and type 'make' to build. Example: linux> make gcc -c -O3 baseFAST.c -o baseFAST.o gcc -c -O3 CommandLineParser.c -o CommandLineParser.o gcc -c -O3 Common.c -o Common.o gcc -c -O3 HashTable.c -o HashTable.o gcc -c -O3 MrFAST.c -o MrFAST.o gcc -c -O3 Output.c -o Output.o gcc -c -O3 Reads.c -o Reads.o gcc -c -O3 RefGenome.c -o RefGenome.o gcc baseFAST.o CommandLineParser.o Common.o HashTable.o MrFAST.o Output.o Reads.o RefGenome.o -o mrFAST -lz -lm rm -rf *.o Parallelization: The best way to optimize mrFAST is to split the reads into chunks that fit into the memory of the cluster nodes, and implement an MPI wrapper in an embarrassingly parallel fashion. We recommend the following criteria to split the reads: Single End Mode: The number of reads should be approximately ((M-600)/(4*L)) million where M is the size of the memory for the cluster node (in megabytes) and L is the read length. If you have more nodes, you can make the chunks smaller to use the nodes efficiently. For example, if the library length is 50bp and the memory of nodes is 2 GB, each chunk should contain (2000-600)/(4*50)= 7 million reads. Paired End Mode: The number of reads in each file should not exceed 1 million (500,000 pairs), however chunk size of 500,000 reads (250,000 pairs) is recommended. To see the list of options, use "-h" or "--help". To see the version of mrFAST, user "-v" or "--version". Indexing mrFAST's indices can be generated in two modes (single, batch). In single mode, mrFAST indexes a fasta file (which may contain one or more reference genomes) while in batch mode it indexes a set of fasta files. By default mrFAST uses the window size of 12 characters to generate its index. Please be advised that if you do not choose the window size carefully, you will lose sensitivity. How to choose the right window size: For a given read length (l) and error threshold (e), the window size is floor(l/(e+1)). For example if the reads length is 36 and the maximum number of mismatches allowed is 2, the window size is 12. if your calculated window size is greater than default, you can use the default window size without losing the sensitivity. For example, for the read length of 64 and error threshold of 2, the windows size should be 21. You can use the default window size 12. However you cannot use 12 as window size for read length of 30 and error threshold of 2. Single Genome Mode: To index a reference genome like "refgen.fasta" run the following command: $>./mrfast --index refgen.fasta Upon the completion of the indexing phase, you can find "refgen.fasta.index" in the same directory as "refgen.fasta". mrFAST uses a window size of 12 (default) to make the index of the genome, this windows size can be modified with "--ws". There is a restriction on the maximum of the window size as the window size directly affects the memory usage. $>./mrfast --index refgen.fasta --ws 13 Batch Mode In batch mode, mrFAST gets a list of reference files and generates the index for each one of them. Similar to single mode, you can specify a different window size for indexing. $>./mrfast -b --index fasts.list --ws 13 Mapping mrFAST can map single-end reads and paired-end reads to a reference genome. mrFAST can map in either single or batch mode. In single mode, it only maps to one index. In batch mode, it maps to a list of indices. mrFAST supports both fasta and fastq formats. Single-end Reads - Single Mode To map single reads to a reference genome in single mode, run the following command. Use "--seq" to specify the input file. refgen.fa and refgen.fa.index should be in the same folder. You can load a multi-sequence FASTA file as the reference genome. $>./mrfast --search refgen.fa --seq reads.fastq The reported locations will be saved into "output" by default. If you want to save it somewhere else, use "-o" to specify another file. mrFAST can report the unmapped reads in fasta/fastq format. $>./mrfast --search refgen.fasta --seq reads.fastq -o my.map By default, mrFAST reports all the locations per read. If you need one "best" mapping add the "--best" parameter to the command line: $>./mrfast --search refgen.fasta --seq reads.fastq -e 3 --best Single-end Reads - Batch Mode (Note: deprecated after version 2.1.0.6) In batch mode, mrFAST uses a list of indices to find the mappings of the reads. "index.list" should contain the list of fasta files. $>./mrfast -b --search index.list --seq reads.fastq Paired-end Reads To map paired-end reads, use "--pe" option. The mapping can be done in single/batch mode. If the reads are in two different files, you have to use "--seq1/--seq2" to indicate the files. If the reads are interleaved, use "--seq" to indicated the file. The distance allowed between the paired-end reads should be specified with "--min" and "--max". "--min" and "--max" specify the minmum and maximum of the inferred size (the distance between outer edges of the mapping mates). $>./mrfast --search refgen.fasta --pe --seq reads.fastq --min 150 --max 250 Discordant Mapping mrFAST can report the discordant mapping for use of Variation Hunter. The --min and --max optiopns will define the minimum and maximum inferred size for concordant mapping. This is enabled by default since version 2.1.0.6 $>./mrfast --search refgen.fasta --pe --discordant-vh --seq reads.fastq --min 50 --max 75 Parameters General Options: -v|--version Shows the current version. -h Shows the help screen. Indexing Options: --index [file] Generate an index from the specified fasta file. -b Indicates the indexing will be done in batch mode. The file specified in --search should contain the list of fasta files. (Note: deprecated after version 2.1.0.6) -ws [int] Set window size for indexing (default:12 max:14). Searching Options: --search [file] Search the specified genome. Index file should be in same directory as the fasta file. -b Indicates the mapping will be done in batch mode. The file specified in --search should contain the list of fasta files. (Note: deprecated after version 2.1.0.6) --pe Search will be done in paired-end mode --mp Search will be done in matepair mode --seq [file] Input sequences in fasta/fastq format [file]. If pairend reads are interleaved, use this option. --seq1 [file] Input sequences in fasta/fastq format [file] (First file). Use this option to indicate the first file of paired-end reads --seq2 [file] Input sequences in fasta/fastq format [file] (Second file). Use this option to indicate the second file of paired-end reads. -o [file] Output of the mapped sequences (SAM format). The default is "output". -u [file] FASTA/FASTQ file for the unmapped sequences. The default is "unmapped". -e [int] Maximum allowed edit distance (default 4% of the read length). Note that although the current version is limited with up to 4+4 indels, it supports any number of substitution errors. --min [int] Min inferred distance allowed between two pairend sequences. --max [int] Max inferred distance allowed between two pairend sequences. --discordant-vh To return all discordant map locations ready for the Variation Hunter program, and OEA map locations ready for the NovelSeq. --best Return "best" location only (single-end mode). --seqcomp Indicates that the input sequences are compressed (gz). --outcomp Indicates that output file should be compressed (gz). --maxoea [int] Max number of One End Anchored (OEA) returned for each read pair. Minimum of 100 is recommendded for NovelSeq use. --maxdis [int] Max number of discordant map locations returned for each read pair. --crop [int] Crop the input reads at position [int]. --sample [string] Sample name to be added to the SAM header (optional). --rg [string] Read group ID to be added to the SAM header (optional). --lib [string] Library name to be added to the SAM header (optional). Output Files Single-End Mode: In the single-end mode mrFAST will generate two files as specified by the "-o" and "-u" parameters. Default filenename if the "-o" parameter is not specified is "output"; and default filename for the "-u" parameter is "unmapped". output file ("-o"): Contains the map locations of the sequences in the specified genome in SAM format. mrFAST returns all possible map locations within the given edit distance ("-e") by default. If the "--best" parameter is invoked, then it will select one "best" location that has the minimum edit distance to the genome. unmappped file ("-u"): Contains the unmapped reads in FASTQ or FASTA format, depending on the format of the input sequences. Paired-End and Matepair Modes: In paired-end and matepair modes, mrFAST will generate a SAM file in the paired-end mode that will store best mapping locations while utilizing the paired-end span information. In addition, it will generate a DIVET file and and OEA file (SAM format). See below: output file ("-o"): Contains the map locations of the sequences in the specified genome in SAM format. This file will include: If a read pair can be mapped concordantly, the "best" (minimum total edit distance and minimum differential from the average span) map location for the pair. If the read pair can not be mapped concordantly, again, the "best" (minimum total edit distance and minimum differential from the average span) map location for the pair. unmapped file ("-u"): Contains the orphan (both ends unmapped) reads in FASTQ or FASTA format, depending on the format of the input sequences. output.DIVET.vh file ("-o" option changes the prefix "output"): This file includes all possible map locations for the read pairs that cannot be concordantly mapped. This file can be loaded by VariationHunter tool for structural variation discovery. output_OEA file: Contains the OEA (One-End-Anchored) reads (paired-end reads where only one read can be mapped to the genome). The output is in SAM format, contains the map location of read that can be mapped to the genome. The unmapped reads of an OEA read pair are not reported in separate lines; instead the sequence and quality information is given in the line that specifies the map location of the mapped read. We use optional fields NS and NQ to specify the unmapped sequence and unmapped quality information. This file can be loaded by NovelSeq tool for novel sequence discovery, however format conversion might be required; please see the NovelSeq documentation. NOTE: mrFAST will report many (up to 100 by default) possible map locations for the "mapped" read of OEA matepais. This will generate a large file due to repeats and duplications. This file can be limited through the --maxoea parameter (version 2.1.0.0 and above). Output Format mrFAST mapping output format is in SAM format. For detail about the definition of the fields please refer to SAM Manual. We have not implemented "MQUAL" field yet. All locations of discordant paired-end reads will be reported in DIVET format as required by the VariationHunter package. Unmapped reads (or, "orphan" read pairs in the PE mode) will be outputted in FASTQ or FASTA format, depending on the input sequence file format.

Run Unix # mrfast [options]

Run Web #

Version	MAJ	mrsFAST
2.5.0.4	2012-02-01	mrsFAST	Download	Doc

mrsFAST is a cache oblivious mapper that is designed to map short reads to reference genome. mrsFAST maps short reads with respect to user defined error threshold. In this manual, we will show how to choose the parameters and tune mrsFAST with respect to the library settings. mrsFAST is designed to find 'all' the mappings for a given set of reads.

Remarque

Run Unix # mrsFAST -h

Run Web #

Version	MAJ	multiqc
0.8	2016-11-09	multiqc	Download	Doc

summarize analysis results for multiple tools and samples in a single report

Remarque

Run Unix # multiqc_env

Run Web #

Version	MAJ	nesoni
0.40	2011-01-31	nesoni	Download	Doc

Nesoni focusses on analysing the alignment of reads to a reference genome. We use the SHRiMP read aligner, as it is able to detect small insertions and deletions in addition to SNPs. Nesoni can call a consensus of read alignments, taking care to indicate ambiguity. This can then be used in various ways: to determine the protein level changes resulting from SNPs and indels, to find differences between multiple strains, or to produce n-way comparison data suitable for phylogenetic analysis in SplitsTree4. Alternatively, the raw counts of bases at each position in the reference seen in two different sequenced strains can compared using Fisher's Exact Test.

Remarque

Run Unix # nesoni

Run Web #

Version	MAJ	newbler
2.6	2011-07-06	newbler	Download	Doc

Newbler is a package of three data analysis applications made by Roche 454 : the GS De Novo Assembler (with or without contig scaffolding using Paired End reads), the GS Reference Mapper, and the GS Amplicon Variant Analyzer (AVA). An additional application, the GS Run Browser, is an interactive Run browser/ troubleshooting tool which displays graphically the images, some intermediate data, and various output metrics from a sequencing Run. The software package also includes the SFF Tools commands for handling and using the data files (called Standard Flowgram Format or SFF files) that hold the sequencing trace data.

Remarque

Run Unix # newbler

Run Web #

Version	MAJ	NGSToolsMIG
1.0	2011-02-04	NGSToolsMIG	Download	Doc

Tools developed in MIG laboratory to help in the process of Next generation Sequencing Data analysis : quality control, mapping, assembly, global statistics, etc. ///////// adaptiveTrim.pl ///////// alignmentStatistics.pl ///////// contigsExtractionOnLength.pl ///////// fastqQualityConverter.pl ///////// gbk2Fasta.pl ///////// globalTrim.pl ///////// multiFasta2Fasta.pl ///////// show2Fasta.pl ///////// unmappedReadsExtraction.pl ///////// (Cf. Doc)

Remarque

Run Unix # ex.: contigsExtractionOnLength.pl -i fichier.fasta -do /Dir1/Dir11/Dir111/ -po fichierFiltre -l 1500 -r

Run Web #

Version	MAJ	novoalign
2.08.01	2013-08-20	novoalign	Download	Doc

Novoalign is an aligner for single-ended and paired-end reads from the Illumina Genome Analyser. Novoalign finds global optimum alignments using full Needleman-Wunsch algorithm with affine gap penalties.

Remarque

Run Unix # novoalign [options]

Run Web #

Version	MAJ	pindel
0.2.5a8	2015-02-11	pindel	Download	Doc

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.

Remarque Cite Pindel: Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009 Nov 1;25(21):2865-71. Epub 2009 Jun 26.

Run Unix # Usage: pindel -f -p [and/or -i bam_configuration_file] -c -o

Run Web #

Version	MAJ	poretools
0.6.0	2016-11-30	poretools	Download	Doc

poretools: a toolkit for working with nanopore sequencing data from Oxford Nanopore. The MinION (TM) from Oxford Nanopore Technologies (ONT) is the first nanopore sequencer to be commercialised and is now available to early-access users. The MinION (TM) is a USB-connected, portable nanopore sequencer which permits real-time analysis of streaming event data. Currently, the research community lacks a standardized toolkit for the analysis of nanopore datasets.

Remarque

Run Unix # poretools_env

Run Web #

Version	MAJ	prinseq
0.17.1	2013-08-20	prinseq	Download	Doc

PRINSEQ CAN BE USED TO FILTER, REFORMAT, OR TRIM YOUR GENOMIC AND METAGENOMIC SEQUENCE DATA. IT GENERATES SUMMARY STATISTICS OF YOUR $ GRAPHICAL AND TABULAR FORMAT.

Remarque

Run Unix # prinseq-lite.pl -h

Run Web #

Version	MAJ	ProbeMatch
-	2010-05-11	ProbeMatch	Download	Doc

ProbeMatch is a sequence alignment program that finds sequence alignments for short DNA sequences ( 36-50 bp ). Unlike other programs such as eland and soap that perform ungapped alignment allowing up to 2 substitution, Probematch performs *gapped* alignment, allowing up to 3 errors including substitution, insertion, and deletion.

Remarque

Run Unix # probematch [options] ou # probematch --help

Run Web #

Version	MAJ	prokka
1.10	2014-11-02	prokka	Download	Doc

Prokka is a software tool for the rapid annotation of prokaryotic genomes. A typical 4 Mbp genome can be fully annotated in less than 10 minutes on a quad-core computer, and scales well to 32 core SMP systems. It produces GFF3, GBK and SQN files that are ready for editing in Sequin and ultimately submitted to Genbank/DDJB/ENA.

Remarque

Run Unix # prokka [options]

Run Web #

Version	MAJ	Quake
0.3.5	2014-10-02	Quake	Download	Doc

Quake is a package to correct substitution sequencing errors in experiments with deep coverage (e.g. >15X), specifically intended for Illumina sequencing reads. Quake adopts the k-mer error correction framework, first introduced by the EULER genome assembly package. Unlike EULER and similar progams, Quake utilizes a robust mixture model of erroneous and genuine k-mer distributions to determine where errors are located. Then Quake uses read quality values and learns the nucleotide to nucleotide error rates to determine what types of errors are most likely. This leads to more corrections and greater accuracy, especially with respect to avoiding mis-corrections, which create false sequence unsimilar to anything in the original genome sequence from which the read was taken.

Remarque Kelley DR, Schatz MC, Salzberg SL. Quake: quality-aware detection and correction of sequencing errors. Genome Biology 11:R116 2010. (http://genomebiology.com/2010/11/11/R116/abstract)

Run Unix # quake.py --help

Run Web # 0.3.5

Version	MAJ	quip
1.1.4	2013-02-21	quip	Download	Doc

Quip compresses next-generation sequencing data with extreme prejudice. It supports input and output in the FASTQ and SAM/BAM formats, compressing large datasets to as little as 15% of their original size.

Remarque Compression of next-generation sequencing reads aided by highly efficient de novo assembly Daniel C. Jones; Walter L. Ruzzo; Xinxia Peng; Michael G. Katze — Nucleic Acids Research 2012; doi: 10.1093/nar/gks754

Run Unix # quip

Run Web #

Version	MAJ	ray
2.3.1	2014-06-19	ray	Download	Doc

Ray is a parallel de novo genome assembler that utilises the message-passing interface everywhere and is implemented using peer-to-peer communication.

Remarque Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. Sébastien Boisvert, François Laviolette, and Jacques Corbeil. Journal of Computational Biology (Mary Ann Liebert, Inc. publishers). November 2010, 17(11): 1519-1533. doi:10.1089/cmb.2009.0238

Run Unix # Ray -help

Run Web #

Version	MAJ	ReAS
2.02	2011-06-30	ReAS	Download	Doc

ReAS: Recovery of Ancestral Sequences for Transposable Elements from the Unassembled Reads of a Whole Genome Shotgun

Remarque http://www.ploscompbiol.org/article/info:doi%2F10.1371%2Fjournal.pcbi.0010043

Run Unix #

Run Web #

Version	MAJ	reptile
2.0	2012-05-03	reptile	Download	Doc

Reptile is a software developed in C++ for correcting sequencing errors in short reads from next-gen sequencing platforms.

Remarque

Run Unix # reptile-omp

Run Web #

Version	MAJ	rna2map
0.5.0	2009-09-10	rna2map	Download	Doc

The SOLiD System Small RNA Analysis Pipeline Tool (RNA2MAP) can be used to perform whole genome analysis of color space RNA library reads. It consists of three major procedures: filtering, matching against miRBase sequences (Sanger), and matching against a reference genome.

Remarque

Run Unix #

Run Web #

Version	MAJ	RUM
1.12.01	2012-05-03	RUM	Download	Doc

RUM is an alignment, junction calling, and feature quantification pipeline specifically designed for Illumina RNA-Seq data.

Remarque Comparative Analysis of RNA-Seq Alignment Algorithms and the RNA-Seq Unified Mapper (RUM) Gregory R. Grant, Michael H. Farkas, Angel Pizarro, Nicholas Lahens, Jonathan Schug, Brian Brunk, Christian J. Stoeckert Jr, John B. Hogenesch and Eric A. Pierce.

Run Unix # RUM_runner.pl [options]

Run Web #

Version	MAJ	samToFastq
1.62(1113)	2012-02-17	samToFastq	Download	Doc

Extracts read sequences and qualities from the input SAM/BAM file and writes them into the output file in Sanger fastq format. In the RC mode (default is True), if the read is aligned and the alignment is to the reverse strand on the genome, the read's sequence from input SAM file will be reverse-complemented prior to writing it to fastq in order restore correctly the original read sequence as it was generated by the sequencer.

Remarque

Run Unix #

Run Web #

Version	MAJ	SAMtools
1.2	2015-04-15	SAMtools	Download	Doc

SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.

Remarque

Run Unix # samtools [options]

Run Web #

Version	MAJ	seqmap
1.0.12	2009-02-19	seqmap	Download	Doc

SeqMap is a tool for mapping large amount of oligonucleotide to the genome. It is designed for finding all the places in a genome where an oligonucleotide could potentially come from. SeqMap can efficiently map as many as dozens of millions of short sequences to a genome of several billions of nucleotides. While doing the mapping, several mutations as well as insertions/deletions of the nucleotide bases in the sequences can be tolerated and furthermore detected. Various input and output formats are supported, as well as many command line options for tuning almost every steps in the mapping process.

Remarque Publication: http://dx.doi.org/10.1093/bioinformatics/btn429

Run Unix # seqmap

Run Web #

Version	MAJ	seqtk
--	2013-02-26	seqtk	Download	Doc

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip.

Remarque

Run Unix # seqtk

Run Web #

Version	MAJ	sickle
1.200	2013-02-26	sickle	Download	Doc

sickle - A windowed adaptive trimming tool for FASTQ files using quality

Remarque

Run Unix # sickle [options]

Run Web #

Version	MAJ	snap
0.13	2012-07-16	snap	Download	Doc

SNAP is a new sequence aligner that is 10-100x faster and simultaneously more accurate than existing tools like BWA, Bowtie2 and SOAP2. It runs on commodity x86 processors, and supports a rich error model that lets it cheaply match reads with more differences from the reference than other tools. This gives SNAP up to 2x lower error rates than existing tools and lets it match larger mutations that they may miss.

Remarque Faster and More Accurate Sequence Alignment with SNAP. Matei Zaharia, William J. Bolosky, Kristal Curtis, Armando Fox, David Patterson, Scott Shenker, Ion Stoica, Richard M. Karp, and Taylor Sittler. arXiv:1111.5572v1, November 2011.

Run Unix # snap

Run Web #

Version	MAJ	soap
2.20	2014-08-23	soap	Download	Doc

SOAPaligner/soap2 is a member of the SOAP (Short Oligonucleotide Analysis Package). It is an updated version of SOAP software for short oligonucleotide alignment. The new program features in super fast and accurate alignment for huge amounts of short reads generated by Illumina/Solexa Genome Analyzer. Compared to soap v1, it is one order of magnitude faster. It require only 2 minutes aligning one million single-end reads onto the human reference genome. Another remarkable improvement of SOAPaligner is that it now supports a wide range of the read length.

Remarque To run SOAPaligner, we need to build index files for the reference genome (2bwt-builder), and then search reads against the formatted index files(soap).

Run Unix # soap

Run Web #

Version	MAJ	soap.coverage
2.7.7	2011-12-14	soap.coverage	Download	Doc

Utility for SOAP - soap.coverage can calculate sequencing coverage or physical coverage as well as duplication rate and details of specific block for each segments and whole genome by using SOAP, BLAT, BLAST, BlastZ, mum- mer and MAQ aligement results with multi-thread.

Remarque

Run Unix # soap.coverage

Run Web #

Version	MAJ	SOAPdenovo
1.04	2010-08-23	SOAPdenovo	Download	Doc

SOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads. It creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost effective way.

Remarque

Run Unix # soapdenovo [option]

Run Web #

Version	MAJ	SolexaQA
1.7	2011-05-30	SolexaQA	Download	Doc

SolexaQA is a Perl-based software package that calculates quality statistics and creates visual representations of data quality from FASTQ files generated by Illumina second-generation sequencing technology (“Solexa”).

Remarque

Run Unix # SolexaQA.pl

Run Web #

Version	MAJ	SPAdes
3.9.0	2016-08-16	SPAdes	Download	Doc

SPAdes is a de Bruijn graph based assembler. It integrates a read error corrector, a multiple kmer De Bruijn graph assembler, an assembly merger, a scaffoler and a repeat resolver.

Remarque

Run Unix # spades

Run Web #

Version	MAJ	ssaha2
2.5.2	2014-08-20	ssaha2	Download	Doc

SSAHA (Sequence Search and Alignment by Hashing Algorithm) is an algorithm for very fast matching and alignment of DNA sequences. It achieves its fast search speed by encoding sequence information in a perfect hash function.

Remarque

Run Unix # ssaha2

Run Web #

Version	MAJ	ssake
3.2	2008-07-30	ssake	Download	Doc

SSAKE is a genomics application for assembling millions of very short DNA sequences.sIt is an easy-to-use, robust, reliable and tractable clustering algorithm for very short sequence reads, such as those generated by Illumina Ltd.

Remarque

Run Unix # ssake.pl

Run Web #

Version	MAJ	sspace
2.0	2013-02-21	sspace	Download	Doc

SSPACE is not a de novo assembler, it is used after a preassembled run. SSPACE is a script to extend and scaffold preassembled contigs using a number of mate pairs or paired-end libraries. It uses Bowtie to map all the reads to the pre-assembled contigs. Unmapped reads are used for extending, if desired, the pre-assembled contigs with the SSAKE assembler. Again Bowtie is used to map the reads to the extended contigs. Positions and orientation of the reads are stored and used for scaffolding. If both reads of a pair are found within the allowed distance, they are used for scaffolding to determine the orientation, contig pairing and ordering of the contigs.

Remarque

Run Unix # /usr/local/genome/SSPACE-BASIC-2.0_linux-x86_64/SSPACE_Basic_v2.0.pl

Run Web #

Version	MAJ	stacks
0.9995	2012-08-21	stacks	Download	Doc

Stacks is a software pipeline for building loci out of a set of short-read sequenced samples. Stacks was developed for the purpose of building genetic maps from RAD-Tag Illumina sequence data, but can also be readily applied to population studies, and phylogeography.

Remarque Please cite this paper: J. Catchen, A. Amores, P. Hohenlohe, W. Cresko, and J. Postlethwait. Stacks: building and genotyping loci de novo from short-read sequences. G3: Genes, Genomes, Genetics, 1:171-182, 2011. [reprint]

Run Unix #

Run Web #

Version	MAJ	tablet
1.14.10.20	2015-04-07	tablet	Download	Doc

Tablet is a lightweight, high-performance graphical viewer for next generation sequence assemblies and alignments.

Remarque

Run Unix # tablet

Run Web #

Version	MAJ	tagdust
1.13	2013-09-13	tagdust	Download	Doc

TagDust is a program to eliminate artifactual reads from next-generation sequencing data sets.

Remarque Lassmann T., et al. (2009) TagDust - A program to eliminate artifacts from next generation sequencing data. Bioinformatics.

Run Unix # tagdust [options] lib.fa read1.fa read2.fa ...

Run Web #

Version	MAJ	TMAP
3.4.1	2013-10-25	TMAP	Download	Doc

TMAP / Torrent Mapping Alignment Program - Alignment software for short and long nucleotide sequences produced by next-generation sequencing technologies.

Remarque

Run Unix #

Run Web #

Version	MAJ	tophat
2.0.9	2013-07-10	tophat	Download	Doc

TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.

Remarque

Run Unix # tophat -h

Run Web #

Version	MAJ	Trimmomatic
0.32	2014-01-06	Trimmomatic	Download	Doc

Trimmomatic: A flexible read trimming tool for Illumina NGS data Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line.

Remarque

Run Unix # trimmomatic

Run Web #

Version	MAJ	Trinity
2.2.0	2016-07-01	Trinity	Download	Doc

RNA-Seq De novo Assembly Using Trinity Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads.

Remarque

Run Unix # Trinity

Run Web #

Version	MAJ	vcake
1.0	2008-07-30	vcake	Download	Doc

VCAKE is a genetic sequence assembler capable of assembling millions of small nucleotide reads even in the presence of sequencing error. This software is currently geared towards de novo assembly of Illumina's Solexa Sequencing data.

Remarque

Run Unix # perl -S vcake.pl

Run Web #

Version	MAJ	velvet
1.2.07	2013-08-07	velvet	Download	Doc

Sequence assembler for very short reads. Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near Cambridge, in the United Kingdom.

Remarque

Run Unix # velveth # velvetg

Run Web #

Menu principal

Vous êtes ici

Next Generation Sequencing

abyss

amos

ART

bcftools

bedtools

bfast

bismark

bowtie

bowtie2

breakdancer

bsmap

bwa

canu

CATCh

cd-hit-454

Celera Assembler (wgs)

CNVnator

corona

cufflinks

cutadapt

delly

dwgsim

EDGE-pro

FastQC

fastqp

Fastq_Screen

FLASH

flux-simulator

GASSST

GEM

GMAP/GSNAP

goby

HISAT2

ICORN

Illumina CASAVA-1.8 FASTQ Filter

IM-TORNADO

inGAP

jellyfish

kraken

LAST

macs

mapsembler

MapSplice

maq

metagene

MetaGeneAnnotator

minia

mira

MIReNA

mmseq

MMSEQ

MOCAT

MOSAIK assembler

MPscan

mrFAST

mrsFAST

multiqc

nesoni

newbler

NGSToolsMIG

novoalign

pindel

poretools

prinseq

ProbeMatch

prokka

Quake

quip

ray

ReAS

reptile

rna2map

RUM

samToFastq

SAMtools

seqmap

seqtk