Next Generation Sequencing


VersionMAJ

abyss

1.5.22014-11-18DownloadDoc
ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.

Remarque
Run Unix # Usage: ABYSS [OPTION]... FILE...Run Web #

VersionMAJ

amos

3.1.02013-08-12DownloadDoc
AMOS: A Modular Open-Source Assembler

Remarque
Run Unix # Run Web #

VersionMAJ

ART

ChocolateCherryCake2015-04-30DownloadDoc
ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. ART can also simulate reads using user own read error model or quality profiles. ART supports simulation of single-end, paired-end/mate-pair reads of three major commercial next-generation sequencing platforms: Illumina's Solexa, Roche's 454 and Applied Biosystems' SOLiD. ART can be used to test or benchmark a variety of method or tools for next-generation sequencing data analysis, including read alignment, de novo assembly, SNP and structure variation discovery. ART was used as a primary tool for the simulation study of the 1000 Genomes Project . ART is implemented in C++ with optimized algorithms and is highly efficient in read simulation. ART outputs reads in the FASTQ format, and alignments in the ALN format. ART can also generate alignments in the SAM alignment or UCSC BED file format.

Remarque Citation: Weichun Huang, Leping Li, Jason R Myers, and Gabor T Marth. ART: a next-generation sequencing read simulator, Bioinformatics (2012) 28 (4): 593-594
Run Unix # README FILES in http://genome.jouy.inra.fr/doc/genome/NGS/ARTRun Web #

VersionMAJ

bcftools

1.22015-04-15DownloadDoc
BCFs.bcftools (Tools for variant calling and manipulating VCFs and BCFs)

Remarque
Run Unix # bcftools Run Web #

VersionMAJ

bedtools

2.16.22012-10-09DownloadDoc
The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM. Using BEDTools, one can develop sophisticated pipelines that answer complicated research questions by "streaming" several BEDTools together. The following are examples of common questions that one can address with BEDTools.

Remarque Please cite the following article if you use BEDTools in your research: Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 6, pp. 841–842.
Run Unix # Run Web #

VersionMAJ

bfast

0.7.02013-08-12DownloadDoc
BFAST : Blat-like Fast Accurate Search Tool BFAST facilitates the fast and accurate mapping of short reads to reference sequences. Some advantages of BFAST include: * Speed: enables billions of short reads to be mapped quickly. * Accuracy: A priori probabilities for mapping reads with defined set of variants. * An easy way to measurably tune accuracy at the expense of speed.

Remarque
Run Unix # bfast [options]Run Web #

VersionMAJ

bismark

0.14.32015-06-05DownloadDoc
Bismark is a program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step. The output can be easily imported into a genome viewer, such as SeqMonk, and enables a researcher to analyse the methylation levels of their samples straight away.

Remarque
Run Unix # bismark [options] {-1 -2 | }Run Web #

VersionMAJ

bowtie

1.1.22016-07-24DownloadDoc
Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).

Remarque
Run Unix # bowtie [options]* {-1 -2 | --12 | } []Run Web #

VersionMAJ

bowtie2

2.2.52015-04-07DownloadDoc
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.

Remarque
Run Unix # bowtie2 [options]* -x {-1 -2 | -U } [-S ] Run Web #

VersionMAJ

breakdancer

1.4.52015-03-06DownloadDoc
BreakDancerMax predicts five types of structural variants: insertions, deletions, inversions, inter- and intra-chromosomal translocations from next-generation short paired-end sequencing reads using read pairs that are mapped with unexpected separation distances or orientation.

Remarque
Run Unix # Usage: breakdancer-max Run Web #

VersionMAJ

bsmap

2.902015-03-17DownloadDoc
BSMAP is a short reads mapping software for bisulfite sequencing reads. Bisulfite treatment converts unmethylated Cytosines into Uracils (sequenced as Thymine) and leave methylated Cytosines unchanged, hence provides a way to study DNA cytosine methylation at single nucleotide resolution. BSMAP aligns the Ts in the reads to both Cs and Ts in the reference.

Remarque Citation: Xi Y, Li W: BSMAP: whole genome Bisulfite Sequence MAPping program. BMC Bioinformatics (2009) 10:232.
Run Unix # bsmapRun Web #

VersionMAJ

bwa

0.7.122015-04-07DownloadDoc
BWA is a fast light-weighted tool that aligns short sequences to a sequence database, such as the human reference genome. By default, BWA finds an alignment within edit distance 2 to the query sequence, except for disallowing gaps close to the end of the query. It can also be tuned to find a fraction of longer gaps at the cost of speed and of more false alignments.

Remarque
Run Unix # bwa [options]Run Web #

VersionMAJ

canu

1.32016-10-18DownloadDoc
Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II or Oxford Nanopore MinION).

Remarque Citation: Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. bioRxiv. (2016).
Run Unix # canu Run Web #

VersionMAJ

CATCh

v12015-03-11DownloadDoc
CATCh: an ensemble classifier for chimera detection in 16S rRNA sequencing studies

Remarque If you are going to use CATCh, please cite it with the included software (Mothur, WEKA, RDP MultiClassifier 1.1 and DECIPHER): � Mysara M., Saeys Y., Leys N., Raes J., Monsieurs P. 2014. CATCh: an ensemble classifier for chimera detection in 16S rRNA sequencing studies. Under review. � Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. (2009). Introducing mothur: open-source, platform-independent, community-suppo rted software for describing and comparing microbial communities. Applied and environmental microbiology 75:7537�41. � Hall M, National H, Frank E, Holmes G, Pfahringer B, Reutemann P, et al. (2009). The WEKA Data Mining Software: An Update. SIGKDD Explorations 11:10�18. � Wang Q, Garrity GM, Tiedje JM, Cole Naive JR (2007), Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied an d Environmental Microbiology 09/2007; 73(16):5261-7. � ES Wright et al. (2012), DECIPHER, A Search-Based Approach to Chimera Identification for 16S rRNA Sequences. Applied and Environmental Microbiology, doi:10 .1128/AEM.06516-11.
Run Unix # CATCh.run Run Web #

VersionMAJ

cd-hit-454

-2013-08-05DownloadDoc
The 454 pyrosequencing reads contains artificially duplicates, which might lead to misleading conclusions. cdhit-454 is a fast program to identify exact and nearly identical duplicates, the reads begin at the same position but may vary in length or bear mismatches. cdhit-454 can process a dataset in ~10 minutes. it also provides a consensus sequence for each group of duplicates.

Remarque
Run Unix # cd-hit-454Run Web # 4.6.1

VersionMAJ

Celera Assembler (wgs)

5.42009-10-29DownloadDoc
Celera Assembler is scientific software for DNA research. It can reconstruct long sequences of genomic DNA from the fragmentary data produced by whole-genome shotgun sequencing. The Celera Assembler is mature, efficient, open-source software written mostly in C for unix operating systems.

Remarque This whole-genome shotgun (WGS) assembler software suite, also known as Celera Assembler, implements sophisticated algorithms for the reconstruction of genomic DNA sequence from data produced by a WGS sequencing experiment.
Run Unix # Run Web #

VersionMAJ

CNVnator

0.32015-02-13DownloadDoc
CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing.

Remarque
Run Unix # cnvnatorRun Web #

VersionMAJ

corona

4.2.22009-09-10DownloadDoc
The SOLiD System Analysis Pipeline Tool (Corona Lite) is an off-instrument SOLiD data analysis software package. It supports functionality for mapping color space reads to large or small genomes, pairing for mate-pair runs, SNP calling and generating consensus sequences.

Remarque
Run Unix # Run Web #

VersionMAJ

cufflinks

2.2.02014-05-06DownloadDoc
Cufflinks assembles transcripts and estimates their abundances in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.

Remarque
Run Unix # cufflinks [options]* Run Web #

VersionMAJ

cutadapt

1.7.12015-03-11DownloadDoc
cutadapt is used to remove adapter sequences from high-throughput sequencing data. This is usually necessary when the read length of the sequencing machine is longer than the molecule that is sequenced, for example when sequencing microRNAs.

Remarque
Run Unix # cutadapt [options] []Run Web #

VersionMAJ

delly

0.6.32015-02-25DownloadDoc
DELLY is an integrated structural variant prediction method that can detect deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends and split-reads to sensitively and accurately delineate genomic rearrangements throughout the genome.

Remarque Citation Tobias Rausch, Thomas Zichner, Andreas Schlattl, Adrian M. Stuetz, Vladimir Benes, Jan O. Korbel. Delly: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 2012 28: i333-i339.
Run Unix # Usage: delly [OPTIONS] ... Run Web #

VersionMAJ

dwgsim

0.1.102013-08-02DownloadDoc
Whole genome simulation can be performed with dwgsim. dwgsim is based off of wgsim found in SAMtools written by Heng Li. It was modified to handle ABI SOLiD data, as well as various assumptions about aligners and positions of indels. The documentation below is for the latest dwgsim (not DNAA) release.

Remarque
Run Unix # dwgsim [options] Run Web #

VersionMAJ

EDGE-pro

1.3.12013-07-02DownloadDoc
EDGE-pro, Estimated Degree of Gene Expression in PROkaryots is an efficient software system to estimate gene expression levels in prokaryotic genomes from RNA-seq data. EDGE-pro uses Bowtie2 for alignment and then estimates expression directly from the alignment results. EDGE-pro includes routines to assign reads aligning to overlapping gene regions accurately. 15% or more of bacterial genes overlap other genes, making this a significant problem for bacterial RNA-seq, one that is generally ignored by programs designed for eukaryotic RNA-seq experiments.

Remarque Please reference our paper: T. Magoc, D. Wood, and S.L. Salzberg. EDGE-pro: Estimated Degree of Gene Expression in Prokaryotic Genomes. Evolutionary Bioinformatics vol.9, pp.127-136, 2013.
Run Unix # edge.pl <-g genome> <-p ptt> <-r rnt> <-u reads>Run Web #

VersionMAJ

FastQC

0.10.0 2012-03-05DownloadDoc
FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.

Remarque
Run Unix # fastqc ou fastqc seqfile1 seqfile2 .. seqfileNRun Web #

VersionMAJ

fastqp

0.1.9.12017-02-27DownloadDoc
Simple FASTQ, SAM and BAM read quality assessment and plotting using Python.

Remarque
Run Unix # fastqp [-h]Run Web #

VersionMAJ

Fastq_Screen

0.4.42014-07-09DownloadDoc
Fastq screen is a simple application which allows you to search a large sequence dataset against a panel of different databases to build up a picture of where the sequences in your data originate. It was built as a QC check for sequencing pipelines but may also have uses in metagenomics studies where mixed samples are expected. Although the program wasn't built with any particular technology in mind it is probably only really suitable for processing short reads due to the use of bowtie/bowtie2 as the searching application. The program generates both text and graphical output to tell you what proportion of your library was able to map, either uniquely or in more than one location, against each of the databases in your search set.

Remarque
Run Unix # fastq_screen [OPTION]... [FastQ FILE]...Run Web #

VersionMAJ

FLASH

1.2.112014-11-13DownloadDoc
FLASH, Fast Length Adjustment of SHort reads, is a very accurate fast tool to merge paired-end reads from fragments that are shorter than twice the length of reads. The extended length of reads has a significant positive impact on improvement of genome assemblies.

Remarque
Run Unix # flash [OPTIONS] MATES_1.FASTQ MATES_2.FASTQ Run `flash --help | less' for more information.Run Web #

VersionMAJ

flux-simulator

1.2.12013-07-15DownloadDoc
The Flux Simulator aims at modeling RNA-Seq experiments in silico: sequencing reads are produced from a reference genome according annotated transcripts. The simulation pipeline models different steps as modules, each with a minimal set of parameters that can be estimated by experimental parameters. The first step is-in fact-a transcriptome simulator. Subsequently, common sources of systematic bias in the abundance and distribution of produced reads are simulated by in silico library preparation and sequencing.

Remarque
Run Unix # flux-simulator --helpRun Web #

VersionMAJ

GASSST

1.282013-08-25DownloadDoc
GASSST : Global Alignment Short Sequence Search Tool * GASSST finds global alignments of short DNA sequences against large DNA banks. * GASSST strong point is its ability to perform fast gapped alignments. * It works well for both short and longer reads. It currently has been tested for reads up to 500bp. * The software is freely available for download under the CECILL version 2 License.

Remarque http://bioinformatics.oxfordjournals.org/content/26/20/2534.abstract?keytype=ref&ijkey=f5zH80QsuCqixRH
Run Unix # Gassst -d -i -o -p Run Web #

VersionMAJ

GEM

20121106-0221242013-07-25DownloadDoc
The GEM library (Also home to: The GEM mapper, The GEM RNA mapper, The GEM mappability, and others). Next-generation sequencing platforms (Illumina/Solexa, ABI/SOLiD, etc.) call for powerful and very optimized tools to index/analyze huge genomes. The GEM library strives to be a true "next-generation" tool for handling any kind of sequence data, offering state-of-the-art algorithms and data structures specifically tailored to this demanding task. At the moment, efficient indexing and searching algorithms based on the Burrows-Wheeler transform (BWT) have been implemented.

Remarque
Run Unix # Run Web #

VersionMAJ

GMAP/GSNAP

2013-10-252013-10-28DownloadDoc
GMAP (genomic mapping and alignment program for mRNA and EST sequences): gmap, a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models. GSNAP (Genomic Short-read Nucleotide Alignment Program): GSNAP implements computational methods for fast detection of complex variants and splicing in short reads, based on a successively constrained search process of merging and filtering position lists from a genomic index. It can align both single- and paired-end reads as short as 14 nt and of arbitrarily long length. It can detect short- and long-distance splicing, including interchromosomal splicing, in individual reads, using probabilistic models or a database of known splice sites.

Remarque
Run Unix # gmap [OPTIONS...] Run Web #

VersionMAJ

goby

1.4.12010-04-15DownloadDoc
Goby is a next-gen data management framework designed to facilitate the implementation of efficient next-gen data analysis pipelines. Goby provides compressed file formats that are time and space efficient. It also provides a few utilities that support the most common secondary data analyses

Remarque
Run Unix # gobyRun Web #

VersionMAJ

HISAT2

2.0.42016-09-07DownloadDoc
HISAT is a fast and sensitive spliced alignment program for mapping RNA-seq reads. In addition to one global FM index that represents a whole genome, HISAT uses a large set of small FM indexes that collectively cover the whole genome (each index represents a genomic region of ~64,000 bp and ~48,000 indexes are needed to cover the human genome). These small indexes (called local indexes) combined with several alignment strategies enable effective alignment of RNA-seq reads, in particular, reads spanning multiple exons. The memory footprint of HISAT is relatively low (~4.3GB for the human genome). We have developed HISAT based on the Bowtie2 implementation to handle most of the operations on the FM index.

Remarque
Run Unix # hisat2 [options]* -x {-1 -2 | -U | --sra-acc } [-S ]Run Web #

VersionMAJ

ICORN

0.972010-11-03DownloadDoc
iCORN (iterative correction of reference nucleotides) can correct genome sequences with short reads. Reads are mapped iteratively against the genome sequences, so far by SSAHA. Discrepancies between the multiple alignments of the mapping reads and reference are corrected, if by the correction the amount of perfect mapping reads doesn't decrease.

Remarque
Run Unix # cf. http://icorn.sourceforge.net/example.htmlRun Web #

VersionMAJ

Illumina CASAVA-1.8 FASTQ Filter

0.12014-04-30DownloadDoc
The recent version of Illumina's CASAVA pipeline (Version 1.8) produces FASTQ files with both reads that pass filtering and reads that don't. The new READ-ID (the @ line) contains many new fields, one of them indicates whether the read is filtered or not. This program can filter FASTQ files produced by CASAVA 1.8, and keep/discard reads based on this filter flag.

Remarque
Run Unix # fastq_illumina_filter -hRun Web #

VersionMAJ

IM-TORNADO

2.0.3.32016-02-22DownloadDoc
Illumina paired-end sequencing, which produces two separate reads for each DNA fragment, has become the platform of choice for 16S rDNA hypervariable tag sequencing. However, when the two reads do not overlap, existing computational pipelines analyze data from read separately and underutilize the information contained in the paired-end reads. IM-TORNADO is a tool for processing non-overlapping reads while retaining maximal information content.

Remarque If you use IM-TORNADO for your project, please cite the following manuscript: Jeraldo P, Kalari K, Chen X, Bhavsar J, Mangalam A, White B, et al. IM-TORNADO: A Tool for Comparison of 16S Reads from Paired-End Libraries. PLOS ONE 9 (12):e114804. Available from: http://dx.plos.org/10.1371/journal.pone.0114804
Run Unix # Run Web #

VersionMAJ

inGAP

2.7.82011-11-02DownloadDoc
This is a novel mining pipeline (2009), Integrative Next-generation Genome Analysis Pipeline (inGAP), guided by a Bayesian principle to detect single nucleotide polymorphisms (SNPs), insertion/deletions (indels) by comparing high-throughput pyrosequencing reads with a reference genome of related organisms. inGAP can be applied to the mapping of both Roche/454 and Illumina reads with no restriction of read length.

Remarque
Run Unix # inGAPRun Web #

VersionMAJ

jellyfish

1.1.32011-12-21DownloadDoc
JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence. JELLYFISH can count k-mers using an order of magnitude less memory and an order of magnitude faster than other k-mer counting packages by using an efficient encoding of a hash table and by exploiting the "compare-and-swap" CPU instruction to increase parallelism. JELLYFISH is a command-line program that reads FASTA and multi-FASTA files containing DNA sequences. It outputs its k-mer counts in an binary format, which can be translated into a human-readable text format using the "jellyfish dump" command. See the documentation below for more details.

Remarque If you use JELLYFISH in your research, please cite: Guillaume Marcais and Carl Kingsford, A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics (2011) 27(6): 764-770 (first published online January 7, 2011) doi:10.1093/bioinformatics/btr011
Run Unix # jellyfish Run Web #

VersionMAJ

kraken

0.10.52015-11-25DownloadDoc
raken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.

Remarque If you use Kraken in your research, please cite our paper; the citation is available on the Kraken website.
Run Unix # kraken [options] Run Web #

VersionMAJ

LAST

8612017-06-02DownloadDoc
LAST: Genome-Scale Sequence Comparison LAST finds similar regions between sequences, and aligns them. It is designed for comparing large datasets to each other (e.g. vertebrate genomes and/or large numbers of DNA reads). It can:

Remarque
Run Unix # Run Web #

VersionMAJ

macs

1.4.22013-05-16DownloadDoc
Next generation parallel sequencing technologies made chromatin immunoprecipitation followed by sequencing (ChIP-Seq) a popular strategy to study genome-wide protein-DNA interactions, while creating challenges for analysis algorithms. We present Model-based Analysis of ChIP-Seq (MACS) on short reads sequencers such as Genome Analyzer (Illumina / Solexa). MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, is publicly available open source, and can be used for ChIP-Seq with or without control samples.

Remarque
Run Unix # macs14 <-t tfile> [-n name] [-g genomesize] [options]Run Web #

VersionMAJ

mapsembler

1.3.21 2012-05-31DownloadDoc
Mapsembler is a targeted assembly software. It takes as input a set of NGS raw reads and a set of input sequences (starters). It first determines if each starter is read-coherent, e.g. whether reads confirm the presence of each starter in the original sequence. Then for each read-coherent starter, Mapsembler outputs its sequence neighborhood as a linear sequence or as a graph, depending on the user choice.

Remarque Citation: Peterlongo, P., & Chikhi, R. (2012). Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer. BMC Bioinformatics, 13(1), 48. doi:10.1186/1471-2105-13-48.
Run Unix # mapsembler [-m value] [-o output] [-k value] [-i value] [-e value] [-d value] [-t value] [-E value] [-Clrgfcvsh]Run Web #

VersionMAJ

MapSplice

1.15.22012-01-26DownloadDoc
MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery MapSplice est un algorithme de seconde génération de détection de sites d'épissage alternatifs. Son objectif est de détecter les sites d'épissage de façon sensible et spécifique en maintenant une bonne efficacité au niveau CPU et mémoire. MapSplice peut être appliqué aux reads courts (>75 pb) et long (75 pb). Il ne dépend ni des caractéristiques du site d'épissage ni de la longueur de l'intron, par conséquent, il peut détecter de nouveaux sites canoniques et non-canoniques d'épissage. MapSplice s'appuie sur la qualité et la diversité d'alignements des reads pour augmenter la précision de détection des sites d'épissage.

Remarque Publication MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery Kai Wang; Darshan Singh; Zheng Zeng; Stephen J. Coleman; Yan Huang; Gleb L. Savich; Xiaping He; Piotr Mieczkowski; Sara A. Grimm; Charles M. Perou; James N. MacLeod; Derek Y. Chiang; Jan F. Prins; Jinze Liu Nucleic Acids Research 2010; doi: 10.1093/nar/gkq622
Run Unix # python /usr/local/genome/MapSplice_1.15.2/bin/mapsplice_segments.py MapSplice.cfgRun Web #

VersionMAJ

maq

0.7.12014-10-02DownloadDoc
Maq is a software that builds mapping assemblies from short reads generated by the next-generation sequencing machines. It is particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has a preliminary functionality to handle AB SOLiD data.

Remarque
Run Unix # maqRun Web #

VersionMAJ

metagene

2007-05-04DownloadDoc
Gene Finding Program for Metagenomics MetaGene predicts prokaryotic genes on anonymous genomic sequences. Fragmented sequences (longer than 100 bp) can be accepted.

Remarque
Run Unix # metagene [multi-fasta] Run Web #

VersionMAJ

MetaGeneAnnotator

- 2009-01-26DownloadDoc
Version améliorée du programe d'annotation de données métagénomiques Metagene. Prediction de genes procaryotes à partir d'un génome ou d'un set de génomes anonymes. Particulierement adapté aux analyses métagénomiques.

Remarque
Run Unix # metageneannotatorRun Web #

VersionMAJ

minia

1.46832013-02-21DownloadDoc
Minia is a short-read assembler based on a de Bruijn graph, capable of assembling a human genome on a desktop computer in a day. The output of Minia is a set of contigs. Minia produces results of similar contiguity and accuracy to other de Bruijn assemblers (e.g. Velvet).

Remarque PDF and Citation R. Chikhi, G. Rizk. Space-efficient and exact de Bruijn graph representation based on a Bloom filter, WABI 2012
Run Unix # minia fasta_file kmer_size min_abundance estimated_genome_size prefixRun Web #

VersionMAJ

mira

4.02014-11-18DownloadDoc
MIRA is a Whole Genome Shotgun and EST Sequence Assembler for Sanger, 454 and Solexa / Illumina. It can perform Hybrid de-novo assemblies as well as SNP and mutations discovery for mapping assemblies.

Remarque
Run Unix # miraRun Web #

VersionMAJ

MIReNA

2.02012-09-05DownloadDoc

Remarque
Run Unix # Run Web #

VersionMAJ

mmseq

0.11.22012-11-20DownloadDoc
MMSEQ: haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads pipeline The flowchart to the right depicts the MMSEQ pipeline for obtaining expression estimates from RNA-seq data. There are two routes, with starting points labelled A and B. Route A is quite fast and straightforward to run and uses pre-existing transcript sequences for alignment. Route B requires more time, as it involves the creation of custom transcript sequences based on the data.

Remarque Please cite Turro et al. 2011 (Genome Biology) if you use MMSEQ in your work. http://dx.doi.org/10.1186/gb-2011-12-2-r13
Run Unix # mmseq [OPTIONS...] hits_file output_base Run Web #

VersionMAJ

MMSEQ

1.0.22013-09-02DownloadDoc
MMSEQ: haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads

Remarque Please cite Turro et al. 2011 (http://dx.doi.org/10.1186/gb-2011-12-2-r13)
Run Unix # mmseq / bam2hitsRun Web #

VersionMAJ

MOCAT

1.12012-08-01DownloadDoc
MOCAT is a package for analyzing metagenomics datasets. Currently MOCAT supports Illumina single- and paired-end reads in raw FastQ format.

Remarque Jens Roat Kultima & Shinichi Sunagawa (Bork Group, EMBL)
Run Unix # MOCAT.pl -sf|sample_file 'FILE' [Pipeline, Statistics, & Additional Options]Run Web #

VersionMAJ

MOSAIK assembler

1.1.00212011-06-06DownloadDoc
MOSAIK is a reference-guided assembler comprising of four main modular programs: * MosaikBuild * MosaikAligner * MosaikSort * MosaikAssembler. MosaikBuild converts various sequence formats into Mosaik’s native read format. MosaikAligner pairwise aligns each read to a specified series of reference sequences. MosaikSort resolves paired-end reads and sorts the alignments by the reference sequence coordinates. Finally, MosaikAssembler parses the sorted alignment archive and produces a multiple sequence alignment which is then saved into an assembly file format.

Remarque
Run Unix # MosaikAligner MosaikAssembler MosaikBuild MosaikCoverage MosaikDupSnoop MosaikJump MosaikMerge MosaikSort MosaikTextRun Web #

VersionMAJ

MPscan

-2013-08-26DownloadDoc
MPscan: fast localisation of multiple reads in genomes

Remarque Please cite THIS paper if you use MPscan. Rivals E., Salmela L., Kiiskinen P., Kalsi P., Tarhio J.Lecture Notes in BioInformatics (LNBI), Springer-Verlag, Vol. 5724, p. 246-260, 2009.
Run Unix # mpscan -hRun Web #

VersionMAJ

mrFAST

2.6.0.02012-02-01DownloadDoc
mrFAST is a read mapper that is designed to map short reads to reference genome with a special emphasis on the discovery of structural variation and segmental duplications. mrFAST maps short reads with respect to user defined error threshold, including indels up to 4+4 bp. This manual, describes how to choose the parameters and tune mrFAST with respect to the library settings. mrFAST is designed to find 'all' mappings for a given set of reads, however it can return one "best" map location if the relevant parameter is invoked. NOTE: mrFAST is developed for Illumina, thus requires all reads to be at the same length. For paired-end reads, lengths of mates may be different from each other, but each "side" should have a uniform length.

Remarque Personalized copy number and segmental duplication maps using next-generation sequencing. Can Alkan, Jeffrey M. Kidd, Tomas Marques-Bonet, Gozde Aksay, Francesca Antonacci, Fereydoun Hormozdiari, Jacob O. Kitzman, Carl Baker, Maika Malig, Onur Mutlu, S. Cenk Sahinalp, Richard A. Gibbs, Evan E. Eichler. Nature Genetics, Oct, 41(10):1061-1067, 2009. Table of Contents Sample Set General Indexing Single Genome Mode Batch Mode Mapping Single-end Reads - Single Mode Single-end Reads - Batch Mode Paired-end Reads Discordant Paired-end Reads Output Format Sample Set A sample genome FASTA file, with simulated reads and a command line to map in paired-end mode is supplied. Please download the sample set. General Please download the latest version from our download page and then unzip the downloaded file. Run 'make' to build mrFAST. mrFAST generates an index of the reference genome(s) and maps the reads to reference genome. Requirements: zlib for the ability to read compressed FASTQ and write compressed SAM files. C compiler (mrFAST is developed with gcc versions > 4.1.2) Building: On Unix/Linux systems, we recommend using GNU gcc version > 4.1.2 as your compiler and type 'make' to build. Example: linux> make gcc -c -O3 baseFAST.c -o baseFAST.o gcc -c -O3 CommandLineParser.c -o CommandLineParser.o gcc -c -O3 Common.c -o Common.o gcc -c -O3 HashTable.c -o HashTable.o gcc -c -O3 MrFAST.c -o MrFAST.o gcc -c -O3 Output.c -o Output.o gcc -c -O3 Reads.c -o Reads.o gcc -c -O3 RefGenome.c -o RefGenome.o gcc baseFAST.o CommandLineParser.o Common.o HashTable.o MrFAST.o Output.o Reads.o RefGenome.o -o mrFAST -lz -lm rm -rf *.o Parallelization: The best way to optimize mrFAST is to split the reads into chunks that fit into the memory of the cluster nodes, and implement an MPI wrapper in an embarrassingly parallel fashion. We recommend the following criteria to split the reads: Single End Mode: The number of reads should be approximately ((M-600)/(4*L)) million where M is the size of the memory for the cluster node (in megabytes) and L is the read length. If you have more nodes, you can make the chunks smaller to use the nodes efficiently. For example, if the library length is 50bp and the memory of nodes is 2 GB, each chunk should contain (2000-600)/(4*50)= 7 million reads. Paired End Mode: The number of reads in each file should not exceed 1 million (500,000 pairs), however chunk size of 500,000 reads (250,000 pairs) is recommended. To see the list of options, use "-h" or "--help". To see the version of mrFAST, user "-v" or "--version". Indexing mrFAST's indices can be generated in two modes (single, batch). In single mode, mrFAST indexes a fasta file (which may contain one or more reference genomes) while in batch mode it indexes a set of fasta files. By default mrFAST uses the window size of 12 characters to generate its index. Please be advised that if you do not choose the window size carefully, you will lose sensitivity. How to choose the right window size: For a given read length (l) and error threshold (e), the window size is floor(l/(e+1)). For example if the reads length is 36 and the maximum number of mismatches allowed is 2, the window size is 12. if your calculated window size is greater than default, you can use the default window size without losing the sensitivity. For example, for the read length of 64 and error threshold of 2, the windows size should be 21. You can use the default window size 12. However you cannot use 12 as window size for read length of 30 and error threshold of 2. Single Genome Mode: To index a reference genome like "refgen.fasta" run the following command: $>./mrfast --index refgen.fasta Upon the completion of the indexing phase, you can find "refgen.fasta.index" in the same directory as "refgen.fasta". mrFAST uses a window size of 12 (default) to make the index of the genome, this windows size can be modified with "--ws". There is a restriction on the maximum of the window size as the window size directly affects the memory usage. $>./mrfast --index refgen.fasta --ws 13 Batch Mode In batch mode, mrFAST gets a list of reference files and generates the index for each one of them. Similar to single mode, you can specify a different window size for indexing. $>./mrfast -b --index fasts.list --ws 13 Mapping mrFAST can map single-end reads and paired-end reads to a reference genome. mrFAST can map in either single or batch mode. In single mode, it only maps to one index. In batch mode, it maps to a list of indices. mrFAST supports both fasta and fastq formats. Single-end Reads - Single Mode To map single reads to a reference genome in single mode, run the following command. Use "--seq" to specify the input file. refgen.fa and refgen.fa.index should be in the same folder. You can load a multi-sequence FASTA file as the reference genome. $>./mrfast --search refgen.fa --seq reads.fastq The reported locations will be saved into "output" by default. If you want to save it somewhere else, use "-o" to specify another file. mrFAST can report the unmapped reads in fasta/fastq format. $>./mrfast --search refgen.fasta --seq reads.fastq -o my.map By default, mrFAST reports all the locations per read. If you need one "best" mapping add the "--best" parameter to the command line: $>./mrfast --search refgen.fasta --seq reads.fastq -e 3 --best Single-end Reads - Batch Mode (Note: deprecated after version 2.1.0.6) In batch mode, mrFAST uses a list of indices to find the mappings of the reads. "index.list" should contain the list of fasta files. $>./mrfast -b --search index.list --seq reads.fastq Paired-end Reads To map paired-end reads, use "--pe" option. The mapping can be done in single/batch mode. If the reads are in two different files, you have to use "--seq1/--seq2" to indicate the files. If the reads are interleaved, use "--seq" to indicated the file. The distance allowed between the paired-end reads should be specified with "--min" and "--max". "--min" and "--max" specify the minmum and maximum of the inferred size (the distance between outer edges of the mapping mates). $>./mrfast --search refgen.fasta --pe --seq reads.fastq --min 150 --max 250 Discordant Mapping mrFAST can report the discordant mapping for use of Variation Hunter. The --min and --max optiopns will define the minimum and maximum inferred size for concordant mapping. This is enabled by default since version 2.1.0.6 $>./mrfast --search refgen.fasta --pe --discordant-vh --seq reads.fastq --min 50 --max 75 Parameters General Options: -v|--version Shows the current version. -h Shows the help screen. Indexing Options: --index [file] Generate an index from the specified fasta file. -b Indicates the indexing will be done in batch mode. The file specified in --search should contain the list of fasta files. (Note: deprecated after version 2.1.0.6) -ws [int] Set window size for indexing (default:12 max:14). Searching Options: --search [file] Search the specified genome. Index file should be in same directory as the fasta file. -b Indicates the mapping will be done in batch mode. The file specified in --search should contain the list of fasta files. (Note: deprecated after version 2.1.0.6) --pe Search will be done in paired-end mode --mp Search will be done in matepair mode --seq [file] Input sequences in fasta/fastq format [file]. If pairend reads are interleaved, use this option. --seq1 [file] Input sequences in fasta/fastq format [file] (First file). Use this option to indicate the first file of paired-end reads --seq2 [file] Input sequences in fasta/fastq format [file] (Second file). Use this option to indicate the second file of paired-end reads. -o [file] Output of the mapped sequences (SAM format). The default is "output". -u [file] FASTA/FASTQ file for the unmapped sequences. The default is "unmapped". -e [int] Maximum allowed edit distance (default 4% of the read length). Note that although the current version is limited with up to 4+4 indels, it supports any number of substitution errors. --min [int] Min inferred distance allowed between two pairend sequences. --max [int] Max inferred distance allowed between two pairend sequences. --discordant-vh To return all discordant map locations ready for the Variation Hunter program, and OEA map locations ready for the NovelSeq. --best Return "best" location only (single-end mode). --seqcomp Indicates that the input sequences are compressed (gz). --outcomp Indicates that output file should be compressed (gz). --maxoea [int] Max number of One End Anchored (OEA) returned for each read pair. Minimum of 100 is recommendded for NovelSeq use. --maxdis [int] Max number of discordant map locations returned for each read pair. --crop [int] Crop the input reads at position [int]. --sample [string] Sample name to be added to the SAM header (optional). --rg [string] Read group ID to be added to the SAM header (optional). --lib [string] Library name to be added to the SAM header (optional). Output Files Single-End Mode: In the single-end mode mrFAST will generate two files as specified by the "-o" and "-u" parameters. Default filenename if the "-o" parameter is not specified is "output"; and default filename for the "-u" parameter is "unmapped". output file ("-o"): Contains the map locations of the sequences in the specified genome in SAM format. mrFAST returns all possible map locations within the given edit distance ("-e") by default. If the "--best" parameter is invoked, then it will select one "best" location that has the minimum edit distance to the genome. unmappped file ("-u"): Contains the unmapped reads in FASTQ or FASTA format, depending on the format of the input sequences. Paired-End and Matepair Modes: In paired-end and matepair modes, mrFAST will generate a SAM file in the paired-end mode that will store best mapping locations while utilizing the paired-end span information. In addition, it will generate a DIVET file and and OEA file (SAM format). See below: output file ("-o"): Contains the map locations of the sequences in the specified genome in SAM format. This file will include: If a read pair can be mapped concordantly, the "best" (minimum total edit distance and minimum differential from the average span) map location for the pair. If the read pair can not be mapped concordantly, again, the "best" (minimum total edit distance and minimum differential from the average span) map location for the pair. unmapped file ("-u"): Contains the orphan (both ends unmapped) reads in FASTQ or FASTA format, depending on the format of the input sequences. output.DIVET.vh file ("-o" option changes the prefix "output"): This file includes all possible map locations for the read pairs that cannot be concordantly mapped. This file can be loaded by VariationHunter tool for structural variation discovery. output_OEA file: Contains the OEA (One-End-Anchored) reads (paired-end reads where only one read can be mapped to the genome). The output is in SAM format, contains the map location of read that can be mapped to the genome. The unmapped reads of an OEA read pair are not reported in separate lines; instead the sequence and quality information is given in the line that specifies the map location of the mapped read. We use optional fields NS and NQ to specify the unmapped sequence and unmapped quality information. This file can be loaded by NovelSeq tool for novel sequence discovery, however format conversion might be required; please see the NovelSeq documentation. NOTE: mrFAST will report many (up to 100 by default) possible map locations for the "mapped" read of OEA matepais. This will generate a large file due to repeats and duplications. This file can be limited through the --maxoea parameter (version 2.1.0.0 and above). Output Format mrFAST mapping output format is in SAM format. For detail about the definition of the fields please refer to SAM Manual. We have not implemented "MQUAL" field yet. All locations of discordant paired-end reads will be reported in DIVET format as required by the VariationHunter package. Unmapped reads (or, "orphan" read pairs in the PE mode) will be outputted in FASTQ or FASTA format, depending on the input sequence file format.
Run Unix # mrfast [options]Run Web #

VersionMAJ

mrsFAST

2.5.0.42012-02-01DownloadDoc
mrsFAST is a cache oblivious mapper that is designed to map short reads to reference genome. mrsFAST maps short reads with respect to user defined error threshold. In this manual, we will show how to choose the parameters and tune mrsFAST with respect to the library settings. mrsFAST is designed to find 'all' the mappings for a given set of reads.

Remarque
Run Unix # mrsFAST -hRun Web #

VersionMAJ

multiqc

0.82016-11-09DownloadDoc
summarize analysis results for multiple tools and samples in a single report

Remarque
Run Unix # multiqc_envRun Web #

VersionMAJ

nesoni

0.402011-01-31DownloadDoc
Nesoni focusses on analysing the alignment of reads to a reference genome. We use the SHRiMP read aligner, as it is able to detect small insertions and deletions in addition to SNPs. Nesoni can call a consensus of read alignments, taking care to indicate ambiguity. This can then be used in various ways: to determine the protein level changes resulting from SNPs and indels, to find differences between multiple strains, or to produce n-way comparison data suitable for phylogenetic analysis in SplitsTree4. Alternatively, the raw counts of bases at each position in the reference seen in two different sequenced strains can compared using Fisher's Exact Test.

Remarque
Run Unix # nesoniRun Web #

VersionMAJ

newbler

2.62011-07-06DownloadDoc
Newbler is a package of three data analysis applications made by Roche 454 : the GS De Novo Assembler (with or without contig scaffolding using Paired End reads), the GS Reference Mapper, and the GS Amplicon Variant Analyzer (AVA). An additional application, the GS Run Browser, is an interactive Run browser/ troubleshooting tool which displays graphically the images, some intermediate data, and various output metrics from a sequencing Run. The software package also includes the SFF Tools commands for handling and using the data files (called Standard Flowgram Format or SFF files) that hold the sequencing trace data.

Remarque
Run Unix # newblerRun Web #

VersionMAJ

NGSToolsMIG

1.02011-02-04DownloadDoc
Tools developed in MIG laboratory to help in the process of Next generation Sequencing Data analysis : quality control, mapping, assembly, global statistics, etc. ///////// adaptiveTrim.pl ///////// alignmentStatistics.pl ///////// contigsExtractionOnLength.pl ///////// fastqQualityConverter.pl ///////// gbk2Fasta.pl ///////// globalTrim.pl ///////// multiFasta2Fasta.pl ///////// show2Fasta.pl ///////// unmappedReadsExtraction.pl ///////// (Cf. Doc)

Remarque
Run Unix # ex.: contigsExtractionOnLength.pl -i fichier.fasta -do /Dir1/Dir11/Dir111/ -po fichierFiltre -l 1500 -rRun Web #

VersionMAJ

novoalign

2.08.012013-08-20DownloadDoc
Novoalign is an aligner for single-ended and paired-end reads from the Illumina Genome Analyser. Novoalign finds global optimum alignments using full Needleman-Wunsch algorithm with affine gap penalties.

Remarque
Run Unix # novoalign [options]Run Web #

VersionMAJ

pindel

0.2.5a82015-02-11DownloadDoc
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.

Remarque Cite Pindel: Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009 Nov 1;25(21):2865-71. Epub 2009 Jun 26.
Run Unix # Usage: pindel -f -p [and/or -i bam_configuration_file] -c -o Run Web #

VersionMAJ

poretools

0.6.02016-11-30DownloadDoc
poretools: a toolkit for working with nanopore sequencing data from Oxford Nanopore. The MinION (TM) from Oxford Nanopore Technologies (ONT) is the first nanopore sequencer to be commercialised and is now available to early-access users. The MinION (TM) is a USB-connected, portable nanopore sequencer which permits real-time analysis of streaming event data. Currently, the research community lacks a standardized toolkit for the analysis of nanopore datasets.

Remarque
Run Unix # poretools_envRun Web #

VersionMAJ

prinseq

0.17.12013-08-20DownloadDoc
PRINSEQ CAN BE USED TO FILTER, REFORMAT, OR TRIM YOUR GENOMIC AND METAGENOMIC SEQUENCE DATA. IT GENERATES SUMMARY STATISTICS OF YOUR $ GRAPHICAL AND TABULAR FORMAT.

Remarque
Run Unix # prinseq-lite.pl -hRun Web #

VersionMAJ

ProbeMatch

-2010-05-11DownloadDoc
ProbeMatch is a sequence alignment program that finds sequence alignments for short DNA sequences ( 36-50 bp ). Unlike other programs such as eland and soap that perform ungapped alignment allowing up to 2 substitution, Probematch performs *gapped* alignment, allowing up to 3 errors including substitution, insertion, and deletion.

Remarque
Run Unix # probematch [options] ou # probematch --help Run Web #

VersionMAJ

prokka

1.102014-11-02DownloadDoc
Prokka is a software tool for the rapid annotation of prokaryotic genomes. A typical 4 Mbp genome can be fully annotated in less than 10 minutes on a quad-core computer, and scales well to 32 core SMP systems. It produces GFF3, GBK and SQN files that are ready for editing in Sequin and ultimately submitted to Genbank/DDJB/ENA.

Remarque
Run Unix # prokka [options] Run Web #

VersionMAJ

Quake

0.3.52014-10-02DownloadDoc
Quake is a package to correct substitution sequencing errors in experiments with deep coverage (e.g. >15X), specifically intended for Illumina sequencing reads. Quake adopts the k-mer error correction framework, first introduced by the EULER genome assembly package. Unlike EULER and similar progams, Quake utilizes a robust mixture model of erroneous and genuine k-mer distributions to determine where errors are located. Then Quake uses read quality values and learns the nucleotide to nucleotide error rates to determine what types of errors are most likely. This leads to more corrections and greater accuracy, especially with respect to avoiding mis-corrections, which create false sequence unsimilar to anything in the original genome sequence from which the read was taken.

Remarque Kelley DR, Schatz MC, Salzberg SL. Quake: quality-aware detection and correction of sequencing errors. Genome Biology 11:R116 2010. (http://genomebiology.com/2010/11/11/R116/abstract)
Run Unix # quake.py --helpRun Web # 0.3.5

VersionMAJ

quip

1.1.42013-02-21DownloadDoc
Quip compresses next-generation sequencing data with extreme prejudice. It supports input and output in the FASTQ and SAM/BAM formats, compressing large datasets to as little as 15% of their original size.

Remarque Compression of next-generation sequencing reads aided by highly efficient de novo assembly Daniel C. Jones; Walter L. Ruzzo; Xinxia Peng; Michael G. Katze — Nucleic Acids Research 2012; doi: 10.1093/nar/gks754
Run Unix # quip Run Web #

VersionMAJ

ray

2.3.12014-06-19DownloadDoc
Ray is a parallel de novo genome assembler that utilises the message-passing interface everywhere and is implemented using peer-to-peer communication.

Remarque Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. Sébastien Boisvert, François Laviolette, and Jacques Corbeil. Journal of Computational Biology (Mary Ann Liebert, Inc. publishers). November 2010, 17(11): 1519-1533. doi:10.1089/cmb.2009.0238
Run Unix # Ray -help Run Web #

VersionMAJ

ReAS

2.022011-06-30DownloadDoc
ReAS: Recovery of Ancestral Sequences for Transposable Elements from the Unassembled Reads of a Whole Genome Shotgun

Remarque http://www.ploscompbiol.org/article/info:doi%2F10.1371%2Fjournal.pcbi.0010043
Run Unix # Run Web #

VersionMAJ

reptile

2.02012-05-03DownloadDoc
Reptile is a software developed in C++ for correcting sequencing errors in short reads from next-gen sequencing platforms.

Remarque
Run Unix # reptile-ompRun Web #

VersionMAJ

rna2map

0.5.02009-09-10DownloadDoc
The SOLiD System Small RNA Analysis Pipeline Tool (RNA2MAP) can be used to perform whole genome analysis of color space RNA library reads. It consists of three major procedures: filtering, matching against miRBase sequences (Sanger), and matching against a reference genome.

Remarque
Run Unix # Run Web #

VersionMAJ

RUM

1.12.012012-05-03DownloadDoc
RUM is an alignment, junction calling, and feature quantification pipeline specifically designed for Illumina RNA-Seq data.

Remarque Comparative Analysis of RNA-Seq Alignment Algorithms and the RNA-Seq Unified Mapper (RUM) Gregory R. Grant, Michael H. Farkas, Angel Pizarro, Nicholas Lahens, Jonathan Schug, Brian Brunk, Christian J. Stoeckert Jr, John B. Hogenesch and Eric A. Pierce.
Run Unix # RUM_runner.pl [options] Run Web #

VersionMAJ

samToFastq

1.62(1113) 2012-02-17DownloadDoc
Extracts read sequences and qualities from the input SAM/BAM file and writes them into the output file in Sanger fastq format. In the RC mode (default is True), if the read is aligned and the alignment is to the reverse strand on the genome, the read's sequence from input SAM file will be reverse-complemented prior to writing it to fastq in order restore correctly the original read sequence as it was generated by the sequencer.

Remarque
Run Unix # Run Web #

VersionMAJ

SAMtools

1.22015-04-15DownloadDoc
SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.

Remarque
Run Unix # samtools [options]Run Web #

VersionMAJ

seqmap

1.0.122009-02-19DownloadDoc
SeqMap is a tool for mapping large amount of oligonucleotide to the genome. It is designed for finding all the places in a genome where an oligonucleotide could potentially come from. SeqMap can efficiently map as many as dozens of millions of short sequences to a genome of several billions of nucleotides. While doing the mapping, several mutations as well as insertions/deletions of the nucleotide bases in the sequences can be tolerated and furthermore detected. Various input and output formats are supported, as well as many command line options for tuning almost every steps in the mapping process.

Remarque Publication: http://dx.doi.org/10.1093/bioinformatics/btn429
Run Unix # seqmapRun Web #

VersionMAJ

seqtk

--2013-02-26DownloadDoc
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip.

Remarque
Run Unix # seqtk Run Web #

VersionMAJ

sickle

1.2002013-02-26DownloadDoc
sickle - A windowed adaptive trimming tool for FASTQ files using quality

Remarque
Run Unix # sickle [options]Run Web #

VersionMAJ

snap

0.132012-07-16DownloadDoc
SNAP is a new sequence aligner that is 10-100x faster and simultaneously more accurate than existing tools like BWA, Bowtie2 and SOAP2. It runs on commodity x86 processors, and supports a rich error model that lets it cheaply match reads with more differences from the reference than other tools. This gives SNAP up to 2x lower error rates than existing tools and lets it match larger mutations that they may miss.

Remarque Faster and More Accurate Sequence Alignment with SNAP. Matei Zaharia, William J. Bolosky, Kristal Curtis, Armando Fox, David Patterson, Scott Shenker, Ion Stoica, Richard M. Karp, and Taylor Sittler. arXiv:1111.5572v1, November 2011.
Run Unix # snapRun Web #

VersionMAJ

soap

2.202014-08-23DownloadDoc
SOAPaligner/soap2 is a member of the SOAP (Short Oligonucleotide Analysis Package). It is an updated version of SOAP software for short oligonucleotide alignment. The new program features in super fast and accurate alignment for huge amounts of short reads generated by Illumina/Solexa Genome Analyzer. Compared to soap v1, it is one order of magnitude faster. It require only 2 minutes aligning one million single-end reads onto the human reference genome. Another remarkable improvement of SOAPaligner is that it now supports a wide range of the read length.

Remarque To run SOAPaligner, we need to build index files for the reference genome (2bwt-builder), and then search reads against the formatted index files(soap).
Run Unix # soapRun Web #

VersionMAJ

soap.coverage

2.7.72011-12-14DownloadDoc
Utility for SOAP - soap.coverage can calculate sequencing coverage or physical coverage as well as duplication rate and details of specific block for each segments and whole genome by using SOAP, BLAT, BLAST, BlastZ, mum- mer and MAQ aligement results with multi-thread.

Remarque
Run Unix # soap.coverageRun Web #

VersionMAJ

SOAPdenovo

1.042010-08-23DownloadDoc
SOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads. It creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost effective way.

Remarque
Run Unix # soapdenovo [option]Run Web #

VersionMAJ

SolexaQA

1.72011-05-30DownloadDoc
SolexaQA is a Perl-based software package that calculates quality statistics and creates visual representations of data quality from FASTQ files generated by Illumina second-generation sequencing technology (“Solexa”).

Remarque
Run Unix # SolexaQA.plRun Web #

VersionMAJ

SPAdes

3.9.02016-08-16DownloadDoc
SPAdes is a de Bruijn graph based assembler. It integrates a read error corrector, a multiple kmer De Bruijn graph assembler, an assembly merger, a scaffoler and a repeat resolver.

Remarque
Run Unix # spadesRun Web #

VersionMAJ

ssaha2

2.5.22014-08-20DownloadDoc
SSAHA (Sequence Search and Alignment by Hashing Algorithm) is an algorithm for very fast matching and alignment of DNA sequences. It achieves its fast search speed by encoding sequence information in a perfect hash function.

Remarque
Run Unix # ssaha2Run Web #

VersionMAJ

ssake

3.22008-07-30DownloadDoc
SSAKE is a genomics application for assembling millions of very short DNA sequences.sIt is an easy-to-use, robust, reliable and tractable clustering algorithm for very short sequence reads, such as those generated by Illumina Ltd.

Remarque
Run Unix # ssake.plRun Web #

VersionMAJ

sspace

2.02013-02-21DownloadDoc
SSPACE is not a de novo assembler, it is used after a preassembled run. SSPACE is a script to extend and scaffold preassembled contigs using a number of mate pairs or paired-end libraries. It uses Bowtie to map all the reads to the pre-assembled contigs. Unmapped reads are used for extending, if desired, the pre-assembled contigs with the SSAKE assembler. Again Bowtie is used to map the reads to the extended contigs. Positions and orientation of the reads are stored and used for scaffolding. If both reads of a pair are found within the allowed distance, they are used for scaffolding to determine the orientation, contig pairing and ordering of the contigs.

Remarque
Run Unix # /usr/local/genome/SSPACE-BASIC-2.0_linux-x86_64/SSPACE_Basic_v2.0.pl Run Web #

VersionMAJ

stacks

0.99952012-08-21DownloadDoc
Stacks is a software pipeline for building loci out of a set of short-read sequenced samples. Stacks was developed for the purpose of building genetic maps from RAD-Tag Illumina sequence data, but can also be readily applied to population studies, and phylogeography.

Remarque Please cite this paper: J. Catchen, A. Amores, P. Hohenlohe, W. Cresko, and J. Postlethwait. Stacks: building and genotyping loci de novo from short-read sequences. G3: Genes, Genomes, Genetics, 1:171-182, 2011. [reprint]
Run Unix # Run Web #

VersionMAJ

tablet

1.14.10.202015-04-07DownloadDoc
Tablet is a lightweight, high-performance graphical viewer for next generation sequence assemblies and alignments.

Remarque
Run Unix # tabletRun Web #

VersionMAJ

tagdust

1.132013-09-13DownloadDoc
TagDust is a program to eliminate artifactual reads from next-generation sequencing data sets.

Remarque Lassmann T., et al. (2009) TagDust - A program to eliminate artifacts from next generation sequencing data. Bioinformatics.
Run Unix # tagdust [options] lib.fa read1.fa read2.fa ...Run Web #

VersionMAJ

TMAP

3.4.12013-10-25DownloadDoc
TMAP / Torrent Mapping Alignment Program - Alignment software for short and long nucleotide sequences produced by next-generation sequencing technologies.

Remarque
Run Unix # Run Web #

VersionMAJ

tophat

2.0.92013-07-10DownloadDoc
TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.

Remarque
Run Unix # tophat -hRun Web #

VersionMAJ

Trimmomatic

0.322014-01-06DownloadDoc
Trimmomatic: A flexible read trimming tool for Illumina NGS data Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line.

Remarque
Run Unix # trimmomaticRun Web #

VersionMAJ

Trinity

2.2.02016-07-01DownloadDoc
RNA-Seq De novo Assembly Using Trinity Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads.

Remarque
Run Unix # TrinityRun Web #

VersionMAJ

vcake

1.02008-07-30DownloadDoc
VCAKE is a genetic sequence assembler capable of assembling millions of small nucleotide reads even in the presence of sequencing error. This software is currently geared towards de novo assembly of Illumina's Solexa Sequencing data.

Remarque
Run Unix # perl -S vcake.plRun Web #

VersionMAJ

velvet

1.2.072013-08-07DownloadDoc
Sequence assembler for very short reads. Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near Cambridge, in the United Kingdom.

Remarque
Run Unix # velveth # velvetgRun Web #

Menu principal

Page | by Dr. Radut