Référentiel des outils classés par ordre alphabétique

Version	MAJ	abyss
1.5.2	2014-11-18	abyss	Download	Doc

ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.

Remarque

Run Unix # Usage: ABYSS [OPTION]... FILE...

Run Web #

Version	MAJ	acnuc
none	2003-08-09	acnuc	Download	Doc

ACNUC allows to select sequences from many criteria from these three databases, to translate protein-coding genes in protein, and to extract selected sequences in user files. ACNUC is very efficient in providing direct access to coding regions (e.g. protein coding regions, tRNA or rRNA coding regions) of DNA fragments present in GenBank.

Remarque

Run Unix # acnuc ou pour la version X11 xacnuc

Run Web #

Version	MAJ	agmial
	2004-10-12	agmial	Download	Doc

Agmial est une chaîne d'annotation de génomes microbiens, formée de deux modules indépendants. Le premier gère les séquences protéiques, le second les séquences nucléiques. Agmial soutient le principe que l'expert humain doit être placé au centre du processus d'annotation. Afin d'aider les annotateurs dans cette tache complexe et coûteuse en temps, le système est conçu pour automatiser au maximum le processus d'annotation et fournir des interfaces conviviales. Il implémente une stratégie d'annotation. Le système est capable de travailler sur des séquences non finies (draft) et il permet l'annotation collaborative par des équipes d'annotateurs. Il est basé sur des standards informatiques (services web, système de gestion de base de données relationnelles, Java, ...) et bioinformatiques. Le système est distribué sous licence GPL. Agmial est actuellement utilisé par plusieurs laboratoires de l'INRA pour l'annotation ou la réannotation de génomes d'interêt agro-alimentaire.

Remarque

Run Unix #

Run Web # http://genome.jouy.inra.fr/public-agmial

Version	MAJ	align
2.0u	2003-11-28	align	Download	Doc

align et align0 calcule un alignement global de deux sequences.

Remarque

Run Unix # align ou align0

Run Web #

Version	MAJ	ALLPATHS-LG
	2012-03-13	ALLPATHS-LG	Download	Doc

ALLPATHS-LG is a de Bruijn graph-based de novo assembler for large (and small) genomes. ALLPATHS-LG is being developed by scientists at the Broad Institute.

Remarque

Run Unix #

Run Web #

Version	MAJ	amos
3.1.0	2013-08-12	amos	Download	Doc

AMOS: A Modular Open-Source Assembler

Remarque

Run Unix #

Run Web #

Version	MAJ	AnovArray
1.1	2003-10-20	AnovArray	Download	Doc

AnovArray permet la quantification des facteurs biologiques et des biais techniques, ainsi que l'identification des gÃ¨nes diffÃ©rentiellement exprimÃ©s entre plusieurs conditions expÃ©rimentales (deux et plus) pour des expÃ©riences transcriptomiques issues de macroarray et microarray dans la cadre d'un plan d'expÃ©rience factoriel Ã©quilibrÃ© et d'un modÃ¨le complet. Ce package est dÃ©veloppÃ© en SAS (logiciel statistique) et bÃ©nÃ©ficie en consÃ©quence de toutes les procÃ©dures statistiques de ce logiciel. Les mÃ©thodes statistiques dans ce package sont l'analyse de la variance (ANOVA) et les tests multiples de type FDR (False Discovery Rate).

Remarque

Run Unix # Utilisation sous SAS

Run Web #

Version	MAJ	apollo
1.11.8	2013-08-11	apollo	Download	Doc

Apollo is a genomic annotation viewer and editor. There are currently two branches of Apollo, one primarily used for genome browsing and maintained at Ensembl, and the other primarily used for genome annotation and maintained at the Berkeley Drosophila Genome Center. The latter is part of the GMOD project.

Remarque

Run Unix # apollo

Run Web #

Version	MAJ	arachne
3.1	2008-07-29	arachne	Download	Doc

Arachne is a tool for assembling genome sequences from whole genome shotgun reads, mostly in forward-reverse pairs obtained by sequencing clone ends.

Remarque

Run Unix #

Run Web #

Version	MAJ	arb
none	2003-08-22	arb	Download	Doc

The ARB software is a graphically oriented package comprising various tools for sequence database handling and data analysis. A central database of processed (aligned) sequences and any type of additional data linked to the respective sequence entries is structured according to phylogeny or other user defined criteria.

Remarque

Run Unix # arb

Run Web #

Version	MAJ	ART
ChocolateCherryCake	2015-04-30	ART	Download	Doc

ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. ART can also simulate reads using user own read error model or quality profiles. ART supports simulation of single-end, paired-end/mate-pair reads of three major commercial next-generation sequencing platforms: Illumina's Solexa, Roche's 454 and Applied Biosystems' SOLiD. ART can be used to test or benchmark a variety of method or tools for next-generation sequencing data analysis, including read alignment, de novo assembly, SNP and structure variation discovery. ART was used as a primary tool for the simulation study of the 1000 Genomes Project . ART is implemented in C++ with optimized algorithms and is highly efficient in read simulation. ART outputs reads in the FASTQ format, and alignments in the ALN format. ART can also generate alignments in the SAM alignment or UCSC BED file format.

Remarque Citation: Weichun Huang, Leping Li, Jason R Myers, and Gabor T Marth. ART: a next-generation sequencing read simulator, Bioinformatics (2012) 28 (4): 593-594

Run Unix # README FILES in http://genome.jouy.inra.fr/doc/genome/NGS/ART

Run Web #

Version	MAJ	artemis
15.0	2013-08-07	artemis	Download	Doc

Artemis is a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation.

Remarque

Run Unix # art

Run Web #

Version	MAJ	Artemis Comparison Tool
-	2015-07-15	Artemis Comparison Tool	Download	Doc

ACT is a free tool for displaying pairwise comparisons between two or more DNA sequences. It can be used to identify and analyse regions of similarity and difference between genomes and to explore conservation of synteny, in the context of the entire sequences and their annotation.

Remarque

Run Unix #

Run Web #

Version	MAJ	asium
2.21		asium	Download	Doc

Asium construit des hiérarchies conceptuelles (ontologies) à partir de texte analysé. Il est associé avec le logiciel LP2LP qui transforme les sorties de Link Parser en entrée d'Asium et à un logiciel de transformation des sorties en RDF.

Remarque

Run Unix #

Run Web #

Version	MAJ	augustus
2.7	2013-12-12	augustus	Download	Doc

AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences. It can be run on this web server, on a new web server for larger input files or be downloaded and run locally. It is open source so you can compile it for your computing platform. You can now run AUGUSTUS on the German MediGRID. This enables you to submit larger sequence files and allows to use protein homology information in the prediction. The MediGRID requires an instant easy registration by email for first-time users.

Remarque

Run Unix # augustus [parameters] --species=SPECIES queryfilename

Run Web #

Version	MAJ	autodock
4.2.6	2015-03-09	autodock	Download	Doc

AutoDock is a suite of automated docking tools. It is designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure.

Remarque

Run Unix # autodock4 et autogrid4

Run Web #

Version	MAJ	autodock_vina
1.1.2	2015-05-12	autodock_vina	Download	Doc

AutoDock Vina is a new program for drug discovery, molecular docking and virtual screening, offering multi-core capability, high performance and enhanced accuracy and ease of use.

Remarque O. Trott, A. J. Olson, AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading, Journal of Computational Chemistry (in press)

Run Unix # vina --help

Run Web #

Version	MAJ	base
1.2.12	2004-08-12	base	Download	Doc

BioArray Software Environment (BASE) est une base de données permettant de gérer limportante quantité de données générées par des analyses de bio-puces. BASE gère les informations biologiques, les données brutes et les images. BASE possède également des outils de normalisation, de visualisation et danalyse des données.

Remarque

Run Unix #

Run Web # http://genome.jouy.inra.fr/basejouy

Version	MAJ	bcftools
1.2	2015-04-15	bcftools	Download	Doc

BCFs.bcftools (Tools for variant calling and manipulating VCFs and BCFs)

Remarque

Run Unix # bcftools

Run Web #

Version	MAJ	BCM trace viewer
1.5		BCM trace viewer	Download	Doc

A Java application/applet to display .scf traces and phred quality values.

Remarque

Run Unix # bcm-trace-view -s { -q }

Run Web #

Version	MAJ	bedtools
2.16.2	2012-10-09	bedtools	Download	Doc

The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM. Using BEDTools, one can develop sophisticated pipelines that answer complicated research questions by "streaming" several BEDTools together. The following are examples of common questions that one can address with BEDTools.

Remarque Please cite the following article if you use BEDTools in your research: Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 6, pp. 841–842.

Run Unix #

Run Web #

Version	MAJ	Beluga
		Beluga	Download	Doc

La démarche centrale est basée sur les techniques emanant de l'apprentissage automatique (classification) et le traitement automatique des langues mais aussi d'une methode sociologique appelée GST (Graphe SocioTechnique) de facon a construire des indices d'evolution de l'innovation grace a la terminology utilisée au cours du temps.

Remarque

Run Unix #

Run Web #

Version	MAJ	bfast
0.7.0	2013-08-12	bfast	Download	Doc

BFAST : Blat-like Fast Accurate Search Tool BFAST facilitates the fast and accurate mapping of short reads to reference sequences. Some advantages of BFAST include: * Speed: enables billions of short reads to be mapped quickly. * Accuracy: A priori probabilities for mapping reads with defined set of variants. * An easy way to measurably tune accuracy at the expense of speed.

Remarque

Run Unix # bfast [options]

Run Web #

Version	MAJ	bioprospector
2004	2014-01-01	bioprospector	Download	Doc

Programme de recherche de motifs d'une ou deux boîtes exceptionnels (Gibbs Sampler) dans des séquences d'ADN. Des séquences de bruit de fond peuvent être fournies. Séquences en entrée de moins de 32765 nt, format fasta avec séquence sur une ligne (et en-tête de la forme >sequence1 nomdegene ). Peut rechercher spécifiquement des palyndromes.

Remarque

Run Unix # BioProspector

Run Web #

Version	MAJ	bismark
0.14.3	2015-06-05	bismark	Download	Doc

Bismark is a program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step. The output can be easily imported into a genome viewer, such as SeqMonk, and enables a researcher to analyse the methylation levels of their samples straight away.

Remarque

Run Unix # bismark [options] {-1 -2 | }

Run Web #

Version	MAJ	blast
2.2.26	2012-03-07	blast	Download	Doc

Remarque

Run Unix # blastall

Run Web # https://migale.jouy.inra.fr/?q=blast

Version	MAJ	blast+
2.2.31	2015-08-24	blast+	Download	Doc

The Basic Local Alignment Search Tool (BLAST) is the most widely used sequence similarity tool. There are versions of BLAST that compare protein queries to protein databases, nucleotide queries to nucleotide databases, as well as versions that translate nucleotide queries or databases in all six frames and compare to protein databases or queries. PSI-BLAST produces a position-specific-scoring-matrix (PSSM) starting with a protein query, and then uses that PSSM to perform further searches. It is also possible to compare a protein or nucleotide query to a database of PSSM’s. The NCBI supports a BLAST web page at blast.ncbi.nlm.nih.gov as well as a network service. The NCBI also distributes stand-alone BLAST applications for users who wish to run BLAST on their own machines or with their own databases. This document describes the stand-alone BLAST applications and will concentrate on the latest generation of such applications included in the BLAST+ package.

Remarque

Run Unix # /usr/local/genome/ncbi-blast-2.2.31+/bin/

Run Web #

Version	MAJ	blat
34	2008-01-11	blat	Download	Doc

BLAT is a DNA/Protein Sequence Analysis program written by Jim Kent at UCSC. It is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more. It may miss more divergent or shorter sequence alignments. It will find perfect sequence matches of 33 bases, and sometimes find them down to 22 bases. BLAT on proteins finds sequences of 80% and greater similarity of length 20 amino acids or more. In practice DNA BLAT works well on primates, and protein blat on land vertebrates.

Remarque

Run Unix # blat

Run Web #

Version	MAJ	BMGE
1.1	2012-12-19	BMGE	Download	Doc

BMGE (Block Mapping and Gathering with Entropy) is a program that selects regions in a multiple sequence alignment that are suited for phylogenetic inference. BMGE selects characters that are biologically relevant, thanks to the use of standard similarity matrices such as PAM or BLOSUM. Moreover, BMGE provides other character- or sequenceremoval operations, such stationary-based character trimming (that provides a subset of compositionally homogeneous characters) or removal of sequences containing a too large proportion of gaps. Finally, BMGE can simply be used to perform standard conversion operations among DNA-, codon-, RY- and amino acid-coding sequences.

Remarque

Run Unix # BMGE ou BMGE -?

Run Web #

Version	MAJ	bowtie
1.1.2	2016-07-24	bowtie	Download	Doc

Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).

Remarque

Run Unix # bowtie [options]* {-1 -2 | --12 | ~~} []~~

Run Web #

Version	MAJ	bowtie2
2.2.5	2015-04-07	bowtie2	Download	Doc

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.

Remarque

Run Unix # bowtie2 [options]* -x {-1 -2 | -U } [-S ]

Run Web #

Version	MAJ	breakdancer
1.4.5	2015-03-06	breakdancer	Download	Doc

BreakDancerMax predicts five types of structural variants: insertions, deletions, inversions, inter- and intra-chromosomal translocations from next-generation short paired-end sequencing reads using read pairs that are mapped with unexpected separation distances or orientation.

Remarque

Run Unix # Usage: breakdancer-max

Run Web #

Version	MAJ	bsmap
2.90	2015-03-17	bsmap	Download	Doc

BSMAP is a short reads mapping software for bisulfite sequencing reads. Bisulfite treatment converts unmethylated Cytosines into Uracils (sequenced as Thymine) and leave methylated Cytosines unchanged, hence provides a way to study DNA cytosine methylation at single nucleotide resolution. BSMAP aligns the Ts in the reads to both Cs and Ts in the reference.

Remarque Citation: Xi Y, Li W: BSMAP: whole genome Bisulfite Sequence MAPping program. BMC Bioinformatics (2009) 10:232.

Run Unix # bsmap

Run Web #

Version	MAJ	buster
2.10.3 <2016-12-07>	2016-12-08	buster	Download	Doc

BUSTER structure refinement package. Includes the refine program for running BUSTER refinement and loads of useful utilities.

Remarque How to cite use of BUSTER : https://www.globalphasing.com/buster/wiki/index.cgi?BusterCite

Run Unix #

Run Web #

Version	MAJ	bwa
0.7.12	2015-04-07	bwa	Download	Doc

BWA is a fast light-weighted tool that aligns short sequences to a sequence database, such as the human reference genome. By default, BWA finds an alignment within edit distance 2 to the query sequence, except for disallowing gaps close to the end of the query. It can also be tuned to find a fraction of longer gaps at the cost of speed and of more false alignments.

Remarque

Run Unix # bwa [options]

Run Web #

Version	MAJ	CaliFlopp
3.0	2010-08-03	CaliFlopp	Download	Doc

CaliFloPP is a software that calculates flows of particles between pairs of polygons, when given a so-called individual dispersal function. The individual dispersal function describes the particle dispersion between pairs of points, and CaliFloPP deduces the total flows between pairs of polygons.

Remarque

Run Unix # califlopp -i polygons-filename [-p parameters-filename] [-r result-filename]

Run Web #

Version	MAJ	canu
1.3	2016-10-18	canu	Download	Doc

Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II or Oxford Nanopore MinION).

Remarque Citation: Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. bioRxiv. (2016).

Run Unix # canu

Run Web #

Version	MAJ	cap3
3.0	2014-05-06	cap3	Download	Doc

Similar to phrap, CAP3 takes individual sequences and assembles them into sequence.s

Remarque

Run Unix # cap3

Run Web #

Version	MAJ	carthagene
1.2	2010-10-15	carthagene	Download	Doc

CarthaGène is a genetic/radiated hybrid mapping software. CarthaGene looks for multiple populations maximum likelihood consensus maps using a fast EM algorithm for maximum likelihood estimation and powerful ordering algorithms. CarthaGène:

Remarque

Run Unix # carthagene

Run Web #

Version	MAJ	CATCh
v1	2015-03-11	CATCh	Download	Doc

CATCh: an ensemble classifier for chimera detection in 16S rRNA sequencing studies

Remarque If you are going to use CATCh, please cite it with the included software (Mothur, WEKA, RDP MultiClassifier 1.1 and DECIPHER): � Mysara M., Saeys Y., Leys N., Raes J., Monsieurs P. 2014. CATCh: an ensemble classifier for chimera detection in 16S rRNA sequencing studies. Under review. � Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. (2009). Introducing mothur: open-source, platform-independent, community-suppo rted software for describing and comparing microbial communities. Applied and environmental microbiology 75:7537�41. � Hall M, National H, Frank E, Holmes G, Pfahringer B, Reutemann P, et al. (2009). The WEKA Data Mining Software: An Update. SIGKDD Explorations 11:10�18. � Wang Q, Garrity GM, Tiedje JM, Cole Naive JR (2007), Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied an d Environmental Microbiology 09/2007; 73(16):5261-7. � ES Wright et al. (2012), DECIPHER, A Search-Based Approach to Chimera Identification for 16S rRNA Sequences. Applied and Environmental Microbiology, doi:10 .1128/AEM.06516-11.

Run Unix # CATCh.run

Run Web #

Version	MAJ	ccp4
6.3.0	2012-11-27	ccp4	Download	Doc

CCP4 exists to produce and support a world-leading, integrated suite of programs that allows researchers to determine macromolecular structures by X-ray crystallography, and other biophysical techniques. CCP4 aims to develop and support the development of cutting edge approaches to experimental determination and analysis of protein structure, and integrate these approaches into the suite. CCP4 is a community based resource that supports the widest possible researcher community, embracing academic, not for profit, and for profit research. CCP4 aims to play a key role in the education and training of scientists in experimental structural biology. It encourages the wide dissemination of new ideas, techniques and practice.

Remarque

Run Unix # ccp4i

Run Web #

Version	MAJ	cd-hit
4.6.1	2013-08-12	cd-hit	Download	Doc

CD-HIT stands for Cluster Database at High Identity with Tolerance. The program (cd-hit) takes a fasta format sequence database as input and produces a set of 'non-redundant' (nr) representative sequences as output.

Remarque Exemple d'utilisation : cd-hit -n 5 -i /db/fasta/nr90/nr90.fsa -o nr80 -M 2048 -c 0.8 -u clstr.lastweek

Run Unix # cd-hit [Options]

Run Web #

Version	MAJ	cd-hit-454
-	2013-08-05	cd-hit-454	Download	Doc

The 454 pyrosequencing reads contains artificially duplicates, which might lead to misleading conclusions. cdhit-454 is a fast program to identify exact and nearly identical duplicates, the reads begin at the same position but may vary in length or bear mismatches. cdhit-454 can process a dataset in ~10 minutes. it also provides a consensus sequence for each group of duplicates.

Remarque

Run Unix # cd-hit-454

Run Web # 4.6.1

Version	MAJ	Celera Assembler (wgs)
5.4	2009-10-29	Celera Assembler (wgs)	Download	Doc

Celera Assembler is scientific software for DNA research. It can reconstruct long sequences of genomic DNA from the fragmentary data produced by whole-genome shotgun sequencing. The Celera Assembler is mature, efficient, open-source software written mostly in C for unix operating systems.

Remarque This whole-genome shotgun (WGS) assembler software suite, also known as Celera Assembler, implements sophisticated algorithms for the reconstruction of genomic DNA sequence from data produced by a WGS sequencing experiment.

Run Unix #

Run Web #

Version	MAJ	censor
4.2.10	2008-07-02	censor	Download	Doc

CENSOR is a software tool which screens query sequences against a reference collection of repeats and "censors" (masks) homologous portions with masking symbols, as well as generating a report classifying all found repeats.

Remarque

Run Unix # censor

Run Web #

Version	MAJ	cgview
-	2011-05-27	cgview	Download	Doc

CGView is a Java package for generating high quality, zoomable maps of circular genomes. Its primary purpose is to serve as a component of sequence annotation pipelines, as a means of generating visual output suitable for the web. Feature information and rendering options are supplied to the program using an XML file, a tab delimited file, or an NCBI ptt file. CGView converts the input into a graphical map (PNG, JPG, or Scalable Vector Graphics format), complete with labels, a title, legends, and footnotes. In addition to the default full view map, the program can generate a series of hyperlinked maps showing expanded views. The linked maps can be explored using any web browser, allowing rapid genome browsing, and facilitating data sharing. The feature labels in maps can be hyperlinked to external resources, allowing CGView maps to be integrated with existing web site content or databases. For examples of the various output types, see the CGView gallery.

Remarque

Run Unix # cgview

Run Web #

Version	MAJ	circos
0.64	2013-01-20	circos	Download	Doc

Circos is a software package for visualizing data and information. It visualizes data in a circular layout — this makes Circos ideal for exploring relationships between objects or positions. There are other reasons why a circular layout is advantageous, not the least being the fact that it is attractive.

Remarque

Run Unix # circos

Run Web #

Version	MAJ	class2g
1.0	2006-04-04	class2g	Download	Doc

Class2G permet de classer les gènes en deux groupes en utilisant un modèle de mélange. Les principales caractéristiques sont d'une part l'affectation des gènes est associée à une probabilité, et d'autre part l'analyse d'un macroarray est indépendante d'une référence. Class2G est intégrée au système BASE (BioArray Software Environment) par l'intermédiaire d'un plug-in perl, et est développé dans l'environnement statistique R. BASE permet d'accéder à une interface web conviviale, d'utiliser un seul environnement pour le stockage et l'analyse de données. Class2G a été utilisé pour la détection de gènes présents et absents de E. faecalis dans le cadre de l'analyse d'une trentaine de macroarray (P.Serror - INRA Jouy-en-Josas - UBLO).

Remarque

Run Unix #

Run Web # http://genome.jouy.inra.fr/basejouy

Version	MAJ	CLC Sequence Viewer
6.4	2010-09-12	CLC Sequence Viewer	Download	Doc

A Sequence Viewer for basic bioinformatics. CLC Sequence Viewer creates a software environment enabling users to make a large number of bioinformatics analyses, combined with smooth data management, and excellent graphical viewing and output options.

Remarque

Run Unix # clcseqview6

Run Web #

Version	MAJ	clustal-omega
1.1.0	2012-07-17	clustal-omega	Download	Doc

Clustal Omega is the latest addition to the Clustal family. It offers a significant increase in scalability over previous versions, allowing hundreds of thousands of sequences to be aligned in only a few hours. It will also make use of multiple processors, where present. In addition, the quality of alignments is superior to previous versions, as measured by a range of popular benchmarks.

Remarque Citing Clustal: Sievers F, Wilm A, Dineen DG, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins DG (2011). Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7.

Run Unix # clustalo --help

Run Web #

Version	MAJ	clustalx
2.1	2013-12-29	clustalx	Download	Doc

Multiple sequence alignment program. It provides an integrated environment for performing multiple sequence and profile alignments and analysing the results.

Remarque

Run Unix # clustalx (en mode graphique) ou clustalw2 (en mode ligne de commande)

Run Web # http://www.ebi.ac.uk/Tools/msa/clustalw2/

Version	MAJ	cluster-3.0
3.0	2013-05-24	cluster-3.0	Download	Doc

The open source clustering software available here implement the most commonly used clustering methods for gene expression data analysis. The clustering methods can be used in several ways.Cluster 3.0 provides a Graphical User Interface to access to the clustering routines. It is available for Windows, Mac OS X, and Linux/Unix. Python users can access the clustering routines by using Pycluster, which is an extension module to Python. People that want to make use of the clustering algorithms in their own C, C++, or Fortran programs can download the source code of the C Clustering Library.

Remarque

Run Unix # cluster

Run Web #

Version	MAJ	CNVnator
0.3	2015-02-13	CNVnator	Download	Doc

CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing.

Remarque

Run Unix # cnvnator

Run Web #

Version	MAJ	COLONY
2.0.6.3	2017-05-02	COLONY	Download	Doc

COLONY is a Fortran program written by Jinliang Wang. It implements a maximum likelihood method to assign sibship and parentage jointly, using individual multilocus genotypes at a number of codominant or dominant marker loci.

Remarque

Run Unix #

Run Web #

Version	MAJ	concaterpillar
1.5	2013-01-15	concaterpillar	Download	Doc

Concaterpillar is a hierarchical likelihood-ratio test for phylogenetic congruence.

Remarque If you use Concaterpillar for a publication please cite: Leigh JW, Susko E, Baumgartner M, Roger AJ. Testing congruence in phylogenomic analysis. Syst Biol. 2008 Feb; 57(1): 104-15.

Run Unix # concaterpillar.py

Run Web #

Version	MAJ	consed
22.0	2014-04-30	consed	Download	Doc

Consed/Autofinish is a tool for viewing, editing, and finishing sequence assemblies created with phrap. Finishing capabilities include allowing the user to pick primers and templates, suggesting additional sequencing reactions to perform, and facilitating checking the accuracy of the assembly using digest and forward/reverse pair information.

Remarque Voir aussi autofinishs (http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=11282977s)

Run Unix # consed

Run Web #

Version	MAJ	consel
0.2	2012-08-23	consel	Download	Doc

CONSEL is a program package consists of small programs written in C language. It calculates the probability value (i.e., p-value) to assess the confidence in the selection problem. Although CONSEL is applicable to any selection problem, it is mainly designed for the phylogenetic tree selection. CONSEL does not estimate the phylogenetic tree by itself, but CONSEL does read the output of the other phylogenetic packages, such as Molphy, PAML, PAUP*, TREE-PUZZLE, and PhyML. CONSEL calculates the p-value using several testing procedures; the bootstrap probability, the Kishino-Hasegawa test, the Shimodaira-Hasegawa test, and the weighted Shimodaira-Hasegawa test. In addition to these conventional tests, CONSEL calculates the p-value based on the approximately unbiased test using the multi-scale bootstrap technique. This newly developed method gives less biased results than the conventional methods.

Remarque

Run Unix #

Run Web #

Version	MAJ	coot
0.7	2014-05-20	coot	Download	Doc

Coot is for macromolecular model building, model completion and validation, particularly suitable for protein modelling using X-ray data. Coot displays maps and models and allows model manipulations such as idealization, real space refinement, manual rotation/translation, rigid-body fitting, ligand search, solvation, mutations, rotamers, Ramachandran plots, skeletonization, non-crystallographic symmetry and more.

Remarque Citing Coot and Friends If have found this software to be useful, you are requested (if appropriate) to cite: "Features and Development of Coot" P Emsley, B Lohkamp, W Scott, and K Cowtan Acta Cryst. (2010). D66, 486-501 Acta Crystallographica Section D-Biological Crystallography 66: 486-501

Run Unix # coot

Run Web #

Version	MAJ	CopyRighter
0.46	2015-12-21	CopyRighter	Download	Doc

Parses microbial profiles and, because gene copy number (GCN) estimates are pre-computed for all taxa in the reference taxonomy, rapidly corrects GCN bias. The CopyRighter bioinformatic tools permits rapid correction of GCN in microbial surveys, resulting in improved estimates of microbial abundance, alpha and beta diversity.

Remarque

Run Unix # copyrighter -i [optional arguments]

Run Web #

Version	MAJ	corona
4.2.2	2009-09-10	corona	Download	Doc

The SOLiD System Analysis Pipeline Tool (Corona Lite) is an off-instrument SOLiD data analysis software package. It supports functionality for mapping color space reads to large or small genomes, pairing for mate-pair runs, SNP calling and generating consensus sequences.

Remarque

Run Unix #

Run Web #

Version	MAJ	count_base
cc 30/06/2004	2004-06-16	count_base	Download	Doc

Programme pour compter les ATGC ds une sequence

Remarque

Run Unix # count_base.sh

Run Web #

Version	MAJ	count_codon
none	2004-08-01	count_codon	Download	Doc

Remarque

Run Unix # count_codon.pl

Run Web #

Version	MAJ	cross_match
0.990329	2002-11-06	cross_match	Download	Doc

Cross_Match uses the same algorithm as Swat but also allows the comparison of a pair of sequences to be constrained to bands of the Smith-Waterman matrix that surround one or more matching words in the sequences. This substantially increases speed for large-scale nucleotide sequence comparisons without compromising sensitivity.

Remarque

Run Unix # cross_match

Run Web #

Version	MAJ	cufflinks
2.2.0	2014-05-06	cufflinks	Download	Doc

Cufflinks assembles transcripts and estimates their abundances in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.

Remarque

Run Unix # cufflinks [options]*

Run Web #

Version	MAJ	cutadapt
1.7.1	2015-03-11	cutadapt	Download	Doc

cutadapt is used to remove adapter sequences from high-throughput sequencing data. This is usually necessary when the read length of the sequencing machine is longer than the molecule that is sequenced, for example when sequencing microRNAs.

Remarque

Run Unix # cutadapt [options] []

Run Web #

Version	MAJ	cytoscape
2.7.0	2010-05-07	cytoscape	Download	Doc

Cytoscape is an open source bioinformatics software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data. Although Cytoscape was originally designed for biological research, now it is a general platform for complex network analysis and visualization. Cytoscape core distribution provides a basic set of features for data integration and visualization.

Remarque

Run Unix # cytoscape

Run Web #

Version	MAJ	dadi
1.7	2016-07-18	dadi	Download	Doc

Diffusion Approximation for Demographic Inference ∂a∂i implements methods for demographic history and selection inference from genetic data, based on diffusion approximations to the allele frequency spectrum. One of ∂a∂i's main benefits is speed: fitting a two-population model typically takes around 10 minutes, and run time is independent of the number of SNPs in your data set. ∂a∂i is also flexible, handling up to three simultaneous populations, with arbitrary timecourses for population size and migration, plus the possibility of admixture and population-specific selection.

Remarque If you use ∂a∂i in your research, please cite RN Gutenkunst, RD Hernandez, SH Williamson, CD Bustamante "Inferring the joint demographic history of multiple populations from multidimensional SNP data" PLoS Genetics 5:e1000695 (2009).

Run Unix #

Run Web #

Version	MAJ	debarcer
0.3.1	2017-03-21	debarcer	Download	Doc

Debarcer (De-Barcoding and Error Correction) is a package for working with next-gen sequencing data that contains molecular barcodes. As it stands, it supports targeted sequencing libraries generated by SimSenSeq, a method of creating multiplexed barcoded sequencing libraries using PCR.

Remarque

Run Unix # runDebarcer.sh -u

Run Web #

Version	MAJ	delly
0.6.3	2015-02-25	delly	Download	Doc

DELLY is an integrated structural variant prediction method that can detect deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends and split-reads to sensitively and accurately delineate genomic rearrangements throughout the genome.

Remarque Citation Tobias Rausch, Thomas Zichner, Andreas Schlattl, Adrian M. Stuetz, Vladimir Benes, Jan O. Korbel. Delly: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 2012 28: i333-i339.

Run Unix # Usage: delly [OPTIONS] ...

Run Web #

Version	MAJ	DESeq2
1.6.3		DESeq2	Download	Doc

SARTools is a R package dedicated to the differential analysis of RNA-seq data. It provides tools to generate descriptive and diagnostic graphs, to run the differential analysis with one of the well known DESeq2 or edgeR packages and to export the results into easily readable tab-delimited files. It also facilitates the generation of a HTML report which displays all the figures produced, explains the statistical methods and gives the results of the differential analysis.

Remarque

Run Unix #

Run Web #

Version	MAJ	dialign
2.2.1	2005-12-06	dialign	Download	Doc

sDIALIGN is a software program for multiple alignment developed by Burkhard Morgenstern et al. While standard alignment methods rely on comparing single residues and imposing gap penalties, DIALIGN constructs pairwise and multiple alignments by comparing whole segments of the sequences. No gap penalty is used. This approach is especially efficient where sequences are not globally related but share only local similarities, as is the case with genomic DNA and with many protein families.

Remarque

Run Unix # dialign

Run Web #

Version	MAJ	diamond
0.7.9	2015-12-02	diamond	Download	Doc

DIAMOND is a new high-throughput program for aligning a file of short reads against a protein reference database such as NR, at 20,000 times the speed of BLASTX, with high sensitivity. DIAMOND is a new alignment tool for aligning short DNA sequencing reads to a protein reference database such as NCBI-NR. On Illumina reads of length 100-150bp, in fast mode, DIAMOND is about 20,000 times faster than BLASTX, while reporting about 80-90% of all matches that BLASTX finds, with an e-value of at most 1e-5. In sensitive mode, DIAMOND ist about 2,500 times faster than BLASTX, finding more than 94% of all matches.

Remarque

Run Unix # diamond COMMAND [OPTIONS]

Run Web #

Version	MAJ	DisplayMUM
1.05	2005-06-30	DisplayMUM	Download	Doc

Remarque

Run Unix # displaymums

Run Web #

Version	MAJ	Dizzy
1.11.4	2007-09-04	Dizzy	Download	Doc

Simulation de systèmes stochastiques.

Remarque An article describing Dizzy has been published, Ramsey S., Orrell D. and Bolouri H. Dizzy: stochastic simulation of large-scAn article describing Dizzy has been published, Ramsey S., Orrell D. and Bolouri H. Dizzy: stochastic simulation of large-scale genetic regulatory networks. J. Bioinf. Comp. Biol. 3(2) 415-436, 2005.ale genetic regulatory networks. J. Bioinf. Comp. Biol. 3(2) 415-436, 2005.

Run Unix # Dizzy

Run Web #

Version	MAJ	DOMIRE
-	2014-01-20	DOMIRE	Download	Doc

(DOMain Identification from REcurrence) is a server using VAST (Vector Alignment Search Tool, protein 3D structure comparison) to define the domain boundaries in proteins from their 3 D structures (Tai et al, 2010). It provides also a list of structural neighbours.

Remarque

Run Unix #

Run Web # http://genome.jouy.inra.fr/domire

Version	MAJ	dotur
1.53	2007-08-30	dotur	Download	Doc

DOTUR est un programme qui prend en entrée une matrice décrivant les distances génétiques entre des séquences d'ADN pour les assigner à des unités taxonomiques opérationelles (OTUs). DOTUR utilise la composition des OTUs pour calculer des courbes de raréfaction et de collection pour évaluer l'intensité, la richesse et la diversité de l'échantillon.

Remarque

Run Unix # dotur

Run Web #

Version	MAJ	dsrc2
2.0	2016-10-17	dsrc2	Download	Doc

DNA Sequence Reads Compression is an application designed for compression of data files containing reads from DNA sequencing in FASTQ format. The amount of such files can be huge, e.g., a few (or tens) of gigabytes, so a need for a robust data compression tool is clear. Usually universal compression programs like gzip or bzip2 are used for this purpose, but it is obvious that a specialized tool can work better.

Remarque

Run Unix # usage: dsrc [options]

Run Web #

Version	MAJ	dssp
2.0.3		dssp	Download	Doc

DSSP permet de dÃ©finir les structures secondaires dans les protÃ©ines Ã partir des fichiers PDB

Remarque

Run Unix # dssp

Run Web #

Version	MAJ	dwgsim
0.1.10	2013-08-02	dwgsim	Download	Doc

Whole genome simulation can be performed with dwgsim. dwgsim is based off of wgsim found in SAMtools written by Heng Li. It was modified to handle ABI SOLiD data, as well as various assumptions about aligners and positions of indels. The documentation below is for the latest dwgsim (not DNAA) release.

Remarque

Run Unix # dwgsim [options]

Run Web #

Version	MAJ	EDGE-pro
1.3.1	2013-07-02	EDGE-pro	Download	Doc

EDGE-pro, Estimated Degree of Gene Expression in PROkaryots is an efficient software system to estimate gene expression levels in prokaryotic genomes from RNA-seq data. EDGE-pro uses Bowtie2 for alignment and then estimates expression directly from the alignment results. EDGE-pro includes routines to assign reads aligning to overlapping gene regions accurately. 15% or more of bacterial genes overlap other genes, making this a significant problem for bacterial RNA-seq, one that is generally ignored by programs designed for eukaryotic RNA-seq experiments.

Remarque Please reference our paper: T. Magoc, D. Wood, and S.L. Salzberg. EDGE-pro: Estimated Degree of Gene Expression in Prokaryotic Genomes. Evolutionary Bioinformatics vol.9, pp.127-136, 2013.

Run Unix # edge.pl <-g genome> <-p ptt> <-r rnt> <-u reads>

Run Web #

Version	MAJ	edgeR
3.8.6		edgeR	Download	Doc

Remarque

Run Unix #

Run Web #

Version	MAJ	ELPH
1.0.1	2012-10-01	ELPH	Download	Doc

ELPH is a general-purpose Gibbs sampler for finding motifs in a set of DNA or protein sequences. The program takes as input a set containing anywhere from a few dozen to thousands of sequences, and searches through them for the most common motif, assuming that each sequence contains one copy of the motif. We have used ELPH to find patterns such as ribosome binding sites (RBSs) and exon splicing enhancers (ESEs).

Remarque

Run Unix # elph [options] OR elph

[-t ]

Run Web #

Version	MAJ	eLSA
81a2ee0	2017-02-01	eLSA	Download	Doc

The Extended Local Similarity Analysis (ELSA) tools subsequently F-transform and normalize the raw data (matrices of time series) and then calculate the Local Similarity (LS) Scores and/or Local Trend Scores. The tools then assess the statistical significance (P-values) of these correlation statistics using either permutation test or theoretical p-value approximation and filter out insignificant results. Finally, the tools construct a partially directed association network from significant associations.

Remarque

Run Unix # eLSA_env

Run Web #

Version	MAJ	emboss
6.6.0.0	2013-08-19	emboss	Download	Doc

Within EMBOSS you will find around 100 programs (applications). These are just some of the areas covered (Sequence alignment, Rapid database searching with sequence patterns,Protein motif identification, including domain analysis, Nucleotide sequence pattern analysis, for example to identify CpG islands or repeats, Codon usage analysis for small genomes, Rapid identification of sequence patterns in large scale sequence sets, Presentation tools for publication...)

Remarque

Run Unix #

Run Web # http://genome.jouy.inra.fr/cgi-bin/emboss-explorer/emboss.pl

Version	MAJ	ESAP
1.0		ESAP	Download	Doc

Programme de prediction de la conformation de boucles dans les proteines.

Remarque

Run Unix #

Run Web #

Version	MAJ	ESPript
2.3	2009-01-11	ESPript	Download	Doc

ESPript, Easy Sequencing in Postscript, is a utility to generate a pretty PostScript output from aligned sequences.

Remarque

Run Unix # ESPript

Run Web #

Version	MAJ	esprit
	2009-07-07	esprit	Download	Doc

ESPRIT is a pipeline for estimating species richness using large collections of 16S rRNA pyrosequences.

Remarque

Run Unix # esprit_pc

Run Web #

Version	MAJ	fasta
3.6	2014-02-21	fasta	Download	Doc

A set of sequence comparison tools (fasta36, ggsearch...) used for alignment and database searching.For example, fasta compares a protein sequence to another protein sequence or to a protein database, or a DNA sequence to another DNA sequence or a DNA library.

Remarque

Run Unix # fasta36

Run Web #

Version	MAJ	fastPHASE
1.4.0	2015-03-10	fastPHASE	Download	Doc

fastPHASE: software for haplotype reconstruction, and estimating missing genotypes from population data The program fastPHASE implements methods described in Scheet, P and Stephens, M (2006). A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet fastPHASE can handle larger data-sets than PHASE (e.g., hundreds of thousands of markers in thousands of individuals), but does not provide estimates of recombination rates. Our experiments suggest that haplotype estimates are slightly less accurate than from PHASE, but missing genotype estimates appear to be similar or even slightly better than PHASE.

Remarque

Run Unix # fastPHASE [options]

Run Web #

Version	MAJ	FastQC
0.10.0	2012-03-05	FastQC	Download	Doc

FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.

Remarque

Run Unix # fastqc ou fastqc seqfile1 seqfile2 .. seqfileN

Run Web #

Version	MAJ	fastqp
0.1.9.1	2017-02-27	fastqp	Download	Doc

Simple FASTQ, SAM and BAM read quality assessment and plotting using Python.

Remarque

Run Unix # fastqp [-h]

Run Web #

Version	MAJ	Fastq_Screen
0.4.4	2014-07-09	Fastq_Screen	Download	Doc

Fastq screen is a simple application which allows you to search a large sequence dataset against a panel of different databases to build up a picture of where the sequences in your data originate. It was built as a QC check for sequencing pipelines but may also have uses in metagenomics studies where mixed samples are expected. Although the program wasn't built with any particular technology in mind it is probably only really suitable for processing short reads due to the use of bowtie/bowtie2 as the searching application. The program generates both text and graphical output to tell you what proportion of your library was able to map, either uniquely or in more than one location, against each of the databases in your search set.

Remarque

Run Unix # fastq_screen [OPTION]... [FastQ FILE]...

Run Web #

Version	MAJ	FASTX-Toolkit
0.0.13	2013-04-30	FASTX-Toolkit	Download	Doc

The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing. Next-Generation sequencing machines usually produce FASTA or FASTQ files, containing multiple short-reads sequences (possibly with quality information).

Remarque

Run Unix #

Run Web #

Version	MAJ	FigTree
1.4.0	2013-11-15	FigTree	Download	Doc

FigTree is designed as a graphical viewer of phylogenetic trees and as a program for producing publication-ready figures. As with most of my programs, it was written for my own needs so may not be as polished and feature-complete as a commercial program. In particular it is designed to display summarized and annotated trees produced by BEAST.

Remarque

Run Unix # figtree

Run Web #

Version	MAJ	Filter pileup
1.0.2		Filter pileup	Download	Doc

Allows one to find sequence variants and/or sites covered by a specified number of reads with bases above a set quality threshold. The tool works on six and ten column pileup formats produced with samtools pileup command. However, it also allows you to specify columns in the input file manually.

Remarque

Run Unix #

Run Web #

Version	MAJ	FinchTV
1.3.1	2008-10-14	FinchTV	Download	Doc

FinchTV (Finch Trace Viewer), a cross-platform graphical viewer for chromatogram files.s

Remarque

Run Unix # finchtv

Run Web #

Version	MAJ	findtarget
none	2004-05-15	findtarget	Download	Doc

Findtarget est un outil de comparaison génomique qui permet de cibler des gènes d'intérêts chez un micro-organisme dont le génome est séquencé. Il utilise des données issues de blast.

Remarque

Run Unix #

Run Web # http://migale.jouy.inra.fr/outils/findtarget.html

Version	MAJ	FLASH
1.2.11	2014-11-13	FLASH	Download	Doc

FLASH, Fast Length Adjustment of SHort reads, is a very accurate fast tool to merge paired-end reads from fragments that are shorter than twice the length of reads. The extended length of reads has a significant positive impact on improvement of genome assemblies.

Remarque

Run Unix # flash [OPTIONS] MATES_1.FASTQ MATES_2.FASTQ Run `flash --help | less' for more information.

Run Web #

Version	MAJ	flux-simulator
1.2.1	2013-07-15	flux-simulator	Download	Doc

The Flux Simulator aims at modeling RNA-Seq experiments in silico: sequencing reads are produced from a reference genome according annotated transcripts. The simulation pipeline models different steps as modules, each with a minimal set of parameters that can be estimated by experimental parameters. The first step is-in fact-a transcriptome simulator. Subsequently, common sources of systematic bias in the abundance and distribution of produced reads are simulated by in silico library preparation and sequencing.

Remarque

Run Unix # flux-simulator --help

Run Web #

Version	MAJ	fmtseq
1.2.2	2004-01-21	fmtseq	Download	Doc

Conversion de formats de sequence. Réimplémentation et extension du programme Readseq (conversion depuis et vers le format Clustalw et indication du format d'entrée.

Remarque Fait partie du paquetage seqio-1.2.2

Run Unix # fmtseq

Run Web #

Version	MAJ	fpc
8.9	2007-11-13	fpc	Download	Doc

FPC (fingerprinted contigs) is an interactive program for building contigs from fingerprinted clones, where the fingerprint for a clone is a set of restriction fragments.

Remarque

Run Unix # fpc

Run Web #

Version	MAJ	freebayes
v1.1.0-1-gf15e66e	2017-02-16	freebayes	Download	Doc

FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.

Remarque Citing freebayes: Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907 [q-bio.GN] 2012

Run Unix # freebayes -f [REFERENCE] [OPTIONS] [BAM FILES] >[OUTPUT]

Run Web #

Version	MAJ	FROGS
0.0.6		FROGS	Download	Doc

Find Rapidly OTUs with Galaxy Solution: FROGS is a galaxy/CLI workflow designed to produce an OTU count matrix from high depth sequencing amplicon data. This workflow is focused on: - User-friendliness with the integration in galaxy and lots of rich graphic outputs - Accuracy with a clustering without global similarity threshold, the management of multi-affiliations and management of separated PCRs in the chimera removal step - Speed with fast algorithms and an easy to use parallelisation - Scalability with algorithms designed to support the data growth

Remarque

Run Unix #

Run Web #

Version	MAJ	frost
0.4.3	2002-05-01	frost	Download	Doc

Outils de reconnaissance de repliement

Remarque

Run Unix #

Run Web # http://genome.jouy.inra.fr/frost

Version	MAJ	FSA-BLAST
1.03	2005-12-16	FSA-BLAST	Download	Doc

FSA-BLAST is a new version of the popular BLAST (Basic Local Alignment Search Tool) bioinformatics tool, used to search genomic databases containing either protein or nucleotide sequences. FSA stands for Faster Search Algorithm; FSA-BLAST is twice as fast as NCBI-BLAST with no loss in accuracy.

Remarque

Run Unix # formatdb, cluster, blast, readdb, ssearch

Run Web #

Version	MAJ	GALF_P
-	2010-03-18	GALF_P	Download	Doc

GALF-P is a novel framework for TFBS identification (motif discovery) in DNA sequences. It consists of Genetic Algorithm with Local Filtering (GALF) and the post-processing procedure based on adaptive adding and removing. GALF-P achieves both effectiveness and efficiency, and provides reliable performance over the other state-of-art GA based approaches. The post-processing procedure is designed for zero or more TFBSs in each sequence.

Remarque

Run Unix # GALF_P.o

Run Web #

Version	MAJ	GapCloser
1.12	2015-07-13	GapCloser	Download	Doc

GapCloser for SOAPdenovo The GapCloser is designed to close the gaps emerging during the scaffolding process by SOAPdenovo or other assembler, using the abundant pair relationships of short reads. GapCloser aims for large plant and animal genomes, although it also works well on bacteria and fungi genomes.

Remarque

Run Unix # GapCloser [options]

Run Web #

Version	MAJ	GASSST
1.28	2013-08-25	GASSST	Download	Doc

GASSST : Global Alignment Short Sequence Search Tool * GASSST finds global alignments of short DNA sequences against large DNA banks. * GASSST strong point is its ability to perform fast gapped alignments. * It works well for both short and longer reads. It currently has been tested for reads up to 500bp. * The software is freely available for download under the CECILL version 2 License.

Remarque http://bioinformatics.oxfordjournals.org/content/26/20/2534.abstract?keytype=ref&ijkey=f5zH80QsuCqixRH

Run Unix # Gassst -d -i -o -p

Run Web #

Version	MAJ	gatk
3.5	2016-01-25	gatk	Download	Doc

The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyze high-throughput sequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Remarque

Run Unix # java -jar /usr/local/genome/gatk/GenomeAnalysisTK.jar -h

Run Web #

Version	MAJ	Gblocks
0.91b	2006-07-19	Gblocks	Download	Doc

Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis Gblocks eliminates poorly aligned positions and divergent regions of an alignment of DNA or protein sequences.

Remarque

Run Unix # Gblocks

Run Web #

Version	MAJ	GEM
20121106-022124	2013-07-25	GEM	Download	Doc

The GEM library (Also home to: The GEM mapper, The GEM RNA mapper, The GEM mappability, and others). Next-generation sequencing platforms (Illumina/Solexa, ABI/SOLiD, etc.) call for powerful and very optimized tools to index/analyze huge genomes. The GEM library strives to be a true "next-generation" tool for handling any kind of sequence data, offering state-of-the-art algorithms and data structures specifically tailored to this demanding task. At the moment, efficient indexing and searching algorithms based on the Burrows-Wheeler transform (BWT) have been implemented.

Remarque

Run Unix #

Run Web #

Version	MAJ	geneclust
1-0.0	2007-03-27	geneclust	Download	Doc

GeneClust is a piece of computer software which can be used as a tool for exploratory analysis of gene expression microarray data. The development of GeneClust was motivated by surging interest to search for interpretable biological structure in gene expression microarray data.

Remarque

Run Unix # geneclust

Run Web #

Version	MAJ	genehunter
2.1_r2	2002-09-27	genehunter	Download	Doc

Multipoint analysis of pedigree data including: non-parametric linkage analysis, LOD-score computation, information-content mapping, haplotype reconstruction

Remarque

Run Unix # gh

Run Web #

Version	MAJ	GenePRIMP
0.3	2013-04-19	GenePRIMP	Download	Doc

Identification of anomalous gene calls The GenePRIMP pipeline consists of a series of computational units that identify erroneous gene calls and missed genes, and then correct a subset of the identified anomalous features. The data input to GenePRIMP needs to be a file of gene calls in GenBank or EMBL format. As its output, GenePRIMP generates reports of identified anomalies, plus a corrected EMBL file.

Remarque

Run Unix # geneprimp

Run Web #

Version	MAJ	genewise
2.2.0	2008-12-10	genewise	Download	Doc

Genewise permet de comparer une protéine à une banque d'ADN et en prédire sa structure, tout en se déchargeant des problèmes liés au sequencage et d'introns.

Remarque

Run Unix # genewise

Run Web #

Version	MAJ	GenomeThreader
1.6.0	2013-03-12	GenomeThreader	Download	Doc

GenomeThreader is a software tool to compute gene structure predictions. The gene structure predictions are calculated using a similarity-based approach where additional cDNA/EST and/or protein sequences are used to predict gene structures via spliced alignments. GenomeThreader was motivated by disabling limitations in GeneSeqer, a popular gene prediction program which is widely used for plant genome annotation.

Remarque

Run Unix # gth [option ...] -genomic file [...] -cdna file [...] -protein file [...]

Run Web #

Version	MAJ	genscan
1.0	2007-10-24	genscan	Download	Doc

Remarque

Run Unix # genscan

Run Web #

Version	MAJ	gimsan
20100830	2011-01-10	gimsan	Download	Doc

GIMSAN (GIbbsMarkov with Significance ANalysis): a novel tool for de novo motif finding. GIMSAN combines GibbsMarkov, our variant of the Gibbs Sampler, described here for the first time, with our recently introduced significance analysis.

Remarque please cite: Patrick Ng, Uri Keich. GIMSAN: A Gibbs motif finder with significance analysis. Bioinformatics, 24 (19): 2256-2257, 2008.

Run Unix # gimsan_submit_job.pl

Run Web #

Version	MAJ	glimmer
glimmer-3.02	2008-12-12	glimmer	Download	Doc

Glimmer (Gene Locator and Interpolated Markov ModelER) prédit la position des gènes dans une séquence d'ADN (bactérie, archae, virus) en s'appuyant sur des modèles de Markov.

Remarque

Run Unix # glimmer3

Run Web #

Version	MAJ	GMAP/GSNAP
2013-10-25	2013-10-28	GMAP/GSNAP	Download	Doc

GMAP (genomic mapping and alignment program for mRNA and EST sequences): gmap, a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models. GSNAP (Genomic Short-read Nucleotide Alignment Program): GSNAP implements computational methods for fast detection of complex variants and splicing in short reads, based on a successively constrained search process of merging and filtering position lists from a genomic index. It can align both single- and paired-end reads as short as 14 nt and of arbitrarily long length. It can detect short- and long-distance splicing, including interchromosomal splicing, in individual reads, using probabilistic models or a database of known splice sites.

Remarque

Run Unix # gmap [OPTIONS...]

Run Web #

Version	MAJ	gmorse
1.0	2009-08-05	gmorse	Download	Doc

G-Mo.R-Se is a method aimed at using RNA-Seq short reads to build de novo gene models. First, candidate exons are built directly from the positions of the reads mapped on the genome (without any ab initio assembly of the reads), and all the possible splice junctions between those exons are tested against unmapped reads : the testing of junctions is directed by the information available in the RNA-Seq dataset rather than a priori knowledge about the genome. Exons can thus be chained into stranded gene models.

Remarque

Run Unix # gmorse -h

Run Web #

Version	MAJ	goby
1.4.1	2010-04-15	goby	Download	Doc

Goby is a next-gen data management framework designed to facilitate the implementation of efficient next-gen data analysis pipelines. Goby provides compressed file formats that are time and space efficient. It also provides a few utilities that support the most common secondary data analyses

Remarque

Run Unix # goby

Run Web #

Version	MAJ	GORIV
1.0	2008-01-21	GORIV	Download	Doc

MÃ©thode de prÃ©diction de la structure secondaire des protÃ©ines Ã partir de la sÃ©quence en acides aminÃ©s.

Remarque

Run Unix # gorIV

Run Web #

Version	MAJ	grape
1.1	2008-10-21	grape	Download	Doc

GRAPe is a tool for computing genome re-alignment using marginalized posterior decoding.sTo answer this question, GRAPe uses the Marginalized Posterior Decoding (MPD) algorithm which uses the posterior distribution of alignments to optimize the correct assignment of homology of individual nucleotides, instead of finding a single most probable alignment. Simulations show that the MPD algorithm has higher sensitivity and specificity than the Viterbi and Needleman-Wunsch algorithms.

Remarque

Run Unix # grape

Run Web #

Version	MAJ	grepseq
1.2.2	2004-01-21	grepseq	Download	Doc

The `grepseq' program takes a keyword which can contain ambiguous characters and character classes (also called a fixed-width motif) and then searches files and databases for exact or approximate matches to that keyword. The program produces one of two kinds of output, either a list of the matching sequences with the places where the keyword matched, or the complete entries of sequences containing matches, where each entry is annotated with the places where the matches occur.

Remarque Fait partie de seqio

Run Unix # grepseq

Run Web #

Version	MAJ	gril
1.0.0		gril	Download	Doc

GRIL is a tool to detect the locations of genomic rearrangements in a set of sequences.

Remarque

Run Unix # gril

Run Web #

Version	MAJ	HH-suite
2.0.16	2013-07-24	HH-suite	Download	Doc

The HH-suite is an open-source software package for highly sensitive sequence searching and sequence alignment. Its two most important programs are HHsearch and HHblits. Both are based on the pairwise comparison of profile hidden Markov models (HMMs).

Remarque

Run Unix #

Run Web #

Version	MAJ	HISAT2
2.0.4	2016-09-07	HISAT2	Download	Doc

HISAT is a fast and sensitive spliced alignment program for mapping RNA-seq reads. In addition to one global FM index that represents a whole genome, HISAT uses a large set of small FM indexes that collectively cover the whole genome (each index represents a genomic region of ~64,000 bp and ~48,000 indexes are needed to cover the human genome). These small indexes (called local indexes) combined with several alignment strategies enable effective alignment of RNA-seq reads, in particular, reads spanning multiple exons. The memory footprint of HISAT is relatively low (~4.3GB for the human genome). We have developed HISAT based on the Bowtie2 implementation to handle most of the operations on the FM index.

Remarque

Run Unix # hisat2 [options]* -x {-1 -2 | -U | --sra-acc } [-S ]

Run Web #

Version	MAJ	hmmer
3.1	2013-08-23	hmmer	Download	Doc

HMMER: profile HMMs for protein sequence analysis Profile hidden Markov models (profile HMMs) can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus.

Remarque

Run Unix #

Run Web #

Version	MAJ	hmmtop
2.1	2004-09-25	hmmtop	Download	Doc

Prediction of transmembrane helices and topology for transmembrane proteins using hidden Markov models

Remarque

Run Unix # hmmtop

Run Web #

Version	MAJ	html4blast
1.6a	2003-05-15	html4blast	Download	Doc

Html4blast est un logiciel d'analyse et de présentation des résultats de Blast.

Remarque Utilsie par findtarget

Run Unix # html4blast [options]

Run Web # http://bioweb.pasteur.fr/seqanal/interfaces/html4blast.html

Version	MAJ	i-ADHoRe
3.0.01	2013-10-30	i-ADHoRe	Download	Doc

This novel version of i-ADHoRe is designed to detect genomic homology in extremely large-scale data sets. Along with several under-the hood-improvements, resulting in a 30 fold reduction in runtime over previous versions, the implementation of multithreading and MPI now enables i-ADHoRe to take advantage of a parallel computing platform. As the scale of the data sets increased, the need for a new alignment algorithm able to cope with dozens of genomic segments became apparent. Therefore a new greedy graph based alignment algorithm has been implemented (described in Fostier et al., 2011), allowing analysis of even the largest data sets currently available.

Remarque

Run Unix # i-adhore

Run Web #

Version	MAJ	ICORN
0.97	2010-11-03	ICORN	Download	Doc

iCORN (iterative correction of reference nucleotides) can correct genome sequences with short reads. Reads are mapped iteratively against the genome sequences, so far by SSAHA. Discrepancies between the multiple alignments of the mapping reads and reference are corrected, if by the correction the amount of perfect mapping reads doesn't decrease.

Remarque

Run Unix # cf. http://icorn.sourceforge.net/example.html

Run Web #

Version	MAJ	idba
1.1.1	2015-12-02	idba	Download	Doc

IDBA is a practical iterative De Bruijn Graph De Novo Assembler for sequence assembly in bioinfomatics. Most assemblers based on de Bruijn graph build a de Bruijn graph with a specific k to perform the assembling task. For all of them, it is very crucial to find a specific value of k. If k is too large, there will be a lot of gap problems in the graph. If k is too small, there will a lot of branch problems. IDBA uses not only one specific k but a range of k values to build the iterative de Bruijn graph. It can keep all the information in graphs with different k values. So, it will perform better than other assemblers.

Remarque If you use our assembler in your research, please cite our papers. Peng, Y., et al. (2010) IDBA- A Practical Iterative de Bruijn Graph De Novo Assembler. RECOMB. Lisbon.

Run Unix # idba_ud -r read.fa -o output_dir

Run Web #

Version	MAJ	igv
2.3.67	2016-01-11	igv	Download	Doc

The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.

Remarque To cite your use of IGV in your publication: James T. Robinson, Helga Thorvaldsdóttir, Wendy Winckler, Mitchell Guttman, Eric S. Lander, Gad Getz, Jill P. Mesirov. Integrative Genomics Viewer. Nature Biotechnology 29, 24–26 (2011)

Run Unix # igv

Run Web #

Version	MAJ	Illumina CASAVA-1.8 FASTQ Filter
0.1	2014-04-30	Illumina CASAVA-1.8 FASTQ Filter	Download	Doc

The recent version of Illumina's CASAVA pipeline (Version 1.8) produces FASTQ files with both reads that pass filtering and reads that don't. The new READ-ID (the @ line) contains many new fields, one of them indicates whether the read is filtered or not. This program can filter FASTQ files produced by CASAVA 1.8, and keep/discard reads based on this filter flag.

Remarque

Run Unix # fastq_illumina_filter -h

Run Web #

Version	MAJ	IM-TORNADO
2.0.3.3	2016-02-22	IM-TORNADO	Download	Doc

Illumina paired-end sequencing, which produces two separate reads for each DNA fragment, has become the platform of choice for 16S rDNA hypervariable tag sequencing. However, when the two reads do not overlap, existing computational pipelines analyze data from read separately and underutilize the information contained in the paired-end reads. IM-TORNADO is a tool for processing non-overlapping reads while retaining maximal information content.

Remarque If you use IM-TORNADO for your project, please cite the following manuscript: Jeraldo P, Kalari K, Chen X, Bhavsar J, Mangalam A, White B, et al. IM-TORNADO: A Tool for Comparison of 16S Reads from Paired-End Libraries. PLOS ONE 9 (12):e114804. Available from: http://dx.plos.org/10.1371/journal.pone.0114804

Run Unix #

Run Web #

Version	MAJ	indel-Seq-Gen
2.1.03	2012-08-24	indel-Seq-Gen	Download	Doc

indel-Seq-Gen (iSG) is a biological sequence simulation program that simulates highly divergent DNA sequences and protein superfamilies. This is accomplished through the addition of subsequence length constraints and lineage- and site-specific evolution. iSG tracks insertion and deletion processes that occur during the simulation run. iSG records all evolutionary events and outputs the "true" multiple alignment of the sequences, and can generate a larger simulated sequence space by allowing the use of multiple related root sequences. iSG can be used to test the accuracy of multiple alignment methods, evolutionary hypotheses, ancestral protein reconstruction methods, and protein superfamily classification methods. iSG utilizes a highly modified version of the substitution engine from Seq-Gen v1.3.2.

Remarque

Run Unix # indel-seq-gen [-bdefghilmnoqsuwz] < [tree_file] (indel-seq-gen -h)

Run Web #

Version	MAJ	inGAP
2.7.8	2011-11-02	inGAP	Download	Doc

This is a novel mining pipeline (2009), Integrative Next-generation Genome Analysis Pipeline (inGAP), guided by a Bayesian principle to detect single nucleotide polymorphisms (SNPs), insertion/deletions (indels) by comparing high-throughput pyrosequencing reads with a reference genome of related organisms. inGAP can be applied to the mapping of both Roche/454 and Illumina reads with no restriction of read length.

Remarque

Run Unix # inGAP

Run Web #

Version	MAJ	InParanoid
4.1	2011-01-21	InParanoid	Download	Doc

InParanoid is a program for automatic identification of orthologs while differentiating between inparalogs and outparalogs. An InParanoid cluster is seeded by a reciprocally bestmatching ortholog pair, around which inparalogs are gathered independently, while outparalogs are excluded. The InParanoid database is a collection of pairwise ortholog groups aiming to include all 'completely sequenced' eukaryotic genomes. By this we mean above 6X coverage, and less than 1% X letters in the protein sequences.

Remarque

Run Unix # Usage: inparanoid.pl [FASTAFILE with sequences of species C]

Run Web #

Version	MAJ	Insyght
	2014-01-01	Insyght	Download	Doc

Insyght is genomic visualisation tool that combines a symbolic and a proportional view of the genes, syntenies and genomic regions. Another of Insyght's feature is synchronized navigation and zooming across multiple species.

Remarque

Run Unix #

Run Web # http://migale.jouy.inra.fr/IGO/

Version	MAJ	JAGS
3.4.0	2013-12-17	JAGS	Download	Doc

JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation not wholly unlike BUGS. JAGS was written with three aims in mind: To have a cross-platform engine for the BUGS language To be extensible, allowing users to write their own functions, distributions and samplers. To be a plaftorm for experimentation with ideas in Bayesian modelling

Remarque

Run Unix # jags

Run Web #

Version	MAJ	jalview
2.9.0	2016-03-10	jalview	Download	Doc

Jalview is a multiple alignment editor

Remarque

Run Unix # jalview

Run Web #

Version	MAJ	jellyfish
1.1.3	2011-12-21	jellyfish	Download	Doc

JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence. JELLYFISH can count k-mers using an order of magnitude less memory and an order of magnitude faster than other k-mer counting packages by using an efficient encoding of a hash table and by exploiting the "compare-and-swap" CPU instruction to increase parallelism. JELLYFISH is a command-line program that reads FASTA and multi-FASTA files containing DNA sequences. It outputs its k-mer counts in an binary format, which can be translated into a human-readable text format using the "jellyfish dump" command. See the documentation below for more details.

Remarque If you use JELLYFISH in your research, please cite: Guillaume Marcais and Carl Kingsford, A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics (2011) 27(6): 764-770 (first published online January 7, 2011) doi:10.1093/bioinformatics/btr011

Run Unix # jellyfish

Run Web #

Version	MAJ	Julia
0.5.0	2017-02-08	Julia	Download	Doc

Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. It is a very performant programming language somehow similar to R, Matlab or Python, but with performances approaching those of C/Fortran.

Remarque

Run Unix # julia

Run Web #

Version	MAJ	kaiju
1.5.0	2017-05-14	kaiju	Download	Doc

Kaiju is a program for the taxonomic classification of metagenomic high-throughput sequencing reads. Each read is directly assigned to a taxon within the NCBI taxonomy by comparing it to a reference database containing microbial and viral protein sequences.

Remarque Citation Menzel P., Ng K.L., Krogh A. (2016) Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7:11257

Run Unix # kaiju -t nodes.dmp -f kaiju_db.fmi -i reads.fastq [-j reads2.fastq]

Run Web #

Version	MAJ	kaksi
2.3rc1	2008-07-01	kaksi	Download	Doc

Kaksi est un outil d'assignation des structures secondaires. D'après un fichier PDB contenant les coordonnées atomiques d'une protéine, kaksi définit la position des hélices alpha et des feuillets beta. La méthode d'assignation utilise les distances entre carbones alpha et les angles dièdres phi/psi du squelette protéique. Un calcul d'axes permet d'assurer la régularité des hélices assignées : une hélice présentant un coude sera décrite sous la forme de deux hélices distinctes. Les paramètres de détection -valeurs tolérées pour les distances et les angles- peuvent être modifiés en ligne de commande (se reporter à Martin et al, BMC Structural Biology 2005 pour une discussion détaillée du choix des paramètres). Les résultats sont retournés à l'utilisateur sous forme d'un fichier xml. Un utilitaire permettant d'extraire les principales informations au format fasta est fourni avec le programme.

Remarque

Run Unix # kaksi -pf my_pdb_file.pdb

Run Web #

Version	MAJ	kaskad
1.0		kaskad	Download	Doc

Outil pour l'extraction d'information temporelle sur les gènes à partir de corpus de textes.

Remarque

Run Unix #

Run Web #

Version	MAJ	kClust
1.0	2015-01-21	kClust	Download	Doc

kClust is a fast and sensitive clustering method for the clustering of protein sequences. It is able to cluster large protein databases down to 20-30% sequence identity. kClust generates a clustering where each cluster is represented by its longest sequence (representative sequence).

Remarque For generating one multiple sequence alignment file for each cluster, please use kClust_mkAln. Type kClust_mkAln

Run Unix # kClust -i [fasta-db-file] -d [directory] [options]

Run Web #

Version	MAJ	khmer
2.0	2015-11-25	khmer	Download	Doc

The khmer software is a set of command-line tools for working with DNA shotgun sequencing data from genomes, transcriptomes, metagenomes, and single cells. khmer can make de novo assemblies faster, and sometimes better. khmer can also identify (and fix) problems with shotgun data.

Remarque

Run Unix # -

Run Web #

Version	MAJ	Klast
4.4	2015-04-24	Klast	Download	Doc

KLAST is a fast, accurate and NGS scalable bank-to-bank sequence similarity search tool providing significant accelerations of seeds-based heuristic comparison methods, such as the Blast suite of algorithms. Relying on unique software architecture, KLAST takes full advantage of recent multi-core personal computers without requiring any additional hardware devices.

Remarque

Run Unix #

Run Web #

Version	MAJ	kmergenie
1.6663	2014-06-23	kmergenie	Download	Doc

KmerGenie estimates the best k-mer length for genome de novo assembly. Given a set of reads, KmerGenie first computes the k-mer abundance histogram for many values of k. Then, for each value of k, it predicts the number of distinct genomic k-mers in the dataset, and returns the k-mer length which maximizes this number. Experiments show that KmerGenie's choices lead to assemblies that are close to the best possible over all k-mer lengths.

Remarque

Run Unix # kmergenie [options]

Run Web #

Version	MAJ	kraken
0.10.5	2015-11-25	kraken	Download	Doc

raken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.

Remarque If you use Kraken in your research, please cite our paper; the citation is available on the Kraken website.

Run Unix # kraken [options]

Run Web #

Version	MAJ	KronaTools
2.6	2016-01-13	KronaTools	Download	Doc

Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

Remarque

Run Unix #

Run Web #

Version	MAJ	kSNP
2.1.2	2014-04-24	kSNP	Download	Doc

Indentify SNPs in a set of genome sequences without the requirement of a reference sequence or a multiple sequence alignment. Reconstruction of SNP based phylogenies by maximum likelihood.

Remarque

Run Unix # kSNP -k kmer_length -f fasta -d output_directory [-p genomes4positions_list] [-u unassembled_genomes_list] [-m minimum_fraction_genomes_with_locus] [-G genbank.gbk] [-n num_CPU] [-j ] [-v ] [-c min_kmer_coverage]

Run Web #

Version	MAJ	lalnview
3.0	2005-07-01	lalnview	Download	Doc

LALNVIEW is a graphical program for visualizing local alignments between two sequences (protein or nucleic acids) [reference]. Sequences are represented by colored rectangles to give an overall picture of the similarities between the two sequences. Blocks of similarity between the two sequences are colored according to the degree of identity between the two segments.

Remarque

Run Unix # lalnview

Run Web #

Version	MAJ	LAST
861	2017-06-02	LAST	Download	Doc

LAST: Genome-Scale Sequence Comparison LAST finds similar regions between sequences, and aligns them. It is designed for comparing large datasets to each other (e.g. vertebrate genomes and/or large numbers of DNA reads). It can:

Remarque

Run Unix #

Run Web #

Version	MAJ	LEfSe
	2014-12-24	LEfSe	Download	Doc

LEfSe (Linear discriminant analysis Effect Size) determines the features (organisms, clades, operational taxonomic units, genes, or functions) most likely to explain differences between classes by coupling standard tests for statistical significance with additional tests encoding biological consistency and effect relevance. LEfSe is available as a Galaxy module, and as a bitbucket repository. For additional information, please refer to the LEfSe paper. We provide support for LEfSe users. Please join our Google group designated specifically for LEfSe users. F

Remarque

Run Unix #

Run Web #

Version	MAJ	linkage
5.1	2002-11-04	linkage	Download	Doc

The core of the LINKAGE package is a series of programs for maximum likelihood estimation of recombination rates, calculation of lod score tables, and analysis of genetic risks.

Remarque linkmapslinkmap.tracesmakepedspreplinksilinkslodscoresmlink

Run Unix # preplink ou linkmap ...

Run Web #

Version	MAJ	loco
0.990329		loco	Download	Doc

Remarque

Run Unix # loco

Run Web #

Version	MAJ	macs
1.4.2	2013-05-16	macs	Download	Doc

Next generation parallel sequencing technologies made chromatin immunoprecipitation followed by sequencing (ChIP-Seq) a popular strategy to study genome-wide protein-DNA interactions, while creating challenges for analysis algorithms. We present Model-based Analysis of ChIP-Seq (MACS) on short reads sequencers such as Genome Analyzer (Illumina / Solexa). MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, is publicly available open source, and can be used for ChIP-Seq with or without control samples.

Remarque

Run Unix # macs14 <-t tfile> [-n name] [-g genomesize] [options]

Run Web #

Version	MAJ	mafft
7.164	2014-08-12	mafft	Download	Doc

MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods.

Remarque

Run Unix # mafft [options] input > output

Run Web #

Version	MAJ	mallard
1.02	2007-08-09	mallard	Download	Doc

Ce programme permet la détection de séquence d'ADN ribosomal 16S chimère (Une chimère correspond à la fusion de plusieurs séquences d'ADN r 16S).

Remarque http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=16957188&ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum

Run Unix # mallard

Run Web #

Version	MAJ	mango
0.1.0	2008-02-12	mango	Download	Doc

Multiple Alignment with N Gapped OligossMANGO: A NEW APPROACH TO MULTIPLE SEQUENCE ALIGNMENT

Remarque Please use four scripts provided:smang8: MANGO with 8 seeds, without refinement;smang8r: MANGO with 8 seeds, with refinement;smang90: MANGO with 90 seeds, without refinement;smang90r: MANGO with 90 seeds, with refinement;s

Run Unix # mang8 ; mang8r ; mang90 ; mang90r

Run Web #

Version	MAJ	mapsembler
1.3.21	2012-05-31	mapsembler	Download	Doc

Mapsembler is a targeted assembly software. It takes as input a set of NGS raw reads and a set of input sequences (starters). It first determines if each starter is read-coherent, e.g. whether reads confirm the presence of each starter in the original sequence. Then for each read-coherent starter, Mapsembler outputs its sequence neighborhood as a linear sequence or as a graph, depending on the user choice.

Remarque Citation: Peterlongo, P., & Chikhi, R. (2012). Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer. BMC Bioinformatics, 13(1), 48. doi:10.1186/1471-2105-13-48.

Run Unix # mapsembler [-m value] [-o output] [-k value] [-i value] [-e value] [-d value] [-t value] [-E value] [-Clrgfcvsh]

Run Web #

Version	MAJ	MapSplice
1.15.2	2012-01-26	MapSplice	Download	Doc

MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery MapSplice est un algorithme de seconde génération de détection de sites d'épissage alternatifs. Son objectif est de détecter les sites d'épissage de façon sensible et spécifique en maintenant une bonne efficacité au niveau CPU et mémoire. MapSplice peut être appliqué aux reads courts (>75 pb) et long (75 pb). Il ne dépend ni des caractéristiques du site d'épissage ni de la longueur de l'intron, par conséquent, il peut détecter de nouveaux sites canoniques et non-canoniques d'épissage. MapSplice s'appuie sur la qualité et la diversité d'alignements des reads pour augmenter la précision de détection des sites d'épissage.

Remarque Publication MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery Kai Wang; Darshan Singh; Zheng Zeng; Stephen J. Coleman; Yan Huang; Gleb L. Savich; Xiaping He; Piotr Mieczkowski; Sara A. Grimm; Charles M. Perou; James N. MacLeod; Derek Y. Chiang; Jan F. Prins; Jinze Liu Nucleic Acids Research 2010; doi: 10.1093/nar/gkq622

Run Unix # python /usr/local/genome/MapSplice_1.15.2/bin/mapsplice_segments.py MapSplice.cfg

Run Web #

Version	MAJ	maq
0.7.1	2014-10-02	maq	Download	Doc

Maq is a software that builds mapping assemblies from short reads generated by the next-generation sequencing machines. It is particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has a preliminary functionality to handle AB SOLiD data.

Remarque

Run Unix # maq

Run Web #

Version	MAJ	mascot
1.9	2003-05-21	mascot	Download	Doc

Mascot est un outil de recherche puissant qui utilise des données de spéctrométrie de masse pour identifier des protéines à partir de séquences primaires des bases de données.

Remarque Accès restreint

Run Unix # none

Run Web # http://genome.jouy.inra.fr/mascot

Version	MAJ	MaSuRCA
2.3.1	2014-10-01	MaSuRCA	Download	Doc

MaSuRCA is whole genome assembly software. It combines the efficiency of the de Bruijn graph and Overlap-Layout-Consensus (OLC) approaches. MaSuRCA can assemble data sets containing only short reads from Illumina sequencing or a mixture of short reads and long reads (Sanger, 454).

Remarque

Run Unix #

Run Web #

Version	MAJ	matrix2png
1.2.1	2011-05-30	matrix2png	Download	Doc

Matrix2png is a simple but powerful program for making visualizations of microarray data and many other data types. It generates PNG formatted images from text files of data. It is fast, easy to use, and reasonably flexible. It can be used to generate publication-quality images, or to act as a image generator for web applications. Our group has found it useful for imaging all kinds of matrix-based data, not just microarray data.

Remarque If you use images created with matrix2png for publication or presentation, please cite:Pavlidis, P. and Noble W.S. (2003) Matrix2png: A Utility for Visualizing Matrix Data. Bioinformatics 19: 295-296 (abstract).Readers of the Bioinformatics application note: Here is the color version of the figure from the paper (pdf format).

Run Unix # matrix2png

Run Web #

Version	MAJ	mauve
2.4.0	2015-01-07	mauve	Download	Doc

Multiple Alignment of Conserved Regions in Genome Sequences

Remarque

Run Unix # mauve

Run Web #

Version	MAJ	MaxBin
2.2.1	2017-01-17	MaxBin	Download	Doc

MaxBin is a software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm. Users could understand the underlying bins (genomes) of the microbes in their metagenomes by simply providing assembled metagenomic sequences and the reads coverage information or sequencing reads. For users' convenience MaxBin will report genome-related statistics, including estimated completeness, GC content and genome size in the binning summary page. Users could use MEGAN or similar software on MaxBin bins to find out the taxonomy of each bin after the binning process is finished.

Remarque

Run Unix # run_MaxBin.pl -contig (contig file) -out (output file)

Run Web #

Version	MAJ	mcl
12-068	2013-08-22	mcl	Download	Doc

The MCL algorithm is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for networks (also known as graphs) based on simulation of (stochastic) flow in graphs.

Remarque

Run Unix # mcl <-|file name> [options], do 'mcl -h' or 'man mcl' for help

Run Web #

Version	MAJ	mega2
4.5.6	2012-08-06	mega2	Download	Doc

Mega2 est un logiciel qui sert à partir de trois fichiers d'entrée (pedigree, carte et locus) à créer tous les fichiers nécessaires à l'utilisation de logiciels d'analyse de liaison, d'haplotypes, d'IBD etc.. comme simwalk2, genehunter, vitesse, TDT, SAGE, Allegro ou encore Mendel. Sans Mega2, il faut formater tous les input ce qui est long, fastidieux et source d'erreurs...

Remarque If you use Mega2 as part of a published work, please remember to reference Mega2. You may reference it by citing the following: Mukhopadhyay N, Almasy L, Schroeder M, Mulvihill WP, Weeks DE (2005) Mega2: data-handling for facilitating genetic linkage and association analyses. Bioinformatics. 2005 May 15;21(10):2556-7. PMID: 15746282

Run Unix # mega2

Run Web #

Version	MAJ	Megahit
1.0.4-beta	2016-03-17	Megahit	Download	Doc

MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

Remarque

Run Unix # megahit [options] {-1 -2 | --12 | -r } [-o ]

Run Web #

Version	MAJ	megan
5.10.6	2015-07-12	megan	Download	Doc

MEGAN - Metagenome Analysis Software

Remarque

Run Unix # megan

Run Web #

Version	MAJ	memrec
1.11	2014-11-12	memrec	Download	Doc

The memrec (memory usage recorder) script is a tool we've written to watch the memory usage of a program.

Remarque

Run Unix # memrec [opts] prog

Run Web #

Version	MAJ	memsat3
3	2010-12-28	memsat3	Download	Doc

Transmembrane Protein Modelling

Remarque

Run Unix # memsat3 "query" "database" ou runmemsat.sh "query" "database"

Run Web #

Version	MAJ	merlin
1.1.2	2008-10-15	merlin	Download	Doc

MERLIN est un package qui permet d'effectuer des analyses génétiques rapides de pedigrees (analyses de liaison, d'association, haplotypes...).

Remarque

Run Unix # merlin

Run Web #

Version	MAJ	metagene
	2007-05-04	metagene	Download	Doc

Gene Finding Program for Metagenomics MetaGene predicts prokaryotic genes on anonymous genomic sequences. Fragmented sequences (longer than 100 bp) can be accepted.

Remarque

Run Unix # metagene [multi-fasta]

Run Web #

Version	MAJ	MetaGeneAnnotator
-	2009-01-26	MetaGeneAnnotator	Download	Doc

Version améliorée du programe d'annotation de données métagénomiques Metagene. Prediction de genes procaryotes à partir d'un génome ou d'un set de génomes anonymes. Particulierement adapté aux analyses métagénomiques.

Remarque

Run Unix # metageneannotator

Run Web #

Version	MAJ	MetaSim
0.9.5	2010-09-28	MetaSim	Download	Doc

MetaSim - A Sequencing Simulator for Genomics and Metagenomics

Remarque f you use this program for your own research please cite our software. Publication: Richter DC, Ott F, Auch AF, Schmid R, Huson DH (2008) MetaSim—A Sequencing Simulator for Genomics and Metagenomics. PLoS ONE 3(10): e3373. doi:10.1371/journal.pone.0003373

Run Unix # MetaSim

Run Web #

Version	MAJ	mga
none	2014-11-20	mga	Download	Doc

Multiple Genome Aligner (MGA for short) computes multiple genome alignments of large, closely related DNA sequences.

Remarque

Run Unix # mga

Run Web #

Version	MAJ	mgltools
1.5.6	2015-03-11	mgltools	Download	Doc

MGLTools is a software developed at the Molecular Graphics Laboratory (MGL) of The Scripps Research Institute for visualization and analysis of molecular structures. Short description and demo of its three main applications are given below. Navigation portlet on the left has links to downloads, screenshots, documentation section of this website where you can find more information about MGLTools. Please visit MGL Bugzilla to submit a bug report or to request a new feature.

Remarque

Run Unix # pmv, adt, vision

Run Web #

Version	MAJ	micca
1.5.0	2017-02-27	micca	Download	Doc

micca (MICrobial Community Analysis) is a software pipeline for the processing of amplicon sequencing data, from raw sequences to OTU tables, taxonomy classification and phylogenetic tree inference. The pipeline can be applied to a range of highly conserved genes/spacers, such as 16S rRNA gene, Internal Transcribed Spacer (ITS) and 28S rRNA.

Remarque

Run Unix # micca [--version] [--help] []

Run Web #

Version	MAJ	minia
1.4683	2013-02-21	minia	Download	Doc

Minia is a short-read assembler based on a de Bruijn graph, capable of assembling a human genome on a desktop computer in a day. The output of Minia is a set of contigs. Minia produces results of similar contiguity and accuracy to other de Bruijn assemblers (e.g. Velvet).

Remarque PDF and Citation R. Chikhi, G. Rizk. Space-efficient and exact de Bruijn graph representation based on a Bloom filter, WABI 2012

Run Unix # minia fasta_file kmer_size min_abundance estimated_genome_size prefix

Run Web #

Version	MAJ	mira
4.0	2014-11-18	mira	Download	Doc

MIRA is a Whole Genome Shotgun and EST Sequence Assembler for Sanger, 454 and Solexa / Illumina. It can perform Hybrid de-novo assemblies as well as SNP and mutations discovery for mapping assemblies.

Remarque

Run Unix # mira

Run Web #

Version	MAJ	miranda
3.3a	2014-10-29	miranda	Download	Doc

miRanda is an algorithm for the detection of potential microRNA target sites in genomic sequences. miRanda reads RNA sequences (such as microRNAs) from file1 and genomic DNA/RNA sequences from file2. Both of these files should be in FASTA format.

Remarque

Run Unix # miranda file1 file2 [options..]

Run Web #

Version	MAJ	miRDeep2
2.0.0.7	2015-02-23	miRDeep2	Download	Doc

documentation miRDeep2 documentation What is miRDeep2 miRDeep2 is a software package for identification of novel and known miRNAs in deep sequencing data. Furthermore, it can be used for miRNA expression profiling across samples. Last, a new module for preprocessing of raw Illumina sequencing data produces files for downstream analysis with the miRDeep2 or quantifier module. Colorspace sequencing data is currently not supported by the preprocessing module but it is planed to be implemented. Preprocessing is performed with the mapper.pl script. Quantification and expression profiling is done by the quantifier.pl script. miRNA identification is done by the miRDeep2.pl script.

Remarque

Run Unix # miRDeep2.pl

Run Web #

Version	MAJ	MIReNA
2.0	2012-09-05	MIReNA	Download	Doc

Remarque

Run Unix #

Run Web #

Version	MAJ	mktrace
0.001017	2005-07-30	mktrace	Download	Doc

This program reads a FASTA file and creates a chromatogram stored in an SCF file and a corresponding phd file. The SCF file contains minimal information at this time. If a quality value FASTA file exists, mktrace uses those quality values in the phd file, otherwise it sets the quality values to the pre-determined values. mktrace produces a fake trace that could be used by Phred/Phrap packages.

Remarque Fait parti du package consed

Run Unix # mktrace G0771A003_114.s1.seq G0771A003_114.s1.scf

Run Web #

Version	MAJ	mmseq
0.11.2	2012-11-20	mmseq	Download	Doc

MMSEQ: haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads pipeline The flowchart to the right depicts the MMSEQ pipeline for obtaining expression estimates from RNA-seq data. There are two routes, with starting points labelled A and B. Route A is quite fast and straightforward to run and uses pre-existing transcript sequences for alignment. Route B requires more time, as it involves the creation of custom transcript sequences based on the data.

Remarque Please cite Turro et al. 2011 (Genome Biology) if you use MMSEQ in your work. http://dx.doi.org/10.1186/gb-2011-12-2-r13

Run Unix # mmseq [OPTIONS...] hits_file output_base

Run Web #

Version	MAJ	MMSEQ
1.0.2	2013-09-02	MMSEQ	Download	Doc

MMSEQ: haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads

Remarque Please cite Turro et al. 2011 (http://dx.doi.org/10.1186/gb-2011-12-2-r13)

Run Unix # mmseq / bam2hits

Run Web #

Version	MAJ	MMTK
2.7.9	2016-06-21	MMTK	Download	Doc

The Molecular Modelling Toolkit (MMTK) is an Open Source program library for molecular simulation applications. In addition to providing ready-to-use implementations of standard algorithms, MMTK serves as a code basis that can be e

Remarque

Run Unix #

Run Web #

Version	MAJ	MOCAT
1.1	2012-08-01	MOCAT	Download	Doc

MOCAT is a package for analyzing metagenomics datasets. Currently MOCAT supports Illumina single- and paired-end reads in raw FastQ format.

Remarque Jens Roat Kultima & Shinichi Sunagawa (Bork Group, EMBL)

Run Unix # MOCAT.pl -sf|sample_file 'FILE' [Pipeline, Statistics, & Additional Options]

Run Web #

Version	MAJ	modelgenerator
0.85	2011-03-15	modelgenerator	Download	Doc

ModelGenerator is a model selection program that selects optimal amino acid and nucleotide substitution models from Fasta or Phylip alignments. ModelGenerator supports 56 nucleotide and 96 amino acid substitution models.

Remarque

Run Unix # modelgenerator

Run Web #

Version	MAJ	modeller
9.16	2016-01-21	modeller	Download	Doc

MODELLER is used for homology or comparative modeling of protein three-dimensional structures (1,2). The user provides an alignment of a sequence to be modeled with known related structures and MODELLER automatically calculates a model containing all non-hydrogen atoms.

Remarque

Run Unix # usage: mod9.16 script [...]

Run Web #

Version	MAJ	modeltest
3.7	2006-11-20	modeltest	Download	Doc

Modeltest est un programme qui évalue différents tests de rapport de vraisemblance de modèles d'évolution dans le but de choisir le modèle le plus approprié aux données.

Remarque

Run Unix # modeltest

Run Web #

Version	MAJ	mole
1.2	2011-05-12	mole	Download	Doc

Program MOLE is an universal toolkit for rapid and fully automated location and characterization of channels, tunnels and pores in molecular structures. The core of MOLE algorithm is a Dijsktra path search algorithm, which is applied to a Voronoi mesh. MOLE is a powerful software (overcomming some limitations of CAVER tool) for exploring large molecular channels, complex networks of channels and molecular dynamics trajectories (AMBER ascii traj and parm7 are supported) in which analysis of a large number of snapshots is required.

Remarque

Run Unix # Mole.exe

Run Web #

Version	MAJ	molscript
2.1.2	2005-07-04	molscript	Download	Doc

MolScript is a program for displaying molecular 3D structures, such as proteins, in both schematic and detailed representations.

Remarque

Run Unix # molscript

Run Web #

Version	MAJ	MOSAIK assembler
1.1.0021	2011-06-06	MOSAIK assembler	Download	Doc

MOSAIK is a reference-guided assembler comprising of four main modular programs: * MosaikBuild * MosaikAligner * MosaikSort * MosaikAssembler. MosaikBuild converts various sequence formats into Mosaik’s native read format. MosaikAligner pairwise aligns each read to a specified series of reference sequences. MosaikSort resolves paired-end reads and sorts the alignments by the reference sequence coordinates. Finally, MosaikAssembler parses the sorted alignment archive and produces a multiple sequence alignment which is then saved into an assembly file format.

Remarque

Run Unix # MosaikAligner MosaikAssembler MosaikBuild MosaikCoverage MosaikDupSnoop MosaikJump MosaikMerge MosaikSort MosaikText

Run Web #

Version	MAJ	mothur
1.34,4	2014-12-23	mothur	Download	Doc

The goal of mothur is to have a single resource to analyze molecular data that is used by microbial ecologists. Many of these tools are available elsewhere as individual programs and as scripts, which tend to be slow or as web utilities, which limit your ability to analyze your data. mothur offers the ability to go from raw sequences to the generation of visualization tools to describe α and β diversity. Examples of each command are provided within their specific pages, but several users have provided several analysis examples, which use these commands. An exhaustive list of the commands found in mothur is available within the commands category index.

Remarque

Run Unix # mothur

Run Web #

Version	MAJ	MPscan
-	2013-08-26	MPscan	Download	Doc

MPscan: fast localisation of multiple reads in genomes

Remarque Please cite THIS paper if you use MPscan. Rivals E., Salmela L., Kiiskinen P., Kalsi P., Tarhio J.Lecture Notes in BioInformatics (LNBI), Springer-Verlag, Vol. 5724, p. 246-260, 2009.

Run Unix # mpscan -h

Run Web #

Version	MAJ	MrBayes
3.2.5	2015-10-20	MrBayes	Download	Doc

MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.

Remarque

Run Unix # mb

Run Web #

Version	MAJ	mrFAST
2.6.0.0	2012-02-01	mrFAST	Download	Doc

mrFAST is a read mapper that is designed to map short reads to reference genome with a special emphasis on the discovery of structural variation and segmental duplications. mrFAST maps short reads with respect to user defined error threshold, including indels up to 4+4 bp. This manual, describes how to choose the parameters and tune mrFAST with respect to the library settings. mrFAST is designed to find 'all' mappings for a given set of reads, however it can return one "best" map location if the relevant parameter is invoked. NOTE: mrFAST is developed for Illumina, thus requires all reads to be at the same length. For paired-end reads, lengths of mates may be different from each other, but each "side" should have a uniform length.

Remarque Personalized copy number and segmental duplication maps using next-generation sequencing. Can Alkan, Jeffrey M. Kidd, Tomas Marques-Bonet, Gozde Aksay, Francesca Antonacci, Fereydoun Hormozdiari, Jacob O. Kitzman, Carl Baker, Maika Malig, Onur Mutlu, S. Cenk Sahinalp, Richard A. Gibbs, Evan E. Eichler. Nature Genetics, Oct, 41(10):1061-1067, 2009. Table of Contents Sample Set General Indexing Single Genome Mode Batch Mode Mapping Single-end Reads - Single Mode Single-end Reads - Batch Mode Paired-end Reads Discordant Paired-end Reads Output Format Sample Set A sample genome FASTA file, with simulated reads and a command line to map in paired-end mode is supplied. Please download the sample set. General Please download the latest version from our download page and then unzip the downloaded file. Run 'make' to build mrFAST. mrFAST generates an index of the reference genome(s) and maps the reads to reference genome. Requirements: zlib for the ability to read compressed FASTQ and write compressed SAM files. C compiler (mrFAST is developed with gcc versions > 4.1.2) Building: On Unix/Linux systems, we recommend using GNU gcc version > 4.1.2 as your compiler and type 'make' to build. Example: linux> make gcc -c -O3 baseFAST.c -o baseFAST.o gcc -c -O3 CommandLineParser.c -o CommandLineParser.o gcc -c -O3 Common.c -o Common.o gcc -c -O3 HashTable.c -o HashTable.o gcc -c -O3 MrFAST.c -o MrFAST.o gcc -c -O3 Output.c -o Output.o gcc -c -O3 Reads.c -o Reads.o gcc -c -O3 RefGenome.c -o RefGenome.o gcc baseFAST.o CommandLineParser.o Common.o HashTable.o MrFAST.o Output.o Reads.o RefGenome.o -o mrFAST -lz -lm rm -rf *.o Parallelization: The best way to optimize mrFAST is to split the reads into chunks that fit into the memory of the cluster nodes, and implement an MPI wrapper in an embarrassingly parallel fashion. We recommend the following criteria to split the reads: Single End Mode: The number of reads should be approximately ((M-600)/(4*L)) million where M is the size of the memory for the cluster node (in megabytes) and L is the read length. If you have more nodes, you can make the chunks smaller to use the nodes efficiently. For example, if the library length is 50bp and the memory of nodes is 2 GB, each chunk should contain (2000-600)/(4*50)= 7 million reads. Paired End Mode: The number of reads in each file should not exceed 1 million (500,000 pairs), however chunk size of 500,000 reads (250,000 pairs) is recommended. To see the list of options, use "-h" or "--help". To see the version of mrFAST, user "-v" or "--version". Indexing mrFAST's indices can be generated in two modes (single, batch). In single mode, mrFAST indexes a fasta file (which may contain one or more reference genomes) while in batch mode it indexes a set of fasta files. By default mrFAST uses the window size of 12 characters to generate its index. Please be advised that if you do not choose the window size carefully, you will lose sensitivity. How to choose the right window size: For a given read length (l) and error threshold (e), the window size is floor(l/(e+1)). For example if the reads length is 36 and the maximum number of mismatches allowed is 2, the window size is 12. if your calculated window size is greater than default, you can use the default window size without losing the sensitivity. For example, for the read length of 64 and error threshold of 2, the windows size should be 21. You can use the default window size 12. However you cannot use 12 as window size for read length of 30 and error threshold of 2. Single Genome Mode: To index a reference genome like "refgen.fasta" run the following command: $>./mrfast --index refgen.fasta Upon the completion of the indexing phase, you can find "refgen.fasta.index" in the same directory as "refgen.fasta". mrFAST uses a window size of 12 (default) to make the index of the genome, this windows size can be modified with "--ws". There is a restriction on the maximum of the window size as the window size directly affects the memory usage. $>./mrfast --index refgen.fasta --ws 13 Batch Mode In batch mode, mrFAST gets a list of reference files and generates the index for each one of them. Similar to single mode, you can specify a different window size for indexing. $>./mrfast -b --index fasts.list --ws 13 Mapping mrFAST can map single-end reads and paired-end reads to a reference genome. mrFAST can map in either single or batch mode. In single mode, it only maps to one index. In batch mode, it maps to a list of indices. mrFAST supports both fasta and fastq formats. Single-end Reads - Single Mode To map single reads to a reference genome in single mode, run the following command. Use "--seq" to specify the input file. refgen.fa and refgen.fa.index should be in the same folder. You can load a multi-sequence FASTA file as the reference genome. $>./mrfast --search refgen.fa --seq reads.fastq The reported locations will be saved into "output" by default. If you want to save it somewhere else, use "-o" to specify another file. mrFAST can report the unmapped reads in fasta/fastq format. $>./mrfast --search refgen.fasta --seq reads.fastq -o my.map By default, mrFAST reports all the locations per read. If you need one "best" mapping add the "--best" parameter to the command line: $>./mrfast --search refgen.fasta --seq reads.fastq -e 3 --best Single-end Reads - Batch Mode (Note: deprecated after version 2.1.0.6) In batch mode, mrFAST uses a list of indices to find the mappings of the reads. "index.list" should contain the list of fasta files. $>./mrfast -b --search index.list --seq reads.fastq Paired-end Reads To map paired-end reads, use "--pe" option. The mapping can be done in single/batch mode. If the reads are in two different files, you have to use "--seq1/--seq2" to indicate the files. If the reads are interleaved, use "--seq" to indicated the file. The distance allowed between the paired-end reads should be specified with "--min" and "--max". "--min" and "--max" specify the minmum and maximum of the inferred size (the distance between outer edges of the mapping mates). $>./mrfast --search refgen.fasta --pe --seq reads.fastq --min 150 --max 250 Discordant Mapping mrFAST can report the discordant mapping for use of Variation Hunter. The --min and --max optiopns will define the minimum and maximum inferred size for concordant mapping. This is enabled by default since version 2.1.0.6 $>./mrfast --search refgen.fasta --pe --discordant-vh --seq reads.fastq --min 50 --max 75 Parameters General Options: -v|--version Shows the current version. -h Shows the help screen. Indexing Options: --index [file] Generate an index from the specified fasta file. -b Indicates the indexing will be done in batch mode. The file specified in --search should contain the list of fasta files. (Note: deprecated after version 2.1.0.6) -ws [int] Set window size for indexing (default:12 max:14). Searching Options: --search [file] Search the specified genome. Index file should be in same directory as the fasta file. -b Indicates the mapping will be done in batch mode. The file specified in --search should contain the list of fasta files. (Note: deprecated after version 2.1.0.6) --pe Search will be done in paired-end mode --mp Search will be done in matepair mode --seq [file] Input sequences in fasta/fastq format [file]. If pairend reads are interleaved, use this option. --seq1 [file] Input sequences in fasta/fastq format [file] (First file). Use this option to indicate the first file of paired-end reads --seq2 [file] Input sequences in fasta/fastq format [file] (Second file). Use this option to indicate the second file of paired-end reads. -o [file] Output of the mapped sequences (SAM format). The default is "output". -u [file] FASTA/FASTQ file for the unmapped sequences. The default is "unmapped". -e [int] Maximum allowed edit distance (default 4% of the read length). Note that although the current version is limited with up to 4+4 indels, it supports any number of substitution errors. --min [int] Min inferred distance allowed between two pairend sequences. --max [int] Max inferred distance allowed between two pairend sequences. --discordant-vh To return all discordant map locations ready for the Variation Hunter program, and OEA map locations ready for the NovelSeq. --best Return "best" location only (single-end mode). --seqcomp Indicates that the input sequences are compressed (gz). --outcomp Indicates that output file should be compressed (gz). --maxoea [int] Max number of One End Anchored (OEA) returned for each read pair. Minimum of 100 is recommendded for NovelSeq use. --maxdis [int] Max number of discordant map locations returned for each read pair. --crop [int] Crop the input reads at position [int]. --sample [string] Sample name to be added to the SAM header (optional). --rg [string] Read group ID to be added to the SAM header (optional). --lib [string] Library name to be added to the SAM header (optional). Output Files Single-End Mode: In the single-end mode mrFAST will generate two files as specified by the "-o" and "-u" parameters. Default filenename if the "-o" parameter is not specified is "output"; and default filename for the "-u" parameter is "unmapped". output file ("-o"): Contains the map locations of the sequences in the specified genome in SAM format. mrFAST returns all possible map locations within the given edit distance ("-e") by default. If the "--best" parameter is invoked, then it will select one "best" location that has the minimum edit distance to the genome. unmappped file ("-u"): Contains the unmapped reads in FASTQ or FASTA format, depending on the format of the input sequences. Paired-End and Matepair Modes: In paired-end and matepair modes, mrFAST will generate a SAM file in the paired-end mode that will store best mapping locations while utilizing the paired-end span information. In addition, it will generate a DIVET file and and OEA file (SAM format). See below: output file ("-o"): Contains the map locations of the sequences in the specified genome in SAM format. This file will include: If a read pair can be mapped concordantly, the "best" (minimum total edit distance and minimum differential from the average span) map location for the pair. If the read pair can not be mapped concordantly, again, the "best" (minimum total edit distance and minimum differential from the average span) map location for the pair. unmapped file ("-u"): Contains the orphan (both ends unmapped) reads in FASTQ or FASTA format, depending on the format of the input sequences. output.DIVET.vh file ("-o" option changes the prefix "output"): This file includes all possible map locations for the read pairs that cannot be concordantly mapped. This file can be loaded by VariationHunter tool for structural variation discovery. output_OEA file: Contains the OEA (One-End-Anchored) reads (paired-end reads where only one read can be mapped to the genome). The output is in SAM format, contains the map location of read that can be mapped to the genome. The unmapped reads of an OEA read pair are not reported in separate lines; instead the sequence and quality information is given in the line that specifies the map location of the mapped read. We use optional fields NS and NQ to specify the unmapped sequence and unmapped quality information. This file can be loaded by NovelSeq tool for novel sequence discovery, however format conversion might be required; please see the NovelSeq documentation. NOTE: mrFAST will report many (up to 100 by default) possible map locations for the "mapped" read of OEA matepais. This will generate a large file due to repeats and duplications. This file can be limited through the --maxoea parameter (version 2.1.0.0 and above). Output Format mrFAST mapping output format is in SAM format. For detail about the definition of the fields please refer to SAM Manual. We have not implemented "MQUAL" field yet. All locations of discordant paired-end reads will be reported in DIVET format as required by the VariationHunter package. Unmapped reads (or, "orphan" read pairs in the PE mode) will be outputted in FASTQ or FASTA format, depending on the input sequence file format.

Run Unix # mrfast [options]

Run Web #

Version	MAJ	mrsFAST
2.5.0.4	2012-02-01	mrsFAST	Download	Doc

mrsFAST is a cache oblivious mapper that is designed to map short reads to reference genome. mrsFAST maps short reads with respect to user defined error threshold. In this manual, we will show how to choose the parameters and tune mrsFAST with respect to the library settings. mrsFAST is designed to find 'all' the mappings for a given set of reads.

Remarque

Run Unix # mrsFAST -h

Run Web #

Version	MAJ	MuGeN
20060919	2007-01-05	MuGeN	Download	Doc

MuGeN (Multi-Genome Navigator) est un outil interactif permettant une exploration dans plusieurs gÃ©omes annotÃ©s complets par des rÃ©sultats d'analyse in silico. Il dispose Ã©galement d'un mode d'exÃ©cution en mode batch lui permettant de servir de gÃ©nÃ©rateur d'images Ã divers formats. Ce mode de fonctionnement le prÃ©dispose Ã Ãªtre intÃ©grÃ© Ã des sites Web pour l'affichage de cartes physiques annotÃ©es. MuGeN is a software package for the visual exploration of multiple annotated genome portions. It is capable of simultaneously displaying genome portions loaded from various sources both local and remote and mix these with analysis result plots. It can also be used to generate images of these displays in a wide range of formats (PNG, PostScript, IMAP, XFig).

Remarque La commande : mugenv est suffisante pour lancer l'environnement graphique, mais elle ne charge aucun gÃ©nome et les fenÃªtres paraÃ®tront donc un peu vides. Plus frÃ©quemment, on fera : mugenv /chemin/vers/un/fichier/genbank.gbk pour explorer le fichier en question. Les numÃ©ros de version de MuGeN correspondent Ã leur date de sortie, et sont affichÃ©es dans la barre titre de sa fenÃªtre graphique. La derniÃ¨re en date est la 20040726 qui est celle installÃ©e sur topaze et adm.

Run Unix # mugenv ou mugenv /chemin/vers/un/fichier/genbank.gbk

Run Web #

Version	MAJ	Mugsy
1.2.3	2013-07-19	Mugsy	Download	Doc

Mugsy is a multiple whole genome aligner. Mugsy uses Nucmer for pairwise alignment, a custom graph based segmentation procedure for identifying collinear regions, and the segment-based progressive multiple alignment strategy from Seqan::TCoffee. Mugsy accepts draft genomes in the form of multi-FASTA files and does not require a reference genome.

Remarque To cite Mugsy, use: Angiuoli SV and Salzberg SL. Mugsy: Fast multiple alignment of closely related whole genomes. Bioinformatics 2011 27(3):334-4

Run Unix # mugsy [-p output prefix] multifasta_genome1.fsa multifasta_genome2.fsa ... multifasta_genomeN.fsa

Run Web #

Version	MAJ	multalin
5.4.1	2002-04-04	multalin	Download	Doc

This software will allow you to align simultaneously several biological sequences.

Remarque

Run Unix # ma

Run Web # http://www.toulouse.inra.fr/multalin.html

Version	MAJ	multiqc
0.8	2016-11-09	multiqc	Download	Doc

summarize analysis results for multiple tools and samples in a single report

Remarque

Run Unix # multiqc_env

Run Web #

Version	MAJ	mummer
3.23	2015-03-02	mummer	Download	Doc

MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. For example, MUMmer 3.0 can find all 20-basepair or longer exact matches between a pair of 5-megabase genomes in 13.7 seconds, using 78 MB of memory, on a 2.4 GHz Linux desktop computer. MUMmer can also align incomplete genomes; it can easily handle the 100s or 1000s of contigs from a shotgun sequencing project, and will align them to another set of contigs or a genome using the NUCmer program included with the system. If the species are too divergent for a DNA sequence alignment to detect similarity, then the PROmer program can generate alignments based upon the six-frame translations of both input sequences.

Remarque

Run Unix # mummer [options]

Run Web #

Version	MAJ	muscle
3.8.31	2014-08-24	muscle	Download	Doc

MUSCLE stands for MUltiple Sequence Comparison by Log-Expectation.

Remarque

Run Unix # muscle -in -out

Run Web #

Version	MAJ	mview
1.47.3	2003-05-16	mview	Download	Doc

MView is a tool for converting the results of a sequence database search (BLAST, FASTA, etc.) into the form of a coloured multiple alignment of hits stacked against the query. Alternatively, an existing multiple alignment (MSF, PIR, CLUSTALW, etc.) can be processed. In either case, the output is simply HTML, so the result is platform independent and does not require a separate application or applet to be loaded. MView is NOT a multiple alignment program, nor is it a general purpose alignment editor.

Remarque

Run Unix # mview [options] [file...]

Run Web #

Version	MAJ	naccess
2.1.1		naccess	Download	Doc

Remarque

Run Unix # naccess

Run Web #

Version	MAJ	ncoils
		ncoils	Download	Doc

Remarque

Run Unix # ncoils

Run Web #

Version	MAJ	nesoni
0.40	2011-01-31	nesoni	Download	Doc

Nesoni focusses on analysing the alignment of reads to a reference genome. We use the SHRiMP read aligner, as it is able to detect small insertions and deletions in addition to SNPs. Nesoni can call a consensus of read alignments, taking care to indicate ambiguity. This can then be used in various ways: to determine the protein level changes resulting from SNPs and indels, to find differences between multiple strains, or to produce n-way comparison data suitable for phylogenetic analysis in SplitsTree4. Alternatively, the raw counts of bases at each position in the reference seen in two different sequenced strains can compared using Fisher's Exact Test.

Remarque

Run Unix # nesoni

Run Web #

Version	MAJ	netlogo
4.1	2010-03-16	netlogo	Download	Doc

NetLogo is a programmable modeling environment for simulating natural and social phenomena. It was authored by Uri Wilensky in 1999 and has been in continuous development ever since at the Center for Connected Learning and Computer-Based Modeling.

Remarque

Run Unix # netlogo

Run Web #

Version	MAJ	newbler
2.6	2011-07-06	newbler	Download	Doc

Newbler is a package of three data analysis applications made by Roche 454 : the GS De Novo Assembler (with or without contig scaffolding using Paired End reads), the GS Reference Mapper, and the GS Amplicon Variant Analyzer (AVA). An additional application, the GS Run Browser, is an interactive Run browser/ troubleshooting tool which displays graphically the images, some intermediate data, and various output metrics from a sequencing Run. The software package also includes the SFF Tools commands for handling and using the data files (called Standard Flowgram Format or SFF files) that hold the sequencing trace data.

Remarque

Run Unix # newbler

Run Web #

Version	MAJ	newicktopdf
-	2010-08-11	newicktopdf	Download	Doc

Convertit un fichier contenant les caractéristiques d'un arbre au format newick en un fichier pdf (programme du groupe Manolo Gouy à Lyon).

Remarque

Run Unix # newicktopdf (produit le même fichier suffixé pdf)

Run Web #

Version	MAJ	NGSToolsMIG
1.0	2011-02-04	NGSToolsMIG	Download	Doc

Tools developed in MIG laboratory to help in the process of Next generation Sequencing Data analysis : quality control, mapping, assembly, global statistics, etc. ///////// adaptiveTrim.pl ///////// alignmentStatistics.pl ///////// contigsExtractionOnLength.pl ///////// fastqQualityConverter.pl ///////// gbk2Fasta.pl ///////// globalTrim.pl ///////// multiFasta2Fasta.pl ///////// show2Fasta.pl ///////// unmappedReadsExtraction.pl ///////// (Cf. Doc)

Remarque

Run Unix # ex.: contigsExtractionOnLength.pl -i fichier.fasta -do /Dir1/Dir11/Dir111/ -po fichierFiltre -l 1500 -r

Run Web #

Version	MAJ	njplot
2009	2013-08-27	njplot	Download	Doc

NJplot is a tree drawing program able to draw any binary tree expressed in the standard phylogenetic tree format (e.g., the format used by the PHYLIP package). NJplot is especially convenient for rooting the unrooted trees obtained from parsimony, distance or maximum likelihood tree-building methods.

Remarque

Run Unix # njplot

Run Web #

Version	MAJ	novoalign
2.08.01	2013-08-20	novoalign	Download	Doc

Novoalign is an aligner for single-ended and paired-end reads from the Illumina Genome Analyser. Novoalign finds global optimum alignments using full Needleman-Wunsch algorithm with affine gap penalties.

Remarque

Run Unix # novoalign [options]

Run Web #

Version	MAJ	novosnp
2.0.1		novosnp	Download	Doc

novoSNP is a program that will help you find variations (SNPs and short INDELs) in resequencing projects. It takes a reference sequence and a number of sequencing trace files as input, and generates a list of possible variations with a quality score. novoSNP allows you to easily filter, sort and check the variations found visually and keep track of your verifications.

Remarque

Run Unix # novosnp2.0.1

Run Web #

Version	MAJ	nupack
3.0	2010-12-01	nupack	Download	Doc

NUPACK is a growing software suite for the analysis and design of nucleic acid systems. The package currently enables thermodynamic analysis of dilute solutions of interacting nucleic acid strands, and sequence design for complexes of nucleic acid strands intended to adopt a target secondary structure at equilibrium. NUPACK algorithms are formulated in terms of nucleic acid secondary structure. In most cases, pseudo-knots are excluded from the structural ensemble. Much of this software may be conveniently run through the NUPACK web server at http://www.nupack.org (Zadeh et al., 2010b).

Remarque

Run Unix #

Run Web #

Version	MAJ	oases
0.2.08	2014-10-03	oases	Download	Doc

De novo transcriptome assembler for very short reads

Remarque

Run Unix # oases

Run Web #

Version	MAJ	OBO-Edit
1.101	2008-12-05	OBO-Edit	Download	Doc

Obo-Edit est un éditeur d'ontologie dans le format obo. Le format obo a été défini originellement pour Gene Ontology et se répand dans la communauté bioinformatique. Quelques dizaines d'ontologies en format obo sont disponibles et éditables par Obo-Edit.

Remarque

Run Unix # oboedit

Run Web #

Version	MAJ	ocount
0.4	2008-02-07	ocount	Download	Doc

OCOUNT is a fast C command-line utility that has been written in the course of TETRA's development. It counts oligonucleotides in DNA sequences and computes Markov-Model-based z-scores.

Remarque

Run Unix # ocount

Run Web #

Version	MAJ	opera
2.0	2015-03-02	opera	Download	Doc

Opera (Optimal Paired-End Read Assembler) is a sequence assembly program.

Remarque To cite Opera please use the following citation: Song Gao, Wing-Kin Sung, Niranjan Nagarajan. Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. Journal of Computational Biology, Sept. 2011, doi:10.1089/cmb.2011.0170.

Run Unix # opera OR opera

Run Web #

Version	MAJ	orthomcl
2.0.2	2012-01-31	orthomcl	Download	Doc

OrthoMCl est un logiciel qui construit des clusters d'orthologue à partir de fichiers multifasta contenant des CDS.

Remarque Se placer dans le reépertoire où se trouvent les données et lancer : orthomcl.pl --mode 1 --fa_files "Ath.fa,Hsa.fa,Sce.fa"

Run Unix # orthomcl.pl

Run Web #

Version	MAJ	otterlace
52.11	2011-01-27	otterlace	Download	Doc

Otterlace is an interactive, graphical client, which uses a local acedb database with Zmap and perl/Tk tools to curate genomic annotation. Annotation is stored in an extended Ensembl schema (the "otter" database), which presents the annotator with contiguous regions of a chromosome. The acedb database provides local persistent storage, so that if the software or desktop machine crashes, reboots or is exited, the editing session can be recovered. Since all communication goes through the Sanger web server, annotators can work wherever there is a network connection.

Remarque

Run Unix # otterlace

Run Web #

Version	MAJ	paml
4.9	2015-03-30	paml	Download	Doc

PAML is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood. It is maintained and distributed for academic use free of charge by Ziheng Yang. ANSI C source codes are distributed for UNIX/Linux/Mac OSX, and executables are provided for MS Windows. PAML is not good for tree making. It may be used to estimate parameters and test hypotheses to study the evolutionary process, when you have reconstructed trees using other programs such as PAUP*, PHYLIP, MOLPHY, PhyML, RaxML, etc.

Remarque

Run Unix # #baseml (basemlg codeml pamp evolver yn00 chi2)

Run Web #

Version	MAJ	pandoc
1.9.4.1	2016-02-23	pandoc	Download	Doc

Pandoc is a free and open-source software document converter, widely used as a writing tool and as a basis for publishing workflows. Pandoc can convert documents in markdown, reStructuredText, textile, HTML, DocBook, LaTeX, MediaWiki markup, TWiki markup, OPML, Emacs Org-Mode, Txt2Tags, Microsoft Word docx, LibreOffice ODT, EPUB, or Haddock markup to HTML formats: XHTML, HTML5, and HTML slide shows using Slidy, reveal.js, Slideous, S5, or DZSlides. Word processor formats: Microsoft Word docx, OpenOffice/LibreOffice ODT, OpenDocument XML Ebooks: EPUB version 2 or 3, FictionBook2 Documentation formats: DocBook, GNU TexInfo, Groff man pages, Haddock markup Page layout formats: InDesign ICML Outline formats: OPML TeX formats: LaTeX, ConTeXt, LaTeX Beamer slides PDF via LaTeX Lightweight markup formats: Markdown (including CommonMark), reStructuredText, AsciiDoc, MediaWiki markup, DokuWiki markup, Emacs Org-Mode, Textile Custom formats: custom writers can be written in lua.

Remarque

Run Unix # pandoc [OPTIONS] [FILES]

Run Web #

Version	MAJ	PatScan
	2007-12-12	PatScan	Download	Doc

PatScan is a pattern matcher which searches protein or nucleotide (DNA, RNA, tRNA etc.) sequence archives for instances of a pattern which you input.

Remarque patscan pat_file < input_file

Run Unix # patscan

Run Web #

Version	MAJ	pfam_scan.pl
	2012-11-29	pfam_scan.pl	Download	Doc

pfam_scan.pl - search protein fasta sequences against the Pfam library of HMMs.

Remarque

Run Unix # pfam_scan.pl -fasta -dir /usr/local/genome/PfamScan/databases

Run Web #

Version	MAJ	pftools
2.3.4	2004-04-10	pftools	Download	Doc

Le paquetage pftools est une collection de programmes expérimentaux qui permet de manipuler le format généralisé de profils et implémente les méthodes de recherche de PROSITE. Les commandes accessibles sont les suivantes : gtop, pfsearch, pfscan, psa2msa, pfmake, pfw, ptoh, htop, pfscale, pftof.

Remarque

Run Unix # pfsearch

Run Web # pfsearch

Version	MAJ	phast
1.3	2013-07-23	phast	Download	Doc

PHAST is a freely available software package for comparative and evolutionary genomics. It consists of about half a dozen major programs, plus more than a dozen utilities for manipulating sequence alignments, phylogenetic trees, and genomic annotations (see left panel). For the most part, PHAST focuses on two kinds of applications: the identification of novel functional elements, including protein-coding exons and evolutionarily conserved sequences; and statistical phylogenetic modeling, including estimation of model parameters, detection of signatures of selection, and reconstruction of ancestral sequences. It consists of over 60,000 lines of C code.

Remarque

Run Unix # phast

Run Web #

Version	MAJ	phd2fasta
0.990622.f	2005-07-22	phd2fasta	Download	Doc

Phd2fasta reads phd files and writes sequence and quality value FASTA files, which phrap and cross_match need as input. Phred and consed write sequence and quality value information in 'phd' output files. A phd file contains information in a header, the called bases, the base quality values, and the base call trace locations.

Remarque

Run Unix # phd2fasta -id ../phd_dir -os fasta_seq -oq fasta_seq.qual

Run Web #

Version	MAJ	phenix
1.8.1	2012-12-12	phenix	Download	Doc

PHENIX is a software suite for the automated determination of macromolecular structures using X-ray crystallography and other methods.

Remarque Citing PHENIX: PHENIX: a comprehensive Python-based system for macromolecular structure solution. P. D. Adams, P. V. Afonine, G. Bunkóczi, V. B. Chen, I. W. Davis, N. Echols, J. J. Headd, L.-W. Hung, G. J. Kapral, R. W. Grosse-Kunstleve, A. J. McCoy, N. W. Moriarty, R. Oeffner, R. J. Read, D. C. Richardson, J. S. Richardson, T. C. Terwilliger and P. H. Zwart. Acta Cryst. D66, 213-221 (2010).

Run Unix # phenix

Run Web #

Version	MAJ	phobius
1.01	2010-03-23	phobius	Download	Doc

A combined transmembrane topology and signal peptide prediction method.

Remarque http://www.ncbi.nlm.nih.gov/pubmed/15111065?dopt=Abstract

Run Unix # phobius.pl [options] [infile]

Run Web #

Version	MAJ	phrap
1.090518	2010-01-18	phrap	Download	Doc

phrap is a program for assembling shotgun DNA sequence data. Among other features, it allows use of the entire read and not just the trimmed high quality part, it uses a combination of user-supplied and internally computed data quality information to improve assembly accuracy in the presence of repeats, it constructs the contig sequence as a mosaic of the highest quality read segments rather than a consensus, it provides extensive assembly information to assist in trouble-shooting assembly problems, and it handles large datasets.

Remarque Marche avec cross_match et swat, loco et cluster La version manyreads permet de lire plus de trace.

Run Unix # phrap

Run Web #

Version	MAJ	phrapview
		phrapview	Download	Doc

visualisateur des resultats d'assemblage issus de phraps

Remarque

Run Unix #

Run Web #

Version	MAJ	phred
020425.c	2005-07-27	phred	Download	Doc

Phred reads DNA sequencer trace data, calls bases, assigns quality values to the bases, and writes the base calls and quality values to output files. Phred can read trace data from chromatogram files in the SCF, ABI, and ESD formats. It automatically determines the file format, and whether the chromatogram file was compressed using gzip, bzip2, or UNIX compress. After calling bases, phred writes the sequences to files in either FASTA format, the format suitable for XBAP, PHD format, or the SCF format. Quality values for the bases are written to FASTA format files or PHD files, which can be used by the phrap sequence assembly program in order to increase the accuracy of the assembled sequence. phred, phrap, consed are Unix programs that work as a group for analysis of new DNA sequences. They do the following: phred: Base calling and quality assignments phrap: Contig formation and new quality assignments consed: Visual X-Windows graphic interface, to view and edit alignments and contigs, and to view the original traces

Remarque

Run Unix # phred ou phredPhrap

Run Web #

Version	MAJ	phredPhrap
030415		phredPhrap	Download	Doc

It runs phred on all *new* reads (reads for which there is no phd file. It runs determineReadTypes.perl so consed, autofinish, and phrap will understand your read naming convention Then it runs crossmatch to screen them for vector. Then it runs phd2fasta to create 2 fasta files (one containing read bases and one containing read quality. These are of the highest versions of each read (in case any editing has been done). It runs phrap It runs transferConsensusTags to transfer any consensus tags from the newest old ace file to the one phrap created in step 4 It runs tagRepeats.perl to tag any common repeats (such as ALU) that you want to have automatically tagged for the benefit of consed users. See README.txt "INSTALLING CONSED Typically, you just type: phredPhrap Within the project, there are 3 directories: chromat_dir (with the chromats), phd_dir (with the phd files) and edit_dir (with the ace files and other files). You type "phredPhrap" from within edit_dir.

Remarque Frontal pour la suite phred phrap

Run Unix # phrepPhrap

Run Web #

Version	MAJ	phusion
2.1c	2008-07-29	phusion	Download	Doc

Phusion Assembler --- Phusion is a software package for assembling genome sequences from whole genome shotgun(WGS) reads. The Phusion assembler takes WGS reads, mostly paired with known insert sizes, as input along with quality score assigned for each base and produces a set of supercontigs (scaffords) .

Remarque

Run Unix #

Run Web #

Version	MAJ	phylip-3.69
3.69	2013-02-21	phylip-3.69	Download	Doc

PHYLIP is a free package of programs for inferring phylogenies.

Remarque

Run Unix # dnaml

Run Web # http://bioweb.pasteur.fr/seqanal/phylogeny/phylip-uk.html

Version	MAJ	phylobayes
3.3f	2013-08-22	phylobayes	Download	Doc

PhyloBayes is a Bayesian Monte Carlo Markov Chain (MCMC) sampler for phylogenetic reconstruction and molecular dating using protein and nucleic acid alignments. Compared to other phylogenetic MCMC samplers, the main distinguishing feature of PhyloBayes is the use of nonparametric methods for modelling site-specific features of sequence evolution.

Remarque

Run Unix #

Run Web #

Version	MAJ	phylo_win
2.0	2007-06-28	phylo_win	Download	Doc

Phylo_win (programme du groupe Manolo Gouy à Lyon) Il offre une interface graphique pour la phylogénie.

Remarque

Run Unix # phylo_win

Run Web #

Version	MAJ	PhyML
3.1	2014-08-06	PhyML	Download	Doc

PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm to perform Nearest Neighbor Interchanges (NNIs), in order to improve a reasonable starting tree topology. Since the original publication (Guindon and Gascuel 2003), PhyML has been widely used (>1,250 citations in ISI Web of Science), due to its simplicity and a fair accuracy/speed compromise. In the mean time research around PhyML has continued. We designed an efficient algorithm to search the tree space using Subtree Pruning and Regrafting (SPR) topological moves (Hordijk and Gascuel 2005), and proposed a fast branch test based on an approximate likelihood ratio test (Anisimova and Gascuel 2006). However, these novelties were not included in the official version of PhyML, and we found that improvements were still needed in order to make them effective in some practical cases. PhyML 3.0 achieves this task. It implements new algorithms to search the space of tree topologies with user-defined intensity. A non-parametric, Shimodaira-Hasegawa-like branch test is also available. The program provides a number of new evolutionary models and its interface was entirely re-designed. We tested PhyML 3.0 on a large collection of real data sets to ensure that the new version is stable, ready-to-use and still reasonably fast and accurate.

Remarque

Run Unix # phyml

Run Web #

Version	MAJ	picard-tools
2.0.1	2016-02-22	picard-tools	Download	Doc

A set of Java command line tools for manipulating high-throughput sequencing data (HTS) data and formats. Picard is implemented using the HTSJDK Java library HTSJDK, supporting accessing of common file formats, such as SAM and VCF, used for high-throughput sequencing data.

Remarque

Run Unix # picard -h ou encore PicardCommandLine [-h]

Run Web #

Version	MAJ	pindel
0.2.5a8	2015-02-11	pindel	Download	Doc

Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.

Remarque Cite Pindel: Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009 Nov 1;25(21):2865-71. Epub 2009 Jun 26.

Run Unix # Usage: pindel -f -p [and/or -i bam_configuration_file] -c -o

Run Web #

Version	MAJ	PIPITS
1.4.0	2016-11-16	PIPITS	Download	Doc

PIPITS is an automated pipeline for analyses of fungal internal transcribed spacer (ITS) sequences from the Illumina sequencing platform. PIPITS is designed to work best on Bio-Linux (http://environmentalomics.org/bio-linux/) and Ubuntu. Unfortunately, it's NOT supported on Windows or a Mac If you are using Bio-Linux, most of the dependencies are already on Bio-Linux. Otherwise, you will have to set up the dependencies yourself. If you are using Ubuntu, then instructions on how to set up dependencies are described below (1.8).

Remarque Citation Hyun S. Gweon, Anna Oliver, Joanne Taylor, Tim Booth, Melanie Gibbs, Daniel S. Read, Robert I. Griffiths and Karsten Schonrogge, PIPITS: an automated pipeline for analyses of fungal internal transcribed spacer sequences from the Illumina sequencing platform, Methods in Ecology and Evolution, DOI: 10.1111/2041-210X.12399

Run Unix # pipits_env

Run Web #

Version	MAJ	plast
1.0	2009-06-08	plast	Download	Doc

PLAST : Parallel Local Alignment Search Tool

Remarque

Run Unix # plastall

Run Web #

Version	MAJ	platanus
1.2.1	2015-03-06	platanus	Download	Doc

Platanus is a de novo assembler designed to assemble high-throughput data. It can handle highly heterozygotic samples. The following is the assembly outline. First, it constructs contigs using the algorithm based on de Bruijn graph. Second, the order of contigs is determined according to paired-end (mate-pair) data and constructs scaffolds. Finally, paired-end reads localized on gaps in scaffolds are assembled and gaps are closed.

Remarque To reference the Platanus assembler, please cite : Kajitani R. et al. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Research 24:1384-95.

Run Unix # Usage: platanus Command [options] Command: assemble, scaffold, gap_close

Run Web #

Version	MAJ	plink
1.90	2015-02-24	plink	Download	Doc

PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Remarque For documentation, citation & bug-report instructions: http://pngu.mgh.harvard.edu/purcell/plink/

Run Unix # plink

Run Web #

Version	MAJ	polya_svm
2.2	2013-01-29	polya_svm	Download	Doc

This program takes a file containing DNA/RNA sequences in the FASTA format as input, and 1) makes prediction for putative mRNA polyadenylation sites [or poly(A) sites] and/or 2) generates results indicating the occurrences of different cis-element

Remarque

Run Unix # polya_svm.pl

Run Web #

Version	MAJ	polyphred
5.02		polyphred	Download	Doc

PolyPhred is a program that compares fluorescence-based sequences across traces obtained from different individuals to identify heterozygous sites for single nucleotide substitutions.

Remarque

Run Unix # polyphred

Run Web #

Version	MAJ	poretools
0.6.0	2016-11-30	poretools	Download	Doc

poretools: a toolkit for working with nanopore sequencing data from Oxford Nanopore. The MinION (TM) from Oxford Nanopore Technologies (ONT) is the first nanopore sequencer to be commercialised and is now available to early-access users. The MinION (TM) is a USB-connected, portable nanopore sequencer which permits real-time analysis of streaming event data. Currently, the research community lacks a standardized toolkit for the analysis of nanopore datasets.

Remarque

Run Unix # poretools_env

Run Web #

Version	MAJ	primer3
2.2.3	2011-01-25	primer3	Download	Doc

Primer3 is a complete rewrite of the original PRIMER programs(Primer 0.5), written by Steve Lincoln, Mark Daly, and EricsLander. See DIFFERENCES FROM EARLIER VERSIONS for a discussionsof how Primer3 differs from its predecessors, Primer 0.5 andsPrimer v2.ssPrimer3 picks primers for PCR reactions, considering as criteria:sso oligonucleotide melting temperature, size, GC content,s and primer-dimer possibilities,sso PCR product size,sso positional constraints within the source sequence, andsso miscellaneous other constraints.s

Remarque s

Run Unix # primer3_core [-format_output] [-2x_compat] [-strict_tags]

Run Web # http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi

Version	MAJ	prinseq
0.17.1	2013-08-20	prinseq	Download	Doc

PRINSEQ CAN BE USED TO FILTER, REFORMAT, OR TRIM YOUR GENOMIC AND METAGENOMIC SEQUENCE DATA. IT GENERATES SUMMARY STATISTICS OF YOUR $ GRAPHICAL AND TABULAR FORMAT.

Remarque

Run Unix # prinseq-lite.pl -h

Run Web #

Version	MAJ	probcons
1.12	2010-04-11	probcons	Download	Doc

PROBCONS is a novel tool for generating multiple alignments of protein sequences. Using a combination of probabilistic modeling and consistency-based alignment techniques, PROBCONS has achieved the highest accuracies of all alignment methods to date. On the BAliBASE benchmark alignment database, alignments produced by PROBCONS show statistically significant improvement over current programs, containing an average of 7% more correctly aligned columns than those of T-Coffee, 11% more correctly aligned columns than those of CLUSTAL W, and 14% more correctly aligned columns than those of DIALIGN.

Remarque Publications using the PROBCONS tool should cite:Do, C.B., Mahabhashyam, M.S.P., Brudno, M., and Batzoglou, S. 2005. PROBCONS: Probabilistic Consistency-based Multiple Sequence Alignment. Genome Research 15:330-340.

Run Unix # probcons [OPTION]... [MFAFILE]...

Run Web #

Version	MAJ	ProbeMatch
-	2010-05-11	ProbeMatch	Download	Doc

ProbeMatch is a sequence alignment program that finds sequence alignments for short DNA sequences ( 36-50 bp ). Unlike other programs such as eland and soap that perform ungapped alignment allowing up to 2 substitution, Probematch performs *gapped* alignment, allowing up to 3 errors including substitution, insertion, and deletion.

Remarque

Run Unix # probematch [options] ou # probematch --help

Run Web #

Version	MAJ	procheck
3.5.4	2007-05-21	procheck	Download	Doc

Remarque

Run Unix #

Run Web #

Version	MAJ	prodigal
2.60	2013-03-26	prodigal	Download	Doc

Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program developed at Oak Ridge National Laboratory and the University of Tennessee. Key features of Prodigal include: Speed: Prodigal is an extremely fast gene recognition tool (written in very vanilla C). It can analyze an entire microbial genome in 30 seconds or less. Accuracy: Prodigal is a highly accurate gene finder. It correctly locates the 3' end of every gene in the experimentally verified Ecogene data set (except those containing introns). It possesses a very sophisticated ribosomal binding site scoring system that enables it to locate the translation initiation site with great accuracy (96% of the 5' ends in the Ecogene data set are located correctly). Specificity: Prodigal's false positive rate compares favorably with other gene identification programs, and usually falls under 5%. GC-Content Indifferent: Prodigal performs well even in high GC genomes, with over a 90% perfect match (5'+3') to the Pseudomonas aeruginosa curated annotations. Metagenomic Version: Prodigal can run in metagenomic mode and analyze sequences even when the organism is unknown. Ease of Use: Prodigal can be run in one step on a single genomic sequence or on a draft genome containing many sequences. It does not need to be supplied with any knowledge of the organism, as it learns all the properties it needs to on its own.

Remarque Prodigal Reference: Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010 Mar 8;11(1):119. (Highly Accessed)

Run Unix # prodigal -h

Run Web #

Version	MAJ	prokka
1.10	2014-11-02	prokka	Download	Doc

Prokka is a software tool for the rapid annotation of prokaryotic genomes. A typical 4 Mbp genome can be fully annotated in less than 10 minutes on a quad-core computer, and scales well to 32 core SMP systems. It produces GFF3, GBK and SQN files that are ready for editing in Sequin and ultimately submitted to Genbank/DDJB/ENA.

Remarque

Run Unix # prokka [options]

Run Web #

Version	MAJ	PROSE
		PROSE	Download	Doc

The relational database PROSE contains protein sequences from Swissprot and Trembl

Remarque

Run Unix #

Run Web # http://genome.jouy.inra.fr/prose

Version	MAJ	ProtTest
3.0	2011-08-03	ProtTest	Download	Doc

PROTTEST3 is a high-performance computing program for selecting the model of protein evolution that best fits a given set of aligned sequences. This java program uses the Phyml program (for maximum likelihood calculations and optimization of parameters) and the PAL library for handling trees, and the ALTER library for reading aligment formats. Empirical models included are as WAG, LG, mtREV, Dayhoff, DCMut, JTT, VT, Blosum62, CpREV, RtREV, MtMam, MtArt, HIVb/HIVw and FLU, plus +I:invariable sites, +G: rate heterogeneity among sites and +F: observed amino acid frequencies. ProtTest uses the Akaike Information Criterion (AIC) and other statistics (AICc, BIC and DT) to find which of the candidate models best fits the data at hand. It also implements the calculation of model-averged phylogenies.

Remarque Citation: Darriba D, Taboada GL, Doallo R, Posada D. In press. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics.

Run Unix # runProtTestHPC ou runXProtTestHPC

Run Web #

Version	MAJ	psicov
1.05	2012-07-26	psicov	Download	Doc

Accurate Contact Prediction from large protein alignments

Remarque

Run Unix # psicov [options] alnfile

Run Web #

Version	MAJ	psipred
3.5	2014-06-07	psipred	Download	Doc

PSIPRED is a simple and reliable secondary structure prediction method, incorporating two feed-forward neural networks which perform an analysis on output obtained from PSI-BLAST (Position Specific Iterated - BLAST).Version 2.0 of PSIPRED includes a new algorithm which averages the output from up to 4 separate neural networks in the prediction process to further increase prediction accuracy.

Remarque Utilsie les sorties de psiblast

Run Unix # psipred

Run Web #

Version	MAJ	PSMC
0.6.5	2016-04-07	PSMC	Download	Doc

This software package infers population size history from a diploid sequence using the Pairwise Sequentially Markovian Coalescent (PSMC) model.

Remarque

Run Unix # psmc [options] input.txt

Run Web #

Version	MAJ	psort
3.0.3	2012-05-30	psort	Download	Doc

PSORT is a computer program for the prediction of protein localization sites in cells. It receives the information of an amino acid sequence and its source orgin, e.g., Gram-negative bacteria, as inputs. Then, it analyzes the input sequence by applying the stored rules for various sequence features of known protein sorting signals. Finally, it reports the possiblity for the input protein to be localized at each candidate site with additional information.

Remarque

Run Unix # psort

Run Web #

Version	MAJ	pymol
0.99	2009-04-21	pymol	Download	Doc

Pymol est un logiciel de visualisation moléculaire associé à un interpréteur Python qui permet la visualisation en temps réel ainsi que la génération rapide et de qualité dŽanimations et dŽimages dŽassemblages moléculaires.

Remarque

Run Unix # pymol

Run Web #

Version	MAJ	pynast
0.1	2012-07-23	pynast	Download	Doc

PyNAST: Python Nearest Alignment Space Termination tool PyNAST is a reimplementation of the NAST sequence aligner, which has become a popular tool for adding new 16s rDNA sequences to existing 16s rDNA alignments. This reimplementation is more flexible, faster, and easier to install and maintain than the original NAST implementation. PyNAST is built using the PyCogent Bioinformatics Toolkit. The first versions of PyNAST (through PyNAST 1.0) were written to exactly match the results of the original NAST algorithm. Beginning with the post PyNAST 1.0 development code, PyNAST no longer exactly matches the NAST output but is instead focused on getting better alignments. Users who wish to exactly match the results of NAST should download PyNAST 1.0.

Remarque PyNAST: a flexible tool for aligning sequences to a template alignment. J. Gregory Caporaso, Kyle Bittinger, Frederic D. Bushman, Todd Z. DeSantis, Gary L. Andersen, and Rob Knight. January 15, 2010, DOI 10.1093/bioinformatics/btp636. Bioinformatics 26: 266-267.

Run Unix # pynast [options] {-i input_fp -t template_fp} ou pynast -h

Run Web #

Version	MAJ	qiime
1.9.1	2016-11-14	qiime	Download	Doc

QIIME (pronounced "chime") stands for Quantitative Insights Into Microbial Ecology. QIIME is an open source software package for comparison and analysis of microbial communities, primarily based on high-throughput amplicon sequencing data (such as SSU rRNA) generated on a variety of platforms, but also supporting analysis of other types of data (such as shotgun metagenomic data). QIIME takes users from their raw sequencing output through initial analyses such as OTU picking, taxonomic assignment, and construction of phylogenetic trees from representative sequences of OTUs, and through downstream statistical analysis, visualization, and production of publication-quality graphics. QIIME has been applied to single studies based on billions of sequences from thousands of samples.

Remarque

Run Unix # qiime_env

Run Web #

Version	MAJ	qpdf
5.1.3	2015-08-26	qpdf	Download	Doc

QPDF is a command-line program that does structural, content-preserving transformations on PDF files. It could have been called something like pdf-to-pdf. It also provides many useful capabilities to developers of PDF-producing software or for people who just want to look at the innards of a PDF file to learn more about how they work.

Remarque

Run Unix # qpdf [options] infile outfile

Run Web #

Version	MAJ	Quake
0.3.5	2014-10-02	Quake	Download	Doc

Quake is a package to correct substitution sequencing errors in experiments with deep coverage (e.g. >15X), specifically intended for Illumina sequencing reads. Quake adopts the k-mer error correction framework, first introduced by the EULER genome assembly package. Unlike EULER and similar progams, Quake utilizes a robust mixture model of erroneous and genuine k-mer distributions to determine where errors are located. Then Quake uses read quality values and learns the nucleotide to nucleotide error rates to determine what types of errors are most likely. This leads to more corrections and greater accuracy, especially with respect to avoiding mis-corrections, which create false sequence unsimilar to anything in the original genome sequence from which the read was taken.

Remarque Kelley DR, Schatz MC, Salzberg SL. Quake: quality-aware detection and correction of sequencing errors. Genome Biology 11:R116 2010. (http://genomebiology.com/2010/11/11/R116/abstract)

Run Unix # quake.py --help

Run Web # 0.3.5

Version	MAJ	quantiNemo
1.0.4	2015-02-24	quantiNemo	Download	Doc

quantiNEMO is an individual-based, genetically explicit stochastic simulation program. It was developed to investigate the effects of selection, mutation, recombination, and drift on quantitative traits with varying architectures in structured populations connected by migration and located in a heterogeneous habitat. quantiNEMO is highly flexible at various levels: population, selection, trait(s) architecture, genetic map for QTL and/or markers, environment, demography, mating system, etc.

Remarque

Run Unix # quantiNemo

Run Web #

Version	MAJ	quast
3.2	2016-03-08	quast	Download	Doc

QUality ASsesment Tool for Genome Assembly QUAST evaluates a quality of genome assemblies by computing various metrics and providing nice reports.

Remarque Citation : Alexey Gurevich, Vladislav Saveliev, Nikolay Vyahhi and Glenn Tesler, QUAST: quality assessment tool for genome assemblies, Bioinformatics (2013) 29 (8): 1072-1075. doi: 10.1093/bioinformatics/btt086 First published online: February 19, 2013

Run Unix # quast.py [options] metaquast.py [options]

Run Web #

Version	MAJ	QuickTree
1.1	2006-02-21	QuickTree	Download	Doc

QuickTree is a program for the rapid reconstruction of phylogenies by the Neighbor-Joining method. For details, see the article published in the journal 'Bioinformatics' (18:1546-1547).

Remarque

Run Unix # quicktree

Run Web #

Version	MAJ	quip
1.1.4	2013-02-21	quip	Download	Doc

Quip compresses next-generation sequencing data with extreme prejudice. It supports input and output in the FASTQ and SAM/BAM formats, compressing large datasets to as little as 15% of their original size.

Remarque Compression of next-generation sequencing reads aided by highly efficient de novo assembly Daniel C. Jones; Walter L. Ruzzo; Xinxia Peng; Michael G. Katze — Nucleic Acids Research 2012; doi: 10.1093/nar/gks754

Run Unix # quip

Run Web #

Version	MAJ	R
3.1.1	2014-08-08	R	Download	Doc

R is a language and environment for statistical computing and graphics. In the context of the analysis of genomic data, R includes some statistical packages for clustering, linear model, anova, ...(downloaded from the CRAN). There is also others packages dedicated for the microarray analysis (downloaded from the CRAN). The last the R-project about bioanalysis is named bioconductor (http://www.bioconductor.org/) for the analysis and comprehension of genomic data. The packages anapuce and varmixt developped by the team Statistique et génome (OMIP department INA P-G & INRA - http://www.inapg.fr/ens_rech/mathinfo/recherche/mathematique/outil.html) for differential analysis are also available on the platform.

Remarque Pour avoir l'aide #help.start(browser="mozilla") ou tout autre navigateur non deja utilise (ouvert)

Run Unix # R

Run Web #

Version	MAJ	rainbow
2.0	2012-09-10	rainbow	Download	Doc

Rainbow package consists of several programs used for RAD-seq related clustering and de novo assembly.

Remarque

Run Unix # rainbow [options]

Run Web #

Version	MAJ	rasmol
2.7.2.1.1	2004-10-07	rasmol	Download	Doc

Software for looking at macromolecular structure and its relation to function

Remarque

Run Unix # rasmol

Run Web #

Version	MAJ	ratt
-	2010-10-29	ratt	Download	Doc

RATT is software to transfer annotation from a reference (annotated) genome to an unannotated query genome.

Remarque

Run Unix # start.ratt.sh

Run Web #

Version	MAJ	raxml
7.3.0	2013-04-11	raxml	Download	Doc

RAxML (Randomized Axelerated Maximum Likelihood) is a program for sequential and parallel Maximum Likelihood [1] based inference of large phylogenetic trees. It has originally been derived from fastDNAml which in turn was derived from Joe Felsentein’s dnaml which is part of the PHYLIP [2] package.

Remarque If you use RAxML please always cite the following paper: Alexandros Stamatakis : “RAxML-VI-HPC: Maximum Likelihood-based Phylogenetic Analyses with Thousands of Taxa and Mixed Models”, Bioinformatics 22(21):2688–2690, 2006 [4].

Run Unix # raxmlHPC -h ou raxmlHPC-MPI -h

Run Web #

Version	MAJ	ray
2.3.1	2014-06-19	ray	Download	Doc

Ray is a parallel de novo genome assembler that utilises the message-passing interface everywhere and is implemented using peer-to-peer communication.

Remarque Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. Sébastien Boisvert, François Laviolette, and Jacques Corbeil. Journal of Computational Biology (Mary Ann Liebert, Inc. publishers). November 2010, 17(11): 1519-1533. doi:10.1089/cmb.2009.0238

Run Unix # Ray -help

Run Web #

Version	MAJ	rdp_classifier
2.2	2014-08-02	rdp_classifier	Download	Doc

Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy.

Remarque Exemple : classifier testQuerySeq.fasta mon_result train052008/rRNAClassifier.properties ---Il faut préalablement avoir dans son home le repertoire train052008 http://downloads.sourceforge.net/rdp-classifier/RDPClassifier_train052008.tar.gz) (http://rdp.cme.msu.edu/tmp_download/train3.tar.gz)

Run Unix # classifier

Run Web #

Version	MAJ	readseq2
2.1.30	2014-06-23	readseq2	Download	Doc

Read and reformat biosequences

Remarque

Run Unix # readseq2 [option]

Run Web #

Version	MAJ	ReAS
2.02	2011-06-30	ReAS	Download	Doc

ReAS: Recovery of Ancestral Sequences for Transposable Elements from the Unassembled Reads of a Whole Genome Shotgun

Remarque http://www.ploscompbiol.org/article/info:doi%2F10.1371%2Fjournal.pcbi.0010043

Run Unix #

Run Web #

Version	MAJ	RepeatMasker
3.3.0	2011-07-04	RepeatMasker	Download	Doc

Remarque Pour rechercher une espèce par exemple bos_taurus : /usr/local/genome/RepeatMasker/util/queryTaxonomyDatabase.pl -species "bos taurus"

Run Unix # RepeatMasker

Run Web #

Version	MAJ	RepeatScout
1.05	2011-06-30	RepeatScout	Download	Doc

RepeatScout is a tool to discover repetitive substrings in DNA.

Remarque If you use RepeatScout, please cite the following paper: Price A.L., Jones N.C. and Pevzner P.A. 2005. De novo identification of repeat families in large genomes. To appear in Proceedings of the 13 Annual International conference on Intelligent Systems for Molecular Biology (ISMB-05). Detroit, Michigan.

Run Unix # *1/ build_lmer_table -l -sequence -freq [opts] **2/ RepeatScout -sequence -output -freq -l [opts]

Run Web #

Version	MAJ	reptile
2.0	2012-05-03	reptile	Download	Doc

Reptile is a software developed in C++ for correcting sequencing errors in short reads from next-gen sequencing platforms.

Remarque

Run Unix # reptile-omp

Run Web #

Version	MAJ	RHOM
31.5		RHOM	Download	Doc

R'HOM (Recherche de rÃ©gions HOMogÃ¨nes) est un programme pour la segmentation de sÃ©quences d'ADN en rÃ©gions de composition homogÃ¨nes par chaÃ®nes de Markov cachÃ©es. L'utilisateur choisi le nombre de type de composition diffÃ©rentes et la longueur des mots Ã prendre en compte. Les paramÃ¨tres sont ensuite estimÃ©s par maximum de vraisemblance (algorithme EM) et la sÃ©quence est finalement segmentÃ©e avec l'algorithme forward backward. R'HOM a Ã©tÃ© initialement dÃ©veloppÃ© pendant la thÃ¨se de doctorat de Florence Muri et a Ã©tÃ© ensuite en grande partie rÃ©-implÃ©mentÃ©.

Remarque

Run Unix # rhom.em

Run Web #

Version	MAJ	riboPicker
0.4.3	2013-07-29	riboPicker	Download	Doc

Easy identification and removal of rRNA-like sequences. The riboPicker tool can be used to automatically identify and efficiently remove rRNA-like sequences from metatranscriptomic and metagenomic datasets. It is easily configurable and provides a user-friendly interface.

Remarque

Run Unix # ribopicker [options] -f -dbs ...

Run Web #

Version	MAJ	rmes
3.1.0	2014-08-20	rmes	Download	Doc

Programme pour dÃ©tecter des mots ou motifs ayant une frÃ©quence statistiquement exceptionnelle dans une sÃ©quence biologique. (R'MES pour Recherche de Mots Exceptionnels dans les SÃ©quences)

Remarque Voici ce qu'il y a de nouveau par rapport Ã la version 3.01 : Changements majeurs : - amÃ©lioration significative du temps de calcul dans le cas des approximations Gaussiennes, quelque soit l'ordre du modÃ¨le, - levÃ©e de la contrainte sur la taille des noms des familles de mots. Changements mineurs : - renommage des options de sÃ©lection de seuil dans l'outil de mise en forme des rÃ©sultats (--minthresh et --maxthresh deviennent --tmin et --tmax), - modification de l'ordre de prÃ©sentation pour les rÃ©sultats de calcul de biais (triÃ©s selon le score, et non plus alphabÃ©tiquement). Pour toutes questions, contactez Sophie.Schbath@jouy.inra.fr

Run Unix # rmes [options] -s -o rmes --help

Run Web #

Version	MAJ	rmesplot
0.92	2007-10-31	rmesplot	Download	Doc

Remarque

Run Unix # rmesplot

Run Web #

Version	MAJ	rna2map
0.5.0	2009-09-10	rna2map	Download	Doc

The SOLiD System Small RNA Analysis Pipeline Tool (RNA2MAP) can be used to perform whole genome analysis of color space RNA library reads. It consists of three major procedures: filtering, matching against miRBase sequences (Sanger), and matching against a reference genome.

Remarque

Run Unix #

Run Web #

Version	MAJ	rnammer
1.2	2014-04-30	rnammer	Download	Doc

RNAmmer 1.2 predicts 5s/8s, 16s/18s, and 23s/28s ribosomal RNA in full genome sequences

Remarque

Run Unix # rnammer [options] (man rnammer)

Run Web #

Version	MAJ	RUM
1.12.01	2012-05-03	RUM	Download	Doc

RUM is an alignment, junction calling, and feature quantification pipeline specifically designed for Illumina RNA-Seq data.

Remarque Comparative Analysis of RNA-Seq Alignment Algorithms and the RNA-Seq Unified Mapper (RUM) Gregory R. Grant, Michael H. Farkas, Angel Pizarro, Nicholas Lahens, Jonathan Schug, Brian Brunk, Christian J. Stoeckert Jr, John B. Hogenesch and Eric A. Pierce.

Run Unix # RUM_runner.pl [options]

Run Web #

Version	MAJ	samToFastq
1.62(1113)	2012-02-17	samToFastq	Download	Doc

Extracts read sequences and qualities from the input SAM/BAM file and writes them into the output file in Sanger fastq format. In the RC mode (default is True), if the read is aligned and the alignment is to the reverse strand on the genome, the read's sequence from input SAM file will be reverse-complemented prior to writing it to fastq in order restore correctly the original read sequence as it was generated by the sequencer.

Remarque

Run Unix #

Run Web #

Version	MAJ	SAMtools
1.2	2015-04-15	SAMtools	Download	Doc

SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.

Remarque

Run Unix # samtools [options]

Run Web #

Version	MAJ	SAS
9.2	2010-12-16	SAS	Download	Doc

SAS - Statistical Analysis System

Remarque

Run Unix # sasx

Run Web #

Version	MAJ	scilab
4.1.2	2007-11-29	scilab	Download	Doc

Scilab est un logiciel de calcul numérique scientifique qui fournit un puissant environnement de développement pour les applications scientifiques et l’ingénierie.

Remarque

Run Unix # scilab

Run Web #

Version	MAJ	scwrl3
3.0	2005-10-10	scwrl3	Download	Doc

SCWRL3.0 is a completely new version of the SCWRL program for prediction of protein side-chain conformations. SCWRL3.0 is based on a new algorithm based on graph theory that solves the combinatorial problem in side-chain prediction more rapidly than any other available program. SCWRL3.0 is more accurate than previous versions of SCWRL, while the new algorithm will allow for development of more sophisticated energy functions and for incorporation of side-chain flexibility around rotameric positions.

Remarque

Run Unix # scwrl3

Run Web #

Version	MAJ	seaview
4.2	20130-01-30	seaview	Download	Doc

SeaView is a graphical multiple sequence alignment editor. SeaView is able to read and write various alignment formats (NEXUS, MSF, CLUSTAL, FASTA, PHYLIP,MASE)

Remarque

Run Unix # seaview

Run Web #

Version	MAJ	seq-gen
1.3.3	2012-08-24	seq-gen	Download	Doc

Seq-Gen is a program that will simulate the evolution of nucleotide or amino acid sequences along a phylogeny, using common models of the substitution process. A range of models of molecular evolution are implemented including the general reversible model. State frequencies and other parameters of the model may be given and site-specific rate heterogeneity may also be incorporated in a number of ways. Any number of trees may be read in and the program will produce any number of data sets for each tree. Thus large sets of replicate simulations can be easily created. It has been designed to be a general purpose simulator that incorporates most of the commonly used (and computationally tractable) models of molecular sequence evolution.

Remarque

Run Unix # seq-gen

Run Web #

Version	MAJ	seqmap
1.0.12	2009-02-19	seqmap	Download	Doc

SeqMap is a tool for mapping large amount of oligonucleotide to the genome. It is designed for finding all the places in a genome where an oligonucleotide could potentially come from. SeqMap can efficiently map as many as dozens of millions of short sequences to a genome of several billions of nucleotides. While doing the mapping, several mutations as well as insertions/deletions of the nucleotide bases in the sequences can be tolerated and furthermore detected. Various input and output formats are supported, as well as many command line options for tuning almost every steps in the mapping process.

Remarque Publication: http://dx.doi.org/10.1093/bioinformatics/btn429

Run Unix # seqmap

Run Web #

Version	MAJ	seqtk
--	2013-02-26	seqtk	Download	Doc

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip.

Remarque

Run Unix # seqtk

Run Web #

Version	MAJ	sequin
5.35	2005-07-01	sequin	Download	Doc

A DNA Sequence Submission and Update Tool

Remarque

Run Unix # sequin

Run Web # http://www.ncbi.nlm.nih.gov/Sequin/

Version	MAJ	SHOW
20111109	2011-11-11	SHOW	Download	Doc

SHOW (Structured HOMgeneity Watcher) permet une utilisation souple des modeles de chaines de Markov cachees. L'utilisateur peut construire son propre modele dont les parametres peuvent ensuite etre estimes par maximum de vraisemblance avec l'algorithme EM. Le modele peut alors servir a faire des predictions avec l'algorithme forward-backward (posterior decoding) ou avec l'algorithme de Viterbi. Il peut aussi servir a simuler des sequences. SHOW implemente aussi un detecteur de genes bacteriens. L'utilisateur n'a alors pas a se soucier du modele ni des parametres. SHOW a deja servi a annoter des genomes complets publies.

Remarque

Run Unix # show_viterbi # show2mugen.pl

Run Web #

Version	MAJ	showVenn
1.0	2010-02-26	showVenn	Download	Doc

Cet outil permet de manipuler des listes d'identifiants sous la forme d'un diagramme de Venn. On peut ainsi trouver les éléments en commun ou originaux de 5 listes différentes. En cliquant sur les différents territoires du diagramme, l'utilisateur récupère les identifiants qui correspondent au sous ensemble sélectionné.

Remarque

Run Unix #

Run Web # http://tomcat.jouy.inra.fr/Venn/

Version	MAJ	sickle
1.200	2013-02-26	sickle	Download	Doc

sickle - A windowed adaptive trimming tool for FASTQ files using quality

Remarque

Run Unix # sickle [options]

Run Web #

Version	MAJ	signalp
4.0	2014-05-07	signalp	Download	Doc

Détection de séquence signal et de site de clivage sur les séquences protéiques de bactéries Gram+, Gram- et d'eucaryotes.

Remarque The SIGNALP package is a property of Center for Biological Sequence Analysis It may be downloaded only by special agreement (contact software@cbs.dtu.dk).

Run Unix # signalp

Run Web #

Version	MAJ	sim4
2003-09-21	2006-03-28	sim4	Download	Doc

SIM4 recherche les meilleurs alignements locaux entre une séquence d'ADNc et une séquence d'ADN génomique (ARNm, EST) contenant ce gène et autorisant la présence d'introns et un petit nombre d'erreurs de séquençage.

Remarque http://globin.cse.psu.edu/html/docs/sim4.html

Run Unix # sim4 mouse_cDNA human_genomic K=15 C=11 A=3 W=10

Run Web #

Version	MAJ	Simbac
"master"	2017-05-03	Simbac	Download	Doc

Simulation of whole bacterial genomes with homologous recombination

Remarque

Run Unix # SimBac [OPTIONS]

Run Web #

Version	MAJ	SIMPA
1.0		SIMPA	Download	Doc

SIMPA est un programme de prÃ©diction de la structure secondaire des protÃ©ines. 3 Ã©tats sont pris en considÃ©ration : l'hÃ©lice alpha (H), les brins bÃªta (b) et les structures apÃ©riodiques (C). Ce programme est basÃ© sur la notion de "nearest neighbor". Il fournit un rÃ©sultat Q3 de 67%.

Remarque

Run Unix #

Run Web #

Version	MAJ	SimWalk2
2.91	2007-01-31	SimWalk2	Download	Doc

SimWalk2 is a statistical genetics computer application for haplotype, parametric linkage, non-parametric linkage (NPL), identity by descent (IBD) and mistyping analyses on any size of pedigree. SimWalk2 uses Markov chain Monte Carlo (MCMC) and simulated annealing algorithms to perform these multipoint analyses.

Remarque Ces fichiers sont indispensables a l'utilisation de SimWalk2 et doivent être là où on lancera le soft. (MAP.DAT, LOCUS.DAT, PEDIGREE.DAT, PEN.DAT).

Run Unix # simwalk2

Run Web #

Version	MAJ	SLICEMBLER
-	2015-01-26	SLICEMBLER	Download	Doc

SLICEMBLER is an iterative meta-assembler that takes advantage of the whole dataset, and significantly improves the final quality of the assembly. SLICEMBLER partitions the input data into optimal-sized “slices” and uses a standard assembly tool (e.g., Velvet, SPAdes, IDBA, Ray) to assemble each slice individually. SLICEMBLER uses majority voting among the individual assemblies to identify long contigs that can be merged to the consensus assembly. It extracts high-quality contigs from the slice assemblies, and prevents contigs containing mis-joins and calling errors to be included in the final assembly. SLICEMBLER has been designed and developed at the algorithm and computational biology lab. , university of California, Riverside.

Remarque

Run Unix # slicembler.py -r -i -c -n -o

Run Web #

Version	MAJ	snap
0.13	2012-07-16	snap	Download	Doc

SNAP is a new sequence aligner that is 10-100x faster and simultaneously more accurate than existing tools like BWA, Bowtie2 and SOAP2. It runs on commodity x86 processors, and supports a rich error model that lets it cheaply match reads with more differences from the reference than other tools. This gives SNAP up to 2x lower error rates than existing tools and lets it match larger mutations that they may miss.

Remarque Faster and More Accurate Sequence Alignment with SNAP. Matei Zaharia, William J. Bolosky, Kristal Curtis, Armando Fox, David Patterson, Scott Shenker, Ion Stoica, Richard M. Karp, and Taylor Sittler. arXiv:1111.5572v1, November 2011.

Run Unix # snap

Run Web #

Version	MAJ	soap
2.20	2014-08-23	soap	Download	Doc

SOAPaligner/soap2 is a member of the SOAP (Short Oligonucleotide Analysis Package). It is an updated version of SOAP software for short oligonucleotide alignment. The new program features in super fast and accurate alignment for huge amounts of short reads generated by Illumina/Solexa Genome Analyzer. Compared to soap v1, it is one order of magnitude faster. It require only 2 minutes aligning one million single-end reads onto the human reference genome. Another remarkable improvement of SOAPaligner is that it now supports a wide range of the read length.

Remarque To run SOAPaligner, we need to build index files for the reference genome (2bwt-builder), and then search reads against the formatted index files(soap).

Run Unix # soap

Run Web #

Version	MAJ	soap.coverage
2.7.7	2011-12-14	soap.coverage	Download	Doc

Utility for SOAP - soap.coverage can calculate sequencing coverage or physical coverage as well as duplication rate and details of specific block for each segments and whole genome by using SOAP, BLAT, BLAST, BlastZ, mum- mer and MAQ aligement results with multi-thread.

Remarque

Run Unix # soap.coverage

Run Web #

Version	MAJ	SOAPdenovo
1.04	2010-08-23	SOAPdenovo	Download	Doc

SOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads. It creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost effective way.

Remarque

Run Unix # soapdenovo [option]

Run Web #

Version	MAJ	SolexaQA
1.7	2011-05-30	SolexaQA	Download	Doc

SolexaQA is a Perl-based software package that calculates quality statistics and creates visual representations of data quality from FASTQ files generated by Illumina second-generation sequencing technology (“Solexa”).

Remarque

Run Unix # SolexaQA.pl

Run Web #

Version	MAJ	SortMeRNA
1.9	2014-02-20	SortMeRNA	Download	Doc

SortMeRNA is a software designed to rapidly filter ribosomal RNA fragments from metatransriptomic data produced by next-generation sequencers. It is capable of handling large RNA databases and sorting out all fragments matching to the database with high accuracy and specificity.

Remarque If you use SortMeRNA, please cite: Kopylova E., Noé L. and Touzet H., "SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data", Bioinformatics (2012), doi: 10.1093/bioinformatics/bts611.

Run Unix # sortmerna -h

Run Web #

Version	MAJ	SPAdes
3.9.0	2016-08-16	SPAdes	Download	Doc

SPAdes is a de Bruijn graph based assembler. It integrates a read error corrector, a multiple kmer De Bruijn graph assembler, an assembly merger, a scaffoler and a repeat resolver.

Remarque

Run Unix # spades

Run Web #

Version	MAJ	SPatt
2.0-pre1 & 1.2.2	2007-10-02	SPatt	Download	Doc

SPatt (Statistic for Patterns) is a suite of C++ programs designed for the computation of pattern occurrences p-value on text. Assuming the text is generated according to Markov model, the p-value of a given observation is its probability to occur. The lower is the p-value, the more unlikely is the observation. For example, this tools can be used to find patterns with unusual behaviour in DNA sequences.

Remarque

Run Unix # spatt (aspatt cpspatt gspatt ldspatt oldxspatt sspatt xspatt)

Run Web #

Version	MAJ	SPiD
2.1		SPiD	Download	Doc

Subtilis Protein interaction Database

Remarque

Run Unix #

Run Web # http://genome.jouy.inra.fr/cgi-bin/spid/index.cgi

Version	MAJ	splitsTree
4.13.1	2014-06-23	splitsTree	Download	Doc

SplitsTree4 is the leading application for computing unrooted phylogenetic networks from molecular sequence data. Given an alignment of sequences, a distance matrix or a set of trees, the program will compute a phylogenetic tree or network using methods such as split decomposition, neighbor-net, consensus network, super networks methods or methods for computing hybridization or simple recombination networks.

Remarque

Run Unix # SplitsTree

Run Web #

Version	MAJ	SRA ToolKit
2.5.7	2016-01-30	SRA ToolKit	Download	Doc

The NCBI SRA Toolkit enables reading ("dumping") of sequencing files from the SRA database and writing ("loading") files into the .sra format (Note that this is not required for submission). The Toolkit source code is provided in the form of the SRA SDK, and may be compiled with GCC. However, pre-built software executables are available for Linux, Windows, and Mac OS X, and we highly recommend using these pre-built executables whenever possible.

Remarque

Run Unix #

Run Web #

Version	MAJ	ssaha2
2.5.2	2014-08-20	ssaha2	Download	Doc

SSAHA (Sequence Search and Alignment by Hashing Algorithm) is an algorithm for very fast matching and alignment of DNA sequences. It achieves its fast search speed by encoding sequence information in a perfect hash function.

Remarque

Run Unix # ssaha2

Run Web #

Version	MAJ	ssake
3.2	2008-07-30	ssake	Download	Doc

SSAKE is a genomics application for assembling millions of very short DNA sequences.sIt is an easy-to-use, robust, reliable and tractable clustering algorithm for very short sequence reads, such as those generated by Illumina Ltd.

Remarque

Run Unix # ssake.pl

Run Web #

Version	MAJ	sspace
2.0	2013-02-21	sspace	Download	Doc

SSPACE is not a de novo assembler, it is used after a preassembled run. SSPACE is a script to extend and scaffold preassembled contigs using a number of mate pairs or paired-end libraries. It uses Bowtie to map all the reads to the pre-assembled contigs. Unmapped reads are used for extending, if desired, the pre-assembled contigs with the SSAKE assembler. Again Bowtie is used to map the reads to the extended contigs. Positions and orientation of the reads are stored and used for scaffolding. If both reads of a pair are found within the allowed distance, they are used for scaffolding to determine the orientation, contig pairing and ordering of the contigs.

Remarque

Run Unix # /usr/local/genome/SSPACE-BASIC-2.0_linux-x86_64/SSPACE_Basic_v2.0.pl

Run Web #

Version	MAJ	ssu-align
0.1.1	2016-07-01	ssu-align	Download	Doc

SSU-ALIGN is a software package for identifying, aligning, masking and visualizing archaeal 16S, bacterial 16S and eukaryotic 18S small subunit ribosomal RNA (SSU rRNA) sequences. It includes and uses the Infernal software package for generating alignments based on the conserved secondary structure and sequence of SSU rRNA. SSU-ALIGN extends Infernal to make it easier for users to generate large-scale alignments of up to millions of SSU rRNA sequences that will ultimately be used as input to phylogenetic inference methods. (SSU-ALIGN is not capable of inferring phylogenetic trees itself.) Large SSU rRNA sequence datasets are commonly generated by environmental sequencing survey studies that use SSU rRNA as a phylogenetic marker of species in the environment being studied. While designed primarily for these SSU-based studies, SSU-ALIGN is a general tool that can be used to generate alignments of any type of structural RNA, including large subunit ribosomal RNA (LSU rRNA).

Remarque How to cite SSU-ALIGN SSU-ALIGN does not yet have an associated publication, so please cite the INFERNAL software publication ((Nawrocki et al., 2009a)) if you find the package useful for work that you publish. Additionally, because SU-ALIGN’s seed alignments were derived from the comparative rna website we ask that you cite that database as well: (Cannone et al., 2002).

Run Unix #

Run Web #

Version	MAJ	stacks
0.9995	2012-08-21	stacks	Download	Doc

Stacks is a software pipeline for building loci out of a set of short-read sequenced samples. Stacks was developed for the purpose of building genetic maps from RAD-Tag Illumina sequence data, but can also be readily applied to population studies, and phylogeography.

Remarque Please cite this paper: J. Catchen, A. Amores, P. Hohenlohe, W. Cresko, and J. Postlethwait. Stacks: building and genotyping loci de novo from short-read sequences. G3: Genes, Genomes, Genetics, 1:171-182, 2011. [reprint]

Run Unix #

Run Web #

Version	MAJ	staden
2.0.0b7	2011-02-03	staden	Download	Doc

The Staden Package is a set of tools covering sequence assembly, editing and analysis. Gap4 performs sequence assembly, contig ordering based on read pair data, contig joining based on sequence comparisons, assembly checking, repeat searching, experiment suggestion, read pair analysis and contig editing. Pregap4 provides a graphical user interface to set up the processing required to prepare trace data for assembly or analysis. Trev is a rapid and flexible viewer and editor for ABI, ALF, SCF and ZTR trace files. Prefinish analyses partially completed sequence assemblies and suggests the most efficient set of experiments to help finish the project. Tracediff and hetscan automatically locate mutations by comparing trace data against reference traces. Spin analyses nucleotide sequences to find genes, restriction sites, motifs, etc. It can perform translations, find open reading frames, count codons, etc.

Remarque

Run Unix # http://staden.sourceforge.net/overview.html

Run Web #

Version	MAJ	STAMP
1.1	2014-09-29	STAMP	Download	Doc

Similarity, Tree-building, & Alignment of Motifs and Profiles

Remarque

Run Unix # STAMP

Run Web #

Version	MAJ	STFilter
1.0		STFilter	Download	Doc

STFilter interroge PubMed sur la base d'une liste de noms de gènes ou d'un nom d'espèce, segmente les résumés en phrase et les classes en fonction d'un critère de pertinence. Ce critère de pertinence peut être appris automatiquement à partir de phrases classées.

Remarque

Run Unix #

Run Web #

Version	MAJ	stride
	2005-12-12	stride	Download	Doc

STRIDE = Protein secondary structure assignment from atomic coordinatessSTRIDE is a program to recognize secondary structural elements in proteins from their atomic coordinates.

Remarque

Run Unix # stride

Run Web #

Version	MAJ	StringTie
1.3.0	2016-09-07	StringTie	Download	Doc

StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only the alignments of raw reads used by other transcript assemblers, but also alignments longer sequences that have been assembled from those reads.In order to identify differentially expressed genes between experiments, StringTie's output can be processed by specialized software like Ballgown, Cuffdiff or other programs (DESeq2, edgeR, etc.).

Remarque

Run Unix # stringtie -h/--help

Run Web #

Version	MAJ	structure
2.3.4	2012-12-21	structure	Download	Doc

The program structure is a free software package for using multi-locus genotype data to investigate population structure. Its uses include inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed. It can be applied to most of the commonly-used genetic markers, including SNPS, microsatellites, RFLPs and AFLPs.

Remarque

Run Unix # structure

Run Web #

Version	MAJ	SUPER-FOCUS
0.26	2016-10-04	SUPER-FOCUS	Download	Doc

SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reduced SEED database to report the subsystems present in metagenomic samples and profile their abundances. The tool was tested with over 70 real metagenomes, and the results show that our approach accurately predicts the subsystems present in microbial communities, and it can be up to over 1,000 times faster than other tools.

Remarque

Run Unix #

Run Web #

Version	MAJ	surf
1.0	2006-01-31	surf	Download	Doc

SeqUence Repository and Feature detectionsNucleotidic sequence production commonly involve several dedicated bioinformatic softwares for sequence basecalling, vector detection, etc.

Remarque

Run Unix #

Run Web #

Version	MAJ	SurfG+
1.02	2012-07-13	SurfG+	Download	Doc

SurfG+ is a tool to predict the protein localization in frame-psoitive bacteria. Current protein localization protocols are not suited to this prediction task as they ignore the potential surface exposition of many membrane-associated proteins. Therefore, we developed a new flow scheme, for the processing of protein sequence data with the particular aim of identification of potentially surface exposed (PSE) proteins from Gram-positive bacteria.

Remarque See Barinov A, Loux V, Hammani A, Nicolas P, Langella P, Ehrlich D, et al. Prediction of surface exposed proteins in Streptococcus pyogenes, with a potential application to other Gram-positive bacteria. Proteomics. 2009 Jan.;9(1):61–73.

Run Unix # Surfg

Run Web #

Version	MAJ	SvcR
1.1		SvcR	Download	Doc

SvcR est une implémentation d'un algorithme de clustering basé sur la recherche d'un séparateur dans un espace de caractéristiques entre des points décrits dans un espace de données. Le format de données est défini par une table attribut/valeur (matrice). Les données sont transformées grace à un noyau dans l'espace des caractèristiques en un cluster unique délimité par un rayon de boule et des vecteurs support. On peut utilisé le rayon de cette boule dans l'espace des données pour reconstruire la frontière formant maintenant plusieurs clusters.

Remarque

Run Unix #

Run Web #

Version	MAJ	swat
		swat	Download	Doc

Remarque

Run Unix #

Run Web #

Version	MAJ	tablet
1.14.10.20	2015-04-07	tablet	Download	Doc

Tablet is a lightweight, high-performance graphical viewer for next generation sequence assemblies and alignments.

Remarque

Run Unix # tablet

Run Web #

Version	MAJ	tagdust
1.13	2013-09-13	tagdust	Download	Doc

TagDust is a program to eliminate artifactual reads from next-generation sequencing data sets.

Remarque Lassmann T., et al. (2009) TagDust - A program to eliminate artifacts from next generation sequencing data. Bioinformatics.

Run Unix # tagdust [options] lib.fa read1.fa read2.fa ...

Run Web #

Version	MAJ	Tandem Repeats Finder
4.07b	2013-08-20	Tandem Repeats Finder	Download	Doc

A tandem repeat in DNA is two or more adjacent, approximate copies of a pattern of nucleotides. Tandem Repeats Finder is a program to locate and display tandem repeats in DNA sequences.

Remarque

Run Unix # trf

Run Web #

Version	MAJ	T-Coffee
11.00.8cbe486	2015-04-07	T-Coffee	Download	Doc

T-Coffee is a multiple sequence alignment package. Given a set of sequences (Proteins or DNA), T-Coffee generates a multiple sequence alignment. Version 2.00 and higher can mix sequences and structures.

Remarque

Run Unix # t_coffee sequence_file

Run Web #

Version	MAJ	TFM-Pvalue
-	2014-01-14	TFM-Pvalue	Download	Doc

TFM-Pvalue is a software suite providing tools for computing the score threshold associated to a given P-value and the P-value associated to a given score threshold. It uses Position Weight Matrices, such as those available in the Transfac or Jaspar databases.

Remarque Efficient and accurate P-value computation for Position Weight Matrices H. Touzet and J.S. Varré Algorithms for Molecular Biology 2007, 2:15

Run Unix #

Run Web #

Version	MAJ	TGI
	2005-07-25	TGI	Download	Doc

TGI Clustering tools (TGICL): a software system for fast clustering of large EST datasets This package automates clustering and assembly of a large EST/mRNA dataset. The clustering is performed by a slightly modified version of NCBI's megablast , and the resulting clusters are then assembled using CAP3 assembly program. TGICL starts with a large multi-FASTA file (and an optional peer quality values file) and outputs the assembly files as produced by CAP3.

Remarque

Run Unix # tgicl , cap3...

Run Web #

Version	MAJ	TM-align
20160521	2017-02-28	TM-align	Download	Doc

TM-align is an algorithm for sequence-order independent protein structure comparisons. For two protein structures of unknown equivalence, TM-align first generates optimized residue-to-residue alignment based on structural similarity using dynamic programming iterations. An optimal superposition of the two structures, as well as the TM-score value which scales the structural similarity, will be returned. TM-score has the value in (0,1], where 1 indicates a perfect match between two structures. Following strict statistics of structures in the PDB, scores below 0.2 corresponds to randomly chosen unrelated proteins whereas with a score higher than 0.5 assume generally the same fold in SCOP/CATH.

Remarque

Run Unix # TMalign PDB1.pdb PDB2.pdb [Options]

Run Web #

Version	MAJ	TMAP
3.4.1	2013-10-25	TMAP	Download	Doc

TMAP / Torrent Mapping Alignment Program - Alignment software for short and long nucleotide sequences produced by next-generation sequencing technologies.

Remarque

Run Unix #

Run Web #

Version	MAJ	tmhmm
2.0c	2007-11-22	tmhmm	Download	Doc

tmhmm is one of the better prediction methods of transmembrane helices in proteinss

Remarque tmhmm ma_sequence.fasta puis le resultat est genere sur la sortie standard (pas tres bavard) et dans un repertoire nomme TMHMM_ avec etant le PID du processus qui l a genere.

Run Unix # tmhmm

Run Web # http://www.cbs.dtu.dk/services/TMHMM/

Version	MAJ	tmmod
	2009-02-23	tmmod	Download	Doc

An Improved Hidden Markov Model for Transmembrane Protein Topology Prediction and Its Applications to Complete Genomes

Remarque

Run Unix # tmmod

Run Web #

Version	MAJ	tophat
2.0.9	2013-07-10	tophat	Download	Doc

TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.

Remarque

Run Unix # tophat -h

Run Web #

Version	MAJ	tree-puzzle
5.2	2012-08-22	tree-puzzle	Download	Doc

TREE-PUZZLE is a computer program to reconstruct phylogenetic trees from molecular sequence data by maximum likelihood. It implements a fast tree search algorithm, quartet puzzling, that allows analysis of large data sets and automatically assigns estimations of support to each internal branch. TREEPUZZLE also computes pairwise maximum likelihood distances as well as branch lengths for user specified trees. Branch lengths can be calculated with and without the molecular-clock assumption. In addition, TREE-PUZZLE o ers likelihood mapping, a method to investigate the support of a hypothesized internal branch without computing an overall tree and to visualize the phylogenetic content of a sequence alignment. TREE-PUZZLE also conducts a number of statistical tests on the data set (chi-square test for homogeneity of base composition, likelihood ratio to test the clock hypothesis, one and two-sided Kishino-Hasegawa test, Shimodaira-Hasegawa test, Expected Likelihood Weights). The models of substitution provided by TREE-PUZZLE are GTR, TN, HKY, F84, SH for nucleotides, Dayhoff, JTT, mtREV24, BLOSUM 62, VT, WAG for amino acids, and F81 for two-state data. Rate heterogeneity is modeled by a discrete Gamma distribution and by allowing invariable sites. The corresponding parameters (except for GTR) can be inferred from the data set.

Remarque

Run Unix # puzzle

Run Web #

Version	MAJ	TribeMCL
mcl-12-068	2012-03-09	TribeMCL	Download	Doc

TribeMCL is a method for clustering proteins into related groups, which are termed 'protein families'. This clustering is achieved by analysing similarity patterns between proteins in a given dataset, and using these patterns to assign proteins into related groups.

Remarque

Run Unix # mcl

Run Web #

Version	MAJ	Trimmomatic
0.32	2014-01-06	Trimmomatic	Download	Doc

Trimmomatic: A flexible read trimming tool for Illumina NGS data Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line.

Remarque

Run Unix # trimmomatic

Run Web #

Version	MAJ	Trinity
2.2.0	2016-07-01	Trinity	Download	Doc

RNA-Seq De novo Assembly Using Trinity Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads.

Remarque

Run Unix # Trinity

Run Web #

Version	MAJ	uchime
4.2.40	2013-05-27	uchime	Download	Doc

UCHIME is an algorithm for detecting chimeric sequences.

Remarque

Run Unix # uchime --input query.fasta [--db db.fasta] [--uchimeout results.uchime] [--uchimealns results.alns]

Run Web #

Version	MAJ	uclust
1.2.22q	2012-11-06	uclust	Download	Doc

UCLUST is a high-performance clustering, alignment and search algorithm that is capable of handling millions of sequences.

Remarque

Run Unix # uclust --sort seqs.fasta --output seqs_sorted.fasta

Run Web #

Version	MAJ	usearch
8.0.1517	2015-03-10	usearch	Download	Doc

USEARCH is a unique sequence analysis tool with thousands of users world-wide. USEARCH offers search and clustering algorithms that are often orders of magnitude faster than BLAST.

Remarque

Run Unix # usearch

Run Web #

Version	MAJ	VarScan
4.2.3	2017-02-16	VarScan	Download	Doc

variant detection in massively parallel sequencing data 

Remarque

Run Unix # varscan [COMMAND] [OPTIONS]

Run Web #

Version	MAJ	VAST
		VAST	Download	Doc

Programme de comparaison et d'alignement des structures 3D des protéines. VAST est basé sur une procédure en 2 étapes. Dans la première étape on utilise une description simplifiée des protéines où les éléments de structure secondaire sont représentés par des vecteurs. Le but de cette première étape est de trouver le sous-ensemble des vecteurs qui se superimposent au mieux entre les 2 structures. La significativité du résultat est évaluée en calculant la probabilité d'observer cette superimposition juste par chance. Dans la seconde étape on revient à une description atomique des structures 3D en décrivant la chaîne polypeptique par les positions des CA de chaque résidu. L'objectif de cette seconde étape est d'établir une correspondance univoque (alignement) entre les CA jouant le même rôle dans les 2 structures. On cherche à obtenir l'alignement contenant les plus de paires de CA et le rmsd (root mean square deviation) le plus faible. Pour ce faire l'algorithme est amené à répondre à des questions comme : quel alignement, l'un comprenant 100 paires de CA et ayant un rms de 3 A, et l'autre comprenant 60 paires de CA et un rms de 2 A est le meilleur? Ce problème est résolu en considérant l'alignement qui a la probabilité la plus faible d'être généré par hasard.

Remarque

Run Unix #

Run Web #

Version	MAJ	vcake
1.0	2008-07-30	vcake	Download	Doc

VCAKE is a genetic sequence assembler capable of assembling millions of small nucleotide reads even in the presence of sequencing error. This software is currently geared towards de novo assembly of Illumina's Solexa Sequencing data.

Remarque

Run Unix # perl -S vcake.pl

Run Web #

Version	MAJ	Vcflib
	2017-03-10	Vcflib	Download	Doc

A C++ library for parsing and manipulating VCF files.

Remarque

Run Unix #

Run Web #

Version	MAJ	vcftools
1.12	2015-07-13	vcftools	Download	Doc

vcftools is a suite of functions for use on genetic variation data in the form of VCF and BCF files. The tools provided will be used mainly to summarize data, run calculations on data, filter out data, and convert data into other useful file formats.

Remarque

Run Unix # vcftools [ --vcf FILE | --gzvcf FILE | --bcf FILE] [ --out OUTPUT PREFIX ] [ FILTERING OPTIONS ] [ OUTPUT OPTIONS ]

Run Web #

Version	MAJ	velvet
1.2.07	2013-08-07	velvet	Download	Doc

Sequence assembler for very short reads. Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near Cambridge, in the United Kingdom.

Remarque

Run Unix # velveth # velvetg

Run Web #

Version	MAJ	Vienna
1.8.4	2010-12-02	Vienna	Download	Doc

The Vienna RNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures. RNA secondary structure prediction through energy minimization is the most used function in the package. We provide three kinds of dynamic programming algorithms for structure prediction: the minimum free energy algorithm of (Zuker & Stiegler 1981) which yields a single optimal structure, the partition function algorithm of (McCaskill 1990) which calculates base pair probabilities in the thermodynamic ensemble, and the suboptimal folding algorithm of (Wuchty et.al 1999) which generates all suboptimal structures within a given energy range of the optimal energy. For secondary structure comparison, the package contains several measures of distance (dissimilarities) using either string alignment or tree-editing (Shapiro & Zhang 1990). Finally, we provide an algorithm to design sequences with a predefined structure (inverse folding).

Remarque

Run Unix #

Run Web #

Version	MAJ	vmatch
2.0	2007-10-29	vmatch	Download	Doc

Vmatch replaces Reputer. It looks for all possible repeats in genomes, withsa possibility to specify the kind of repeats to look for, like its identityspercentage, minimal length, etc...Can also be used to mask repeats inssequences, to analyze repeat families, etc...

Remarque

Run Unix #

Run Web #

Version	MAJ	weeder
1.4.2	2009-12-07	weeder	Download	Doc

Recherche nouveaux TFBSs dans un jeu de sequences fasta, recherche de plusieurs tailles et limite de mutations autorisees. Ne sort que les motifs ayant passe le tri stat, contrairement a MEME qui donne autant de motifs que specifie dans les parametres. Par defaut les stat de genome sont basees sur un promoteur de 1000 pb, mais possibilite d'utiliser des stats basees sur toute la sequence intergenique.

Remarque Pavesi G, Mereghetti P, Mauri G, Pesole G. Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res. 2004 32:W199-W203

Run Unix # weederlauncher.out inputfilename speciescode analysistype

Run Web #

Version	MAJ	weederH
1.4.2	2009-12-07	weederH	Download	Doc

Recherche de TFBS et ECR dans des sequences homologues. Pas d'alignement necessaire en input, pas de prerequis de PWM. Mesure de la conservation relative entre les sequences par recherche d'oligo conserves et scoring de similarite globale entre deux sequences homologues. Permet de chercher aussi les enhancers distaux. Fonctionnerait sur des promoteurs non annotes (pas de TSS connu).

Remarque Pavesi, G., Zambelli, F., Pesole, G. WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences. BMC Bioinformatics 2007, 8:46

Run Unix # weederH.out -f inputfilename -O speciescode

Run Web #

Version	MAJ	WolfPsort
0.2	2007-04-02	WolfPsort	Download	Doc

WoLF PSORT predicts the subcellular localization sites of proteins based on their amino acid sequences.

Remarque

Run Unix # runWolfPsortSummary

Run Web #

Version	MAJ	wombat
23/02/11	2011-03-10	wombat	Download	Doc

WOMBAT is a program to facilitate analyses fitting a linear, mixed model via restricted maximum likelihood (REML). It is assumed that traits analysed are continuous and have a multivariate normal distribution.

Remarque http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2064953/ ~ http://didgeridoo.une.edu.au/km/download.php?file=hangzhou.pdf

Run Unix # wombat

Run Web #

Version	MAJ	wu-blast
2.0	2005-02-14	wu-blast	Download	Doc

Washington University BLAST (WU BLAST) version 2.0 is a powerful software package for gene and protein identification, using sensitive, selective and rapid similarity searches of protein and nucleotide sequence databases.

Remarque

Run Unix # wu-blastall

Run Web #

Version	MAJ	xdigitise
3.5.10	2002-04-19	xdigitise	Download	Doc

Evaluation d experience d hybridation

Remarque

Run Unix # xdigitise

Run Web #

Version	MAJ	Xgenovo
-	2016-03-17	Xgenovo	Download	Doc

Metagenomes present assembly challenges, when assembling multiple genomes from mixed reads of multiple species. An assembler for single genomes can’t adapt well when applied in this case. A metagenomic assembler, Genovo, is a de novo assembler for metagenomes under a generative probabilistic model. Genovo assembles all reads without discarding any reads in a preprocessing step, and is therefore able to extract more information from metagenomic data and, in principle, generate better assembly results. Paired end sequencing is currently widely-used yet Genovo was designed for 454 single end reads. In this research, we attempted to extend Genovo by incorporating paired-end information, named Xgenovo, so that it generates higher quality assemblies with paired end reads.

Remarque

Run Unix # assemble - finalize

Run Web #

Version	MAJ	xgrail
1.3c	2002-09-20	xgrail	Download	Doc

GRAIL is a suite of tools designed to provide analysis and putative annotation of DNA sequences both interactively and through the use of automated computation.

Remarque

Run Unix # xgrail

Run Web # http://genome.jouy.inra.fr/

Version	MAJ	xplor-nih
2.30	2012-02-08	xplor-nih	Download	Doc

X-PLOR is a program system for computational structural biology. X-PLOR stands for exploration of conformational space of macromolecules restrained to regions allowed by combinations of empirical energy functions and experimental data. But it also stands for exploration of modern concepts of structured programming in macromolecular simulation.

Remarque

Run Unix # xplor

Run Web #

Version	MAJ	yass
1.14	2010-03-16	yass	Download	Doc

YASS est un outil permettant la recherche locale de similaritées dans les séquences d'ADN.

Remarque

Run Unix # yass

Run Web #

Menu principal

Vous êtes ici

Référentiel des outils classés par ordre alphabétique

abyss

acnuc

agmial

align

ALLPATHS-LG

amos

AnovArray

apollo

arachne

arb

ART

artemis

Artemis Comparison Tool

asium

augustus

autodock

autodock_vina

base

bcftools

BCM trace viewer

bedtools

Beluga

bfast

bioprospector

bismark

blast

blast+

blat

BMGE

bowtie

bowtie2

breakdancer

bsmap

buster

bwa

CaliFlopp

canu

cap3

carthagene

CATCh

ccp4

cd-hit

cd-hit-454

Celera Assembler (wgs)

censor

cgview

circos

class2g

CLC Sequence Viewer

clustal-omega

clustalx

cluster-3.0

CNVnator

COLONY

concaterpillar

consed

consel

coot

CopyRighter

corona

count_base

count_codon

cross_match

cufflinks

cutadapt

cytoscape

dadi

debarcer

delly

DESeq2

dialign

diamond

DisplayMUM

Dizzy

DOMIRE

dotur

dsrc2