Référentiel des outils classés par ordre alphabétique


VersionMAJ

abyss

1.5.22014-11-18DownloadDoc
ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.

Remarque
Run Unix # Usage: ABYSS [OPTION]... FILE...Run Web #

VersionMAJ

acnuc

none2003-08-09DownloadDoc
ACNUC allows to select sequences from many criteria from these three databases, to translate protein-coding genes in protein, and to extract selected sequences in user files. ACNUC is very efficient in providing direct access to coding regions (e.g. protein coding regions, tRNA or rRNA coding regions) of DNA fragments present in GenBank.

Remarque
Run Unix # acnuc ou pour la version X11 xacnucRun Web #

VersionMAJ

agmial

2004-10-12DownloadDoc
Agmial est une chaîne d'annotation de génomes microbiens, formée de deux modules indépendants. Le premier gère les séquences protéiques, le second les séquences nucléiques. Agmial soutient le principe que l'expert humain doit être placé au centre du processus d'annotation. Afin d'aider les annotateurs dans cette tache complexe et coûteuse en temps, le système est conçu pour automatiser au maximum le processus d'annotation et fournir des interfaces conviviales. Il implémente une stratégie d'annotation. Le système est capable de travailler sur des séquences non finies (draft) et il permet l'annotation collaborative par des équipes d'annotateurs. Il est basé sur des standards informatiques (services web, système de gestion de base de données relationnelles, Java, ...) et bioinformatiques. Le système est distribué sous licence GPL. Agmial est actuellement utilisé par plusieurs laboratoires de l'INRA pour l'annotation ou la réannotation de génomes d'interêt agro-alimentaire.

Remarque
Run Unix # Run Web # http://genome.jouy.inra.fr/public-agmial

VersionMAJ

align

2.0u2003-11-28DownloadDoc
align et align0 calcule un alignement global de deux sequences.

Remarque
Run Unix # align ou align0Run Web #

VersionMAJ

ALLPATHS-LG

2012-03-13DownloadDoc
ALLPATHS-LG is a de Bruijn graph-based de novo assembler for large (and small) genomes. ALLPATHS-LG is being developed by scientists at the Broad Institute.

Remarque
Run Unix # Run Web #

VersionMAJ

amos

3.1.02013-08-12DownloadDoc
AMOS: A Modular Open-Source Assembler

Remarque
Run Unix # Run Web #

VersionMAJ

AnovArray

1.12003-10-20DownloadDoc
AnovArray permet la quantification des facteurs biologiques et des biais techniques, ainsi que l'identification des gènes différentiellement exprimés entre plusieurs conditions expérimentales (deux et plus) pour des expériences transcriptomiques issues de macroarray et microarray dans la cadre d'un plan d'expérience factoriel équilibré et d'un modèle complet. Ce package est développé en SAS (logiciel statistique) et bénéficie en conséquence de toutes les procédures statistiques de ce logiciel. Les méthodes statistiques dans ce package sont l'analyse de la variance (ANOVA) et les tests multiples de type FDR (False Discovery Rate).

Remarque
Run Unix # Utilisation sous SASRun Web #

VersionMAJ

apollo

1.11.82013-08-11DownloadDoc
Apollo is a genomic annotation viewer and editor. There are currently two branches of Apollo, one primarily used for genome browsing and maintained at Ensembl, and the other primarily used for genome annotation and maintained at the Berkeley Drosophila Genome Center. The latter is part of the GMOD project.

Remarque
Run Unix # apolloRun Web #

VersionMAJ

arachne

3.12008-07-29DownloadDoc
Arachne is a tool for assembling genome sequences from whole genome shotgun reads, mostly in forward-reverse pairs obtained by sequencing clone ends.

Remarque
Run Unix # Run Web #

VersionMAJ

arb

none2003-08-22DownloadDoc
The ARB software is a graphically oriented package comprising various tools for sequence database handling and data analysis. A central database of processed (aligned) sequences and any type of additional data linked to the respective sequence entries is structured according to phylogeny or other user defined criteria.

Remarque
Run Unix # arbRun Web #

VersionMAJ

ART

ChocolateCherryCake2015-04-30DownloadDoc
ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. ART can also simulate reads using user own read error model or quality profiles. ART supports simulation of single-end, paired-end/mate-pair reads of three major commercial next-generation sequencing platforms: Illumina's Solexa, Roche's 454 and Applied Biosystems' SOLiD. ART can be used to test or benchmark a variety of method or tools for next-generation sequencing data analysis, including read alignment, de novo assembly, SNP and structure variation discovery. ART was used as a primary tool for the simulation study of the 1000 Genomes Project . ART is implemented in C++ with optimized algorithms and is highly efficient in read simulation. ART outputs reads in the FASTQ format, and alignments in the ALN format. ART can also generate alignments in the SAM alignment or UCSC BED file format.

Remarque Citation: Weichun Huang, Leping Li, Jason R Myers, and Gabor T Marth. ART: a next-generation sequencing read simulator, Bioinformatics (2012) 28 (4): 593-594
Run Unix # README FILES in http://genome.jouy.inra.fr/doc/genome/NGS/ARTRun Web #

VersionMAJ

artemis

15.02013-08-07DownloadDoc
Artemis is a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation.

Remarque
Run Unix # artRun Web #

VersionMAJ

Artemis Comparison Tool

-2015-07-15DownloadDoc
ACT is a free tool for displaying pairwise comparisons between two or more DNA sequences. It can be used to identify and analyse regions of similarity and difference between genomes and to explore conservation of synteny, in the context of the entire sequences and their annotation.

Remarque
Run Unix # Run Web #

VersionMAJ

asium

2.21DownloadDoc
Asium construit des hiérarchies conceptuelles (ontologies) à partir de texte analysé. Il est associé avec le logiciel LP2LP qui transforme les sorties de Link Parser en entrée d'Asium et à un logiciel de transformation des sorties en RDF.

Remarque
Run Unix # Run Web #

VersionMAJ

augustus

2.7 2013-12-12DownloadDoc
AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences. It can be run on this web server, on a new web server for larger input files or be downloaded and run locally. It is open source so you can compile it for your computing platform. You can now run AUGUSTUS on the German MediGRID. This enables you to submit larger sequence files and allows to use protein homology information in the prediction. The MediGRID requires an instant easy registration by email for first-time users.

Remarque
Run Unix # augustus [parameters] --species=SPECIES queryfilenameRun Web #

VersionMAJ

autodock

4.2.62015-03-09DownloadDoc
AutoDock is a suite of automated docking tools. It is designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure.

Remarque
Run Unix # autodock4 et autogrid4Run Web #

VersionMAJ

autodock_vina

1.1.22015-05-12DownloadDoc
AutoDock Vina is a new program for drug discovery, molecular docking and virtual screening, offering multi-core capability, high performance and enhanced accuracy and ease of use.

Remarque O. Trott, A. J. Olson, AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading, Journal of Computational Chemistry (in press)
Run Unix # vina --helpRun Web #

VersionMAJ

base

1.2.122004-08-12DownloadDoc
BioArray Software Environment (BASE) est une base de données permettant de gérer l’importante quantité de données générées par des analyses de bio-puces. BASE gère les informations biologiques, les données brutes et les images. BASE possède également des outils de normalisation, de visualisation et d’analyse des données.

Remarque
Run Unix # Run Web # http://genome.jouy.inra.fr/basejouy

VersionMAJ

bcftools

1.22015-04-15DownloadDoc
BCFs.bcftools (Tools for variant calling and manipulating VCFs and BCFs)

Remarque
Run Unix # bcftools Run Web #

VersionMAJ

BCM trace viewer

1.5DownloadDoc
A Java application/applet to display .scf traces and phred quality values.

Remarque
Run Unix # bcm-trace-view -s { -q }Run Web #

VersionMAJ

bedtools

2.16.22012-10-09DownloadDoc
The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM. Using BEDTools, one can develop sophisticated pipelines that answer complicated research questions by "streaming" several BEDTools together. The following are examples of common questions that one can address with BEDTools.

Remarque Please cite the following article if you use BEDTools in your research: Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 6, pp. 841–842.
Run Unix # Run Web #

VersionMAJ

Beluga

DownloadDoc
La démarche centrale est basée sur les techniques emanant de l'apprentissage automatique (classification) et le traitement automatique des langues mais aussi d'une methode sociologique appelée GST (Graphe SocioTechnique) de facon a construire des indices d'evolution de l'innovation grace a la terminology utilisée au cours du temps.

Remarque
Run Unix # Run Web #

VersionMAJ

bfast

0.7.02013-08-12DownloadDoc
BFAST : Blat-like Fast Accurate Search Tool BFAST facilitates the fast and accurate mapping of short reads to reference sequences. Some advantages of BFAST include: * Speed: enables billions of short reads to be mapped quickly. * Accuracy: A priori probabilities for mapping reads with defined set of variants. * An easy way to measurably tune accuracy at the expense of speed.

Remarque
Run Unix # bfast [options]Run Web #

VersionMAJ

bioprospector

20042014-01-01DownloadDoc
Programme de recherche de motifs d'une ou deux boîtes exceptionnels (Gibbs Sampler) dans des séquences d'ADN. Des séquences de bruit de fond peuvent être fournies. Séquences en entrée de moins de 32765 nt, format fasta avec séquence sur une ligne (et en-tête de la forme >sequence1 nomdegene ). Peut rechercher spécifiquement des palyndromes.

Remarque
Run Unix # BioProspectorRun Web #

VersionMAJ

bismark

0.14.32015-06-05DownloadDoc
Bismark is a program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step. The output can be easily imported into a genome viewer, such as SeqMonk, and enables a researcher to analyse the methylation levels of their samples straight away.

Remarque
Run Unix # bismark [options] {-1 -2 | }Run Web #

VersionMAJ

blast

2.2.262012-03-07DownloadDoc

Remarque
Run Unix # blastallRun Web # https://migale.jouy.inra.fr/?q=blast

VersionMAJ

blast+

2.2.312015-08-24DownloadDoc
The Basic Local Alignment Search Tool (BLAST) is the most widely used sequence similarity tool. There are versions of BLAST that compare protein queries to protein databases, nucleotide queries to nucleotide databases, as well as versions that translate nucleotide queries or databases in all six frames and compare to protein databases or queries. PSI-BLAST produces a position-specific-scoring-matrix (PSSM) starting with a protein query, and then uses that PSSM to perform further searches. It is also possible to compare a protein or nucleotide query to a database of PSSM’s. The NCBI supports a BLAST web page at blast.ncbi.nlm.nih.gov as well as a network service. The NCBI also distributes stand-alone BLAST applications for users who wish to run BLAST on their own machines or with their own databases. This document describes the stand-alone BLAST applications and will concentrate on the latest generation of such applications included in the BLAST+ package.

Remarque
Run Unix # /usr/local/genome/ncbi-blast-2.2.31+/bin/Run Web #

VersionMAJ

blat

342008-01-11DownloadDoc
BLAT is a DNA/Protein Sequence Analysis program written by Jim Kent at UCSC. It is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more. It may miss more divergent or shorter sequence alignments. It will find perfect sequence matches of 33 bases, and sometimes find them down to 22 bases. BLAT on proteins finds sequences of 80% and greater similarity of length 20 amino acids or more. In practice DNA BLAT works well on primates, and protein blat on land vertebrates.

Remarque
Run Unix # blatRun Web #

VersionMAJ

BMGE

1.12012-12-19DownloadDoc
BMGE (Block Mapping and Gathering with Entropy) is a program that selects regions in a multiple sequence alignment that are suited for phylogenetic inference. BMGE selects characters that are biologically relevant, thanks to the use of standard similarity matrices such as PAM or BLOSUM. Moreover, BMGE provides other character- or sequenceremoval operations, such stationary-based character trimming (that provides a subset of compositionally homogeneous characters) or removal of sequences containing a too large proportion of gaps. Finally, BMGE can simply be used to perform standard conversion operations among DNA-, codon-, RY- and amino acid-coding sequences.

Remarque
Run Unix # BMGE ou BMGE -? Run Web #

VersionMAJ

bowtie

1.1.22016-07-24DownloadDoc
Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).

Remarque
Run Unix # bowtie [options]* {-1 -2 | --12 | } []Run Web #

VersionMAJ

bowtie2

2.2.52015-04-07DownloadDoc
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.

Remarque
Run Unix # bowtie2 [options]* -x {-1 -2 | -U } [-S ] Run Web #

VersionMAJ

breakdancer

1.4.52015-03-06DownloadDoc
BreakDancerMax predicts five types of structural variants: insertions, deletions, inversions, inter- and intra-chromosomal translocations from next-generation short paired-end sequencing reads using read pairs that are mapped with unexpected separation distances or orientation.

Remarque
Run Unix # Usage: breakdancer-max Run Web #

VersionMAJ

bsmap

2.902015-03-17DownloadDoc
BSMAP is a short reads mapping software for bisulfite sequencing reads. Bisulfite treatment converts unmethylated Cytosines into Uracils (sequenced as Thymine) and leave methylated Cytosines unchanged, hence provides a way to study DNA cytosine methylation at single nucleotide resolution. BSMAP aligns the Ts in the reads to both Cs and Ts in the reference.

Remarque Citation: Xi Y, Li W: BSMAP: whole genome Bisulfite Sequence MAPping program. BMC Bioinformatics (2009) 10:232.
Run Unix # bsmapRun Web #

VersionMAJ

buster

2.10.3 <2016-12-07> 2016-12-08DownloadDoc
BUSTER structure refinement package. Includes the refine program for running BUSTER refinement and loads of useful utilities.

Remarque How to cite use of BUSTER : https://www.globalphasing.com/buster/wiki/index.cgi?BusterCite
Run Unix # Run Web #

VersionMAJ

bwa

0.7.122015-04-07DownloadDoc
BWA is a fast light-weighted tool that aligns short sequences to a sequence database, such as the human reference genome. By default, BWA finds an alignment within edit distance 2 to the query sequence, except for disallowing gaps close to the end of the query. It can also be tuned to find a fraction of longer gaps at the cost of speed and of more false alignments.

Remarque
Run Unix # bwa [options]Run Web #

VersionMAJ

CaliFlopp

3.02010-08-03DownloadDoc
CaliFloPP is a software that calculates flows of particles between pairs of polygons, when given a so-called individual dispersal function. The individual dispersal function describes the particle dispersion between pairs of points, and CaliFloPP deduces the total flows between pairs of polygons.

Remarque
Run Unix # califlopp -i polygons-filename [-p parameters-filename] [-r result-filename] Run Web #

VersionMAJ

canu

1.32016-10-18DownloadDoc
Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II or Oxford Nanopore MinION).

Remarque Citation: Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. bioRxiv. (2016).
Run Unix # canu Run Web #

VersionMAJ

cap3

3.02014-05-06DownloadDoc
Similar to phrap, CAP3 takes individual sequences and assembles them into sequence.s

Remarque
Run Unix # cap3Run Web #

VersionMAJ

carthagene

1.22010-10-15DownloadDoc
CarthaGène is a genetic/radiated hybrid mapping software. CarthaGene looks for multiple populations maximum likelihood consensus maps using a fast EM algorithm for maximum likelihood estimation and powerful ordering algorithms. CarthaGène:

Remarque
Run Unix # carthageneRun Web #

VersionMAJ

CATCh

v12015-03-11DownloadDoc
CATCh: an ensemble classifier for chimera detection in 16S rRNA sequencing studies

Remarque If you are going to use CATCh, please cite it with the included software (Mothur, WEKA, RDP MultiClassifier 1.1 and DECIPHER): � Mysara M., Saeys Y., Leys N., Raes J., Monsieurs P. 2014. CATCh: an ensemble classifier for chimera detection in 16S rRNA sequencing studies. Under review. � Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. (2009). Introducing mothur: open-source, platform-independent, community-suppo rted software for describing and comparing microbial communities. Applied and environmental microbiology 75:7537�41. � Hall M, National H, Frank E, Holmes G, Pfahringer B, Reutemann P, et al. (2009). The WEKA Data Mining Software: An Update. SIGKDD Explorations 11:10�18. � Wang Q, Garrity GM, Tiedje JM, Cole Naive JR (2007), Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied an d Environmental Microbiology 09/2007; 73(16):5261-7. � ES Wright et al. (2012), DECIPHER, A Search-Based Approach to Chimera Identification for 16S rRNA Sequences. Applied and Environmental Microbiology, doi:10 .1128/AEM.06516-11.
Run Unix # CATCh.run Run Web #

VersionMAJ

ccp4

6.3.02012-11-27DownloadDoc
CCP4 exists to produce and support a world-leading, integrated suite of programs that allows researchers to determine macromolecular structures by X-ray crystallography, and other biophysical techniques. CCP4 aims to develop and support the development of cutting edge approaches to experimental determination and analysis of protein structure, and integrate these approaches into the suite. CCP4 is a community based resource that supports the widest possible researcher community, embracing academic, not for profit, and for profit research. CCP4 aims to play a key role in the education and training of scientists in experimental structural biology. It encourages the wide dissemination of new ideas, techniques and practice.

Remarque
Run Unix # ccp4iRun Web #

VersionMAJ

cd-hit

4.6.12013-08-12DownloadDoc
CD-HIT stands for Cluster Database at High Identity with Tolerance. The program (cd-hit) takes a fasta format sequence database as input and produces a set of 'non-redundant' (nr) representative sequences as output.

Remarque Exemple d'utilisation : cd-hit -n 5 -i /db/fasta/nr90/nr90.fsa -o nr80 -M 2048 -c 0.8 -u clstr.lastweek
Run Unix # cd-hit [Options]Run Web #

VersionMAJ

cd-hit-454

-2013-08-05DownloadDoc
The 454 pyrosequencing reads contains artificially duplicates, which might lead to misleading conclusions. cdhit-454 is a fast program to identify exact and nearly identical duplicates, the reads begin at the same position but may vary in length or bear mismatches. cdhit-454 can process a dataset in ~10 minutes. it also provides a consensus sequence for each group of duplicates.

Remarque
Run Unix # cd-hit-454Run Web # 4.6.1

VersionMAJ

Celera Assembler (wgs)

5.42009-10-29DownloadDoc
Celera Assembler is scientific software for DNA research. It can reconstruct long sequences of genomic DNA from the fragmentary data produced by whole-genome shotgun sequencing. The Celera Assembler is mature, efficient, open-source software written mostly in C for unix operating systems.

Remarque This whole-genome shotgun (WGS) assembler software suite, also known as Celera Assembler, implements sophisticated algorithms for the reconstruction of genomic DNA sequence from data produced by a WGS sequencing experiment.
Run Unix # Run Web #

VersionMAJ

censor

4.2.102008-07-02DownloadDoc
CENSOR is a software tool which screens query sequences against a reference collection of repeats and "censors" (masks) homologous portions with masking symbols, as well as generating a report classifying all found repeats.

Remarque
Run Unix # censorRun Web #

VersionMAJ

cgview

-2011-05-27DownloadDoc
CGView is a Java package for generating high quality, zoomable maps of circular genomes. Its primary purpose is to serve as a component of sequence annotation pipelines, as a means of generating visual output suitable for the web. Feature information and rendering options are supplied to the program using an XML file, a tab delimited file, or an NCBI ptt file. CGView converts the input into a graphical map (PNG, JPG, or Scalable Vector Graphics format), complete with labels, a title, legends, and footnotes. In addition to the default full view map, the program can generate a series of hyperlinked maps showing expanded views. The linked maps can be explored using any web browser, allowing rapid genome browsing, and facilitating data sharing. The feature labels in maps can be hyperlinked to external resources, allowing CGView maps to be integrated with existing web site content or databases. For examples of the various output types, see the CGView gallery.

Remarque
Run Unix # cgviewRun Web #

VersionMAJ

circos

0.642013-01-20DownloadDoc
Circos is a software package for visualizing data and information. It visualizes data in a circular layout — this makes Circos ideal for exploring relationships between objects or positions. There are other reasons why a circular layout is advantageous, not the least being the fact that it is attractive.

Remarque
Run Unix # circosRun Web #

VersionMAJ

class2g

1.02006-04-04DownloadDoc
Class2G permet de classer les gènes en deux groupes en utilisant un modèle de mélange. Les principales caractéristiques sont d'une part l'affectation des gènes est associée à une probabilité, et d'autre part l'analyse d'un macroarray est indépendante d'une référence. Class2G est intégrée au système BASE (BioArray Software Environment) par l'intermédiaire d'un plug-in perl, et est développé dans l'environnement statistique R. BASE permet d'accéder à une interface web conviviale, d'utiliser un seul environnement pour le stockage et l'analyse de données. Class2G a été utilisé pour la détection de gènes présents et absents de E. faecalis dans le cadre de l'analyse d'une trentaine de macroarray (P.Serror - INRA Jouy-en-Josas - UBLO).

Remarque
Run Unix # Run Web # http://genome.jouy.inra.fr/basejouy

VersionMAJ

CLC Sequence Viewer

6.42010-09-12DownloadDoc
A Sequence Viewer for basic bioinformatics. CLC Sequence Viewer creates a software environment enabling users to make a large number of bioinformatics analyses, combined with smooth data management, and excellent graphical viewing and output options.

Remarque
Run Unix # clcseqview6Run Web #

VersionMAJ

clustal-omega

1.1.02012-07-17DownloadDoc
Clustal Omega is the latest addition to the Clustal family. It offers a significant increase in scalability over previous versions, allowing hundreds of thousands of sequences to be aligned in only a few hours. It will also make use of multiple processors, where present. In addition, the quality of alignments is superior to previous versions, as measured by a range of popular benchmarks.

Remarque Citing Clustal: Sievers F, Wilm A, Dineen DG, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins DG (2011). Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7.
Run Unix # clustalo --help Run Web #

VersionMAJ

clustalx

2.12013-12-29DownloadDoc
Multiple sequence alignment program. It provides an integrated environment for performing multiple sequence and profile alignments and analysing the results.

Remarque
Run Unix # clustalx (en mode graphique) ou clustalw2 (en mode ligne de commande)Run Web # http://www.ebi.ac.uk/Tools/msa/clustalw2/

VersionMAJ

cluster-3.0

3.02013-05-24DownloadDoc
The open source clustering software available here implement the most commonly used clustering methods for gene expression data analysis. The clustering methods can be used in several ways.Cluster 3.0 provides a Graphical User Interface to access to the clustering routines. It is available for Windows, Mac OS X, and Linux/Unix. Python users can access the clustering routines by using Pycluster, which is an extension module to Python. People that want to make use of the clustering algorithms in their own C, C++, or Fortran programs can download the source code of the C Clustering Library.

Remarque
Run Unix # clusterRun Web #

VersionMAJ

CNVnator

0.32015-02-13DownloadDoc
CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing.

Remarque
Run Unix # cnvnatorRun Web #

VersionMAJ

COLONY

2.0.6.32017-05-02DownloadDoc
COLONY is a Fortran program written by Jinliang Wang. It implements a maximum likelihood method to assign sibship and parentage jointly, using individual multilocus genotypes at a number of codominant or dominant marker loci.

Remarque
Run Unix # Run Web #

VersionMAJ

concaterpillar

1.52013-01-15DownloadDoc
Concaterpillar is a hierarchical likelihood-ratio test for phylogenetic congruence.

Remarque If you use Concaterpillar for a publication please cite: Leigh JW, Susko E, Baumgartner M, Roger AJ. Testing congruence in phylogenomic analysis. Syst Biol. 2008 Feb; 57(1): 104-15.
Run Unix # concaterpillar.pyRun Web #

VersionMAJ

consed

22.02014-04-30DownloadDoc
Consed/Autofinish is a tool for viewing, editing, and finishing sequence assemblies created with phrap. Finishing capabilities include allowing the user to pick primers and templates, suggesting additional sequencing reactions to perform, and facilitating checking the accuracy of the assembly using digest and forward/reverse pair information.

Remarque Voir aussi autofinishs (http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=11282977s)
Run Unix # consedRun Web #

VersionMAJ

consel

0.22012-08-23DownloadDoc
CONSEL is a program package consists of small programs written in C language. It calculates the probability value (i.e., p-value) to assess the confidence in the selection problem. Although CONSEL is applicable to any selection problem, it is mainly designed for the phylogenetic tree selection. CONSEL does not estimate the phylogenetic tree by itself, but CONSEL does read the output of the other phylogenetic packages, such as Molphy, PAML, PAUP*, TREE-PUZZLE, and PhyML. CONSEL calculates the p-value using several testing procedures; the bootstrap probability, the Kishino-Hasegawa test, the Shimodaira-Hasegawa test, and the weighted Shimodaira-Hasegawa test. In addition to these conventional tests, CONSEL calculates the p-value based on the approximately unbiased test using the multi-scale bootstrap technique. This newly developed method gives less biased results than the conventional methods.

Remarque
Run Unix # Run Web #

VersionMAJ

coot

0.72014-05-20DownloadDoc
Coot is for macromolecular model building, model completion and validation, particularly suitable for protein modelling using X-ray data. Coot displays maps and models and allows model manipulations such as idealization, real space refinement, manual rotation/translation, rigid-body fitting, ligand search, solvation, mutations, rotamers, Ramachandran plots, skeletonization, non-crystallographic symmetry and more.

Remarque Citing Coot and Friends If have found this software to be useful, you are requested (if appropriate) to cite: "Features and Development of Coot" P Emsley, B Lohkamp, W Scott, and K Cowtan Acta Cryst. (2010). D66, 486-501 Acta Crystallographica Section D-Biological Crystallography 66: 486-501
Run Unix # cootRun Web #

VersionMAJ

CopyRighter

0.462015-12-21DownloadDoc
Parses microbial profiles and, because gene copy number (GCN) estimates are pre-computed for all taxa in the reference taxonomy, rapidly corrects GCN bias. The CopyRighter bioinformatic tools permits rapid correction of GCN in microbial surveys, resulting in improved estimates of microbial abundance, alpha and beta diversity.

Remarque
Run Unix # copyrighter -i [optional arguments]Run Web #

VersionMAJ

corona

4.2.22009-09-10DownloadDoc
The SOLiD System Analysis Pipeline Tool (Corona Lite) is an off-instrument SOLiD data analysis software package. It supports functionality for mapping color space reads to large or small genomes, pairing for mate-pair runs, SNP calling and generating consensus sequences.

Remarque
Run Unix # Run Web #

VersionMAJ

count_base

cc 30/06/20042004-06-16DownloadDoc
Programme pour compter les ATGC ds une sequence

Remarque
Run Unix # count_base.shRun Web #

VersionMAJ

count_codon

none2004-08-01DownloadDoc

Remarque
Run Unix # count_codon.plRun Web #

VersionMAJ

cross_match

0.9903292002-11-06DownloadDoc
Cross_Match uses the same algorithm as Swat but also allows the comparison of a pair of sequences to be constrained to bands of the Smith-Waterman matrix that surround one or more matching words in the sequences. This substantially increases speed for large-scale nucleotide sequence comparisons without compromising sensitivity.

Remarque
Run Unix # cross_matchRun Web #

VersionMAJ

cufflinks

2.2.02014-05-06DownloadDoc
Cufflinks assembles transcripts and estimates their abundances in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.

Remarque
Run Unix # cufflinks [options]* Run Web #

VersionMAJ

cutadapt

1.7.12015-03-11DownloadDoc
cutadapt is used to remove adapter sequences from high-throughput sequencing data. This is usually necessary when the read length of the sequencing machine is longer than the molecule that is sequenced, for example when sequencing microRNAs.

Remarque
Run Unix # cutadapt [options] []Run Web #

VersionMAJ

cytoscape

2.7.02010-05-07DownloadDoc
Cytoscape is an open source bioinformatics software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data. Although Cytoscape was originally designed for biological research, now it is a general platform for complex network analysis and visualization. Cytoscape core distribution provides a basic set of features for data integration and visualization.

Remarque
Run Unix # cytoscapeRun Web #

VersionMAJ

dadi

1.72016-07-18DownloadDoc
Diffusion Approximation for Demographic Inference ∂a∂i implements methods for demographic history and selection inference from genetic data, based on diffusion approximations to the allele frequency spectrum. One of ∂a∂i's main benefits is speed: fitting a two-population model typically takes around 10 minutes, and run time is independent of the number of SNPs in your data set. ∂a∂i is also flexible, handling up to three simultaneous populations, with arbitrary timecourses for population size and migration, plus the possibility of admixture and population-specific selection.

Remarque If you use ∂a∂i in your research, please cite RN Gutenkunst, RD Hernandez, SH Williamson, CD Bustamante "Inferring the joint demographic history of multiple populations from multidimensional SNP data" PLoS Genetics 5:e1000695 (2009).
Run Unix # Run Web #

VersionMAJ

debarcer

0.3.12017-03-21DownloadDoc
Debarcer (De-Barcoding and Error Correction) is a package for working with next-gen sequencing data that contains molecular barcodes. As it stands, it supports targeted sequencing libraries generated by SimSenSeq, a method of creating multiplexed barcoded sequencing libraries using PCR.

Remarque
Run Unix # runDebarcer.sh -uRun Web #

VersionMAJ

delly

0.6.32015-02-25DownloadDoc
DELLY is an integrated structural variant prediction method that can detect deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends and split-reads to sensitively and accurately delineate genomic rearrangements throughout the genome.

Remarque Citation Tobias Rausch, Thomas Zichner, Andreas Schlattl, Adrian M. Stuetz, Vladimir Benes, Jan O. Korbel. Delly: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 2012 28: i333-i339.
Run Unix # Usage: delly [OPTIONS] ... Run Web #

VersionMAJ

DESeq2

1.6.3DownloadDoc
SARTools is a R package dedicated to the differential analysis of RNA-seq data. It provides tools to generate descriptive and diagnostic graphs, to run the differential analysis with one of the well known DESeq2 or edgeR packages and to export the results into easily readable tab-delimited files. It also facilitates the generation of a HTML report which displays all the figures produced, explains the statistical methods and gives the results of the differential analysis.

Remarque
Run Unix # Run Web #

VersionMAJ

dialign

2.2.12005-12-06DownloadDoc
sDIALIGN is a software program for multiple alignment developed by Burkhard Morgenstern et al. While standard alignment methods rely on comparing single residues and imposing gap penalties, DIALIGN constructs pairwise and multiple alignments by comparing whole segments of the sequences. No gap penalty is used. This approach is especially efficient where sequences are not globally related but share only local similarities, as is the case with genomic DNA and with many protein families.

Remarque
Run Unix # dialignRun Web #

VersionMAJ

diamond

0.7.92015-12-02DownloadDoc
DIAMOND is a new high-throughput program for aligning a file of short reads against a protein reference database such as NR, at 20,000 times the speed of BLASTX, with high sensitivity. DIAMOND is a new alignment tool for aligning short DNA sequencing reads to a protein reference database such as NCBI-NR. On Illumina reads of length 100-150bp, in fast mode, DIAMOND is about 20,000 times faster than BLASTX, while reporting about 80-90% of all matches that BLASTX finds, with an e-value of at most 1e-5. In sensitive mode, DIAMOND ist about 2,500 times faster than BLASTX, finding more than 94% of all matches.

Remarque
Run Unix # diamond COMMAND [OPTIONS]Run Web #

VersionMAJ

DisplayMUM

1.052005-06-30DownloadDoc

Remarque
Run Unix # displaymumsRun Web #

VersionMAJ

Dizzy

1.11.42007-09-04DownloadDoc
Simulation de systèmes stochastiques.

Remarque An article describing Dizzy has been published, Ramsey S., Orrell D. and Bolouri H. Dizzy: stochastic simulation of large-scAn article describing Dizzy has been published, Ramsey S., Orrell D. and Bolouri H. Dizzy: stochastic simulation of large-scale genetic regulatory networks. J. Bioinf. Comp. Biol. 3(2) 415-436, 2005.ale genetic regulatory networks. J. Bioinf. Comp. Biol. 3(2) 415-436, 2005.
Run Unix # DizzyRun Web #

VersionMAJ

DOMIRE

-2014-01-20DownloadDoc
(DOMain Identification from REcurrence) is a server using VAST (Vector Alignment Search Tool, protein 3D structure comparison) to define the domain boundaries in proteins from their 3 D structures (Tai et al, 2010). It provides also a list of structural neighbours.

Remarque
Run Unix # Run Web # http://genome.jouy.inra.fr/domire

VersionMAJ

dotur

1.532007-08-30DownloadDoc
DOTUR est un programme qui prend en entrée une matrice décrivant les distances génétiques entre des séquences d'ADN pour les assigner à des unités taxonomiques opérationelles (OTUs). DOTUR utilise la composition des OTUs pour calculer des courbes de raréfaction et de collection pour évaluer l'intensité, la richesse et la diversité de l'échantillon.

Remarque
Run Unix # doturRun Web #

VersionMAJ

dsrc2

2.02016-10-17DownloadDoc
DNA Sequence Reads Compression is an application designed for compression of data files containing reads from DNA sequencing in FASTQ format. The amount of such files can be huge, e.g., a few (or tens) of gigabytes, so a need for a robust data compression tool is clear. Usually universal compression programs like gzip or bzip2 are used for this purpose, but it is obvious that a specialized tool can work better.

Remarque
Run Unix # usage: dsrc [options] Run Web #

VersionMAJ

dssp

2.0.3DownloadDoc
DSSP permet de définir les structures secondaires dans les protéines à partir des fichiers PDB

Remarque
Run Unix # dsspRun Web #

VersionMAJ

dwgsim

0.1.102013-08-02DownloadDoc
Whole genome simulation can be performed with dwgsim. dwgsim is based off of wgsim found in SAMtools written by Heng Li. It was modified to handle ABI SOLiD data, as well as various assumptions about aligners and positions of indels. The documentation below is for the latest dwgsim (not DNAA) release.

Remarque
Run Unix # dwgsim [options] Run Web #

VersionMAJ

EDGE-pro

1.3.12013-07-02DownloadDoc
EDGE-pro, Estimated Degree of Gene Expression in PROkaryots is an efficient software system to estimate gene expression levels in prokaryotic genomes from RNA-seq data. EDGE-pro uses Bowtie2 for alignment and then estimates expression directly from the alignment results. EDGE-pro includes routines to assign reads aligning to overlapping gene regions accurately. 15% or more of bacterial genes overlap other genes, making this a significant problem for bacterial RNA-seq, one that is generally ignored by programs designed for eukaryotic RNA-seq experiments.

Remarque Please reference our paper: T. Magoc, D. Wood, and S.L. Salzberg. EDGE-pro: Estimated Degree of Gene Expression in Prokaryotic Genomes. Evolutionary Bioinformatics vol.9, pp.127-136, 2013.
Run Unix # edge.pl <-g genome> <-p ptt> <-r rnt> <-u reads>Run Web #

VersionMAJ

edgeR

3.8.6DownloadDoc
SARTools is a R package dedicated to the differential analysis of RNA-seq data. It provides tools to generate descriptive and diagnostic graphs, to run the differential analysis with one of the well known DESeq2 or edgeR packages and to export the results into easily readable tab-delimited files. It also facilitates the generation of a HTML report which displays all the figures produced, explains the statistical methods and gives the results of the differential analysis.

Remarque
Run Unix # Run Web #

VersionMAJ

ELPH

1.0.12012-10-01DownloadDoc
ELPH is a general-purpose Gibbs sampler for finding motifs in a set of DNA or protein sequences. The program takes as input a set containing anywhere from a few dozen to thousands of sequences, and searches through them for the most common motif, assuming that each sequence contains one copy of the motif. We have used ELPH to find patterns such as ribosome binding sites (RBSs) and exon splicing enhancers (ESEs).

Remarque
Run Unix # elph [options] OR elph [-t ] Run Web #

VersionMAJ

eLSA

81a2ee02017-02-01DownloadDoc
The Extended Local Similarity Analysis (ELSA) tools subsequently F-transform and normalize the raw data (matrices of time series) and then calculate the Local Similarity (LS) Scores and/or Local Trend Scores. The tools then assess the statistical significance (P-values) of these correlation statistics using either permutation test or theoretical p-value approximation and filter out insignificant results. Finally, the tools construct a partially directed association network from significant associations.

Remarque
Run Unix # eLSA_env Run Web #

VersionMAJ

emboss

6.6.0.02013-08-19DownloadDoc
Within EMBOSS you will find around 100 programs (applications). These are just some of the areas covered (Sequence alignment, Rapid database searching with sequence patterns,Protein motif identification, including domain analysis, Nucleotide sequence pattern analysis, for example to identify CpG islands or repeats, Codon usage analysis for small genomes, Rapid identification of sequence patterns in large scale sequence sets, Presentation tools for publication...)

Remarque
Run Unix # Run Web # http://genome.jouy.inra.fr/cgi-bin/emboss-explorer/emboss.pl

VersionMAJ

ESAP

1.0DownloadDoc
Programme de prediction de la conformation de boucles dans les proteines.

Remarque
Run Unix # Run Web #

VersionMAJ

ESPript

2.32009-01-11DownloadDoc
ESPript, Easy Sequencing in Postscript, is a utility to generate a pretty PostScript output from aligned sequences.

Remarque
Run Unix # ESPriptRun Web #

VersionMAJ

esprit

2009-07-07DownloadDoc
ESPRIT is a pipeline for estimating species richness using large collections of 16S rRNA pyrosequences.

Remarque
Run Unix # esprit_pcRun Web #

VersionMAJ

fasta

3.62014-02-21DownloadDoc
A set of sequence comparison tools (fasta36, ggsearch...) used for alignment and database searching.For example, fasta compares a protein sequence to another protein sequence or to a protein database, or a DNA sequence to another DNA sequence or a DNA library.

Remarque
Run Unix # fasta36Run Web #

VersionMAJ

fastPHASE

1.4.02015-03-10DownloadDoc
fastPHASE: software for haplotype reconstruction, and estimating missing genotypes from population data The program fastPHASE implements methods described in Scheet, P and Stephens, M (2006). A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet fastPHASE can handle larger data-sets than PHASE (e.g., hundreds of thousands of markers in thousands of individuals), but does not provide estimates of recombination rates. Our experiments suggest that haplotype estimates are slightly less accurate than from PHASE, but missing genotype estimates appear to be similar or even slightly better than PHASE.

Remarque
Run Unix # fastPHASE [options] Run Web #

VersionMAJ

FastQC

0.10.0 2012-03-05DownloadDoc
FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.

Remarque
Run Unix # fastqc ou fastqc seqfile1 seqfile2 .. seqfileNRun Web #

VersionMAJ

fastqp

0.1.9.12017-02-27DownloadDoc
Simple FASTQ, SAM and BAM read quality assessment and plotting using Python.

Remarque
Run Unix # fastqp [-h]Run Web #

VersionMAJ

Fastq_Screen

0.4.42014-07-09DownloadDoc
Fastq screen is a simple application which allows you to search a large sequence dataset against a panel of different databases to build up a picture of where the sequences in your data originate. It was built as a QC check for sequencing pipelines but may also have uses in metagenomics studies where mixed samples are expected. Although the program wasn't built with any particular technology in mind it is probably only really suitable for processing short reads due to the use of bowtie/bowtie2 as the searching application. The program generates both text and graphical output to tell you what proportion of your library was able to map, either uniquely or in more than one location, against each of the databases in your search set.

Remarque
Run Unix # fastq_screen [OPTION]... [FastQ FILE]...Run Web #

VersionMAJ

FASTX-Toolkit

0.0.132013-04-30DownloadDoc
The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing. Next-Generation sequencing machines usually produce FASTA or FASTQ files, containing multiple short-reads sequences (possibly with quality information).

Remarque
Run Unix # Run Web #

VersionMAJ

FigTree

1.4.02013-11-15DownloadDoc
FigTree is designed as a graphical viewer of phylogenetic trees and as a program for producing publication-ready figures. As with most of my programs, it was written for my own needs so may not be as polished and feature-complete as a commercial program. In particular it is designed to display summarized and annotated trees produced by BEAST.

Remarque
Run Unix # figtreeRun Web #

VersionMAJ

Filter pileup

1.0.2DownloadDoc
Allows one to find sequence variants and/or sites covered by a specified number of reads with bases above a set quality threshold. The tool works on six and ten column pileup formats produced with samtools pileup command. However, it also allows you to specify columns in the input file manually.

Remarque
Run Unix # Run Web #

VersionMAJ

FinchTV

1.3.12008-10-14DownloadDoc
FinchTV (Finch Trace Viewer), a cross-platform graphical viewer for chromatogram files.s

Remarque
Run Unix # finchtvRun Web #

VersionMAJ

findtarget

none2004-05-15DownloadDoc
Findtarget est un outil de comparaison génomique qui permet de cibler des gènes d'intérêts chez un micro-organisme dont le génome est séquencé. Il utilise des données issues de blast.

Remarque
Run Unix # Run Web # http://migale.jouy.inra.fr/outils/findtarget.html

VersionMAJ

FLASH

1.2.112014-11-13DownloadDoc
FLASH, Fast Length Adjustment of SHort reads, is a very accurate fast tool to merge paired-end reads from fragments that are shorter than twice the length of reads. The extended length of reads has a significant positive impact on improvement of genome assemblies.

Remarque
Run Unix # flash [OPTIONS] MATES_1.FASTQ MATES_2.FASTQ Run `flash --help | less' for more information.Run Web #

VersionMAJ

flux-simulator

1.2.12013-07-15DownloadDoc
The Flux Simulator aims at modeling RNA-Seq experiments in silico: sequencing reads are produced from a reference genome according annotated transcripts. The simulation pipeline models different steps as modules, each with a minimal set of parameters that can be estimated by experimental parameters. The first step is-in fact-a transcriptome simulator. Subsequently, common sources of systematic bias in the abundance and distribution of produced reads are simulated by in silico library preparation and sequencing.

Remarque
Run Unix # flux-simulator --helpRun Web #

VersionMAJ

fmtseq

1.2.22004-01-21DownloadDoc
Conversion de formats de sequence. Réimplémentation et extension du programme Readseq (conversion depuis et vers le format Clustalw et indication du format d'entrée.

Remarque Fait partie du paquetage seqio-1.2.2
Run Unix # fmtseqRun Web #

VersionMAJ

fpc

8.92007-11-13DownloadDoc
FPC (fingerprinted contigs) is an interactive program for building contigs from fingerprinted clones, where the fingerprint for a clone is a set of restriction fragments.

Remarque
Run Unix # fpcRun Web #

VersionMAJ

freebayes

v1.1.0-1-gf15e66e2017-02-16DownloadDoc
FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.

Remarque Citing freebayes: Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907 [q-bio.GN] 2012
Run Unix # freebayes -f [REFERENCE] [OPTIONS] [BAM FILES] >[OUTPUT] Run Web #

VersionMAJ

FROGS

0.0.6DownloadDoc
Find Rapidly OTUs with Galaxy Solution: FROGS is a galaxy/CLI workflow designed to produce an OTU count matrix from high depth sequencing amplicon data. This workflow is focused on: - User-friendliness with the integration in galaxy and lots of rich graphic outputs - Accuracy with a clustering without global similarity threshold, the management of multi-affiliations and management of separated PCRs in the chimera removal step - Speed with fast algorithms and an easy to use parallelisation - Scalability with algorithms designed to support the data growth

Remarque
Run Unix # Run Web #

VersionMAJ

frost

0.4.32002-05-01DownloadDoc
Outils de reconnaissance de repliement

Remarque
Run Unix # Run Web # http://genome.jouy.inra.fr/frost

VersionMAJ

FSA-BLAST

1.032005-12-16DownloadDoc
FSA-BLAST is a new version of the popular BLAST (Basic Local Alignment Search Tool) bioinformatics tool, used to search genomic databases containing either protein or nucleotide sequences. FSA stands for Faster Search Algorithm; FSA-BLAST is twice as fast as NCBI-BLAST with no loss in accuracy.

Remarque
Run Unix # formatdb, cluster, blast, readdb, ssearchRun Web #

VersionMAJ

GALF_P

-2010-03-18DownloadDoc
GALF-P is a novel framework for TFBS identification (motif discovery) in DNA sequences. It consists of Genetic Algorithm with Local Filtering (GALF) and the post-processing procedure based on adaptive adding and removing. GALF-P achieves both effectiveness and efficiency, and provides reliable performance over the other state-of-art GA based approaches. The post-processing procedure is designed for zero or more TFBSs in each sequence.

Remarque
Run Unix # GALF_P.oRun Web #

VersionMAJ

GapCloser

1.122015-07-13DownloadDoc
GapCloser for SOAPdenovo The GapCloser is designed to close the gaps emerging during the scaffolding process by SOAPdenovo or other assembler, using the abundant pair relationships of short reads. GapCloser aims for large plant and animal genomes, although it also works well on bacteria and fungi genomes.

Remarque
Run Unix # GapCloser [options]Run Web #

VersionMAJ

GASSST

1.282013-08-25DownloadDoc
GASSST : Global Alignment Short Sequence Search Tool * GASSST finds global alignments of short DNA sequences against large DNA banks. * GASSST strong point is its ability to perform fast gapped alignments. * It works well for both short and longer reads. It currently has been tested for reads up to 500bp. * The software is freely available for download under the CECILL version 2 License.

Remarque http://bioinformatics.oxfordjournals.org/content/26/20/2534.abstract?keytype=ref&ijkey=f5zH80QsuCqixRH
Run Unix # Gassst -d -i -o -p Run Web #

VersionMAJ

gatk

3.52016-01-25DownloadDoc
The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyze high-throughput sequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.

Remarque
Run Unix # java -jar /usr/local/genome/gatk/GenomeAnalysisTK.jar -hRun Web #

VersionMAJ

Gblocks

0.91b2006-07-19DownloadDoc
Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis Gblocks eliminates poorly aligned positions and divergent regions of an alignment of DNA or protein sequences.

Remarque
Run Unix # GblocksRun Web #

VersionMAJ

GEM

20121106-0221242013-07-25DownloadDoc
The GEM library (Also home to: The GEM mapper, The GEM RNA mapper, The GEM mappability, and others). Next-generation sequencing platforms (Illumina/Solexa, ABI/SOLiD, etc.) call for powerful and very optimized tools to index/analyze huge genomes. The GEM library strives to be a true "next-generation" tool for handling any kind of sequence data, offering state-of-the-art algorithms and data structures specifically tailored to this demanding task. At the moment, efficient indexing and searching algorithms based on the Burrows-Wheeler transform (BWT) have been implemented.

Remarque
Run Unix # Run Web #

VersionMAJ

geneclust

1-0.02007-03-27DownloadDoc
GeneClust is a piece of computer software which can be used as a tool for exploratory analysis of gene expression microarray data. The development of GeneClust was motivated by surging interest to search for interpretable biological structure in gene expression microarray data.

Remarque
Run Unix # geneclustRun Web #

VersionMAJ

genehunter

2.1_r22002-09-27DownloadDoc
Multipoint analysis of pedigree data including: non-parametric linkage analysis, LOD-score computation, information-content mapping, haplotype reconstruction

Remarque
Run Unix # gh Run Web #

VersionMAJ

GenePRIMP

0.32013-04-19DownloadDoc
Identification of anomalous gene calls The GenePRIMP pipeline consists of a series of computational units that identify erroneous gene calls and missed genes, and then correct a subset of the identified anomalous features. The data input to GenePRIMP needs to be a file of gene calls in GenBank or EMBL format. As its output, GenePRIMP generates reports of identified anomalies, plus a corrected EMBL file.

Remarque
Run Unix # geneprimpRun Web #

VersionMAJ

genewise

2.2.02008-12-10DownloadDoc
Genewise permet de comparer une protéine à une banque d'ADN et en prédire sa structure, tout en se déchargeant des problèmes liés au sequencage et d'introns.

Remarque
Run Unix # genewiseRun Web #

VersionMAJ

GenomeThreader

1.6.02013-03-12DownloadDoc
GenomeThreader is a software tool to compute gene structure predictions. The gene structure predictions are calculated using a similarity-based approach where additional cDNA/EST and/or protein sequences are used to predict gene structures via spliced alignments. GenomeThreader was motivated by disabling limitations in GeneSeqer, a popular gene prediction program which is widely used for plant genome annotation.

Remarque
Run Unix # gth [option ...] -genomic file [...] -cdna file [...] -protein file [...] Run Web #

VersionMAJ

genscan

1.02007-10-24DownloadDoc

Remarque
Run Unix # genscanRun Web #

VersionMAJ

gimsan

201008302011-01-10DownloadDoc
GIMSAN (GIbbsMarkov with Significance ANalysis): a novel tool for de novo motif finding. GIMSAN combines GibbsMarkov, our variant of the Gibbs Sampler, described here for the first time, with our recently introduced significance analysis.

Remarque please cite: Patrick Ng, Uri Keich. GIMSAN: A Gibbs motif finder with significance analysis. Bioinformatics, 24 (19): 2256-2257, 2008.
Run Unix # gimsan_submit_job.pl Run Web #

VersionMAJ

glimmer

glimmer-3.022008-12-12DownloadDoc
Glimmer (Gene Locator and Interpolated Markov ModelER) prédit la position des gènes dans une séquence d'ADN (bactérie, archae, virus) en s'appuyant sur des modèles de Markov.

Remarque
Run Unix # glimmer3Run Web #

VersionMAJ

GMAP/GSNAP

2013-10-252013-10-28DownloadDoc
GMAP (genomic mapping and alignment program for mRNA and EST sequences): gmap, a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models. GSNAP (Genomic Short-read Nucleotide Alignment Program): GSNAP implements computational methods for fast detection of complex variants and splicing in short reads, based on a successively constrained search process of merging and filtering position lists from a genomic index. It can align both single- and paired-end reads as short as 14 nt and of arbitrarily long length. It can detect short- and long-distance splicing, including interchromosomal splicing, in individual reads, using probabilistic models or a database of known splice sites.

Remarque
Run Unix # gmap [OPTIONS...] Run Web #

VersionMAJ

gmorse

1.02009-08-05DownloadDoc
G-Mo.R-Se is a method aimed at using RNA-Seq short reads to build de novo gene models. First, candidate exons are built directly from the positions of the reads mapped on the genome (without any ab initio assembly of the reads), and all the possible splice junctions between those exons are tested against unmapped reads : the testing of junctions is directed by the information available in the RNA-Seq dataset rather than a priori knowledge about the genome. Exons can thus be chained into stranded gene models.

Remarque
Run Unix # gmorse -hRun Web #

VersionMAJ

goby

1.4.12010-04-15DownloadDoc
Goby is a next-gen data management framework designed to facilitate the implementation of efficient next-gen data analysis pipelines. Goby provides compressed file formats that are time and space efficient. It also provides a few utilities that support the most common secondary data analyses

Remarque
Run Unix # gobyRun Web #

VersionMAJ

GORIV

1.02008-01-21DownloadDoc
Méthode de prédiction de la structure secondaire des protéines à partir de la séquence en acides aminés.

Remarque
Run Unix # gorIVRun Web #

VersionMAJ

grape

1.12008-10-21DownloadDoc
GRAPe is a tool for computing genome re-alignment using marginalized posterior decoding.sTo answer this question, GRAPe uses the Marginalized Posterior Decoding (MPD) algorithm which uses the posterior distribution of alignments to optimize the correct assignment of homology of individual nucleotides, instead of finding a single most probable alignment. Simulations show that the MPD algorithm has higher sensitivity and specificity than the Viterbi and Needleman-Wunsch algorithms.

Remarque
Run Unix # grapeRun Web #

VersionMAJ

grepseq

1.2.22004-01-21DownloadDoc
The `grepseq' program takes a keyword which can contain ambiguous characters and character classes (also called a fixed-width motif) and then searches files and databases for exact or approximate matches to that keyword. The program produces one of two kinds of output, either a list of the matching sequences with the places where the keyword matched, or the complete entries of sequences containing matches, where each entry is annotated with the places where the matches occur.

Remarque Fait partie de seqio
Run Unix # grepseqRun Web #

VersionMAJ

gril

1.0.0DownloadDoc
GRIL is a tool to detect the locations of genomic rearrangements in a set of sequences.

Remarque
Run Unix # grilRun Web #

VersionMAJ

HH-suite

2.0.162013-07-24DownloadDoc
The HH-suite is an open-source software package for highly sensitive sequence searching and sequence alignment. Its two most important programs are HHsearch and HHblits. Both are based on the pairwise comparison of pro file hidden Markov models (HMMs).

Remarque
Run Unix # Run Web #

VersionMAJ

HISAT2

2.0.42016-09-07DownloadDoc
HISAT is a fast and sensitive spliced alignment program for mapping RNA-seq reads. In addition to one global FM index that represents a whole genome, HISAT uses a large set of small FM indexes that collectively cover the whole genome (each index represents a genomic region of ~64,000 bp and ~48,000 indexes are needed to cover the human genome). These small indexes (called local indexes) combined with several alignment strategies enable effective alignment of RNA-seq reads, in particular, reads spanning multiple exons. The memory footprint of HISAT is relatively low (~4.3GB for the human genome). We have developed HISAT based on the Bowtie2 implementation to handle most of the operations on the FM index.

Remarque
Run Unix # hisat2 [options]* -x {-1 -2 | -U | --sra-acc } [-S ]Run Web #

VersionMAJ

hmmer

3.12013-08-23DownloadDoc
HMMER: profile HMMs for protein sequence analysis Profile hidden Markov models (profile HMMs) can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus.

Remarque
Run Unix # Run Web #

VersionMAJ

hmmtop

2.12004-09-25DownloadDoc
Prediction of transmembrane helices and topology for transmembrane proteins using hidden Markov models

Remarque
Run Unix # hmmtopRun Web #

VersionMAJ

html4blast

1.6a2003-05-15DownloadDoc
Html4blast est un logiciel d'analyse et de présentation des résultats de Blast.

Remarque Utilsie par findtarget
Run Unix # html4blast [options] Run Web # http://bioweb.pasteur.fr/seqanal/interfaces/html4blast.html

VersionMAJ

i-ADHoRe

3.0.012013-10-30DownloadDoc
This novel version of i-ADHoRe is designed to detect genomic homology in extremely large-scale data sets. Along with several under-the hood-improvements, resulting in a 30 fold reduction in runtime over previous versions, the implementation of multithreading and MPI now enables i-ADHoRe to take advantage of a parallel computing platform. As the scale of the data sets increased, the need for a new alignment algorithm able to cope with dozens of genomic segments became apparent. Therefore a new greedy graph based alignment algorithm has been implemented (described in Fostier et al., 2011), allowing analysis of even the largest data sets currently available.

Remarque
Run Unix # i-adhoreRun Web #

VersionMAJ

ICORN

0.972010-11-03DownloadDoc
iCORN (iterative correction of reference nucleotides) can correct genome sequences with short reads. Reads are mapped iteratively against the genome sequences, so far by SSAHA. Discrepancies between the multiple alignments of the mapping reads and reference are corrected, if by the correction the amount of perfect mapping reads doesn't decrease.

Remarque
Run Unix # cf. http://icorn.sourceforge.net/example.htmlRun Web #

VersionMAJ

idba

1.1.12015-12-02DownloadDoc
IDBA is a practical iterative De Bruijn Graph De Novo Assembler for sequence assembly in bioinfomatics. Most assemblers based on de Bruijn graph build a de Bruijn graph with a specific k to perform the assembling task. For all of them, it is very crucial to find a specific value of k. If k is too large, there will be a lot of gap problems in the graph. If k is too small, there will a lot of branch problems. IDBA uses not only one specific k but a range of k values to build the iterative de Bruijn graph. It can keep all the information in graphs with different k values. So, it will perform better than other assemblers.

Remarque If you use our assembler in your research, please cite our papers. Peng, Y., et al. (2010) IDBA- A Practical Iterative de Bruijn Graph De Novo Assembler. RECOMB. Lisbon.
Run Unix # idba_ud -r read.fa -o output_dirRun Web #

VersionMAJ

igv

2.3.672016-01-11DownloadDoc
The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.

Remarque To cite your use of IGV in your publication: James T. Robinson, Helga Thorvaldsdóttir, Wendy Winckler, Mitchell Guttman, Eric S. Lander, Gad Getz, Jill P. Mesirov. Integrative Genomics Viewer. Nature Biotechnology 29, 24–26 (2011)
Run Unix # igvRun Web #

VersionMAJ

Illumina CASAVA-1.8 FASTQ Filter

0.12014-04-30DownloadDoc
The recent version of Illumina's CASAVA pipeline (Version 1.8) produces FASTQ files with both reads that pass filtering and reads that don't. The new READ-ID (the @ line) contains many new fields, one of them indicates whether the read is filtered or not. This program can filter FASTQ files produced by CASAVA 1.8, and keep/discard reads based on this filter flag.

Remarque
Run Unix # fastq_illumina_filter -hRun Web #

VersionMAJ

IM-TORNADO

2.0.3.32016-02-22DownloadDoc
Illumina paired-end sequencing, which produces two separate reads for each DNA fragment, has become the platform of choice for 16S rDNA hypervariable tag sequencing. However, when the two reads do not overlap, existing computational pipelines analyze data from read separately and underutilize the information contained in the paired-end reads. IM-TORNADO is a tool for processing non-overlapping reads while retaining maximal information content.

Remarque If you use IM-TORNADO for your project, please cite the following manuscript: Jeraldo P, Kalari K, Chen X, Bhavsar J, Mangalam A, White B, et al. IM-TORNADO: A Tool for Comparison of 16S Reads from Paired-End Libraries. PLOS ONE 9 (12):e114804. Available from: http://dx.plos.org/10.1371/journal.pone.0114804
Run Unix # Run Web #

VersionMAJ

indel-Seq-Gen

2.1.032012-08-24DownloadDoc
indel-Seq-Gen (iSG) is a biological sequence simulation program that simulates highly divergent DNA sequences and protein superfamilies. This is accomplished through the addition of subsequence length constraints and lineage- and site-specific evolution. iSG tracks insertion and deletion processes that occur during the simulation run. iSG records all evolutionary events and outputs the "true" multiple alignment of the sequences, and can generate a larger simulated sequence space by allowing the use of multiple related root sequences. iSG can be used to test the accuracy of multiple alignment methods, evolutionary hypotheses, ancestral protein reconstruction methods, and protein superfamily classification methods. iSG utilizes a highly modified version of the substitution engine from Seq-Gen v1.3.2.

Remarque
Run Unix # indel-seq-gen [-bdefghilmnoqsuwz] < [tree_file] (indel-seq-gen -h)Run Web #

VersionMAJ

inGAP

2.7.82011-11-02DownloadDoc
This is a novel mining pipeline (2009), Integrative Next-generation Genome Analysis Pipeline (inGAP), guided by a Bayesian principle to detect single nucleotide polymorphisms (SNPs), insertion/deletions (indels) by comparing high-throughput pyrosequencing reads with a reference genome of related organisms. inGAP can be applied to the mapping of both Roche/454 and Illumina reads with no restriction of read length.

Remarque
Run Unix # inGAPRun Web #

VersionMAJ

InParanoid

4.12011-01-21DownloadDoc
InParanoid is a program for automatic identification of orthologs while differentiating between inparalogs and outparalogs. An InParanoid cluster is seeded by a reciprocally bestmatching ortholog pair, around which inparalogs are gathered independently, while outparalogs are excluded. The InParanoid database is a collection of pairwise ortholog groups aiming to include all 'completely sequenced' eukaryotic genomes. By this we mean above 6X coverage, and less than 1% X letters in the protein sequences.

Remarque
Run Unix # Usage: inparanoid.pl [FASTAFILE with sequences of species C] Run Web #

VersionMAJ

Insyght

2014-01-01DownloadDoc
Insyght is genomic visualisation tool that combines a symbolic and a proportional view of the genes, syntenies and genomic regions. Another of Insyght's feature is synchronized navigation and zooming across multiple species.

Remarque
Run Unix # Run Web # http://migale.jouy.inra.fr/IGO/

VersionMAJ

JAGS

3.4.02013-12-17DownloadDoc
JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation not wholly unlike BUGS. JAGS was written with three aims in mind: To have a cross-platform engine for the BUGS language To be extensible, allowing users to write their own functions, distributions and samplers. To be a plaftorm for experimentation with ideas in Bayesian modelling

Remarque
Run Unix # jagsRun Web #

VersionMAJ

jalview

2.9.02016-03-10DownloadDoc
Jalview is a multiple alignment editor

Remarque
Run Unix # jalviewRun Web #

VersionMAJ

jellyfish

1.1.32011-12-21DownloadDoc
JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence. JELLYFISH can count k-mers using an order of magnitude less memory and an order of magnitude faster than other k-mer counting packages by using an efficient encoding of a hash table and by exploiting the "compare-and-swap" CPU instruction to increase parallelism. JELLYFISH is a command-line program that reads FASTA and multi-FASTA files containing DNA sequences. It outputs its k-mer counts in an binary format, which can be translated into a human-readable text format using the "jellyfish dump" command. See the documentation below for more details.

Remarque If you use JELLYFISH in your research, please cite: Guillaume Marcais and Carl Kingsford, A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics (2011) 27(6): 764-770 (first published online January 7, 2011) doi:10.1093/bioinformatics/btr011
Run Unix # jellyfish Run Web #

VersionMAJ

Julia

0.5.02017-02-08DownloadDoc
Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. It is a very performant programming language somehow similar to R, Matlab or Python, but with performances approaching those of C/Fortran.

Remarque
Run Unix # juliaRun Web #

VersionMAJ

kaiju

1.5.02017-05-14DownloadDoc
Kaiju is a program for the taxonomic classification of metagenomic high-throughput sequencing reads. Each read is directly assigned to a taxon within the NCBI taxonomy by comparing it to a reference database containing microbial and viral protein sequences.

Remarque Citation Menzel P., Ng K.L., Krogh A. (2016) Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7:11257
Run Unix # kaiju -t nodes.dmp -f kaiju_db.fmi -i reads.fastq [-j reads2.fastq]Run Web #

VersionMAJ

kaksi

2.3rc12008-07-01DownloadDoc
Kaksi est un outil d'assignation des structures secondaires. D'après un fichier PDB contenant les coordonnées atomiques d'une protéine, kaksi définit la position des hélices alpha et des feuillets beta. La méthode d'assignation utilise les distances entre carbones alpha et les angles dièdres phi/psi du squelette protéique. Un calcul d'axes permet d'assurer la régularité des hélices assignées : une hélice présentant un coude sera décrite sous la forme de deux hélices distinctes. Les paramètres de détection -valeurs tolérées pour les distances et les angles- peuvent être modifiés en ligne de commande (se reporter à Martin et al, BMC Structural Biology 2005 pour une discussion détaillée du choix des paramètres). Les résultats sont retournés à l'utilisateur sous forme d'un fichier xml. Un utilitaire permettant d'extraire les principales informations au format fasta est fourni avec le programme.

Remarque
Run Unix # kaksi -pf my_pdb_file.pdbRun Web #

VersionMAJ

kaskad

1.0DownloadDoc
Outil pour l'extraction d'information temporelle sur les gènes à partir de corpus de textes.

Remarque
Run Unix # Run Web #

VersionMAJ

kClust

1.02015-01-21DownloadDoc
kClust is a fast and sensitive clustering method for the clustering of protein sequences. It is able to cluster large protein databases down to 20-30% sequence identity. kClust generates a clustering where each cluster is represented by its longest sequence (representative sequence).

Remarque For generating one multiple sequence alignment file for each cluster, please use kClust_mkAln. Type kClust_mkAln
Run Unix # kClust -i [fasta-db-file] -d [directory] [options]Run Web #

VersionMAJ

khmer

2.02015-11-25DownloadDoc
The khmer software is a set of command-line tools for working with DNA shotgun sequencing data from genomes, transcriptomes, metagenomes, and single cells. khmer can make de novo assemblies faster, and sometimes better. khmer can also identify (and fix) problems with shotgun data.

Remarque
Run Unix # -Run Web #

VersionMAJ

Klast

4.42015-04-24DownloadDoc
KLAST is a fast, accurate and NGS scalable bank-to-bank sequence similarity search tool providing significant accelerations of seeds-based heuristic comparison methods, such as the Blast suite of algorithms. Relying on unique software architecture, KLAST takes full advantage of recent multi-core personal computers without requiring any additional hardware devices.

Remarque
Run Unix # Run Web #

VersionMAJ

kmergenie

1.66632014-06-23DownloadDoc
KmerGenie estimates the best k-mer length for genome de novo assembly. Given a set of reads, KmerGenie first computes the k-mer abundance histogram for many values of k. Then, for each value of k, it predicts the number of distinct genomic k-mers in the dataset, and returns the k-mer length which maximizes this number. Experiments show that KmerGenie's choices lead to assemblies that are close to the best possible over all k-mer lengths.

Remarque
Run Unix # kmergenie [options]Run Web #

VersionMAJ

kraken

0.10.52015-11-25DownloadDoc
raken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.

Remarque If you use Kraken in your research, please cite our paper; the citation is available on the Kraken website.
Run Unix # kraken [options] Run Web #

VersionMAJ

KronaTools

2.62016-01-13DownloadDoc
Krona Tools is a set of scripts to create Krona charts from several Bioinformatics tools as well as from text and XML files.

Remarque
Run Unix # Run Web #

VersionMAJ

kSNP

2.1.22014-04-24DownloadDoc
Indentify SNPs in a set of genome sequences without the requirement of a reference sequence or a multiple sequence alignment. Reconstruction of SNP based phylogenies by maximum likelihood.

Remarque
Run Unix # kSNP -k kmer_length -f fasta -d output_directory [-p genomes4positions_list] [-u unassembled_genomes_list] [-m minimum_fraction_genomes_with_locus] [-G genbank.gbk] [-n num_CPU] [-j ] [-v ] [-c min_kmer_coverage] Run Web #

VersionMAJ

lalnview

3.02005-07-01DownloadDoc
LALNVIEW is a graphical program for visualizing local alignments between two sequences (protein or nucleic acids) [reference]. Sequences are represented by colored rectangles to give an overall picture of the similarities between the two sequences. Blocks of similarity between the two sequences are colored according to the degree of identity between the two segments.

Remarque
Run Unix # lalnviewRun Web #

VersionMAJ

LAST

8612017-06-02DownloadDoc
LAST: Genome-Scale Sequence Comparison LAST finds similar regions between sequences, and aligns them. It is designed for comparing large datasets to each other (e.g. vertebrate genomes and/or large numbers of DNA reads). It can:

Remarque
Run Unix # Run Web #

VersionMAJ

LEfSe

2014-12-24DownloadDoc
LEfSe (Linear discriminant analysis Effect Size) determines the features (organisms, clades, operational taxonomic units, genes, or functions) most likely to explain differences between classes by coupling standard tests for statistical significance with additional tests encoding biological consistency and effect relevance. LEfSe is available as a Galaxy module, and as a bitbucket repository. For additional information, please refer to the LEfSe paper. We provide support for LEfSe users. Please join our Google group designated specifically for LEfSe users. F

Remarque
Run Unix # Run Web #

VersionMAJ

linkage

5.12002-11-04DownloadDoc
The core of the LINKAGE package is a series of programs for maximum likelihood estimation of recombination rates, calculation of lod score tables, and analysis of genetic risks.

Remarque linkmapslinkmap.tracesmakepedspreplinksilinkslodscoresmlink
Run Unix # preplink ou linkmap ...Run Web #

VersionMAJ

loco

0.990329DownloadDoc

Remarque
Run Unix # locoRun Web #

VersionMAJ

macs

1.4.22013-05-16DownloadDoc
Next generation parallel sequencing technologies made chromatin immunoprecipitation followed by sequencing (ChIP-Seq) a popular strategy to study genome-wide protein-DNA interactions, while creating challenges for analysis algorithms. We present Model-based Analysis of ChIP-Seq (MACS) on short reads sequencers such as Genome Analyzer (Illumina / Solexa). MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, is publicly available open source, and can be used for ChIP-Seq with or without control samples.

Remarque
Run Unix # macs14 <-t tfile> [-n name] [-g genomesize] [options]Run Web #

VersionMAJ

mafft

7.164 2014-08-12DownloadDoc
MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods.

Remarque
Run Unix # mafft [options] input > outputRun Web #

VersionMAJ

mallard

1.022007-08-09DownloadDoc
Ce programme permet la détection de séquence d'ADN ribosomal 16S chimère (Une chimère correspond à la fusion de plusieurs séquences d'ADN r 16S).

Remarque http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=16957188&ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSum
Run Unix # mallardRun Web #

VersionMAJ

mango

0.1.02008-02-12DownloadDoc
Multiple Alignment with N Gapped OligossMANGO: A NEW APPROACH TO MULTIPLE SEQUENCE ALIGNMENT

Remarque Please use four scripts provided:smang8: MANGO with 8 seeds, without refinement;smang8r: MANGO with 8 seeds, with refinement;smang90: MANGO with 90 seeds, without refinement;smang90r: MANGO with 90 seeds, with refinement;s
Run Unix # mang8 ; mang8r ; mang90 ; mang90rRun Web #

VersionMAJ

mapsembler

1.3.21 2012-05-31DownloadDoc
Mapsembler is a targeted assembly software. It takes as input a set of NGS raw reads and a set of input sequences (starters). It first determines if each starter is read-coherent, e.g. whether reads confirm the presence of each starter in the original sequence. Then for each read-coherent starter, Mapsembler outputs its sequence neighborhood as a linear sequence or as a graph, depending on the user choice.

Remarque Citation: Peterlongo, P., & Chikhi, R. (2012). Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer. BMC Bioinformatics, 13(1), 48. doi:10.1186/1471-2105-13-48.
Run Unix # mapsembler [-m value] [-o output] [-k value] [-i value] [-e value] [-d value] [-t value] [-E value] [-Clrgfcvsh]Run Web #

VersionMAJ

MapSplice

1.15.22012-01-26DownloadDoc
MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery MapSplice est un algorithme de seconde génération de détection de sites d'épissage alternatifs. Son objectif est de détecter les sites d'épissage de façon sensible et spécifique en maintenant une bonne efficacité au niveau CPU et mémoire. MapSplice peut être appliqué aux reads courts (>75 pb) et long (75 pb). Il ne dépend ni des caractéristiques du site d'épissage ni de la longueur de l'intron, par conséquent, il peut détecter de nouveaux sites canoniques et non-canoniques d'épissage. MapSplice s'appuie sur la qualité et la diversité d'alignements des reads pour augmenter la précision de détection des sites d'épissage.

Remarque Publication MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery Kai Wang; Darshan Singh; Zheng Zeng; Stephen J. Coleman; Yan Huang; Gleb L. Savich; Xiaping He; Piotr Mieczkowski; Sara A. Grimm; Charles M. Perou; James N. MacLeod; Derek Y. Chiang; Jan F. Prins; Jinze Liu Nucleic Acids Research 2010; doi: 10.1093/nar/gkq622
Run Unix # python /usr/local/genome/MapSplice_1.15.2/bin/mapsplice_segments.py MapSplice.cfgRun Web #

VersionMAJ

maq

0.7.12014-10-02DownloadDoc
Maq is a software that builds mapping assemblies from short reads generated by the next-generation sequencing machines. It is particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has a preliminary functionality to handle AB SOLiD data.

Remarque
Run Unix # maqRun Web #

VersionMAJ

mascot

1.92003-05-21DownloadDoc
Mascot est un outil de recherche puissant qui utilise des données de spéctrométrie de masse pour identifier des protéines à partir de séquences primaires des bases de données.

Remarque Accès restreint
Run Unix # noneRun Web # http://genome.jouy.inra.fr/mascot

VersionMAJ

MaSuRCA

2.3.12014-10-01DownloadDoc
MaSuRCA is whole genome assembly software. It combines the efficiency of the de Bruijn graph and Overlap-Layout-Consensus (OLC) approaches. MaSuRCA can assemble data sets containing only short reads from Illumina sequencing or a mixture of short reads and long reads (Sanger, 454).

Remarque
Run Unix # Run Web #

VersionMAJ

matrix2png

1.2.12011-05-30DownloadDoc
Matrix2png is a simple but powerful program for making visualizations of microarray data and many other data types. It generates PNG formatted images from text files of data. It is fast, easy to use, and reasonably flexible. It can be used to generate publication-quality images, or to act as a image generator for web applications. Our group has found it useful for imaging all kinds of matrix-based data, not just microarray data.

Remarque If you use images created with matrix2png for publication or presentation, please cite:Pavlidis, P. and Noble W.S. (2003) Matrix2png: A Utility for Visualizing Matrix Data. Bioinformatics 19: 295-296 (abstract).Readers of the Bioinformatics application note: Here is the color version of the figure from the paper (pdf format).
Run Unix # matrix2pngRun Web #

VersionMAJ

mauve

2.4.02015-01-07DownloadDoc
Multiple Alignment of Conserved Regions in Genome Sequences

Remarque
Run Unix # mauve Run Web #

VersionMAJ

MaxBin

2.2.1 2017-01-17DownloadDoc
MaxBin is a software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm. Users could understand the underlying bins (genomes) of the microbes in their metagenomes by simply providing assembled metagenomic sequences and the reads coverage information or sequencing reads. For users' convenience MaxBin will report genome-related statistics, including estimated completeness, GC content and genome size in the binning summary page. Users could use MEGAN or similar software on MaxBin bins to find out the taxonomy of each bin after the binning process is finished.

Remarque
Run Unix # run_MaxBin.pl -contig (contig file) -out (output file) Run Web #

VersionMAJ

mcl

12-0682013-08-22DownloadDoc
The MCL algorithm is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for networks (also known as graphs) based on simulation of (stochastic) flow in graphs.

Remarque
Run Unix # mcl <-|file name> [options], do 'mcl -h' or 'man mcl' for helpRun Web #

VersionMAJ

mega2

4.5.62012-08-06DownloadDoc
Mega2 est un logiciel qui sert à partir de trois fichiers d'entrée (pedigree, carte et locus) à créer tous les fichiers nécessaires à l'utilisation de logiciels d'analyse de liaison, d'haplotypes, d'IBD etc.. comme simwalk2, genehunter, vitesse, TDT, SAGE, Allegro ou encore Mendel. Sans Mega2, il faut formater tous les input ce qui est long, fastidieux et source d'erreurs...

Remarque If you use Mega2 as part of a published work, please remember to reference Mega2. You may reference it by citing the following: Mukhopadhyay N, Almasy L, Schroeder M, Mulvihill WP, Weeks DE (2005) Mega2: data-handling for facilitating genetic linkage and association analyses. Bioinformatics. 2005 May 15;21(10):2556-7. PMID: 15746282
Run Unix # mega2Run Web #

VersionMAJ

Megahit

1.0.4-beta2016-03-17DownloadDoc
MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph

Remarque
Run Unix # megahit [options] {-1 -2 | --12 | -r } [-o ]Run Web #

VersionMAJ

megan

5.10.62015-07-12DownloadDoc
MEGAN - Metagenome Analysis Software

Remarque
Run Unix # meganRun Web #

VersionMAJ

memrec

1.112014-11-12DownloadDoc
The memrec (memory usage recorder) script is a tool we've written to watch the memory usage of a program.

Remarque
Run Unix # memrec [opts] progRun Web #

VersionMAJ

memsat3

32010-12-28DownloadDoc
Transmembrane Protein Modelling

Remarque
Run Unix # memsat3 "query" "database" ou runmemsat.sh "query" "database"Run Web #

VersionMAJ

merlin

1.1.22008-10-15DownloadDoc
MERLIN est un package qui permet d'effectuer des analyses génétiques rapides de pedigrees (analyses de liaison, d'association, haplotypes...).

Remarque
Run Unix # merlinRun Web #

VersionMAJ

metagene

2007-05-04DownloadDoc
Gene Finding Program for Metagenomics MetaGene predicts prokaryotic genes on anonymous genomic sequences. Fragmented sequences (longer than 100 bp) can be accepted.

Remarque
Run Unix # metagene [multi-fasta] Run Web #

VersionMAJ

MetaGeneAnnotator

- 2009-01-26DownloadDoc
Version améliorée du programe d'annotation de données métagénomiques Metagene. Prediction de genes procaryotes à partir d'un génome ou d'un set de génomes anonymes. Particulierement adapté aux analyses métagénomiques.

Remarque
Run Unix # metageneannotatorRun Web #

VersionMAJ

MetaSim

0.9.52010-09-28DownloadDoc
MetaSim - A Sequencing Simulator for Genomics and Metagenomics

Remarque f you use this program for your own research please cite our software. Publication: Richter DC, Ott F, Auch AF, Schmid R, Huson DH (2008) MetaSim—A Sequencing Simulator for Genomics and Metagenomics. PLoS ONE 3(10): e3373. doi:10.1371/journal.pone.0003373
Run Unix # MetaSimRun Web #

VersionMAJ

mga

none2014-11-20DownloadDoc
Multiple Genome Aligner (MGA for short) computes multiple genome alignments of large, closely related DNA sequences.

Remarque
Run Unix # mgaRun Web #

VersionMAJ

mgltools

1.5.62015-03-11DownloadDoc
MGLTools is a software developed at the Molecular Graphics Laboratory (MGL) of The Scripps Research Institute for visualization and analysis of molecular structures. Short description and demo of its three main applications are given below. Navigation portlet on the left has links to downloads, screenshots, documentation section of this website where you can find more information about MGLTools. Please visit MGL Bugzilla to submit a bug report or to request a new feature.

Remarque
Run Unix # pmv, adt, vision Run Web #

VersionMAJ

micca

1.5.02017-02-27DownloadDoc
micca (MICrobial Community Analysis) is a software pipeline for the processing of amplicon sequencing data, from raw sequences to OTU tables, taxonomy classification and phylogenetic tree inference. The pipeline can be applied to a range of highly conserved genes/spacers, such as 16S rRNA gene, Internal Transcribed Spacer (ITS) and 28S rRNA.

Remarque
Run Unix # micca [--version] [--help] [] Run Web #

VersionMAJ

minia

1.46832013-02-21DownloadDoc
Minia is a short-read assembler based on a de Bruijn graph, capable of assembling a human genome on a desktop computer in a day. The output of Minia is a set of contigs. Minia produces results of similar contiguity and accuracy to other de Bruijn assemblers (e.g. Velvet).

Remarque PDF and Citation R. Chikhi, G. Rizk. Space-efficient and exact de Bruijn graph representation based on a Bloom filter, WABI 2012
Run Unix # minia fasta_file kmer_size min_abundance estimated_genome_size prefixRun Web #

VersionMAJ

mira

4.02014-11-18DownloadDoc
MIRA is a Whole Genome Shotgun and EST Sequence Assembler for Sanger, 454 and Solexa / Illumina. It can perform Hybrid de-novo assemblies as well as SNP and mutations discovery for mapping assemblies.

Remarque
Run Unix # miraRun Web #

VersionMAJ

miranda

3.3a2014-10-29DownloadDoc
miRanda is an algorithm for the detection of potential microRNA target sites in genomic sequences. miRanda reads RNA sequences (such as microRNAs) from file1 and genomic DNA/RNA sequences from file2. Both of these files should be in FASTA format.

Remarque
Run Unix # miranda file1 file2 [options..]Run Web #

VersionMAJ

miRDeep2

2.0.0.72015-02-23DownloadDoc
documentation miRDeep2 documentation What is miRDeep2 miRDeep2 is a software package for identification of novel and known miRNAs in deep sequencing data. Furthermore, it can be used for miRNA expression profiling across samples. Last, a new module for preprocessing of raw Illumina sequencing data produces files for downstream analysis with the miRDeep2 or quantifier module. Colorspace sequencing data is currently not supported by the preprocessing module but it is planed to be implemented. Preprocessing is performed with the mapper.pl script. Quantification and expression profiling is done by the quantifier.pl script. miRNA identification is done by the miRDeep2.pl script.

Remarque
Run Unix # miRDeep2.plRun Web #

VersionMAJ

MIReNA

2.02012-09-05DownloadDoc

Remarque
Run Unix # Run Web #

VersionMAJ

mktrace

0.001017 2005-07-30DownloadDoc
This program reads a FASTA file and creates a chromatogram stored in an SCF file and a corresponding phd file. The SCF file contains minimal information at this time. If a quality value FASTA file exists, mktrace uses those quality values in the phd file, otherwise it sets the quality values to the pre-determined values. mktrace produces a fake trace that could be used by Phred/Phrap packages.

Remarque Fait parti du package consed
Run Unix # mktrace G0771A003_114.s1.seq G0771A003_114.s1.scfRun Web #

VersionMAJ

mmseq

0.11.22012-11-20DownloadDoc
MMSEQ: haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads pipeline The flowchart to the right depicts the MMSEQ pipeline for obtaining expression estimates from RNA-seq data. There are two routes, with starting points labelled A and B. Route A is quite fast and straightforward to run and uses pre-existing transcript sequences for alignment. Route B requires more time, as it involves the creation of custom transcript sequences based on the data.

Remarque Please cite Turro et al. 2011 (Genome Biology) if you use MMSEQ in your work. http://dx.doi.org/10.1186/gb-2011-12-2-r13
Run Unix # mmseq [OPTIONS...] hits_file output_base Run Web #

VersionMAJ

MMSEQ

1.0.22013-09-02DownloadDoc
MMSEQ: haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads

Remarque Please cite Turro et al. 2011 (http://dx.doi.org/10.1186/gb-2011-12-2-r13)
Run Unix # mmseq / bam2hitsRun Web #

VersionMAJ

MMTK

2.7.92016-06-21DownloadDoc
The Molecular Modelling Toolkit (MMTK) is an Open Source program library for molecular simulation applications. In addition to providing ready-to-use implementations of standard algorithms, MMTK serves as a code basis that can be e

Remarque
Run Unix # Run Web #

VersionMAJ

MOCAT

1.12012-08-01DownloadDoc
MOCAT is a package for analyzing metagenomics datasets. Currently MOCAT supports Illumina single- and paired-end reads in raw FastQ format.

Remarque Jens Roat Kultima & Shinichi Sunagawa (Bork Group, EMBL)
Run Unix # MOCAT.pl -sf|sample_file 'FILE' [Pipeline, Statistics, & Additional Options]Run Web #

VersionMAJ

modelgenerator

0.852011-03-15DownloadDoc
ModelGenerator is a model selection program that selects optimal amino acid and nucleotide substitution models from Fasta or Phylip alignments. ModelGenerator supports 56 nucleotide and 96 amino acid substitution models.

Remarque
Run Unix # modelgeneratorRun Web #

VersionMAJ

modeller

9.162016-01-21DownloadDoc
MODELLER is used for homology or comparative modeling of protein three-dimensional structures (1,2). The user provides an alignment of a sequence to be modeled with known related structures and MODELLER automatically calculates a model containing all non-hydrogen atoms.

Remarque
Run Unix # usage: mod9.16 script [...]Run Web #

VersionMAJ

modeltest

3.72006-11-20DownloadDoc
Modeltest est un programme qui évalue différents tests de rapport de vraisemblance de modèles d'évolution dans le but de choisir le modèle le plus approprié aux données.

Remarque
Run Unix # modeltestRun Web #

VersionMAJ

mole

1.22011-05-12DownloadDoc
Program MOLE is an universal toolkit for rapid and fully automated location and characterization of channels, tunnels and pores in molecular structures. The core of MOLE algorithm is a Dijsktra path search algorithm, which is applied to a Voronoi mesh. MOLE is a powerful software (overcomming some limitations of CAVER tool) for exploring large molecular channels, complex networks of channels and molecular dynamics trajectories (AMBER ascii traj and parm7 are supported) in which analysis of a large number of snapshots is required.

Remarque
Run Unix # Mole.exeRun Web #

VersionMAJ

molscript

2.1.22005-07-04DownloadDoc
MolScript is a program for displaying molecular 3D structures, such as proteins, in both schematic and detailed representations.

Remarque
Run Unix # molscriptRun Web #

VersionMAJ

MOSAIK assembler

1.1.00212011-06-06DownloadDoc
MOSAIK is a reference-guided assembler comprising of four main modular programs: * MosaikBuild * MosaikAligner * MosaikSort * MosaikAssembler. MosaikBuild converts various sequence formats into Mosaik’s native read format. MosaikAligner pairwise aligns each read to a specified series of reference sequences. MosaikSort resolves paired-end reads and sorts the alignments by the reference sequence coordinates. Finally, MosaikAssembler parses the sorted alignment archive and produces a multiple sequence alignment which is then saved into an assembly file format.

Remarque
Run Unix # MosaikAligner MosaikAssembler MosaikBuild MosaikCoverage MosaikDupSnoop MosaikJump MosaikMerge MosaikSort MosaikTextRun Web #

VersionMAJ

mothur

1.34,42014-12-23DownloadDoc
The goal of mothur is to have a single resource to analyze molecular data that is used by microbial ecologists. Many of these tools are available elsewhere as individual programs and as scripts, which tend to be slow or as web utilities, which limit your ability to analyze your data. mothur offers the ability to go from raw sequences to the generation of visualization tools to describe α and β diversity. Examples of each command are provided within their specific pages, but several users have provided several analysis examples, which use these commands. An exhaustive list of the commands found in mothur is available within the commands category index.

Remarque
Run Unix # mothurRun Web #

VersionMAJ

MPscan

-2013-08-26DownloadDoc
MPscan: fast localisation of multiple reads in genomes

Remarque Please cite THIS paper if you use MPscan. Rivals E., Salmela L., Kiiskinen P., Kalsi P., Tarhio J.Lecture Notes in BioInformatics (LNBI), Springer-Verlag, Vol. 5724, p. 246-260, 2009.
Run Unix # mpscan -hRun Web #

VersionMAJ

MrBayes

3.2.52015-10-20DownloadDoc
MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.

Remarque
Run Unix # mbRun Web #

VersionMAJ

mrFAST

2.6.0.02012-02-01DownloadDoc
mrFAST is a read mapper that is designed to map short reads to reference genome with a special emphasis on the discovery of structural variation and segmental duplications. mrFAST maps short reads with respect to user defined error threshold, including indels up to 4+4 bp. This manual, describes how to choose the parameters and tune mrFAST with respect to the library settings. mrFAST is designed to find 'all' mappings for a given set of reads, however it can return one "best" map location if the relevant parameter is invoked. NOTE: mrFAST is developed for Illumina, thus requires all reads to be at the same length. For paired-end reads, lengths of mates may be different from each other, but each "side" should have a uniform length.

Remarque Personalized copy number and segmental duplication maps using next-generation sequencing. Can Alkan, Jeffrey M. Kidd, Tomas Marques-Bonet, Gozde Aksay, Francesca Antonacci, Fereydoun Hormozdiari, Jacob O. Kitzman, Carl Baker, Maika Malig, Onur Mutlu, S. Cenk Sahinalp, Richard A. Gibbs, Evan E. Eichler. Nature Genetics, Oct, 41(10):1061-1067, 2009. Table of Contents Sample Set General Indexing Single Genome Mode Batch Mode Mapping Single-end Reads - Single Mode Single-end Reads - Batch Mode Paired-end Reads Discordant Paired-end Reads Output Format Sample Set A sample genome FASTA file, with simulated reads and a command line to map in paired-end mode is supplied. Please download the sample set. General Please download the latest version from our download page and then unzip the downloaded file. Run 'make' to build mrFAST. mrFAST generates an index of the reference genome(s) and maps the reads to reference genome. Requirements: zlib for the ability to read compressed FASTQ and write compressed SAM files. C compiler (mrFAST is developed with gcc versions > 4.1.2) Building: On Unix/Linux systems, we recommend using GNU gcc version > 4.1.2 as your compiler and type 'make' to build. Example: linux> make gcc -c -O3 baseFAST.c -o baseFAST.o gcc -c -O3 CommandLineParser.c -o CommandLineParser.o gcc -c -O3 Common.c -o Common.o gcc -c -O3 HashTable.c -o HashTable.o gcc -c -O3 MrFAST.c -o MrFAST.o gcc -c -O3 Output.c -o Output.o gcc -c -O3 Reads.c -o Reads.o gcc -c -O3 RefGenome.c -o RefGenome.o gcc baseFAST.o CommandLineParser.o Common.o HashTable.o MrFAST.o Output.o Reads.o RefGenome.o -o mrFAST -lz -lm rm -rf *.o Parallelization: The best way to optimize mrFAST is to split the reads into chunks that fit into the memory of the cluster nodes, and implement an MPI wrapper in an embarrassingly parallel fashion. We recommend the following criteria to split the reads: Single End Mode: The number of reads should be approximately ((M-600)/(4*L)) million where M is the size of the memory for the cluster node (in megabytes) and L is the read length. If you have more nodes, you can make the chunks smaller to use the nodes efficiently. For example, if the library length is 50bp and the memory of nodes is 2 GB, each chunk should contain (2000-600)/(4*50)= 7 million reads. Paired End Mode: The number of reads in each file should not exceed 1 million (500,000 pairs), however chunk size of 500,000 reads (250,000 pairs) is recommended. To see the list of options, use "-h" or "--help". To see the version of mrFAST, user "-v" or "--version". Indexing mrFAST's indices can be generated in two modes (single, batch). In single mode, mrFAST indexes a fasta file (which may contain one or more reference genomes) while in batch mode it indexes a set of fasta files. By default mrFAST uses the window size of 12 characters to generate its index. Please be advised that if you do not choose the window size carefully, you will lose sensitivity. How to choose the right window size: For a given read length (l) and error threshold (e), the window size is floor(l/(e+1)). For example if the reads length is 36 and the maximum number of mismatches allowed is 2, the window size is 12. if your calculated window size is greater than default, you can use the default window size without losing the sensitivity. For example, for the read length of 64 and error threshold of 2, the windows size should be 21. You can use the default window size 12. However you cannot use 12 as window size for read length of 30 and error threshold of 2. Single Genome Mode: To index a reference genome like "refgen.fasta" run the following command: $>./mrfast --index refgen.fasta Upon the completion of the indexing phase, you can find "refgen.fasta.index" in the same directory as "refgen.fasta". mrFAST uses a window size of 12 (default) to make the index of the genome, this windows size can be modified with "--ws". There is a restriction on the maximum of the window size as the window size directly affects the memory usage. $>./mrfast --index refgen.fasta --ws 13 Batch Mode In batch mode, mrFAST gets a list of reference files and generates the index for each one of them. Similar to single mode, you can specify a different window size for indexing. $>./mrfast -b --index fasts.list --ws 13 Mapping mrFAST can map single-end reads and paired-end reads to a reference genome. mrFAST can map in either single or batch mode. In single mode, it only maps to one index. In batch mode, it maps to a list of indices. mrFAST supports both fasta and fastq formats. Single-end Reads - Single Mode To map single reads to a reference genome in single mode, run the following command. Use "--seq" to specify the input file. refgen.fa and refgen.fa.index should be in the same folder. You can load a multi-sequence FASTA file as the reference genome. $>./mrfast --search refgen.fa --seq reads.fastq The reported locations will be saved into "output" by default. If you want to save it somewhere else, use "-o" to specify another file. mrFAST can report the unmapped reads in fasta/fastq format. $>./mrfast --search refgen.fasta --seq reads.fastq -o my.map By default, mrFAST reports all the locations per read. If you need one "best" mapping add the "--best" parameter to the command line: $>./mrfast --search refgen.fasta --seq reads.fastq -e 3 --best Single-end Reads - Batch Mode (Note: deprecated after version 2.1.0.6) In batch mode, mrFAST uses a list of indices to find the mappings of the reads. "index.list" should contain the list of fasta files. $>./mrfast -b --search index.list --seq reads.fastq Paired-end Reads To map paired-end reads, use "--pe" option. The mapping can be done in single/batch mode. If the reads are in two different files, you have to use "--seq1/--seq2" to indicate the files. If the reads are interleaved, use "--seq" to indicated the file. The distance allowed between the paired-end reads should be specified with "--min" and "--max". "--min" and "--max" specify the minmum and maximum of the inferred size (the distance between outer edges of the mapping mates). $>./mrfast --search refgen.fasta --pe --seq reads.fastq --min 150 --max 250 Discordant Mapping mrFAST can report the discordant mapping for use of Variation Hunter. The --min and --max optiopns will define the minimum and maximum inferred size for concordant mapping. This is enabled by default since version 2.1.0.6 $>./mrfast --search refgen.fasta --pe --discordant-vh --seq reads.fastq --min 50 --max 75 Parameters General Options: -v|--version Shows the current version. -h Shows the help screen. Indexing Options: --index [file] Generate an index from the specified fasta file. -b Indicates the indexing will be done in batch mode. The file specified in --search should contain the list of fasta files. (Note: deprecated after version 2.1.0.6) -ws [int] Set window size for indexing (default:12 max:14). Searching Options: --search [file] Search the specified genome. Index file should be in same directory as the fasta file. -b Indicates the mapping will be done in batch mode. The file specified in --search should contain the list of fasta files. (Note: deprecated after version 2.1.0.6) --pe Search will be done in paired-end mode --mp Search will be done in matepair mode --seq [file] Input sequences in fasta/fastq format [file]. If pairend reads are interleaved, use this option. --seq1 [file] Input sequences in fasta/fastq format [file] (First file). Use this option to indicate the first file of paired-end reads --seq2 [file] Input sequences in fasta/fastq format [file] (Second file). Use this option to indicate the second file of paired-end reads. -o [file] Output of the mapped sequences (SAM format). The default is "output". -u [file] FASTA/FASTQ file for the unmapped sequences. The default is "unmapped". -e [int] Maximum allowed edit distance (default 4% of the read length). Note that although the current version is limited with up to 4+4 indels, it supports any number of substitution errors. --min [int] Min inferred distance allowed between two pairend sequences. --max [int] Max inferred distance allowed between two pairend sequences. --discordant-vh To return all discordant map locations ready for the Variation Hunter program, and OEA map locations ready for the NovelSeq. --best Return "best" location only (single-end mode). --seqcomp Indicates that the input sequences are compressed (gz). --outcomp Indicates that output file should be compressed (gz). --maxoea [int] Max number of One End Anchored (OEA) returned for each read pair. Minimum of 100 is recommendded for NovelSeq use. --maxdis [int] Max number of discordant map locations returned for each read pair. --crop [int] Crop the input reads at position [int]. --sample [string] Sample name to be added to the SAM header (optional). --rg [string] Read group ID to be added to the SAM header (optional). --lib [string] Library name to be added to the SAM header (optional). Output Files Single-End Mode: In the single-end mode mrFAST will generate two files as specified by the "-o" and "-u" parameters. Default filenename if the "-o" parameter is not specified is "output"; and default filename for the "-u" parameter is "unmapped". output file ("-o"): Contains the map locations of the sequences in the specified genome in SAM format. mrFAST returns all possible map locations within the given edit distance ("-e") by default. If the "--best" parameter is invoked, then it will select one "best" location that has the minimum edit distance to the genome. unmappped file ("-u"): Contains the unmapped reads in FASTQ or FASTA format, depending on the format of the input sequences. Paired-End and Matepair Modes: In paired-end and matepair modes, mrFAST will generate a SAM file in the paired-end mode that will store best mapping locations while utilizing the paired-end span information. In addition, it will generate a DIVET file and and OEA file (SAM format). See below: output file ("-o"): Contains the map locations of the sequences in the specified genome in SAM format. This file will include: If a read pair can be mapped concordantly, the "best" (minimum total edit distance and minimum differential from the average span) map location for the pair. If the read pair can not be mapped concordantly, again, the "best" (minimum total edit distance and minimum differential from the average span) map location for the pair. unmapped file ("-u"): Contains the orphan (both ends unmapped) reads in FASTQ or FASTA format, depending on the format of the input sequences. output.DIVET.vh file ("-o" option changes the prefix "output"): This file includes all possible map locations for the read pairs that cannot be concordantly mapped. This file can be loaded by VariationHunter tool for structural variation discovery. output_OEA file: Contains the OEA (One-End-Anchored) reads (paired-end reads where only one read can be mapped to the genome). The output is in SAM format, contains the map location of read that can be mapped to the genome. The unmapped reads of an OEA read pair are not reported in separate lines; instead the sequence and quality information is given in the line that specifies the map location of the mapped read. We use optional fields NS and NQ to specify the unmapped sequence and unmapped quality information. This file can be loaded by NovelSeq tool for novel sequence discovery, however format conversion might be required; please see the NovelSeq documentation. NOTE: mrFAST will report many (up to 100 by default) possible map locations for the "mapped" read of OEA matepais. This will generate a large file due to repeats and duplications. This file can be limited through the --maxoea parameter (version 2.1.0.0 and above). Output Format mrFAST mapping output format is in SAM format. For detail about the definition of the fields please refer to SAM Manual. We have not implemented "MQUAL" field yet. All locations of discordant paired-end reads will be reported in DIVET format as required by the VariationHunter package. Unmapped reads (or, "orphan" read pairs in the PE mode) will be outputted in FASTQ or FASTA format, depending on the input sequence file format.
Run Unix # mrfast [options]Run Web #

VersionMAJ

mrsFAST

2.5.0.42012-02-01DownloadDoc
mrsFAST is a cache oblivious mapper that is designed to map short reads to reference genome. mrsFAST maps short reads with respect to user defined error threshold. In this manual, we will show how to choose the parameters and tune mrsFAST with respect to the library settings. mrsFAST is designed to find 'all' the mappings for a given set of reads.

Remarque
Run Unix # mrsFAST -hRun Web #

VersionMAJ

MuGeN

200609192007-01-05DownloadDoc
MuGeN (Multi-Genome Navigator) est un outil interactif permettant une exploration dans plusieurs géomes annotés complets par des résultats d'analyse in silico. Il dispose également d'un mode d'exécution en mode batch lui permettant de servir de générateur d'images à divers formats. Ce mode de fonctionnement le prédispose à être intégré à des sites Web pour l'affichage de cartes physiques annotées. MuGeN is a software package for the visual exploration of multiple annotated genome portions. It is capable of simultaneously displaying genome portions loaded from various sources both local and remote and mix these with analysis result plots. It can also be used to generate images of these displays in a wide range of formats (PNG, PostScript, IMAP, XFig).

Remarque La commande : mugenv est suffisante pour lancer l'environnement graphique, mais elle ne charge aucun génome et les fenêtres paraîtront donc un peu vides. Plus fréquemment, on fera : mugenv /chemin/vers/un/fichier/genbank.gbk pour explorer le fichier en question. Les numéros de version de MuGeN correspondent à leur date de sortie, et sont affichées dans la barre titre de sa fenêtre graphique. La dernière en date est la 20040726 qui est celle installée sur topaze et adm.
Run Unix # mugenv ou mugenv /chemin/vers/un/fichier/genbank.gbkRun Web #

VersionMAJ

Mugsy

1.2.32013-07-19DownloadDoc
Mugsy is a multiple whole genome aligner. Mugsy uses Nucmer for pairwise alignment, a custom graph based segmentation procedure for identifying collinear regions, and the segment-based progressive multiple alignment strategy from Seqan::TCoffee. Mugsy accepts draft genomes in the form of multi-FASTA files and does not require a reference genome.

Remarque To cite Mugsy, use: Angiuoli SV and Salzberg SL. Mugsy: Fast multiple alignment of closely related whole genomes. Bioinformatics 2011 27(3):334-4
Run Unix # mugsy [-p output prefix] multifasta_genome1.fsa multifasta_genome2.fsa ... multifasta_genomeN.fsaRun Web #

VersionMAJ

multalin

5.4.12002-04-04DownloadDoc
This software will allow you to align simultaneously several biological sequences.

Remarque
Run Unix # maRun Web # http://www.toulouse.inra.fr/multalin.html

VersionMAJ

multiqc

0.82016-11-09DownloadDoc
summarize analysis results for multiple tools and samples in a single report

Remarque
Run Unix # multiqc_envRun Web #

VersionMAJ

mummer

3.232015-03-02DownloadDoc
MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. For example, MUMmer 3.0 can find all 20-basepair or longer exact matches between a pair of 5-megabase genomes in 13.7 seconds, using 78 MB of memory, on a 2.4 GHz Linux desktop computer. MUMmer can also align incomplete genomes; it can easily handle the 100s or 1000s of contigs from a shotgun sequencing project, and will align them to another set of contigs or a genome using the NUCmer program included with the system. If the species are too divergent for a DNA sequence alignment to detect similarity, then the PROmer program can generate alignments based upon the six-frame translations of both input sequences.

Remarque
Run Unix # mummer [options] Run Web #

VersionMAJ

muscle

3.8.312014-08-24DownloadDoc
MUSCLE stands for MUltiple Sequence Comparison by Log-Expectation.

Remarque
Run Unix # muscle -in -out Run Web #

VersionMAJ

mview

1.47.32003-05-16DownloadDoc
MView is a tool for converting the results of a sequence database search (BLAST, FASTA, etc.) into the form of a coloured multiple alignment of hits stacked against the query. Alternatively, an existing multiple alignment (MSF, PIR, CLUSTALW, etc.) can be processed. In either case, the output is simply HTML, so the result is platform independent and does not require a separate application or applet to be loaded. MView is NOT a multiple alignment program, nor is it a general purpose alignment editor.

Remarque
Run Unix # mview [options] [file...]Run Web #

VersionMAJ

naccess

2.1.1DownloadDoc

Remarque
Run Unix # naccess Run Web #

VersionMAJ

ncoils

DownloadDoc

Remarque
Run Unix # ncoilsRun Web #

VersionMAJ

nesoni

0.402011-01-31DownloadDoc
Nesoni focusses on analysing the alignment of reads to a reference genome. We use the SHRiMP read aligner, as it is able to detect small insertions and deletions in addition to SNPs. Nesoni can call a consensus of read alignments, taking care to indicate ambiguity. This can then be used in various ways: to determine the protein level changes resulting from SNPs and indels, to find differences between multiple strains, or to produce n-way comparison data suitable for phylogenetic analysis in SplitsTree4. Alternatively, the raw counts of bases at each position in the reference seen in two different sequenced strains can compared using Fisher's Exact Test.

Remarque
Run Unix # nesoniRun Web #

VersionMAJ

netlogo

4.12010-03-16DownloadDoc
NetLogo is a programmable modeling environment for simulating natural and social phenomena. It was authored by Uri Wilensky in 1999 and has been in continuous development ever since at the Center for Connected Learning and Computer-Based Modeling.

Remarque
Run Unix # netlogoRun Web #

VersionMAJ

newbler

2.62011-07-06DownloadDoc
Newbler is a package of three data analysis applications made by Roche 454 : the GS De Novo Assembler (with or without contig scaffolding using Paired End reads), the GS Reference Mapper, and the GS Amplicon Variant Analyzer (AVA). An additional application, the GS Run Browser, is an interactive Run browser/ troubleshooting tool which displays graphically the images, some intermediate data, and various output metrics from a sequencing Run. The software package also includes the SFF Tools commands for handling and using the data files (called Standard Flowgram Format or SFF files) that hold the sequencing trace data.

Remarque
Run Unix # newblerRun Web #

VersionMAJ

newicktopdf

-2010-08-11DownloadDoc
Convertit un fichier contenant les caractéristiques d'un arbre au format newick en un fichier pdf (programme du groupe Manolo Gouy à Lyon).

Remarque
Run Unix # newicktopdf (produit le même fichier suffixé pdf)Run Web #

VersionMAJ

NGSToolsMIG

1.02011-02-04DownloadDoc
Tools developed in MIG laboratory to help in the process of Next generation Sequencing Data analysis : quality control, mapping, assembly, global statistics, etc. ///////// adaptiveTrim.pl ///////// alignmentStatistics.pl ///////// contigsExtractionOnLength.pl ///////// fastqQualityConverter.pl ///////// gbk2Fasta.pl ///////// globalTrim.pl ///////// multiFasta2Fasta.pl ///////// show2Fasta.pl ///////// unmappedReadsExtraction.pl ///////// (Cf. Doc)

Remarque
Run Unix # ex.: contigsExtractionOnLength.pl -i fichier.fasta -do /Dir1/Dir11/Dir111/ -po fichierFiltre -l 1500 -rRun Web #

VersionMAJ

njplot

20092013-08-27DownloadDoc
NJplot is a tree drawing program able to draw any binary tree expressed in the standard phylogenetic tree format (e.g., the format used by the PHYLIP package). NJplot is especially convenient for rooting the unrooted trees obtained from parsimony, distance or maximum likelihood tree-building methods.

Remarque
Run Unix # njplotRun Web #

VersionMAJ

novoalign

2.08.012013-08-20DownloadDoc
Novoalign is an aligner for single-ended and paired-end reads from the Illumina Genome Analyser. Novoalign finds global optimum alignments using full Needleman-Wunsch algorithm with affine gap penalties.

Remarque
Run Unix # novoalign [options]Run Web #

VersionMAJ

novosnp

2.0.1DownloadDoc
novoSNP is a program that will help you find variations (SNPs and short INDELs) in resequencing projects. It takes a reference sequence and a number of sequencing trace files as input, and generates a list of possible variations with a quality score. novoSNP allows you to easily filter, sort and check the variations found visually and keep track of your verifications.

Remarque
Run Unix # novosnp2.0.1Run Web #

VersionMAJ

nupack

3.02010-12-01DownloadDoc
NUPACK is a growing software suite for the analysis and design of nucleic acid systems. The package currently enables thermodynamic analysis of dilute solutions of interacting nucleic acid strands, and sequence design for complexes of nucleic acid strands intended to adopt a target secondary structure at equilibrium. NUPACK algorithms are formulated in terms of nucleic acid secondary structure. In most cases, pseudo-knots are excluded from the structural ensemble. Much of this software may be conveniently run through the NUPACK web server at http://www.nupack.org (Zadeh et al., 2010b).

Remarque
Run Unix # Run Web #

VersionMAJ

oases

0.2.082014-10-03DownloadDoc
De novo transcriptome assembler for very short reads

Remarque
Run Unix # oasesRun Web #

VersionMAJ

OBO-Edit

1.1012008-12-05DownloadDoc
Obo-Edit est un éditeur d'ontologie dans le format obo. Le format obo a été défini originellement pour Gene Ontology et se répand dans la communauté bioinformatique. Quelques dizaines d'ontologies en format obo sont disponibles et éditables par Obo-Edit.

Remarque
Run Unix # oboeditRun Web #

VersionMAJ

ocount

0.42008-02-07DownloadDoc
OCOUNT is a fast C command-line utility that has been written in the course of TETRA's development. It counts oligonucleotides in DNA sequences and computes Markov-Model-based z-scores.

Remarque
Run Unix # ocountRun Web #

VersionMAJ

opera

2.02015-03-02DownloadDoc
Opera (Optimal Paired-End Read Assembler) is a sequence assembly program.

Remarque To cite Opera please use the following citation: Song Gao, Wing-Kin Sung, Niranjan Nagarajan. Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. Journal of Computational Biology, Sept. 2011, doi:10.1089/cmb.2011.0170.
Run Unix # opera OR opera Run Web #

VersionMAJ

orthomcl

2.0.22012-01-31DownloadDoc
OrthoMCl est un logiciel qui construit des clusters d'orthologue à partir de fichiers multifasta contenant des CDS.

Remarque Se placer dans le reépertoire où se trouvent les données et lancer : orthomcl.pl --mode 1 --fa_files "Ath.fa,Hsa.fa,Sce.fa"
Run Unix # orthomcl.plRun Web #

VersionMAJ

otterlace

52.11 2011-01-27DownloadDoc
Otterlace is an interactive, graphical client, which uses a local acedb database with Zmap and perl/Tk tools to curate genomic annotation. Annotation is stored in an extended Ensembl schema (the "otter" database), which presents the annotator with contiguous regions of a chromosome. The acedb database provides local persistent storage, so that if the software or desktop machine crashes, reboots or is exited, the editing session can be recovered. Since all communication goes through the Sanger web server, annotators can work wherever there is a network connection.

Remarque
Run Unix # otterlaceRun Web #

VersionMAJ

paml

4.92015-03-30DownloadDoc
PAML is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood. It is maintained and distributed for academic use free of charge by Ziheng Yang. ANSI C source codes are distributed for UNIX/Linux/Mac OSX, and executables are provided for MS Windows. PAML is not good for tree making. It may be used to estimate parameters and test hypotheses to study the evolutionary process, when you have reconstructed trees using other programs such as PAUP*, PHYLIP, MOLPHY, PhyML, RaxML, etc.

Remarque
Run Unix # #baseml (basemlg codeml pamp evolver yn00 chi2)Run Web #

VersionMAJ

pandoc

1.9.4.12016-02-23DownloadDoc
Pandoc is a free and open-source software document converter, widely used as a writing tool and as a basis for publishing workflows. Pandoc can convert documents in markdown, reStructuredText, textile, HTML, DocBook, LaTeX, MediaWiki markup, TWiki markup, OPML, Emacs Org-Mode, Txt2Tags, Microsoft Word docx, LibreOffice ODT, EPUB, or Haddock markup to HTML formats: XHTML, HTML5, and HTML slide shows using Slidy, reveal.js, Slideous, S5, or DZSlides. Word processor formats: Microsoft Word docx, OpenOffice/LibreOffice ODT, OpenDocument XML Ebooks: EPUB version 2 or 3, FictionBook2 Documentation formats: DocBook, GNU TexInfo, Groff man pages, Haddock markup Page layout formats: InDesign ICML Outline formats: OPML TeX formats: LaTeX, ConTeXt, LaTeX Beamer slides PDF via LaTeX Lightweight markup formats: Markdown (including CommonMark), reStructuredText, AsciiDoc, MediaWiki markup, DokuWiki markup, Emacs Org-Mode, Textile Custom formats: custom writers can be written in lua.

Remarque
Run Unix # pandoc [OPTIONS] [FILES] Run Web #

VersionMAJ

PatScan

2007-12-12DownloadDoc
PatScan is a pattern matcher which searches protein or nucleotide (DNA, RNA, tRNA etc.) sequence archives for instances of a pattern which you input.

Remarque patscan pat_file < input_file
Run Unix # patscanRun Web #

VersionMAJ

pfam_scan.pl

2012-11-29DownloadDoc
pfam_scan.pl - search protein fasta sequences against the Pfam library of HMMs.

Remarque
Run Unix # pfam_scan.pl -fasta -dir /usr/local/genome/PfamScan/databases Run Web #

VersionMAJ

pftools

2.3.42004-04-10DownloadDoc
Le paquetage pftools est une collection de programmes expérimentaux qui permet de manipuler le format généralisé de profils et implémente les méthodes de recherche de PROSITE. Les commandes accessibles sont les suivantes : gtop, pfsearch, pfscan, psa2msa, pfmake, pfw, ptoh, htop, pfscale, pftof.

Remarque
Run Unix # pfsearchRun Web # pfsearch

VersionMAJ

phast

1.32013-07-23DownloadDoc
PHAST is a freely available software package for comparative and evolutionary genomics. It consists of about half a dozen major programs, plus more than a dozen utilities for manipulating sequence alignments, phylogenetic trees, and genomic annotations (see left panel). For the most part, PHAST focuses on two kinds of applications: the identification of novel functional elements, including protein-coding exons and evolutionarily conserved sequences; and statistical phylogenetic modeling, including estimation of model parameters, detection of signatures of selection, and reconstruction of ancestral sequences. It consists of over 60,000 lines of C code.

Remarque
Run Unix # phastRun Web #

VersionMAJ

phd2fasta

0.990622.f2005-07-22DownloadDoc
Phd2fasta reads phd files and writes sequence and quality value FASTA files, which phrap and cross_match need as input. Phred and consed write sequence and quality value information in 'phd' output files. A phd file contains information in a header, the called bases, the base quality values, and the base call trace locations.

Remarque
Run Unix # phd2fasta -id ../phd_dir -os fasta_seq -oq fasta_seq.qualRun Web #

VersionMAJ

phenix

1.8.12012-12-12DownloadDoc
PHENIX is a software suite for the automated determination of macromolecular structures using X-ray crystallography and other methods.

Remarque Citing PHENIX: PHENIX: a comprehensive Python-based system for macromolecular structure solution. P. D. Adams, P. V. Afonine, G. Bunkóczi, V. B. Chen, I. W. Davis, N. Echols, J. J. Headd, L.-W. Hung, G. J. Kapral, R. W. Grosse-Kunstleve, A. J. McCoy, N. W. Moriarty, R. Oeffner, R. J. Read, D. C. Richardson, J. S. Richardson, T. C. Terwilliger and P. H. Zwart. Acta Cryst. D66, 213-221 (2010).
Run Unix # phenixRun Web #

VersionMAJ

phobius

1.012010-03-23DownloadDoc
A combined transmembrane topology and signal peptide prediction method.

Remarque http://www.ncbi.nlm.nih.gov/pubmed/15111065?dopt=Abstract
Run Unix # phobius.pl [options] [infile]Run Web #

VersionMAJ

phrap

1.0905182010-01-18DownloadDoc
phrap is a program for assembling shotgun DNA sequence data. Among other features, it allows use of the entire read and not just the trimmed high quality part, it uses a combination of user-supplied and internally computed data quality information to improve assembly accuracy in the presence of repeats, it constructs the contig sequence as a mosaic of the highest quality read segments rather than a consensus, it provides extensive assembly information to assist in trouble-shooting assembly problems, and it handles large datasets.

Remarque Marche avec cross_match et swat, loco et cluster La version manyreads permet de lire plus de trace.
Run Unix # phrapRun Web #

VersionMAJ

phrapview

DownloadDoc
visualisateur des resultats d'assemblage issus de phraps

Remarque
Run Unix # Run Web #

VersionMAJ

phred

020425.c2005-07-27DownloadDoc
Phred reads DNA sequencer trace data, calls bases, assigns quality values to the bases, and writes the base calls and quality values to output files. Phred can read trace data from chromatogram files in the SCF, ABI, and ESD formats. It automatically determines the file format, and whether the chromatogram file was compressed using gzip, bzip2, or UNIX compress. After calling bases, phred writes the sequences to files in either FASTA format, the format suitable for XBAP, PHD format, or the SCF format. Quality values for the bases are written to FASTA format files or PHD files, which can be used by the phrap sequence assembly program in order to increase the accuracy of the assembled sequence. phred, phrap, consed are Unix programs that work as a group for analysis of new DNA sequences. They do the following: phred: Base calling and quality assignments phrap: Contig formation and new quality assignments consed: Visual X-Windows graphic interface, to view and edit alignments and contigs, and to view the original traces

Remarque
Run Unix # phred ou phredPhrapRun Web #

VersionMAJ

phredPhrap

030415DownloadDoc
It runs phred on all *new* reads (reads for which there is no phd file. It runs determineReadTypes.perl so consed, autofinish, and phrap will understand your read naming convention Then it runs crossmatch to screen them for vector. Then it runs phd2fasta to create 2 fasta files (one containing read bases and one containing read quality. These are of the highest versions of each read (in case any editing has been done). It runs phrap It runs transferConsensusTags to transfer any consensus tags from the newest old ace file to the one phrap created in step 4 It runs tagRepeats.perl to tag any common repeats (such as ALU) that you want to have automatically tagged for the benefit of consed users. See README.txt "INSTALLING CONSED Typically, you just type: phredPhrap Within the project, there are 3 directories: chromat_dir (with the chromats), phd_dir (with the phd files) and edit_dir (with the ace files and other files). You type "phredPhrap" from within edit_dir.

Remarque Frontal pour la suite phred phrap
Run Unix # phrepPhrapRun Web #

VersionMAJ

phusion

2.1c2008-07-29DownloadDoc
Phusion Assembler --- Phusion is a software package for assembling genome sequences from whole genome shotgun(WGS) reads. The Phusion assembler takes WGS reads, mostly paired with known insert sizes, as input along with quality score assigned for each base and produces a set of supercontigs (scaffords) .

Remarque
Run Unix # Run Web #

VersionMAJ

phylip-3.69

3.692013-02-21DownloadDoc
PHYLIP is a free package of programs for inferring phylogenies.

Remarque
Run Unix # dnaml Run Web # http://bioweb.pasteur.fr/seqanal/phylogeny/phylip-uk.html

VersionMAJ

phylobayes

3.3f2013-08-22DownloadDoc
PhyloBayes is a Bayesian Monte Carlo Markov Chain (MCMC) sampler for phylogenetic reconstruction and molecular dating using protein and nucleic acid alignments. Compared to other phylogenetic MCMC samplers, the main distinguishing feature of PhyloBayes is the use of nonparametric methods for modelling site-specific features of sequence evolution.

Remarque
Run Unix # Run Web #

VersionMAJ

phylo_win

2.02007-06-28DownloadDoc
Phylo_win (programme du groupe Manolo Gouy à Lyon) Il offre une interface graphique pour la phylogénie.

Remarque
Run Unix # phylo_winRun Web #

VersionMAJ

PhyML

3.12014-08-06DownloadDoc
PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm to perform Nearest Neighbor Interchanges (NNIs), in order to improve a reasonable starting tree topology. Since the original publication (Guindon and Gascuel 2003), PhyML has been widely used (>1,250 citations in ISI Web of Science), due to its simplicity and a fair accuracy/speed compromise. In the mean time research around PhyML has continued. We designed an efficient algorithm to search the tree space using Subtree Pruning and Regrafting (SPR) topological moves (Hordijk and Gascuel 2005), and proposed a fast branch test based on an approximate likelihood ratio test (Anisimova and Gascuel 2006). However, these novelties were not included in the official version of PhyML, and we found that improvements were still needed in order to make them effective in some practical cases. PhyML 3.0 achieves this task. It implements new algorithms to search the space of tree topologies with user-defined intensity. A non-parametric, Shimodaira-Hasegawa-like branch test is also available. The program provides a number of new evolutionary models and its interface was entirely re-designed. We tested PhyML 3.0 on a large collection of real data sets to ensure that the new version is stable, ready-to-use and still reasonably fast and accurate.

Remarque
Run Unix # phymlRun Web #

VersionMAJ

picard-tools

2.0.12016-02-22DownloadDoc
A set of Java command line tools for manipulating high-throughput sequencing data (HTS) data and formats. Picard is implemented using the HTSJDK Java library HTSJDK, supporting accessing of common file formats, such as SAM and VCF, used for high-throughput sequencing data.

Remarque
Run Unix # picard -h ou encore PicardCommandLine [-h] Run Web #

VersionMAJ

pindel

0.2.5a82015-02-11DownloadDoc
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.

Remarque Cite Pindel: Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009 Nov 1;25(21):2865-71. Epub 2009 Jun 26.
Run Unix # Usage: pindel -f -p [and/or -i bam_configuration_file] -c -o Run Web #

VersionMAJ

PIPITS

1.4.02016-11-16DownloadDoc
PIPITS is an automated pipeline for analyses of fungal internal transcribed spacer (ITS) sequences from the Illumina sequencing platform. PIPITS is designed to work best on Bio-Linux (http://environmentalomics.org/bio-linux/) and Ubuntu. Unfortunately, it's NOT supported on Windows or a Mac If you are using Bio-Linux, most of the dependencies are already on Bio-Linux. Otherwise, you will have to set up the dependencies yourself. If you are using Ubuntu, then instructions on how to set up dependencies are described below (1.8).

Remarque Citation Hyun S. Gweon, Anna Oliver, Joanne Taylor, Tim Booth, Melanie Gibbs, Daniel S. Read, Robert I. Griffiths and Karsten Schonrogge, PIPITS: an automated pipeline for analyses of fungal internal transcribed spacer sequences from the Illumina sequencing platform, Methods in Ecology and Evolution, DOI: 10.1111/2041-210X.12399
Run Unix # pipits_envRun Web #

VersionMAJ

plast

1.02009-06-08DownloadDoc
PLAST : Parallel Local Alignment Search Tool

Remarque
Run Unix # plastallRun Web #

VersionMAJ

platanus

1.2.12015-03-06DownloadDoc
Platanus is a de novo assembler designed to assemble high-throughput data. It can handle highly heterozygotic samples. The following is the assembly outline. First, it constructs contigs using the algorithm based on de Bruijn graph. Second, the order of contigs is determined according to paired-end (mate-pair) data and constructs scaffolds. Finally, paired-end reads localized on gaps in scaffolds are assembled and gaps are closed.

Remarque To reference the Platanus assembler, please cite : Kajitani R. et al. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Research 24:1384-95.
Run Unix # Usage: platanus Command [options] Command: assemble, scaffold, gap_close Run Web #

VersionMAJ

plink

1.902015-02-24DownloadDoc
PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

Remarque For documentation, citation & bug-report instructions: http://pngu.mgh.harvard.edu/purcell/plink/
Run Unix # plinkRun Web #

VersionMAJ

polya_svm

2.2 2013-01-29DownloadDoc
This program takes a file containing DNA/RNA sequences in the FASTA format as input, and 1) makes prediction for putative mRNA polyadenylation sites [or poly(A) sites] and/or 2) generates results indicating the occurrences of different cis-element

Remarque
Run Unix # polya_svm.pl Run Web #

VersionMAJ

polyphred

5.02DownloadDoc
PolyPhred is a program that compares fluorescence-based sequences across traces obtained from different individuals to identify heterozygous sites for single nucleotide substitutions.

Remarque
Run Unix # polyphredRun Web #

VersionMAJ

poretools

0.6.02016-11-30DownloadDoc
poretools: a toolkit for working with nanopore sequencing data from Oxford Nanopore. The MinION (TM) from Oxford Nanopore Technologies (ONT) is the first nanopore sequencer to be commercialised and is now available to early-access users. The MinION (TM) is a USB-connected, portable nanopore sequencer which permits real-time analysis of streaming event data. Currently, the research community lacks a standardized toolkit for the analysis of nanopore datasets.

Remarque
Run Unix # poretools_envRun Web #

VersionMAJ

primer3

2.2.32011-01-25DownloadDoc
Primer3 is a complete rewrite of the original PRIMER programs(Primer 0.5), written by Steve Lincoln, Mark Daly, and EricsLander. See DIFFERENCES FROM EARLIER VERSIONS for a discussionsof how Primer3 differs from its predecessors, Primer 0.5 andsPrimer v2.ssPrimer3 picks primers for PCR reactions, considering as criteria:sso oligonucleotide melting temperature, size, GC content,s and primer-dimer possibilities,sso PCR product size,sso positional constraints within the source sequence, andsso miscellaneous other constraints.s

Remarque s
Run Unix # primer3_core [-format_output] [-2x_compat] [-strict_tags]Run Web # http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi

VersionMAJ

prinseq

0.17.12013-08-20DownloadDoc
PRINSEQ CAN BE USED TO FILTER, REFORMAT, OR TRIM YOUR GENOMIC AND METAGENOMIC SEQUENCE DATA. IT GENERATES SUMMARY STATISTICS OF YOUR $ GRAPHICAL AND TABULAR FORMAT.

Remarque
Run Unix # prinseq-lite.pl -hRun Web #

VersionMAJ

probcons

1.122010-04-11DownloadDoc
PROBCONS is a novel tool for generating multiple alignments of protein sequences. Using a combination of probabilistic modeling and consistency-based alignment techniques, PROBCONS has achieved the highest accuracies of all alignment methods to date. On the BAliBASE benchmark alignment database, alignments produced by PROBCONS show statistically significant improvement over current programs, containing an average of 7% more correctly aligned columns than those of T-Coffee, 11% more correctly aligned columns than those of CLUSTAL W, and 14% more correctly aligned columns than those of DIALIGN.

Remarque Publications using the PROBCONS tool should cite:Do, C.B., Mahabhashyam, M.S.P., Brudno, M., and Batzoglou, S. 2005. PROBCONS: Probabilistic Consistency-based Multiple Sequence Alignment. Genome Research 15:330-340.
Run Unix # probcons [OPTION]... [MFAFILE]...Run Web #

VersionMAJ

ProbeMatch

-2010-05-11DownloadDoc
ProbeMatch is a sequence alignment program that finds sequence alignments for short DNA sequences ( 36-50 bp ). Unlike other programs such as eland and soap that perform ungapped alignment allowing up to 2 substitution, Probematch performs *gapped* alignment, allowing up to 3 errors including substitution, insertion, and deletion.

Remarque
Run Unix # probematch [options] ou # probematch --help Run Web #

VersionMAJ

procheck

3.5.42007-05-21DownloadDoc

Remarque
Run Unix # Run Web #

VersionMAJ

prodigal

2.602013-03-26DownloadDoc
Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program developed at Oak Ridge National Laboratory and the University of Tennessee. Key features of Prodigal include: Speed: Prodigal is an extremely fast gene recognition tool (written in very vanilla C). It can analyze an entire microbial genome in 30 seconds or less. Accuracy: Prodigal is a highly accurate gene finder. It correctly locates the 3' end of every gene in the experimentally verified Ecogene data set (except those containing introns). It possesses a very sophisticated ribosomal binding site scoring system that enables it to locate the translation initiation site with great accuracy (96% of the 5' ends in the Ecogene data set are located correctly). Specificity: Prodigal's false positive rate compares favorably with other gene identification programs, and usually falls under 5%. GC-Content Indifferent: Prodigal performs well even in high GC genomes, with over a 90% perfect match (5'+3') to the Pseudomonas aeruginosa curated annotations. Metagenomic Version: Prodigal can run in metagenomic mode and analyze sequences even when the organism is unknown. Ease of Use: Prodigal can be run in one step on a single genomic sequence or on a draft genome containing many sequences. It does not need to be supplied with any knowledge of the organism, as it learns all the properties it needs to on its own.

Remarque Prodigal Reference: Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010 Mar 8;11(1):119. (Highly Accessed)
Run Unix # prodigal -h Run Web #

VersionMAJ

prokka

1.102014-11-02DownloadDoc
Prokka is a software tool for the rapid annotation of prokaryotic genomes. A typical 4 Mbp genome can be fully annotated in less than 10 minutes on a quad-core computer, and scales well to 32 core SMP systems. It produces GFF3, GBK and SQN files that are ready for editing in Sequin and ultimately submitted to Genbank/DDJB/ENA.

Remarque
Run Unix # prokka [options] Run Web #

VersionMAJ

PROSE

DownloadDoc
The relational database PROSE contains protein sequences from Swissprot and Trembl

Remarque
Run Unix # Run Web # http://genome.jouy.inra.fr/prose

VersionMAJ

ProtTest

3.02011-08-03DownloadDoc
PROTTEST3 is a high-performance computing program for selecting the model of protein evolution that best fits a given set of aligned sequences. This java program uses the Phyml program (for maximum likelihood calculations and optimization of parameters) and the PAL library for handling trees, and the ALTER library for reading aligment formats. Empirical models included are as WAG, LG, mtREV, Dayhoff, DCMut, JTT, VT, Blosum62, CpREV, RtREV, MtMam, MtArt, HIVb/HIVw and FLU, plus +I:invariable sites, +G: rate heterogeneity among sites and +F: observed amino acid frequencies. ProtTest uses the Akaike Information Criterion (AIC) and other statistics (AICc, BIC and DT) to find which of the candidate models best fits the data at hand. It also implements the calculation of model-averged phylogenies.

Remarque Citation: Darriba D, Taboada GL, Doallo R, Posada D. In press. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics.
Run Unix # runProtTestHPC ou runXProtTestHPCRun Web #

VersionMAJ

psicov

1.052012-07-26DownloadDoc
Accurate Contact Prediction from large protein alignments

Remarque
Run Unix # psicov [options] alnfileRun Web #

VersionMAJ

psipred

3.52014-06-07DownloadDoc
PSIPRED is a simple and reliable secondary structure prediction method, incorporating two feed-forward neural networks which perform an analysis on output obtained from PSI-BLAST (Position Specific Iterated - BLAST).Version 2.0 of PSIPRED includes a new algorithm which averages the output from up to 4 separate neural networks in the prediction process to further increase prediction accuracy.

Remarque Utilsie les sorties de psiblast
Run Unix # psipredRun Web #

VersionMAJ

PSMC

0.6.52016-04-07DownloadDoc
This software package infers population size history from a diploid sequence using the Pairwise Sequentially Markovian Coalescent (PSMC) model.

Remarque
Run Unix # psmc [options] input.txtRun Web #

VersionMAJ

psort

3.0.32012-05-30DownloadDoc
PSORT is a computer program for the prediction of protein localization sites in cells. It receives the information of an amino acid sequence and its source orgin, e.g., Gram-negative bacteria, as inputs. Then, it analyzes the input sequence by applying the stored rules for various sequence features of known protein sorting signals. Finally, it reports the possiblity for the input protein to be localized at each candidate site with additional information.

Remarque
Run Unix # psortRun Web #

VersionMAJ

pymol

0.992009-04-21DownloadDoc
Pymol est un logiciel de visualisation moléculaire associé à un interpréteur Python qui permet la visualisation en temps réel ainsi que la génération rapide et de qualité dŽanimations et dŽimages dŽassemblages moléculaires.

Remarque
Run Unix # pymolRun Web #

VersionMAJ

pynast

0.12012-07-23DownloadDoc
PyNAST: Python Nearest Alignment Space Termination tool PyNAST is a reimplementation of the NAST sequence aligner, which has become a popular tool for adding new 16s rDNA sequences to existing 16s rDNA alignments. This reimplementation is more flexible, faster, and easier to install and maintain than the original NAST implementation. PyNAST is built using the PyCogent Bioinformatics Toolkit. The first versions of PyNAST (through PyNAST 1.0) were written to exactly match the results of the original NAST algorithm. Beginning with the post PyNAST 1.0 development code, PyNAST no longer exactly matches the NAST output but is instead focused on getting better alignments. Users who wish to exactly match the results of NAST should download PyNAST 1.0.

Remarque PyNAST: a flexible tool for aligning sequences to a template alignment. J. Gregory Caporaso, Kyle Bittinger, Frederic D. Bushman, Todd Z. DeSantis, Gary L. Andersen, and Rob Knight. January 15, 2010, DOI 10.1093/bioinformatics/btp636. Bioinformatics 26: 266-267.
Run Unix # pynast [options] {-i input_fp -t template_fp} ou pynast -hRun Web #

VersionMAJ

qiime

1.9.12016-11-14DownloadDoc
QIIME (pronounced "chime") stands for Quantitative Insights Into Microbial Ecology. QIIME is an open source software package for comparison and analysis of microbial communities, primarily based on high-throughput amplicon sequencing data (such as SSU rRNA) generated on a variety of platforms, but also supporting analysis of other types of data (such as shotgun metagenomic data). QIIME takes users from their raw sequencing output through initial analyses such as OTU picking, taxonomic assignment, and construction of phylogenetic trees from representative sequences of OTUs, and through downstream statistical analysis, visualization, and production of publication-quality graphics. QIIME has been applied to single studies based on billions of sequences from thousands of samples.

Remarque
Run Unix # qiime_envRun Web #

VersionMAJ

qpdf

5.1.32015-08-26DownloadDoc
QPDF is a command-line program that does structural, content-preserving transformations on PDF files. It could have been called something like pdf-to-pdf. It also provides many useful capabilities to developers of PDF-producing software or for people who just want to look at the innards of a PDF file to learn more about how they work.

Remarque
Run Unix # qpdf [options] infile outfileRun Web #

VersionMAJ

Quake

0.3.52014-10-02DownloadDoc
Quake is a package to correct substitution sequencing errors in experiments with deep coverage (e.g. >15X), specifically intended for Illumina sequencing reads. Quake adopts the k-mer error correction framework, first introduced by the EULER genome assembly package. Unlike EULER and similar progams, Quake utilizes a robust mixture model of erroneous and genuine k-mer distributions to determine where errors are located. Then Quake uses read quality values and learns the nucleotide to nucleotide error rates to determine what types of errors are most likely. This leads to more corrections and greater accuracy, especially with respect to avoiding mis-corrections, which create false sequence unsimilar to anything in the original genome sequence from which the read was taken.

Remarque Kelley DR, Schatz MC, Salzberg SL. Quake: quality-aware detection and correction of sequencing errors. Genome Biology 11:R116 2010. (http://genomebiology.com/2010/11/11/R116/abstract)
Run Unix # quake.py --helpRun Web # 0.3.5

VersionMAJ

quantiNemo

1.0.42015-02-24DownloadDoc
quantiNEMO is an individual-based, genetically explicit stochastic simulation program. It was developed to investigate the effects of selection, mutation, recombination, and drift on quantitative traits with varying architectures in structured populations connected by migration and located in a heterogeneous habitat. quantiNEMO is highly flexible at various levels: population, selection, trait(s) architecture, genetic map for QTL and/or markers, environment, demography, mating system, etc.

Remarque
Run Unix # quantiNemoRun Web #

VersionMAJ

quast

3.22016-03-08DownloadDoc
QUality ASsesment Tool for Genome Assembly QUAST evaluates a quality of genome assemblies by computing various metrics and providing nice reports.

Remarque Citation : Alexey Gurevich, Vladislav Saveliev, Nikolay Vyahhi and Glenn Tesler, QUAST: quality assessment tool for genome assemblies, Bioinformatics (2013) 29 (8): 1072-1075. doi: 10.1093/bioinformatics/btt086 First published online: February 19, 2013
Run Unix # quast.py [options] metaquast.py [options] Run Web #

VersionMAJ

QuickTree

1.12006-02-21DownloadDoc
QuickTree is a program for the rapid reconstruction of phylogenies by the Neighbor-Joining method. For details, see the article published in the journal 'Bioinformatics' (18:1546-1547).

Remarque
Run Unix # quicktreeRun Web #

VersionMAJ

quip

1.1.42013-02-21DownloadDoc
Quip compresses next-generation sequencing data with extreme prejudice. It supports input and output in the FASTQ and SAM/BAM formats, compressing large datasets to as little as 15% of their original size.

Remarque Compression of next-generation sequencing reads aided by highly efficient de novo assembly Daniel C. Jones; Walter L. Ruzzo; Xinxia Peng; Michael G. Katze — Nucleic Acids Research 2012; doi: 10.1093/nar/gks754
Run Unix # quip Run Web #

VersionMAJ

R

3.1.12014-08-08DownloadDoc
R is a language and environment for statistical computing and graphics. In the context of the analysis of genomic data, R includes some statistical packages for clustering, linear model, anova, ...(downloaded from the CRAN). There is also others packages dedicated for the microarray analysis (downloaded from the CRAN). The last the R-project about bioanalysis is named bioconductor (http://www.bioconductor.org/) for the analysis and comprehension of genomic data. The packages anapuce and varmixt developped by the team Statistique et génome (OMIP department INA P-G & INRA - http://www.inapg.fr/ens_rech/mathinfo/recherche/mathematique/outil.html) for differential analysis are also available on the platform.

Remarque Pour avoir l'aide #help.start(browser="mozilla") ou tout autre navigateur non deja utilise (ouvert)
Run Unix # RRun Web #

VersionMAJ

rainbow

2.02012-09-10DownloadDoc
Rainbow package consists of several programs used for RAD-seq related clustering and de novo assembly.

Remarque
Run Unix # rainbow [options]Run Web #

VersionMAJ

rasmol

2.7.2.1.12004-10-07DownloadDoc
Software for looking at macromolecular structure and its relation to function

Remarque
Run Unix # rasmolRun Web #

VersionMAJ

ratt

-2010-10-29DownloadDoc
RATT is software to transfer annotation from a reference (annotated) genome to an unannotated query genome.

Remarque
Run Unix # start.ratt.sh Run Web #

VersionMAJ

raxml

7.3.02013-04-11DownloadDoc
RAxML (Randomized Axelerated Maximum Likelihood) is a program for sequential and parallel Maximum Likelihood [1] based inference of large phylogenetic trees. It has originally been derived from fastDNAml which in turn was derived from Joe Felsentein’s dnaml which is part of the PHYLIP [2] package.

Remarque If you use RAxML please always cite the following paper: Alexandros Stamatakis : “RAxML-VI-HPC: Maximum Likelihood-based Phylogenetic Analyses with Thousands of Taxa and Mixed Models”, Bioinformatics 22(21):2688–2690, 2006 [4].
Run Unix # raxmlHPC -h ou raxmlHPC-MPI -hRun Web #

VersionMAJ

ray

2.3.12014-06-19DownloadDoc
Ray is a parallel de novo genome assembler that utilises the message-passing interface everywhere and is implemented using peer-to-peer communication.

Remarque Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. Sébastien Boisvert, François Laviolette, and Jacques Corbeil. Journal of Computational Biology (Mary Ann Liebert, Inc. publishers). November 2010, 17(11): 1519-1533. doi:10.1089/cmb.2009.0238
Run Unix # Ray -help Run Web #

VersionMAJ

rdp_classifier

2.22014-08-02DownloadDoc
Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy.

Remarque Exemple : classifier testQuerySeq.fasta mon_result train052008/rRNAClassifier.properties ---Il faut préalablement avoir dans son home le repertoire train052008 http://downloads.sourceforge.net/rdp-classifier/RDPClassifier_train052008.tar.gz) (http://rdp.cme.msu.edu/tmp_download/train3.tar.gz)
Run Unix # classifierRun Web #

VersionMAJ

readseq2

2.1.302014-06-23DownloadDoc
Read and reformat biosequences

Remarque
Run Unix # readseq2 [option]Run Web #

VersionMAJ

ReAS

2.022011-06-30DownloadDoc
ReAS: Recovery of Ancestral Sequences for Transposable Elements from the Unassembled Reads of a Whole Genome Shotgun

Remarque http://www.ploscompbiol.org/article/info:doi%2F10.1371%2Fjournal.pcbi.0010043
Run Unix # Run Web #

VersionMAJ

RepeatMasker

3.3.02011-07-04DownloadDoc

Remarque Pour rechercher une espèce par exemple bos_taurus : /usr/local/genome/RepeatMasker/util/queryTaxonomyDatabase.pl -species "bos taurus"
Run Unix # RepeatMaskerRun Web #

VersionMAJ

RepeatScout

1.052011-06-30DownloadDoc
RepeatScout is a tool to discover repetitive substrings in DNA.

Remarque If you use RepeatScout, please cite the following paper: Price A.L., Jones N.C. and Pevzner P.A. 2005. De novo identification of repeat families in large genomes. To appear in Proceedings of the 13 Annual International conference on Intelligent Systems for Molecular Biology (ISMB-05). Detroit, Michigan.
Run Unix # *1/ build_lmer_table -l -sequence -freq [opts] **2/ RepeatScout -sequence -output -freq -l [opts]Run Web #

VersionMAJ

reptile

2.02012-05-03DownloadDoc
Reptile is a software developed in C++ for correcting sequencing errors in short reads from next-gen sequencing platforms.

Remarque
Run Unix # reptile-ompRun Web #

VersionMAJ

RHOM

31.5DownloadDoc
R'HOM (Recherche de régions HOMogènes) est un programme pour la segmentation de séquences d'ADN en régions de composition homogènes par chaînes de Markov cachées. L'utilisateur choisi le nombre de type de composition différentes et la longueur des mots à prendre en compte. Les paramètres sont ensuite estimés par maximum de vraisemblance (algorithme EM) et la séquence est finalement segmentée avec l'algorithme forward backward. R'HOM a été initialement développé pendant la thèse de doctorat de Florence Muri et a été ensuite en grande partie ré-implémenté.

Remarque
Run Unix # rhom.emRun Web #

VersionMAJ

riboPicker

0.4.32013-07-29DownloadDoc
Easy identification and removal of rRNA-like sequences. The riboPicker tool can be used to automatically identify and efficiently remove rRNA-like sequences from metatranscriptomic and metagenomic datasets. It is easily configurable and provides a user-friendly interface.

Remarque
Run Unix # ribopicker [options] -f -dbs ...Run Web #

VersionMAJ

rmes

3.1.02014-08-20DownloadDoc
Programme pour détecter des mots ou motifs ayant une fréquence statistiquement exceptionnelle dans une séquence biologique. (R'MES pour Recherche de Mots Exceptionnels dans les Séquences)

Remarque Voici ce qu'il y a de nouveau par rapport à la version 3.01 : Changements majeurs : - amélioration significative du temps de calcul dans le cas des approximations Gaussiennes, quelque soit l'ordre du modèle, - levée de la contrainte sur la taille des noms des familles de mots. Changements mineurs : - renommage des options de sélection de seuil dans l'outil de mise en forme des résultats (--minthresh et --maxthresh deviennent --tmin et --tmax), - modification de l'ordre de présentation pour les résultats de calcul de biais (triés selon le score, et non plus alphabétiquement). Pour toutes questions, contactez Sophie.Schbath@jouy.inra.fr
Run Unix # rmes [options] -s -o rmes --helpRun Web #

VersionMAJ

rmesplot

0.922007-10-31DownloadDoc

Remarque
Run Unix # rmesplotRun Web #

VersionMAJ

rna2map

0.5.02009-09-10DownloadDoc
The SOLiD System Small RNA Analysis Pipeline Tool (RNA2MAP) can be used to perform whole genome analysis of color space RNA library reads. It consists of three major procedures: filtering, matching against miRBase sequences (Sanger), and matching against a reference genome.

Remarque
Run Unix # Run Web #

VersionMAJ

rnammer

1.22014-04-30DownloadDoc
RNAmmer 1.2 predicts 5s/8s, 16s/18s, and 23s/28s ribosomal RNA in full genome sequences

Remarque
Run Unix # rnammer [options] (man rnammer)Run Web #

VersionMAJ

RUM

1.12.012012-05-03DownloadDoc
RUM is an alignment, junction calling, and feature quantification pipeline specifically designed for Illumina RNA-Seq data.

Remarque Comparative Analysis of RNA-Seq Alignment Algorithms and the RNA-Seq Unified Mapper (RUM) Gregory R. Grant, Michael H. Farkas, Angel Pizarro, Nicholas Lahens, Jonathan Schug, Brian Brunk, Christian J. Stoeckert Jr, John B. Hogenesch and Eric A. Pierce.
Run Unix # RUM_runner.pl [options] Run Web #

VersionMAJ

samToFastq

1.62(1113) 2012-02-17DownloadDoc
Extracts read sequences and qualities from the input SAM/BAM file and writes them into the output file in Sanger fastq format. In the RC mode (default is True), if the read is aligned and the alignment is to the reverse strand on the genome, the read's sequence from input SAM file will be reverse-complemented prior to writing it to fastq in order restore correctly the original read sequence as it was generated by the sequencer.

Remarque
Run Unix # Run Web #

VersionMAJ

SAMtools

1.22015-04-15DownloadDoc
SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.

Remarque
Run Unix # samtools [options]Run Web #

VersionMAJ

SAS

9.22010-12-16DownloadDoc
SAS - Statistical Analysis System

Remarque
Run Unix # sasxRun Web #

VersionMAJ

scilab

4.1.22007-11-29DownloadDoc
Scilab est un logiciel de calcul numérique scientifique qui fournit un puissant environnement de développement pour les applications scientifiques et l’ingénierie.

Remarque
Run Unix # scilabRun Web #

VersionMAJ

scwrl3

3.02005-10-10DownloadDoc
SCWRL3.0 is a completely new version of the SCWRL program for prediction of protein side-chain conformations. SCWRL3.0 is based on a new algorithm based on graph theory that solves the combinatorial problem in side-chain prediction more rapidly than any other available program. SCWRL3.0 is more accurate than previous versions of SCWRL, while the new algorithm will allow for development of more sophisticated energy functions and for incorporation of side-chain flexibility around rotameric positions.

Remarque
Run Unix # scwrl3Run Web #

VersionMAJ

seaview

4.220130-01-30DownloadDoc
SeaView is a graphical multiple sequence alignment editor. SeaView is able to read and write various alignment formats (NEXUS, MSF, CLUSTAL, FASTA, PHYLIP,MASE)

Remarque
Run Unix # seaviewRun Web #

VersionMAJ

seq-gen

1.3.32012-08-24DownloadDoc
Seq-Gen is a program that will simulate the evolution of nucleotide or amino acid sequences along a phylogeny, using common models of the substitution process. A range of models of molecular evolution are implemented including the general reversible model. State frequencies and other parameters of the model may be given and site-specific rate heterogeneity may also be incorporated in a number of ways. Any number of trees may be read in and the program will produce any number of data sets for each tree. Thus large sets of replicate simulations can be easily created. It has been designed to be a general purpose simulator that incorporates most of the commonly used (and computationally tractable) models of molecular sequence evolution.

Remarque
Run Unix # seq-genRun Web #

VersionMAJ

seqmap

1.0.122009-02-19DownloadDoc
SeqMap is a tool for mapping large amount of oligonucleotide to the genome. It is designed for finding all the places in a genome where an oligonucleotide could potentially come from. SeqMap can efficiently map as many as dozens of millions of short sequences to a genome of several billions of nucleotides. While doing the mapping, several mutations as well as insertions/deletions of the nucleotide bases in the sequences can be tolerated and furthermore detected. Various input and output formats are supported, as well as many command line options for tuning almost every steps in the mapping process.

Remarque Publication: http://dx.doi.org/10.1093/bioinformatics/btn429
Run Unix # seqmapRun Web #

VersionMAJ

seqtk

--2013-02-26DownloadDoc
Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip.

Remarque
Run Unix # seqtk Run Web #

VersionMAJ

sequin

5.352005-07-01DownloadDoc
A DNA Sequence Submission and Update Tool

Remarque
Run Unix # sequinRun Web # http://www.ncbi.nlm.nih.gov/Sequin/

VersionMAJ

SHOW

201111092011-11-11DownloadDoc
SHOW (Structured HOMgeneity Watcher) permet une utilisation souple des modeles de chaines de Markov cachees. L'utilisateur peut construire son propre modele dont les parametres peuvent ensuite etre estimes par maximum de vraisemblance avec l'algorithme EM. Le modele peut alors servir a faire des predictions avec l'algorithme forward-backward (posterior decoding) ou avec l'algorithme de Viterbi. Il peut aussi servir a simuler des sequences. SHOW implemente aussi un detecteur de genes bacteriens. L'utilisateur n'a alors pas a se soucier du modele ni des parametres. SHOW a deja servi a annoter des genomes complets publies.

Remarque
Run Unix # show_viterbi # show2mugen.plRun Web #

VersionMAJ

showVenn

1.02010-02-26DownloadDoc
Cet outil permet de manipuler des listes d'identifiants sous la forme d'un diagramme de Venn. On peut ainsi trouver les éléments en commun ou originaux de 5 listes différentes. En cliquant sur les différents territoires du diagramme, l'utilisateur récupère les identifiants qui correspondent au sous ensemble sélectionné.

Remarque
Run Unix # Run Web # http://tomcat.jouy.inra.fr/Venn/

VersionMAJ

sickle

1.2002013-02-26DownloadDoc
sickle - A windowed adaptive trimming tool for FASTQ files using quality

Remarque
Run Unix # sickle [options]Run Web #

VersionMAJ

signalp

4.02014-05-07DownloadDoc
Détection de séquence signal et de site de clivage sur les séquences protéiques de bactéries Gram+, Gram- et d'eucaryotes.

Remarque The SIGNALP package is a property of Center for Biological Sequence Analysis It may be downloaded only by special agreement (contact software@cbs.dtu.dk).
Run Unix # signalpRun Web #

VersionMAJ

sim4

2003-09-212006-03-28DownloadDoc
SIM4 recherche les meilleurs alignements locaux entre une séquence d'ADNc et une séquence d'ADN génomique (ARNm, EST) contenant ce gène et autorisant la présence d'introns et un petit nombre d'erreurs de séquençage.

Remarque http://globin.cse.psu.edu/html/docs/sim4.html
Run Unix # sim4 mouse_cDNA human_genomic K=15 C=11 A=3 W=10Run Web #

VersionMAJ

Simbac

"master"2017-05-03DownloadDoc
Simulation of whole bacterial genomes with homologous recombination

Remarque
Run Unix # SimBac [OPTIONS] Run Web #

VersionMAJ

SIMPA

1.0DownloadDoc
SIMPA est un programme de prédiction de la structure secondaire des protéines. 3 états sont pris en considération : l'hélice alpha (H), les brins bêta (b) et les structures apériodiques (C). Ce programme est basé sur la notion de "nearest neighbor". Il fournit un résultat Q3 de 67%.

Remarque
Run Unix # Run Web #

VersionMAJ

SimWalk2

2.912007-01-31DownloadDoc
SimWalk2 is a statistical genetics computer application for haplotype, parametric linkage, non-parametric linkage (NPL), identity by descent (IBD) and mistyping analyses on any size of pedigree. SimWalk2 uses Markov chain Monte Carlo (MCMC) and simulated annealing algorithms to perform these multipoint analyses.

Remarque Ces fichiers sont indispensables a l'utilisation de SimWalk2 et doivent être là où on lancera le soft. (MAP.DAT, LOCUS.DAT, PEDIGREE.DAT, PEN.DAT).
Run Unix # simwalk2Run Web #

VersionMAJ

SLICEMBLER

-2015-01-26DownloadDoc
SLICEMBLER is an iterative meta-assembler that takes advantage of the whole dataset, and significantly improves the final quality of the assembly. SLICEMBLER partitions the input data into optimal-sized “slices” and uses a standard assembly tool (e.g., Velvet, SPAdes, IDBA, Ray) to assemble each slice individually. SLICEMBLER uses majority voting among the individual assemblies to identify long contigs that can be merged to the consensus assembly. It extracts high-quality contigs from the slice assemblies, and prevents contigs containing mis-joins and calling errors to be included in the final assembly. SLICEMBLER has been designed and developed at the algorithm and computational biology lab. , university of California, Riverside.

Remarque
Run Unix # slicembler.py -r -i -c -n -o Run Web #

VersionMAJ

snap

0.132012-07-16DownloadDoc
SNAP is a new sequence aligner that is 10-100x faster and simultaneously more accurate than existing tools like BWA, Bowtie2 and SOAP2. It runs on commodity x86 processors, and supports a rich error model that lets it cheaply match reads with more differences from the reference than other tools. This gives SNAP up to 2x lower error rates than existing tools and lets it match larger mutations that they may miss.

Remarque Faster and More Accurate Sequence Alignment with SNAP. Matei Zaharia, William J. Bolosky, Kristal Curtis, Armando Fox, David Patterson, Scott Shenker, Ion Stoica, Richard M. Karp, and Taylor Sittler. arXiv:1111.5572v1, November 2011.
Run Unix # snapRun Web #

VersionMAJ

soap

2.202014-08-23DownloadDoc
SOAPaligner/soap2 is a member of the SOAP (Short Oligonucleotide Analysis Package). It is an updated version of SOAP software for short oligonucleotide alignment. The new program features in super fast and accurate alignment for huge amounts of short reads generated by Illumina/Solexa Genome Analyzer. Compared to soap v1, it is one order of magnitude faster. It require only 2 minutes aligning one million single-end reads onto the human reference genome. Another remarkable improvement of SOAPaligner is that it now supports a wide range of the read length.

Remarque To run SOAPaligner, we need to build index files for the reference genome (2bwt-builder), and then search reads against the formatted index files(soap).
Run Unix # soapRun Web #

VersionMAJ

soap.coverage

2.7.72011-12-14DownloadDoc
Utility for SOAP - soap.coverage can calculate sequencing coverage or physical coverage as well as duplication rate and details of specific block for each segments and whole genome by using SOAP, BLAT, BLAST, BlastZ, mum- mer and MAQ aligement results with multi-thread.

Remarque
Run Unix # soap.coverageRun Web #

VersionMAJ

SOAPdenovo

1.042010-08-23DownloadDoc
SOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads. It creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost effective way.

Remarque
Run Unix # soapdenovo [option]Run Web #

VersionMAJ

SolexaQA

1.72011-05-30DownloadDoc
SolexaQA is a Perl-based software package that calculates quality statistics and creates visual representations of data quality from FASTQ files generated by Illumina second-generation sequencing technology (“Solexa”).

Remarque
Run Unix # SolexaQA.plRun Web #

VersionMAJ

SortMeRNA

1.92014-02-20DownloadDoc
SortMeRNA is a software designed to rapidly filter ribosomal RNA fragments from metatransriptomic data produced by next-generation sequencers. It is capable of handling large RNA databases and sorting out all fragments matching to the database with high accuracy and specificity.

Remarque If you use SortMeRNA, please cite: Kopylova E., Noé L. and Touzet H., "SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data", Bioinformatics (2012), doi: 10.1093/bioinformatics/bts611.
Run Unix # sortmerna -h Run Web #

VersionMAJ

SPAdes

3.9.02016-08-16DownloadDoc
SPAdes is a de Bruijn graph based assembler. It integrates a read error corrector, a multiple kmer De Bruijn graph assembler, an assembly merger, a scaffoler and a repeat resolver.

Remarque
Run Unix # spadesRun Web #

VersionMAJ

SPatt

2.0-pre1 & 1.2.22007-10-02DownloadDoc
SPatt (Statistic for Patterns) is a suite of C++ programs designed for the computation of pattern occurrences p-value on text. Assuming the text is generated according to Markov model, the p-value of a given observation is its probability to occur. The lower is the p-value, the more unlikely is the observation. For example, this tools can be used to find patterns with unusual behaviour in DNA sequences.

Remarque
Run Unix # spatt (aspatt cpspatt gspatt ldspatt oldxspatt sspatt xspatt)Run Web #

VersionMAJ

SPiD

2.1DownloadDoc
Subtilis Protein interaction Database

Remarque
Run Unix # Run Web # http://genome.jouy.inra.fr/cgi-bin/spid/index.cgi

VersionMAJ

splitsTree

4.13.12014-06-23DownloadDoc
SplitsTree4 is the leading application for computing unrooted phylogenetic networks from molecular sequence data. Given an alignment of sequences, a distance matrix or a set of trees, the program will compute a phylogenetic tree or network using methods such as split decomposition, neighbor-net, consensus network, super networks methods or methods for computing hybridization or simple recombination networks.

Remarque
Run Unix # SplitsTreeRun Web #

VersionMAJ

SRA ToolKit

2.5.72016-01-30DownloadDoc
The NCBI SRA Toolkit enables reading ("dumping") of sequencing files from the SRA database and writing ("loading") files into the .sra format (Note that this is not required for submission). The Toolkit source code is provided in the form of the SRA SDK, and may be compiled with GCC. However, pre-built software executables are available for Linux, Windows, and Mac OS X, and we highly recommend using these pre-built executables whenever possible.

Remarque
Run Unix # Run Web #

VersionMAJ

ssaha2

2.5.22014-08-20DownloadDoc
SSAHA (Sequence Search and Alignment by Hashing Algorithm) is an algorithm for very fast matching and alignment of DNA sequences. It achieves its fast search speed by encoding sequence information in a perfect hash function.

Remarque
Run Unix # ssaha2Run Web #

VersionMAJ

ssake

3.22008-07-30DownloadDoc
SSAKE is a genomics application for assembling millions of very short DNA sequences.sIt is an easy-to-use, robust, reliable and tractable clustering algorithm for very short sequence reads, such as those generated by Illumina Ltd.

Remarque
Run Unix # ssake.plRun Web #

VersionMAJ

sspace

2.02013-02-21DownloadDoc
SSPACE is not a de novo assembler, it is used after a preassembled run. SSPACE is a script to extend and scaffold preassembled contigs using a number of mate pairs or paired-end libraries. It uses Bowtie to map all the reads to the pre-assembled contigs. Unmapped reads are used for extending, if desired, the pre-assembled contigs with the SSAKE assembler. Again Bowtie is used to map the reads to the extended contigs. Positions and orientation of the reads are stored and used for scaffolding. If both reads of a pair are found within the allowed distance, they are used for scaffolding to determine the orientation, contig pairing and ordering of the contigs.

Remarque
Run Unix # /usr/local/genome/SSPACE-BASIC-2.0_linux-x86_64/SSPACE_Basic_v2.0.pl Run Web #

VersionMAJ

ssu-align

0.1.12016-07-01DownloadDoc
SSU-ALIGN is a software package for identifying, aligning, masking and visualizing archaeal 16S, bacterial 16S and eukaryotic 18S small subunit ribosomal RNA (SSU rRNA) sequences. It includes and uses the Infernal software package for generating alignments based on the conserved secondary structure and sequence of SSU rRNA. SSU-ALIGN extends Infernal to make it easier for users to generate large-scale alignments of up to millions of SSU rRNA sequences that will ultimately be used as input to phylogenetic inference methods. (SSU-ALIGN is not capable of inferring phylogenetic trees itself.) Large SSU rRNA sequence datasets are commonly generated by environmental sequencing survey studies that use SSU rRNA as a phylogenetic marker of species in the environment being studied. While designed primarily for these SSU-based studies, SSU-ALIGN is a general tool that can be used to generate alignments of any type of structural RNA, including large subunit ribosomal RNA (LSU rRNA).

Remarque How to cite SSU-ALIGN SSU-ALIGN does not yet have an associated publication, so please cite the INFERNAL software publication ((Nawrocki et al., 2009a)) if you find the package useful for work that you publish. Additionally, because SU-ALIGN’s seed alignments were derived from the comparative rna website we ask that you cite that database as well: (Cannone et al., 2002).
Run Unix # Run Web #

VersionMAJ

stacks

0.99952012-08-21DownloadDoc
Stacks is a software pipeline for building loci out of a set of short-read sequenced samples. Stacks was developed for the purpose of building genetic maps from RAD-Tag Illumina sequence data, but can also be readily applied to population studies, and phylogeography.

Remarque Please cite this paper: J. Catchen, A. Amores, P. Hohenlohe, W. Cresko, and J. Postlethwait. Stacks: building and genotyping loci de novo from short-read sequences. G3: Genes, Genomes, Genetics, 1:171-182, 2011. [reprint]
Run Unix # Run Web #

VersionMAJ

staden

2.0.0b72011-02-03DownloadDoc
The Staden Package is a set of tools covering sequence assembly, editing and analysis. Gap4 performs sequence assembly, contig ordering based on read pair data, contig joining based on sequence comparisons, assembly checking, repeat searching, experiment suggestion, read pair analysis and contig editing. Pregap4 provides a graphical user interface to set up the processing required to prepare trace data for assembly or analysis. Trev is a rapid and flexible viewer and editor for ABI, ALF, SCF and ZTR trace files. Prefinish analyses partially completed sequence assemblies and suggests the most efficient set of experiments to help finish the project. Tracediff and hetscan automatically locate mutations by comparing trace data against reference traces. Spin analyses nucleotide sequences to find genes, restriction sites, motifs, etc. It can perform translations, find open reading frames, count codons, etc.

Remarque
Run Unix # http://staden.sourceforge.net/overview.htmlRun Web #

VersionMAJ

STAMP

1.12014-09-29DownloadDoc
Similarity, Tree-building, & Alignment of Motifs and Profiles

Remarque
Run Unix # STAMPRun Web #

VersionMAJ

STFilter

1.0DownloadDoc
STFilter interroge PubMed sur la base d'une liste de noms de gènes ou d'un nom d'espèce, segmente les résumés en phrase et les classes en fonction d'un critère de pertinence. Ce critère de pertinence peut être appris automatiquement à partir de phrases classées.

Remarque
Run Unix # Run Web #

VersionMAJ

stride

2005-12-12DownloadDoc
STRIDE = Protein secondary structure assignment from atomic coordinatessSTRIDE is a program to recognize secondary structural elements in proteins from their atomic coordinates.

Remarque
Run Unix # strideRun Web #

VersionMAJ

StringTie

1.3.02016-09-07DownloadDoc
StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only the alignments of raw reads used by other transcript assemblers, but also alignments longer sequences that have been assembled from those reads.In order to identify differentially expressed genes between experiments, StringTie's output can be processed by specialized software like Ballgown, Cuffdiff or other programs (DESeq2, edgeR, etc.).

Remarque
Run Unix # stringtie -h/--helpRun Web #

VersionMAJ

structure

2.3.42012-12-21DownloadDoc
The program structure is a free software package for using multi-locus genotype data to investigate population structure. Its uses include inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed. It can be applied to most of the commonly-used genetic markers, including SNPS, microsatellites, RFLPs and AFLPs.

Remarque
Run Unix # structureRun Web #

VersionMAJ

SUPER-FOCUS

0.262016-10-04DownloadDoc
SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reduced SEED database to report the subsystems present in metagenomic samples and profile their abundances. The tool was tested with over 70 real metagenomes, and the results show that our approach accurately predicts the subsystems present in microbial communities, and it can be up to over 1,000 times faster than other tools.

Remarque
Run Unix # Run Web #

VersionMAJ

surf

1.02006-01-31DownloadDoc
SeqUence Repository and Feature detectionsNucleotidic sequence production commonly involve several dedicated bioinformatic softwares for sequence basecalling, vector detection, etc.

Remarque
Run Unix # Run Web #

VersionMAJ

SurfG+

1.022012-07-13DownloadDoc
SurfG+ is a tool to predict the protein localization in frame-psoitive bacteria. Current protein localization protocols are not suited to this prediction task as they ignore the potential surface exposition of many membrane-associated proteins. Therefore, we developed a new flow scheme, for the processing of protein sequence data with the particular aim of identification of potentially surface exposed (PSE) proteins from Gram-positive bacteria.

Remarque See Barinov A, Loux V, Hammani A, Nicolas P, Langella P, Ehrlich D, et al. Prediction of surface exposed proteins in Streptococcus pyogenes, with a potential application to other Gram-positive bacteria. Proteomics. 2009 Jan.;9(1):61–73.  
Run Unix # SurfgRun Web #

VersionMAJ

SvcR

1.1DownloadDoc
SvcR est une implémentation d'un algorithme de clustering basé sur la recherche d'un séparateur dans un espace de caractéristiques entre des points décrits dans un espace de données. Le format de données est défini par une table attribut/valeur (matrice). Les données sont transformées grace à un noyau dans l'espace des caractèristiques en un cluster unique délimité par un rayon de boule et des vecteurs support. On peut utilisé le rayon de cette boule dans l'espace des données pour reconstruire la frontière formant maintenant plusieurs clusters.

Remarque
Run Unix # Run Web #

VersionMAJ

swat

DownloadDoc

Remarque
Run Unix # Run Web #

VersionMAJ

tablet

1.14.10.202015-04-07DownloadDoc
Tablet is a lightweight, high-performance graphical viewer for next generation sequence assemblies and alignments.

Remarque
Run Unix # tabletRun Web #

VersionMAJ

tagdust

1.132013-09-13DownloadDoc
TagDust is a program to eliminate artifactual reads from next-generation sequencing data sets.

Remarque Lassmann T., et al. (2009) TagDust - A program to eliminate artifacts from next generation sequencing data. Bioinformatics.
Run Unix # tagdust [options] lib.fa read1.fa read2.fa ...Run Web #

VersionMAJ

Tandem Repeats Finder

4.07b 2013-08-20DownloadDoc
A tandem repeat in DNA is two or more adjacent, approximate copies of a pattern of nucleotides. Tandem Repeats Finder is a program to locate and display tandem repeats in DNA sequences.

Remarque
Run Unix # trfRun Web #

VersionMAJ

T-Coffee

11.00.8cbe4862015-04-07DownloadDoc
T-Coffee is a multiple sequence alignment package. Given a set of sequences (Proteins or DNA), T-Coffee generates a multiple sequence alignment. Version 2.00 and higher can mix sequences and structures.

Remarque
Run Unix # t_coffee sequence_fileRun Web #

VersionMAJ

TFM-Pvalue

-2014-01-14DownloadDoc
TFM-Pvalue is a software suite providing tools for computing the score threshold associated to a given P-value and the P-value associated to a given score threshold. It uses Position Weight Matrices, such as those available in the Transfac or Jaspar databases.

Remarque Efficient and accurate P-value computation for Position Weight Matrices H. Touzet and J.S. Varré Algorithms for Molecular Biology 2007, 2:15
Run Unix # Run Web #

VersionMAJ

TGI

2005-07-25DownloadDoc
TGI Clustering tools (TGICL): a software system for fast clustering of large EST datasets This package automates clustering and assembly of a large EST/mRNA dataset. The clustering is performed by a slightly modified version of NCBI's megablast , and the resulting clusters are then assembled using CAP3 assembly program. TGICL starts with a large multi-FASTA file (and an optional peer quality values file) and outputs the assembly files as produced by CAP3.

Remarque
Run Unix # tgicl , cap3...Run Web #

VersionMAJ

TM-align

201605212017-02-28DownloadDoc
TM-align is an algorithm for sequence-order independent protein structure comparisons. For two protein structures of unknown equivalence, TM-align first generates optimized residue-to-residue alignment based on structural similarity using dynamic programming iterations. An optimal superposition of the two structures, as well as the TM-score value which scales the structural similarity, will be returned. TM-score has the value in (0,1], where 1 indicates a perfect match between two structures. Following strict statistics of structures in the PDB, scores below 0.2 corresponds to randomly chosen unrelated proteins whereas with a score higher than 0.5 assume generally the same fold in SCOP/CATH.

Remarque
Run Unix # TMalign PDB1.pdb PDB2.pdb [Options] Run Web #

VersionMAJ

TMAP

3.4.12013-10-25DownloadDoc
TMAP / Torrent Mapping Alignment Program - Alignment software for short and long nucleotide sequences produced by next-generation sequencing technologies.

Remarque
Run Unix # Run Web #

VersionMAJ

tmhmm

2.0c 2007-11-22DownloadDoc
tmhmm is one of the better prediction methods of transmembrane helices in proteinss

Remarque tmhmm ma_sequence.fasta puis le resultat est genere sur la sortie standard (pas tres bavard) et dans un repertoire nomme TMHMM_ avec etant le PID du processus qui l a genere.
Run Unix # tmhmm Run Web # http://www.cbs.dtu.dk/services/TMHMM/

VersionMAJ

tmmod

2009-02-23DownloadDoc
An Improved Hidden Markov Model for Transmembrane Protein Topology Prediction and Its Applications to Complete Genomes

Remarque
Run Unix # tmmodRun Web #

VersionMAJ

tophat

2.0.92013-07-10DownloadDoc
TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.

Remarque
Run Unix # tophat -hRun Web #

VersionMAJ

tree-puzzle

5.22012-08-22DownloadDoc
TREE-PUZZLE is a computer program to reconstruct phylogenetic trees from molecular sequence data by maximum likelihood. It implements a fast tree search algorithm, quartet puzzling, that allows analysis of large data sets and automatically assigns estimations of support to each internal branch. TREEPUZZLE also computes pairwise maximum likelihood distances as well as branch lengths for user specified trees. Branch lengths can be calculated with and without the molecular-clock assumption. In addition, TREE-PUZZLE o ers likelihood mapping, a method to investigate the support of a hypothesized internal branch without computing an overall tree and to visualize the phylogenetic content of a sequence alignment. TREE-PUZZLE also conducts a number of statistical tests on the data set (chi-square test for homogeneity of base composition, likelihood ratio to test the clock hypothesis, one and two-sided Kishino-Hasegawa test, Shimodaira-Hasegawa test, Expected Likelihood Weights). The models of substitution provided by TREE-PUZZLE are GTR, TN, HKY, F84, SH for nucleotides, Dayhoff, JTT, mtREV24, BLOSUM 62, VT, WAG for amino acids, and F81 for two-state data. Rate heterogeneity is modeled by a discrete Gamma distribution and by allowing invariable sites. The corresponding parameters (except for GTR) can be inferred from the data set.

Remarque
Run Unix # puzzleRun Web #

VersionMAJ

TribeMCL

mcl-12-0682012-03-09DownloadDoc
TribeMCL is a method for clustering proteins into related groups, which are termed 'protein families'. This clustering is achieved by analysing similarity patterns between proteins in a given dataset, and using these patterns to assign proteins into related groups.

Remarque
Run Unix # mclRun Web #

VersionMAJ

Trimmomatic

0.322014-01-06DownloadDoc
Trimmomatic: A flexible read trimming tool for Illumina NGS data Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line.

Remarque
Run Unix # trimmomaticRun Web #

VersionMAJ

Trinity

2.2.02016-07-01DownloadDoc
RNA-Seq De novo Assembly Using Trinity Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads.

Remarque
Run Unix # TrinityRun Web #

VersionMAJ

uchime

4.2.402013-05-27DownloadDoc
UCHIME is an algorithm for detecting chimeric sequences.

Remarque
Run Unix # uchime --input query.fasta [--db db.fasta] [--uchimeout results.uchime] [--uchimealns results.alns] Run Web #

VersionMAJ

uclust

1.2.22q2012-11-06DownloadDoc
UCLUST is a high-performance clustering, alignment and search algorithm that is capable of handling millions of sequences.

Remarque
Run Unix # uclust --sort seqs.fasta --output seqs_sorted.fastaRun Web #

VersionMAJ

usearch

8.0.15172015-03-10DownloadDoc
USEARCH is a unique sequence analysis tool with thousands of users world-wide. USEARCH offers search and clustering algorithms that are often orders of magnitude faster than BLAST.

Remarque
Run Unix # usearch Run Web #

VersionMAJ

VarScan

4.2.32017-02-16DownloadDoc
variant detection in massively parallel sequencing data 

Remarque
Run Unix # varscan [COMMAND] [OPTIONS] Run Web #

VersionMAJ

VAST

DownloadDoc
Programme de comparaison et d'alignement des structures 3D des protéines. VAST est basé sur une procédure en 2 étapes. Dans la première étape on utilise une description simplifiée des protéines où les éléments de structure secondaire sont représentés par des vecteurs. Le but de cette première étape est de trouver le sous-ensemble des vecteurs qui se superimposent au mieux entre les 2 structures. La significativité du résultat est évaluée en calculant la probabilité d'observer cette superimposition juste par chance. Dans la seconde étape on revient à une description atomique des structures 3D en décrivant la chaîne polypeptique par les positions des CA de chaque résidu. L'objectif de cette seconde étape est d'établir une correspondance univoque (alignement) entre les CA jouant le même rôle dans les 2 structures. On cherche à obtenir l'alignement contenant les plus de paires de CA et le rmsd (root mean square deviation) le plus faible. Pour ce faire l'algorithme est amené à répondre à des questions comme : quel alignement, l'un comprenant 100 paires de CA et ayant un rms de 3 A, et l'autre comprenant 60 paires de CA et un rms de 2 A est le meilleur? Ce problème est résolu en considérant l'alignement qui a la probabilité la plus faible d'être généré par hasard.

Remarque
Run Unix # Run Web #

VersionMAJ

vcake

1.02008-07-30DownloadDoc
VCAKE is a genetic sequence assembler capable of assembling millions of small nucleotide reads even in the presence of sequencing error. This software is currently geared towards de novo assembly of Illumina's Solexa Sequencing data.

Remarque
Run Unix # perl -S vcake.plRun Web #

VersionMAJ

Vcflib

2017-03-10DownloadDoc
A C++ library for parsing and manipulating VCF files.

Remarque
Run Unix # Run Web #

VersionMAJ

vcftools

1.122015-07-13DownloadDoc
vcftools is a suite of functions for use on genetic variation data in the form of VCF and BCF files. The tools provided will be used mainly to summarize data, run calculations on data, filter out data, and convert data into other useful file formats.

Remarque
Run Unix # vcftools [ --vcf FILE | --gzvcf FILE | --bcf FILE] [ --out OUTPUT PREFIX ] [ FILTERING OPTIONS ] [ OUTPUT OPTIONS ]Run Web #

VersionMAJ

velvet

1.2.072013-08-07DownloadDoc
Sequence assembler for very short reads. Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near Cambridge, in the United Kingdom.

Remarque
Run Unix # velveth # velvetgRun Web #

VersionMAJ

Vienna

1.8.42010-12-02DownloadDoc
The Vienna RNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures. RNA secondary structure prediction through energy minimization is the most used function in the package. We provide three kinds of dynamic programming algorithms for structure prediction: the minimum free energy algorithm of (Zuker & Stiegler 1981) which yields a single optimal structure, the partition function algorithm of (McCaskill 1990) which calculates base pair probabilities in the thermodynamic ensemble, and the suboptimal folding algorithm of (Wuchty et.al 1999) which generates all suboptimal structures within a given energy range of the optimal energy. For secondary structure comparison, the package contains several measures of distance (dissimilarities) using either string alignment or tree-editing (Shapiro & Zhang 1990). Finally, we provide an algorithm to design sequences with a predefined structure (inverse folding).

Remarque
Run Unix # Run Web #

VersionMAJ

vmatch

2.02007-10-29DownloadDoc
Vmatch replaces Reputer. It looks for all possible repeats in genomes, withsa possibility to specify the kind of repeats to look for, like its identityspercentage, minimal length, etc...Can also be used to mask repeats inssequences, to analyze repeat families, etc...

Remarque
Run Unix # Run Web #

VersionMAJ

weeder

1.4.22009-12-07DownloadDoc
Recherche nouveaux TFBSs dans un jeu de sequences fasta, recherche de plusieurs tailles et limite de mutations autorisees. Ne sort que les motifs ayant passe le tri stat, contrairement a MEME qui donne autant de motifs que specifie dans les parametres. Par defaut les stat de genome sont basees sur un promoteur de 1000 pb, mais possibilite d'utiliser des stats basees sur toute la sequence intergenique.

Remarque Pavesi G, Mereghetti P, Mauri G, Pesole G. Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res. 2004 32:W199-W203
Run Unix # weederlauncher.out inputfilename speciescode analysistype Run Web #

VersionMAJ

weederH

1.4.22009-12-07DownloadDoc
Recherche de TFBS et ECR dans des sequences homologues. Pas d'alignement necessaire en input, pas de prerequis de PWM. Mesure de la conservation relative entre les sequences par recherche d'oligo conserves et scoring de similarite globale entre deux sequences homologues. Permet de chercher aussi les enhancers distaux. Fonctionnerait sur des promoteurs non annotes (pas de TSS connu).

Remarque Pavesi, G., Zambelli, F., Pesole, G. WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences. BMC Bioinformatics 2007, 8:46
Run Unix # weederH.out -f inputfilename -O speciescode Run Web #

VersionMAJ

WolfPsort

0.22007-04-02DownloadDoc
WoLF PSORT predicts the subcellular localization sites of proteins based on their amino acid sequences.

Remarque
Run Unix # runWolfPsortSummary Run Web #

VersionMAJ

wombat

23/02/112011-03-10DownloadDoc
WOMBAT is a program to facilitate analyses fitting a linear, mixed model via restricted maximum likelihood (REML). It is assumed that traits analysed are continuous and have a multivariate normal distribution.

Remarque http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2064953/ ~ http://didgeridoo.une.edu.au/km/download.php?file=hangzhou.pdf
Run Unix # wombatRun Web #

VersionMAJ

wu-blast

2.02005-02-14DownloadDoc
Washington University BLAST (WU BLAST) version 2.0 is a powerful software package for gene and protein identification, using sensitive, selective and rapid similarity searches of protein and nucleotide sequence databases.

Remarque
Run Unix # wu-blastallRun Web #

VersionMAJ

xdigitise

3.5.102002-04-19DownloadDoc
Evaluation d experience d hybridation

Remarque
Run Unix # xdigitiseRun Web #

VersionMAJ

Xgenovo

-2016-03-17DownloadDoc
Metagenomes present assembly challenges, when assembling multiple genomes from mixed reads of multiple species. An assembler for single genomes can’t adapt well when applied in this case. A metagenomic assembler, Genovo, is a de novo assembler for metagenomes under a generative probabilistic model. Genovo assembles all reads without discarding any reads in a preprocessing step, and is therefore able to extract more information from metagenomic data and, in principle, generate better assembly results. Paired end sequencing is currently widely-used yet Genovo was designed for 454 single end reads. In this research, we attempted to extend Genovo by incorporating paired-end information, named Xgenovo, so that it generates higher quality assemblies with paired end reads.

Remarque
Run Unix # assemble - finalize Run Web #

VersionMAJ

xgrail

1.3c2002-09-20DownloadDoc
GRAIL is a suite of tools designed to provide analysis and putative annotation of DNA sequences both interactively and through the use of automated computation.

Remarque
Run Unix # xgrailRun Web # http://genome.jouy.inra.fr/

VersionMAJ

xplor-nih

2.302012-02-08DownloadDoc
X-PLOR is a program system for computational structural biology. X-PLOR stands for exploration of conformational space of macromolecules restrained to regions allowed by combinations of empirical energy functions and experimental data. But it also stands for exploration of modern concepts of structured programming in macromolecular simulation.

Remarque
Run Unix # xplorRun Web #

VersionMAJ

yass

1.142010-03-16DownloadDoc
YASS est un outil permettant la recherche locale de similaritées dans les séquences d'ADN.

Remarque
Run Unix # yassRun Web #

Menu principal

Page | by Dr. Radut