Version | MAJ | abyss | | |
1.5.2 | 2014-11-18 | Download | Doc |
ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads. The single-processor version is useful for assembling genomes up to 100 Mbases in size. The parallel version is implemented using MPI and is capable of assembling larger genomes.
Remarque Run Unix # Usage: ABYSS [OPTION]... FILE... | Run Web # |
ACNUC allows to select sequences from many criteria from these three databases, to translate protein-coding genes in protein, and to extract selected sequences in user files. ACNUC is very efficient in providing direct access to coding regions (e.g. protein coding regions, tRNA or rRNA coding regions) of DNA fragments present in GenBank.
Remarque Run Unix # acnuc ou pour la version X11 xacnuc | Run Web # |
Agmial est une chaîne d'annotation de génomes microbiens, formée de deux modules indépendants. Le premier gère les séquences protéiques, le second les séquences nucléiques. Agmial soutient le principe que l'expert humain doit être placé au centre du processus d'annotation. Afin d'aider les annotateurs dans cette tache complexe et coûteuse en temps, le système est conçu pour automatiser au maximum le processus d'annotation et fournir des interfaces conviviales. Il implémente une stratégie d'annotation. Le système est capable de travailler sur des séquences non finies (draft) et il permet l'annotation collaborative par des équipes d'annotateurs. Il est basé sur des standards informatiques (services web, système de gestion de base de données relationnelles, Java, ...) et bioinformatiques. Le système est distribué sous licence GPL. Agmial est actuellement utilisé par plusieurs laboratoires de l'INRA pour l'annotation ou la réannotation de génomes d'interêt agro-alimentaire.
Remarque
align et align0 calcule un alignement global de deux sequences.
Remarque Run Unix # align ou align0 | Run Web # |
ALLPATHS-LG is a de Bruijn graph-based de novo assembler for large (and
small) genomes. ALLPATHS-LG is being developed by scientists at the Broad
Institute.
Remarque
AMOS: A Modular Open-Source Assembler
Remarque
AnovArray permet la quantification des facteurs biologiques et des biais techniques, ainsi que l'identification des gènes différentiellement exprimés entre plusieurs conditions expérimentales (deux et plus) pour des expériences transcriptomiques issues de macroarray et microarray dans la cadre d'un plan d'expérience factoriel équilibré et d'un modèle complet. Ce package est développé en SAS (logiciel statistique) et bénéficie en conséquence de toutes les procédures statistiques de ce logiciel. Les méthodes statistiques dans ce package sont l'analyse de la variance (ANOVA) et les tests multiples de type FDR (False Discovery Rate).
Remarque Run Unix # Utilisation sous SAS | Run Web # |
Apollo is a genomic annotation viewer and editor. There are currently two branches of Apollo, one primarily used for genome browsing and maintained at Ensembl, and the other primarily used for genome annotation and maintained at the Berkeley Drosophila Genome Center. The latter is part of the GMOD project.
Remarque Run Unix # apollo | Run Web # |
Arachne is a tool for assembling genome sequences from whole genome shotgun reads, mostly in forward-reverse pairs obtained by sequencing clone ends.
Remarque
The ARB software is a graphically oriented package comprising various tools for sequence database handling and data analysis. A central database of processed (aligned) sequences and any type of additional data linked to the respective sequence entries is structured according to phylogeny or other user defined criteria.
Remarque
Version | MAJ | ART | | |
ChocolateCherryCake | 2015-04-30 | Download | Doc |
ART is a set of simulation tools to generate synthetic next-generation sequencing reads. ART simulates sequencing reads by mimicking real sequencing process with empirical error models or quality profiles summarized from large recalibrated sequencing data. ART can also simulate reads using user own read error model or quality profiles. ART supports simulation of single-end, paired-end/mate-pair reads of three major commercial next-generation sequencing platforms: Illumina's Solexa, Roche's 454 and Applied Biosystems' SOLiD. ART can be used to test or benchmark a variety of method or tools for next-generation sequencing data analysis, including read alignment, de novo assembly, SNP and structure variation discovery. ART was used as a primary tool for the simulation study of the 1000 Genomes Project . ART is implemented in C++ with optimized algorithms and is highly efficient in read simulation. ART outputs reads in the FASTQ format, and alignments in the ALN format. ART can also generate alignments in the SAM alignment or UCSC BED file format.
Remarque Citation:
Weichun Huang, Leping Li, Jason R Myers, and Gabor T Marth. ART: a next-generation sequencing read simulator, Bioinformatics (2012) 28 (4): 593-594 Run Unix # README FILES in http://genome.jouy.inra.fr/doc/genome/NGS/ART | Run Web # |
Artemis is a free genome viewer and annotation tool that allows visualization of sequence features and the results of analyses within the context of the sequence, and its six-frame translation.
Remarque
Version | MAJ | Artemis Comparison Tool | | |
- | 2015-07-15 | Download | Doc |
ACT is a free tool for displaying pairwise comparisons between two or more
DNA sequences. It can be used to identify and analyse regions of similarity
and difference between genomes and to explore conservation of synteny, in the
context of the entire sequences and their annotation.
Remarque
Version | MAJ | asium | | |
2.21 | | Download | Doc |
Asium construit des hiérarchies conceptuelles (ontologies) à partir de texte analysé. Il est associé avec le logiciel LP2LP qui transforme les sorties de Link Parser en entrée d'Asium et à un logiciel de transformation des sorties en RDF.
Remarque
Version | MAJ | augustus | | |
2.7
| 2013-12-12 | Download | Doc |
AUGUSTUS is a program that predicts genes in eukaryotic genomic sequences. It can be run on this web server, on a new web server for larger input files or be downloaded and run locally. It is open source so you can compile it for your computing platform. You can now run AUGUSTUS on the German MediGRID. This enables you to submit larger sequence files and allows to use protein homology information in the prediction. The MediGRID requires an instant easy registration by email for first-time users.
Remarque Run Unix # augustus [parameters] --species=SPECIES queryfilename | Run Web # |
AutoDock is a suite of automated docking tools. It is designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure.
Remarque Run Unix # autodock4 et autogrid4 | Run Web # |
AutoDock Vina is a new program for drug discovery, molecular docking and virtual screening, offering multi-core capability, high performance and enhanced accuracy and ease of use.
Remarque O. Trott, A. J. Olson, AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading, Journal of Computational Chemistry (in press)Run Unix # vina --help | Run Web # |
BioArray Software Environment (BASE) est une base de données permettant de gérer limportante quantité de données générées par des analyses de bio-puces. BASE gère les informations biologiques, les données brutes et les images. BASE possède également des outils de normalisation, de visualisation et danalyse des données.
Remarque
BCFs.bcftools (Tools for variant calling and manipulating VCFs and BCFs)
Remarque Run Unix # bcftools | Run Web # |
A Java application/applet to display .scf traces and phred quality values.
Remarque Run Unix # bcm-trace-view -s { -q } | Run Web # |
The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM. Using BEDTools, one can develop sophisticated pipelines that answer complicated research questions by "streaming" several BEDTools together. The following are examples of common questions that one can address with BEDTools.
Remarque Please cite the following article if you use BEDTools in your research:
Quinlan AR and Hall IM, 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26, 6, pp. 841–842.
La démarche centrale est basée sur les techniques emanant de l'apprentissage automatique (classification) et le traitement automatique des langues mais aussi d'une methode sociologique appelée GST (Graphe SocioTechnique) de facon a construire des indices d'evolution de l'innovation grace a la terminology utilisée au cours du temps.
Remarque
BFAST : Blat-like Fast Accurate Search Tool BFAST facilitates the fast and accurate mapping of short reads to reference sequences. Some advantages of BFAST include: * Speed: enables billions of short reads to be mapped quickly. * Accuracy: A priori probabilities for mapping reads with defined set of variants. * An easy way to measurably tune accuracy at the expense of speed.
Remarque Run Unix # bfast [options] | Run Web # |
Version | MAJ | bioprospector | | |
2004 | 2014-01-01 | Download | Doc |
Programme de recherche de motifs d'une ou deux boîtes exceptionnels (Gibbs Sampler) dans des séquences d'ADN. Des séquences de bruit de fond peuvent être fournies. Séquences en entrée de moins de 32765 nt, format fasta avec séquence sur une ligne (et en-tête de la forme >sequence1 nomdegene ). Peut rechercher spécifiquement des palyndromes.
Remarque Run Unix # BioProspector | Run Web # |
Bismark is a program to map bisulfite treated sequencing reads to a genome of interest and perform methylation calls in a single step. The output can be easily imported into a genome viewer, such as SeqMonk, and enables a researcher to analyse the methylation levels of their samples straight away.
Remarque Run Unix # bismark [options] {-1 -2 | } | Run Web # |
Remarque
The Basic Local Alignment Search Tool (BLAST) is the most widely used sequence similarity tool. There are versions of BLAST that compare protein queries to protein databases, nucleotide queries to nucleotide databases, as well as versions that translate nucleotide queries or databases in all six frames and compare to protein databases or queries. PSI-BLAST produces a position-specific-scoring-matrix (PSSM) starting with a protein query, and then uses that PSSM to perform further searches. It is also possible to compare a protein or nucleotide query to a database of PSSM’s. The NCBI supports a BLAST web page at blast.ncbi.nlm.nih.gov as well as a network service. The NCBI also distributes stand-alone BLAST applications for users who wish to run BLAST on their own machines or with their own databases. This document describes the stand-alone BLAST applications and will concentrate on the latest generation of such applications included in the BLAST+ package.
Remarque Run Unix # /usr/local/genome/ncbi-blast-2.2.31+/bin/ | Run Web # |
BLAT is a DNA/Protein Sequence Analysis program written by Jim Kent at UCSC. It is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more. It may miss more divergent or shorter sequence alignments. It will find perfect sequence matches of 33 bases, and sometimes find them down to 22 bases. BLAT on proteins finds sequences of 80% and greater similarity of length 20 amino acids or more. In practice DNA BLAT works well on primates, and protein blat on land vertebrates.
Remarque
BMGE (Block Mapping and Gathering with Entropy) is a program that selects regions in a
multiple sequence alignment that are suited for phylogenetic inference. BMGE selects
characters that are biologically relevant, thanks to the use of standard similarity matrices
such as PAM or BLOSUM. Moreover, BMGE provides other character- or sequenceremoval
operations, such stationary-based character trimming (that provides a subset of
compositionally homogeneous characters) or removal of sequences containing a too large
proportion of gaps. Finally, BMGE can simply be used to perform standard conversion
operations among DNA-, codon-, RY- and amino acid-coding sequences.
Remarque Run Unix # BMGE ou BMGE -? | Run Web # |
Version | MAJ | bowtie | | |
1.1.2 | 2016-07-24 | Download | Doc |
Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).
Remarque Run Unix # bowtie [options]* {-1 -2 | --12 | } [] | Run Web # |
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
Remarque Run Unix # bowtie2 [options]* -x {-1 -2 | -U } [-S ]
| Run Web # |
BreakDancerMax predicts five types of structural variants: insertions, deletions, inversions, inter- and intra-chromosomal translocations from next-generation short paired-end sequencing reads using read pairs that are mapped with unexpected separation distances or orientation.
Remarque Run Unix # Usage: breakdancer-max | Run Web # |
Version | MAJ | bsmap | | |
2.90 | 2015-03-17 | Download | Doc |
BSMAP is a short reads mapping software for bisulfite sequencing reads. Bisulfite treatment converts unmethylated Cytosines into Uracils (sequenced as Thymine) and leave methylated Cytosines unchanged, hence provides a way to study DNA cytosine methylation at single nucleotide resolution. BSMAP aligns the Ts in the reads to both Cs and Ts in the reference.
Remarque Citation: Xi Y, Li W: BSMAP: whole genome Bisulfite Sequence MAPping program. BMC Bioinformatics (2009) 10:232.Run Unix # bsmap | Run Web # |
Version | MAJ | buster | | |
2.10.3 <2016-12-07> | 2016-12-08 | Download | Doc |
BUSTER structure refinement package. Includes the refine program for running BUSTER refinement and loads of useful utilities.
Remarque How to cite use of BUSTER :
https://www.globalphasing.com/buster/wiki/index.cgi?BusterCite
BWA is a fast light-weighted tool that aligns short sequences to a sequence database, such as the human reference genome. By default, BWA finds an alignment within edit distance 2 to the query sequence, except for disallowing gaps close to the end of the query. It can also be tuned to find a fraction of longer gaps at the cost of speed and of more false alignments.
Remarque Run Unix # bwa [options] | Run Web # |
Version | MAJ | CaliFlopp | | |
3.0 | 2010-08-03 | Download | Doc |
CaliFloPP is a software that calculates flows of particles between pairs of polygons, when given a so-called individual dispersal function. The individual dispersal function describes the particle dispersion between pairs of points, and CaliFloPP deduces the total flows between pairs of polygons.
Remarque Run Unix # califlopp -i polygons-filename [-p parameters-filename] [-r result-filename] | Run Web # |
Canu is a fork of the Celera Assembler, designed for high-noise
single-molecule sequencing (such as the PacBio RS II or Oxford Nanopore
MinION).
Remarque Citation: Koren S, Walenz BP, Berlin K, Miller JR, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. bioRxiv. (2016).
Run Unix # canu
| Run Web # |
Similar to phrap, CAP3 takes individual sequences and assembles them into sequence.s
Remarque
CarthaGène is a genetic/radiated hybrid mapping software. CarthaGene looks for multiple populations maximum likelihood consensus maps using a fast EM algorithm for maximum likelihood estimation and powerful ordering algorithms. CarthaGène:
Remarque Run Unix # carthagene | Run Web # |
CATCh: an ensemble classifier for chimera detection in 16S rRNA sequencing
studies
Remarque If you are going to use CATCh, please cite it with the included software (Mothur, WEKA, RDP MultiClassifier 1.1 and DECIPHER):
� Mysara M., Saeys Y., Leys N., Raes J., Monsieurs P. 2014. CATCh: an ensemble classifier for chimera detection in 16S rRNA sequencing studies. Under review.
� Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, et al. (2009). Introducing mothur: open-source, platform-independent, community-suppo
rted software for describing and comparing microbial communities. Applied and environmental microbiology 75:7537�41.
� Hall M, National H, Frank E, Holmes G, Pfahringer B, Reutemann P, et al. (2009). The WEKA Data Mining Software: An Update. SIGKDD Explorations 11:10�18.
� Wang Q, Garrity GM, Tiedje JM, Cole Naive JR (2007), Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied an
d Environmental Microbiology 09/2007; 73(16):5261-7.
� ES Wright et al. (2012), DECIPHER, A Search-Based Approach to Chimera Identification for 16S rRNA Sequences. Applied and Environmental Microbiology, doi:10
.1128/AEM.06516-11.
Run Unix # CATCh.run
| Run Web # |
CCP4 exists to produce and support a world-leading, integrated suite of programs that allows researchers to determine macromolecular structures by X-ray crystallography, and other biophysical techniques. CCP4 aims to develop and support the development of cutting edge approaches to experimental determination and analysis of protein structure, and integrate these approaches into the suite. CCP4 is a community based resource that supports the widest possible researcher community, embracing academic, not for profit, and for profit research. CCP4 aims to play a key role in the education and training of scientists in experimental structural biology. It encourages the wide dissemination of new ideas, techniques and practice.
Remarque Run Unix # ccp4i | Run Web # |
CD-HIT stands for Cluster Database at High Identity with Tolerance. The program (cd-hit) takes a fasta format sequence database as input and produces a set of 'non-redundant' (nr) representative sequences as output.
Remarque Exemple d'utilisation : cd-hit -n 5 -i /db/fasta/nr90/nr90.fsa -o nr80 -M 2048 -c 0.8 -u clstr.lastweekRun Unix # cd-hit [Options] | Run Web # |
Version | MAJ | cd-hit-454 | | |
- | 2013-08-05 | Download | Doc |
The 454 pyrosequencing reads contains artificially duplicates, which might lead to misleading conclusions. cdhit-454 is a fast program to identify exact and nearly identical duplicates, the reads begin at the same position but may vary in length or bear mismatches. cdhit-454 can process a dataset in ~10 minutes. it also provides a consensus sequence for each group of duplicates.
Remarque Run Unix # cd-hit-454 | Run Web # 4.6.1
|
Version | MAJ | Celera Assembler (wgs) | | |
5.4 | 2009-10-29 | Download | Doc |
Celera Assembler is scientific software for DNA research. It can reconstruct long sequences of genomic DNA from the fragmentary data produced by whole-genome shotgun sequencing. The Celera Assembler is mature, efficient, open-source software written mostly in C for unix operating systems.
Remarque This whole-genome shotgun (WGS) assembler software suite, also known as Celera Assembler, implements sophisticated algorithms for the reconstruction of genomic DNA sequence from data produced by a WGS sequencing experiment.
CENSOR is a software tool which screens query sequences against a reference collection of repeats and "censors" (masks) homologous portions with masking symbols, as well as generating a report classifying all found repeats.
Remarque Run Unix # censor | Run Web # |
CGView is a Java package for generating high quality, zoomable maps of circular genomes. Its primary purpose is to serve as a component of sequence annotation pipelines, as a means of generating visual output suitable for the web. Feature information and rendering options are supplied to the program using an XML file, a tab delimited file, or an NCBI ptt file. CGView converts the input into a graphical map (PNG, JPG, or Scalable Vector Graphics format), complete with labels, a title, legends, and footnotes. In addition to the default full view map, the program can generate a series of hyperlinked maps showing expanded views. The linked maps can be explored using any web browser, allowing rapid genome browsing, and facilitating data sharing. The feature labels in maps can be hyperlinked to external resources, allowing CGView maps to be integrated with existing web site content or databases. For examples of the various output types, see the CGView gallery.
Remarque Run Unix # cgview | Run Web # |
Circos is a software package for visualizing data and information. It
visualizes data in a circular layout — this makes Circos ideal for exploring
relationships between objects or positions. There are other reasons why a
circular layout is advantageous, not the least being the fact that it is
attractive.
Remarque Run Unix # circos | Run Web # |
Version | MAJ | class2g | | |
1.0 | 2006-04-04 | Download | Doc |
Class2G permet de classer les gènes en deux groupes en utilisant un modèle de mélange. Les principales caractéristiques sont d'une part l'affectation des gènes est associée à une probabilité, et d'autre part l'analyse d'un macroarray est indépendante d'une référence. Class2G est intégrée au système BASE (BioArray Software Environment) par l'intermédiaire d'un plug-in perl, et est développé dans l'environnement statistique R. BASE permet d'accéder à une interface web conviviale, d'utiliser un seul environnement pour le stockage et l'analyse de données. Class2G a été utilisé pour la détection de gènes présents et absents de E. faecalis dans le cadre de l'analyse d'une trentaine de macroarray (P.Serror - INRA Jouy-en-Josas - UBLO).
Remarque
A Sequence Viewer for basic bioinformatics. CLC Sequence Viewer creates a software environment enabling users to make a large number of bioinformatics analyses, combined with smooth data management, and excellent graphical viewing and output options.
Remarque Run Unix # clcseqview6 | Run Web # |
Clustal Omega is the latest addition to the Clustal family. It offers a significant increase in scalability over previous versions, allowing hundreds of thousands of sequences to be aligned in only a few hours. It will also make use of multiple processors, where present. In addition, the quality of alignments is superior to previous versions, as measured by a range of popular benchmarks.
Remarque Citing Clustal:
Sievers F, Wilm A, Dineen DG, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins DG (2011). Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7. Run Unix # clustalo --help
| Run Web # |
Multiple sequence alignment program. It provides an integrated environment for performing multiple sequence and profile alignments and analysing the results.
Remarque
Version | MAJ | cluster-3.0 | | |
3.0 | 2013-05-24 | Download | Doc |
The open source clustering software available here implement the most commonly used clustering methods for gene expression data analysis. The clustering methods can be used in several ways.Cluster 3.0 provides a Graphical User
Interface to access to the clustering routines. It is available for Windows, Mac OS X, and Linux/Unix. Python users can access the clustering routines by using Pycluster, which is an extension module to Python. People that want to make use of the clustering algorithms in their own C, C++, or Fortran programs can download the source code of the C Clustering Library.
Remarque Run Unix # cluster | Run Web # |
Version | MAJ | CNVnator | | |
0.3 | 2015-02-13 | Download | Doc |
CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing.
Remarque Run Unix # cnvnator | Run Web # |
Version | MAJ | COLONY | | |
2.0.6.3 | 2017-05-02 | Download | Doc |
COLONY is a Fortran program written by Jinliang Wang. It implements a maximum likelihood method to assign sibship and parentage jointly, using individual multilocus genotypes at a number of codominant or dominant marker loci.
Remarque
Concaterpillar is a hierarchical likelihood-ratio test for phylogenetic congruence.
Remarque If you use Concaterpillar for a publication please cite:
Leigh JW, Susko E, Baumgartner M, Roger AJ. Testing congruence in phylogenomic analysis. Syst Biol. 2008 Feb; 57(1): 104-15. Run Unix # concaterpillar.py | Run Web # |
Consed/Autofinish is a tool for viewing, editing, and finishing sequence assemblies created with phrap. Finishing capabilities include allowing the user to pick primers and templates, suggesting additional sequencing reactions to perform, and facilitating checking the accuracy of the assembly using digest and forward/reverse pair information.
Remarque Voir aussi autofinishs (http://www.ncbi.nlm.nih.gov:80/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=11282977s)Run Unix # consed | Run Web # |
CONSEL is a program package consists of small programs written in C language. It calculates the probability value (i.e., p-value) to assess the confidence in the selection problem. Although CONSEL is applicable to any selection problem, it is mainly designed for the phylogenetic tree selection. CONSEL does not estimate the phylogenetic tree by itself, but CONSEL does read the output of the other phylogenetic packages, such as Molphy, PAML, PAUP*, TREE-PUZZLE, and PhyML. CONSEL calculates the p-value using several testing procedures; the bootstrap probability, the Kishino-Hasegawa test, the Shimodaira-Hasegawa test, and the weighted Shimodaira-Hasegawa test. In addition to these conventional tests, CONSEL calculates the p-value based on the approximately unbiased test using the multi-scale bootstrap technique. This newly developed method gives less biased results than the conventional methods.
Remarque
Coot is for macromolecular model building, model completion and validation, particularly suitable for protein modelling using X-ray data.
Coot displays maps and models and allows model manipulations such as idealization, real space refinement, manual rotation/translation, rigid-body fitting, ligand search, solvation, mutations, rotamers, Ramachandran plots, skeletonization, non-crystallographic symmetry and more.
Remarque Citing Coot and Friends
If have found this software to be useful, you are requested (if appropriate) to cite:
"Features and Development of Coot" P Emsley, B Lohkamp, W Scott, and
K Cowtan Acta Cryst. (2010). D66, 486-501 Acta Crystallographica Section
D-Biological Crystallography 66: 486-501
Version | MAJ | CopyRighter | | |
0.46 | 2015-12-21 | Download | Doc |
Parses microbial profiles and, because gene copy number (GCN) estimates are
pre-computed for all taxa in the reference taxonomy, rapidly corrects GCN
bias. The CopyRighter bioinformatic tools permits rapid correction of GCN in
microbial surveys, resulting in improved estimates of microbial abundance,
alpha and beta diversity.
Remarque
The SOLiD System Analysis Pipeline Tool (Corona Lite) is an off-instrument SOLiD data analysis software package. It supports functionality for mapping color space reads to large or small genomes, pairing for mate-pair runs, SNP calling and generating consensus sequences.
Remarque
Version | MAJ | count_base | | |
cc 30/06/2004 | 2004-06-16 | Download | Doc |
Programme pour compter les ATGC ds une sequence
Remarque Run Unix # count_base.sh | Run Web # |
Version | MAJ | count_codon | | |
none | 2004-08-01 | Download | Doc |
Remarque Run Unix # count_codon.pl | Run Web # |
Cross_Match uses the same algorithm as Swat but also allows the comparison of a pair of sequences to be constrained to bands of the Smith-Waterman matrix that surround one or more matching words in the sequences. This substantially increases speed for large-scale nucleotide sequence comparisons without compromising sensitivity.
Remarque Run Unix # cross_match | Run Web # |
Cufflinks assembles transcripts and estimates their abundances in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.
Remarque Run Unix # cufflinks [options]* | Run Web # |
Version | MAJ | cutadapt | | |
1.7.1 | 2015-03-11 | Download | Doc |
cutadapt is used to remove adapter sequences from high-throughput sequencing data. This is usually necessary when the read length of the sequencing machine is longer than the molecule that is sequenced, for example when sequencing microRNAs.
Remarque Run Unix # cutadapt [options] [] | Run Web # |
Cytoscape is an open source bioinformatics software platform for visualizing molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data. Although Cytoscape was originally designed for biological research, now it is a general platform for complex network analysis and visualization. Cytoscape core distribution provides a basic set of features for data integration and visualization.
Remarque Run Unix # cytoscape | Run Web # |
Diffusion Approximation for Demographic Inference
∂a∂i implements methods for demographic history and selection inference from genetic data, based on diffusion approximations to the allele frequency spectrum. One of ∂a∂i's main benefits is speed: fitting a two-population model typically takes around 10 minutes, and run time is independent of the number of SNPs in your data set. ∂a∂i is also flexible, handling up to three simultaneous populations, with arbitrary timecourses for population size and migration, plus the possibility of admixture and population-specific selection.
Remarque If you use ∂a∂i in your research, please cite RN Gutenkunst, RD Hernandez, SH Williamson, CD Bustamante "Inferring the joint demographic history of multiple populations from multidimensional SNP data" PLoS Genetics 5:e1000695 (2009).
Version | MAJ | debarcer
| | |
0.3.1 | 2017-03-21 | Download | Doc |
Debarcer (De-Barcoding and Error Correction) is a package for working with
next-gen sequencing data that contains molecular barcodes.
As it stands, it supports targeted sequencing libraries generated by
SimSenSeq, a method of creating multiplexed barcoded sequencing libraries
using PCR.
Remarque Run Unix # runDebarcer.sh -u | Run Web # |
Version | MAJ | delly | | |
0.6.3 | 2015-02-25 | Download | Doc |
DELLY is an integrated structural variant prediction method that can detect deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends and split-reads to sensitively and accurately delineate genomic rearrangements throughout the genome.
Remarque Citation
Tobias Rausch, Thomas Zichner, Andreas Schlattl, Adrian M. Stuetz, Vladimir Benes, Jan O. Korbel.
Delly: structural variant discovery by integrated paired-end and split-read analysis.
Bioinformatics 2012 28: i333-i339.Run Unix # Usage: delly [OPTIONS] ...
| Run Web # |
SARTools is a R package dedicated to the differential analysis of RNA-seq data. It provides tools to generate descriptive and diagnostic graphs, to run the differential analysis with one of the well known DESeq2 or edgeR packages and to export the results into easily readable tab-delimited files. It also facilitates the generation of a HTML report which displays all the figures produced, explains the statistical methods and gives the results of the differential analysis.
Remarque
sDIALIGN is a software program for multiple alignment developed by Burkhard Morgenstern et al. While standard alignment methods rely on comparing single residues and imposing gap penalties, DIALIGN constructs pairwise and multiple alignments by comparing whole segments of the sequences. No gap penalty is used. This approach is especially efficient where sequences are not globally related but share only local similarities, as is the case with genomic DNA and with many protein families.
Remarque Run Unix # dialign | Run Web # |
DIAMOND is a new high-throughput program for aligning a file of short reads against a protein reference database such as NR, at 20,000 times the speed of BLASTX, with high sensitivity.
DIAMOND is a new alignment tool for aligning short DNA sequencing reads to a protein reference database such as NCBI-NR. On Illumina reads of length 100-150bp, in fast mode, DIAMOND is about 20,000 times faster than BLASTX, while reporting about 80-90% of all matches that BLASTX finds, with an e-value of at most 1e-5. In sensitive mode, DIAMOND ist about 2,500 times faster than BLASTX, finding more than 94% of all matches.
Remarque Run Unix # diamond COMMAND [OPTIONS] | Run Web # |
Version | MAJ | DisplayMUM | | |
1.05 | 2005-06-30 | Download | Doc |
Remarque Run Unix # displaymums | Run Web # |
Simulation de systèmes stochastiques.
Remarque An article describing Dizzy has been published, Ramsey S., Orrell D. and Bolouri H. Dizzy: stochastic simulation of large-scAn article describing Dizzy has been published, Ramsey S., Orrell D. and Bolouri H. Dizzy: stochastic simulation of large-scale genetic regulatory networks. J. Bioinf. Comp. Biol. 3(2) 415-436, 2005.ale genetic regulatory networks. J. Bioinf. Comp. Biol. 3(2) 415-436, 2005.Run Unix # Dizzy | Run Web # |
Version | MAJ | DOMIRE | | |
- | 2014-01-20 | Download | Doc |
(DOMain Identification from REcurrence) is a server using VAST (Vector Alignment Search Tool, protein 3D structure comparison) to define the domain boundaries in proteins from their 3 D structures (Tai et al, 2010). It provides also a list of structural neighbours.
Remarque
DOTUR est un programme qui prend en entrée une matrice décrivant les distances génétiques entre des séquences d'ADN pour les assigner à des unités taxonomiques opérationelles (OTUs). DOTUR utilise la composition des OTUs pour calculer des courbes de raréfaction et de collection pour évaluer l'intensité, la richesse et la diversité de l'échantillon.
Remarque Run Unix # dotur | Run Web # |
DNA Sequence Reads Compression is an application designed for compression of data files containing reads from DNA sequencing in FASTQ format. The amount of such files can be huge, e.g., a few (or tens) of gigabytes, so a need for a robust data compression tool is clear. Usually universal compression programs like gzip or bzip2 are used for this purpose, but it is obvious that a specialized tool can work better.
Remarque
DSSP permet de définir les structures secondaires dans les protéines à partir des fichiers PDB
Remarque
Version | MAJ | dwgsim | | |
0.1.10 | 2013-08-02 | Download | Doc |
Whole genome simulation can be performed with dwgsim. dwgsim is based off of wgsim found in SAMtools written by Heng Li. It was modified to handle ABI SOLiD data, as well as various assumptions about aligners and positions of indels. The documentation below is for the latest dwgsim (not DNAA) release.
Remarque Run Unix # dwgsim [options] | Run Web # |
EDGE-pro, Estimated Degree of Gene Expression in PROkaryots is an efficient software system to estimate gene expression levels in prokaryotic genomes from RNA-seq data. EDGE-pro uses Bowtie2 for alignment and then estimates expression directly from the alignment results.
EDGE-pro includes routines to assign reads aligning to overlapping gene regions accurately. 15% or more of bacterial genes overlap other genes, making this a significant problem for bacterial RNA-seq, one that is generally ignored by programs designed for eukaryotic RNA-seq experiments.
Remarque Please reference our paper:
T. Magoc, D. Wood, and S.L. Salzberg. EDGE-pro: Estimated Degree of Gene Expression in Prokaryotic Genomes. Evolutionary Bioinformatics vol.9, pp.127-136, 2013. Run Unix # edge.pl <-g genome> <-p ptt> <-r rnt> <-u reads> | Run Web # |
SARTools is a R package dedicated to the differential analysis of RNA-seq data. It provides tools to generate descriptive and diagnostic graphs, to run the differential analysis with one of the well known DESeq2 or edgeR packages and to export the results into easily readable tab-delimited files. It also facilitates the generation of a HTML report which displays all the figures produced, explains the statistical methods and gives the results of the differential analysis.
Remarque
ELPH is a general-purpose Gibbs sampler for finding motifs in a set of DNA or protein sequences. The program takes as input a set containing anywhere from a few dozen to thousands of sequences, and searches through them for the most common motif, assuming that each sequence contains one copy of the motif. We have used ELPH to find patterns such as ribosome binding sites (RBSs) and exon splicing enhancers (ESEs).
Remarque Run Unix # elph [options] OR
elph [-t ]
| Run Web # |
Version | MAJ | eLSA | | |
81a2ee0 | 2017-02-01 | Download | Doc |
The Extended Local Similarity Analysis (ELSA) tools subsequently F-transform and normalize the raw data (matrices of time series) and then calculate the Local Similarity (LS) Scores and/or Local Trend Scores. The tools then assess the statistical significance (P-values) of these correlation statistics using either permutation test or theoretical p-value approximation and filter out insignificant results. Finally, the tools construct a partially directed association network from significant associations.
Remarque Run Unix # eLSA_env | Run Web # |
Within EMBOSS you will find around 100 programs (applications). These are just some of the areas covered (Sequence alignment, Rapid database searching with sequence patterns,Protein motif identification, including domain analysis, Nucleotide sequence pattern analysis, for example to identify CpG islands or repeats, Codon usage analysis for small genomes, Rapid identification of sequence patterns in large scale sequence sets, Presentation tools for publication...)
Remarque
Programme de prediction de la conformation de boucles dans les proteines.
Remarque
ESPript, Easy Sequencing in Postscript, is a utility to generate a pretty PostScript output from aligned sequences.
Remarque Run Unix # ESPript | Run Web # |
ESPRIT is a pipeline for estimating species richness using large collections of 16S rRNA pyrosequences.
Remarque Run Unix # esprit_pc | Run Web # |
A set of sequence comparison tools (fasta36, ggsearch...) used for alignment and database searching.For example, fasta compares a protein sequence to another protein sequence or to a protein database, or a DNA sequence to another DNA sequence or a DNA library.
Remarque Run Unix # fasta36 | Run Web # |
fastPHASE: software for haplotype reconstruction, and estimating missing genotypes from population data
The program fastPHASE implements methods described in
Scheet, P and Stephens, M (2006). A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet
fastPHASE can handle larger data-sets than PHASE (e.g., hundreds of thousands of markers in thousands of individuals), but does not provide estimates of recombination rates. Our experiments suggest that haplotype estimates are slightly less accurate than from PHASE, but missing genotype estimates appear to be similar or even slightly better than PHASE.
Remarque Run Unix # fastPHASE [options] | Run Web # |
FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis.
Remarque Run Unix # fastqc ou fastqc seqfile1 seqfile2 .. seqfileN | Run Web # |
Simple FASTQ, SAM and BAM read quality assessment and plotting using Python.
Remarque Run Unix # fastqp [-h] | Run Web # |
Version | MAJ | Fastq_Screen | | |
0.4.4 | 2014-07-09 | Download | Doc |
Fastq screen is a simple application which allows you to
search a large sequence dataset against a panel of different
databases to build up a picture of where the sequences
in your data originate. It was built as a QC check for
sequencing pipelines but may also have uses in metagenomics
studies where mixed samples are expected.
Although the program wasn't built with any particular
technology in mind it is probably only really suitable for
processing short reads due to the use of bowtie/bowtie2 as
the searching application.
The program generates both text and graphical output to
tell you what proportion of your library was able to map,
either uniquely or in more than one location, against each
of the databases in your search set.
Remarque Run Unix # fastq_screen [OPTION]... [FastQ FILE]... | Run Web # |
The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
Next-Generation sequencing machines usually produce FASTA or FASTQ files, containing multiple short-reads sequences (possibly with quality information).
Remarque
Version | MAJ | FigTree | | |
1.4.0 | 2013-11-15 | Download | Doc |
FigTree is designed as a graphical viewer of phylogenetic trees and as a program for producing publication-ready figures. As with most of my programs, it was written for my own needs so may not be as polished and feature-complete as a commercial program. In particular it is designed to display summarized and annotated trees produced by BEAST.
Remarque Run Unix # figtree | Run Web # |
Version | MAJ | Filter pileup | | |
1.0.2 | | Download | Doc |
Allows one to find sequence variants and/or sites covered by a specified number of reads with bases above a set quality threshold. The tool works on six and ten column pileup formats produced with samtools pileup command. However, it also allows you to specify columns in the input file manually.
Remarque
FinchTV (Finch Trace Viewer), a cross-platform graphical viewer for chromatogram files.s
Remarque Run Unix # finchtv | Run Web # |
Version | MAJ | findtarget | | |
none | 2004-05-15 | Download | Doc |
Findtarget est un outil de comparaison génomique qui permet de cibler des gènes d'intérêts chez un micro-organisme dont le génome est séquencé. Il utilise des données issues de blast.
Remarque
Version | MAJ | FLASH | | |
1.2.11 | 2014-11-13 | Download | Doc |
FLASH, Fast Length Adjustment of SHort reads, is a very accurate fast tool
to merge paired-end reads from fragments that are shorter than twice the
length of reads. The extended length of reads has a significant positive
impact on improvement of genome assemblies.
Remarque Run Unix # flash [OPTIONS] MATES_1.FASTQ MATES_2.FASTQ
Run `flash --help | less' for more information. | Run Web # |
Version | MAJ | flux-simulator | | |
1.2.1 | 2013-07-15 | Download | Doc |
The Flux Simulator aims at modeling RNA-Seq experiments in silico: sequencing reads are produced from a reference genome according annotated transcripts. The simulation pipeline models different steps as modules, each with a minimal set of parameters that can be estimated by experimental parameters. The first step is-in fact-a transcriptome simulator. Subsequently, common sources of systematic bias in the abundance and distribution of produced reads are simulated by in silico library preparation and sequencing.
Remarque Run Unix # flux-simulator --help | Run Web # |
Version | MAJ | fmtseq | | |
1.2.2 | 2004-01-21 | Download | Doc |
Conversion de formats de sequence. Réimplémentation et extension du programme Readseq (conversion depuis et vers le format Clustalw et indication du format d'entrée.
Remarque Fait partie du paquetage seqio-1.2.2Run Unix # fmtseq | Run Web # |
FPC (fingerprinted contigs) is an interactive program for building contigs from fingerprinted clones, where the fingerprint for a clone is a set of restriction fragments.
Remarque
Version | MAJ | freebayes
| | |
v1.1.0-1-gf15e66e | 2017-02-16 | Download | Doc |
FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.
Remarque Citing freebayes:
Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907 [q-bio.GN] 2012
Run Unix # freebayes -f [REFERENCE] [OPTIONS] [BAM FILES] >[OUTPUT]
| Run Web # |
Find Rapidly OTUs with Galaxy Solution: FROGS is a galaxy/CLI workflow designed to produce an OTU count matrix from high depth sequencing amplicon data. This workflow is focused on: - User-friendliness with the integration in galaxy and lots of rich graphic outputs - Accuracy with a clustering without global similarity threshold, the management of multi-affiliations and management of separated PCRs in the chimera removal step - Speed with fast algorithms and an easy to use parallelisation - Scalability with algorithms designed to support the data growth
Remarque
Outils de reconnaissance de repliement
Remarque
FSA-BLAST is a new version of the popular BLAST (Basic Local Alignment Search Tool) bioinformatics tool, used to search genomic databases containing either protein or nucleotide sequences. FSA stands for Faster Search Algorithm; FSA-BLAST is twice as fast as NCBI-BLAST with no loss in accuracy.
Remarque Run Unix # formatdb, cluster, blast, readdb, ssearch | Run Web # |
GALF-P is a novel framework for TFBS identification (motif discovery) in DNA sequences. It consists of Genetic Algorithm with Local Filtering (GALF) and the post-processing procedure based on adaptive adding and removing. GALF-P achieves both effectiveness and efficiency, and provides reliable performance over the other state-of-art GA based approaches. The post-processing procedure is designed for zero or more TFBSs in each sequence.
Remarque Run Unix # GALF_P.o | Run Web # |
Version | MAJ | GapCloser | | |
1.12 | 2015-07-13 | Download | Doc |
GapCloser for SOAPdenovo
The GapCloser is designed to close the gaps emerging during the scaffolding process by SOAPdenovo or other assembler, using the abundant pair relationships of short reads.
GapCloser aims for large plant and animal genomes, although it also works well on bacteria and fungi genomes.
Remarque Run Unix # GapCloser [options] | Run Web # |
GASSST : Global Alignment Short Sequence Search Tool * GASSST finds global alignments of short DNA sequences against large DNA banks. * GASSST strong point is its ability to perform fast gapped alignments. * It works well for both short and longer reads. It currently has been tested for reads up to 500bp. * The software is freely available for download under the CECILL version 2 License.
Remarque http://bioinformatics.oxfordjournals.org/content/26/20/2534.abstract?keytype=ref&ijkey=f5zH80QsuCqixRHRun Unix # Gassst -d -i -o -p | Run Web # |
The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyze high-throughput sequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Remarque Run Unix # java -jar /usr/local/genome/gatk/GenomeAnalysisTK.jar -h | Run Web # |
Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis Gblocks eliminates poorly aligned positions and divergent regions of an alignment of DNA or protein sequences.
Remarque Run Unix # Gblocks | Run Web # |
The GEM library
(Also home to: The GEM mapper, The GEM RNA mapper, The GEM mappability, and others).
Next-generation sequencing platforms (Illumina/Solexa, ABI/SOLiD, etc.) call for powerful and very optimized tools to index/analyze huge genomes. The GEM library strives to be a true "next-generation" tool for handling any kind of sequence data, offering state-of-the-art algorithms and data structures specifically tailored to this demanding task. At the moment, efficient indexing and searching algorithms based on the Burrows-Wheeler transform (BWT) have been implemented.
Remarque
GeneClust is a piece of computer software which can be used as a tool for exploratory analysis of gene expression microarray data. The development of GeneClust was motivated by surging interest to search for interpretable biological structure in gene expression microarray data.
Remarque Run Unix # geneclust | Run Web # |
Multipoint analysis of pedigree data including: non-parametric linkage analysis, LOD-score computation, information-content mapping, haplotype reconstruction
Remarque
Version | MAJ | GenePRIMP | | |
0.3 | 2013-04-19 | Download | Doc |
Identification of anomalous gene calls
The GenePRIMP pipeline consists of a series of computational units that identify erroneous gene calls and missed genes, and then correct a subset of the identified anomalous features. The data input to GenePRIMP needs to be a file of gene calls in GenBank or EMBL format. As its output, GenePRIMP generates reports of identified anomalies, plus a corrected EMBL file.
Remarque Run Unix # geneprimp | Run Web # |
Version | MAJ | genewise | | |
2.2.0 | 2008-12-10 | Download | Doc |
Genewise permet de comparer une protéine à une banque d'ADN et en prédire sa structure, tout en se déchargeant des problèmes liés au sequencage et d'introns.
Remarque Run Unix # genewise | Run Web # |
GenomeThreader is a software tool to compute gene structure predictions. The gene structure predictions are calculated using a similarity-based approach where additional cDNA/EST and/or protein sequences are used to predict gene structures via spliced alignments. GenomeThreader was motivated by disabling limitations in GeneSeqer, a popular gene prediction program which is widely used for plant genome annotation.
Remarque Run Unix # gth [option ...] -genomic file [...] -cdna file [...] -protein file [...]
| Run Web # |
Remarque Run Unix # genscan | Run Web # |
GIMSAN (GIbbsMarkov with Significance ANalysis): a novel tool for de novo motif finding. GIMSAN combines GibbsMarkov, our variant of the Gibbs Sampler, described here for the first time, with our recently introduced significance analysis.
Remarque please cite: Patrick Ng, Uri Keich. GIMSAN: A Gibbs motif finder with significance analysis. Bioinformatics, 24 (19): 2256-2257, 2008. Run Unix # gimsan_submit_job.pl | Run Web # |
Version | MAJ | glimmer | | |
glimmer-3.02 | 2008-12-12 | Download | Doc |
Glimmer (Gene Locator and Interpolated Markov ModelER) prédit la position des gènes dans une séquence d'ADN (bactérie, archae, virus) en s'appuyant sur des modèles de Markov.
Remarque Run Unix # glimmer3 | Run Web # |
Version | MAJ | GMAP/GSNAP | | |
2013-10-25 | 2013-10-28 | Download | Doc |
GMAP (genomic mapping and alignment program for mRNA and EST sequences): gmap, a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with
minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms
and sequence errors, without using probabilistic splice site models.
GSNAP (Genomic Short-read Nucleotide Alignment Program): GSNAP implements computational methods for fast detection of complex variants and splicing in short reads, based on a successively constrained
search process of merging and filtering position lists from a genomic index. It can align both single- and paired-end reads as short as 14 nt and of arbitrarily long length. It can detect short- and
long-distance splicing, including interchromosomal splicing, in individual reads, using probabilistic models or a database of known splice sites.
Remarque Run Unix # gmap [OPTIONS...]
| Run Web # |
Version | MAJ | gmorse | | |
1.0 | 2009-08-05 | Download | Doc |
G-Mo.R-Se is a method aimed at using RNA-Seq short reads to build de novo gene models. First, candidate exons are built directly from the positions of the reads mapped on the genome (without any ab initio assembly of the reads), and all the possible splice junctions between those exons are tested against unmapped reads : the testing of junctions is directed by the information available in the RNA-Seq dataset rather than a priori knowledge about the genome. Exons can thus be chained into stranded gene models.
Remarque Run Unix # gmorse -h | Run Web # |
Goby is a next-gen data management framework designed to facilitate the implementation of efficient next-gen data analysis pipelines. Goby provides compressed file formats that are time and space efficient. It also provides a few utilities that support the most common secondary data analyses
Remarque
Version | MAJ | GORIV | | |
1.0 | 2008-01-21 | Download | Doc |
Méthode de prédiction de la structure secondaire des protéines à partir de la séquence en acides aminés.
Remarque Run Unix # gorIV | Run Web # |
GRAPe is a tool for computing genome re-alignment using marginalized posterior decoding.sTo answer this question, GRAPe uses the Marginalized Posterior Decoding (MPD) algorithm which uses the posterior distribution of alignments to optimize the correct assignment of homology of individual nucleotides, instead of finding a single most probable alignment. Simulations show that the MPD algorithm has higher sensitivity and specificity than the Viterbi and Needleman-Wunsch algorithms.
Remarque Run Unix # grape | Run Web # |
Version | MAJ | grepseq | | |
1.2.2 | 2004-01-21 | Download | Doc |
The `grepseq' program takes a keyword which can contain ambiguous characters and character classes (also called a fixed-width motif) and then searches files and databases for exact or approximate matches to that keyword. The program produces one of two kinds of output, either a list of the matching sequences with the places where the keyword matched, or the complete entries of sequences containing matches, where each entry is annotated with the places where the matches occur.
Remarque Fait partie de seqio Run Unix # grepseq | Run Web # |
GRIL is a tool to detect the locations of genomic rearrangements in a set of sequences.
Remarque
The HH-suite is an open-source software package for highly sensitive sequence searching and sequence alignment. Its two most important programs are HHsearch and HHblits. Both are based on the pairwise comparison of profile hidden Markov models (HMMs).
Remarque
HISAT is a fast and sensitive spliced alignment program for mapping RNA-seq reads. In addition to one global FM index that represents a whole genome, HISAT uses a large set of small FM indexes that collectively cover the whole genome (each index represents a genomic region of ~64,000 bp and ~48,000 indexes are needed to cover the human genome). These small indexes (called local indexes) combined with several alignment strategies enable effective alignment of RNA-seq reads, in particular, reads spanning multiple exons. The memory footprint of HISAT is relatively low (~4.3GB for the human genome). We have developed HISAT based on the Bowtie2 implementation to handle most of the operations on the FM index.
Remarque Run Unix # hisat2 [options]* -x {-1 -2 | -U | --sra-acc } [-S ] | Run Web # |
HMMER: profile HMMs for protein sequence analysis Profile hidden Markov models (profile HMMs) can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus.
Remarque
Prediction of transmembrane helices and topology for transmembrane proteins using hidden Markov models
Remarque Run Unix # hmmtop | Run Web # |
Html4blast est un logiciel d'analyse et de présentation des résultats de Blast.
Remarque Utilsie par findtarget
This novel version of i-ADHoRe is designed to detect genomic homology in extremely large-scale data sets. Along with several under-the hood-improvements, resulting in a 30 fold reduction in runtime over previous versions, the
implementation of multithreading and MPI now enables i-ADHoRe to take advantage of a parallel computing platform. As the scale of the data sets increased, the need for a new alignment algorithm able to cope with dozens of genomic
segments became apparent. Therefore a new greedy graph based alignment algorithm has been implemented (described in Fostier et al., 2011), allowing analysis of even the largest data sets currently available.
Remarque Run Unix # i-adhore | Run Web # |
iCORN (iterative correction of reference nucleotides) can correct genome sequences with short reads. Reads are mapped iteratively against the genome sequences, so far by SSAHA. Discrepancies between the multiple alignments of the mapping reads and reference are corrected, if by the correction the amount of perfect mapping reads doesn't decrease.
Remarque Run Unix # cf. http://icorn.sourceforge.net/example.html | Run Web # |
Version | MAJ | idba | | |
1.1.1 | 2015-12-02 | Download | Doc |
IDBA is a practical iterative De Bruijn Graph De Novo Assembler for sequence assembly in bioinfomatics. Most assemblers based on de Bruijn graph build a de Bruijn graph with a specific k to perform the assembling task. For all of them, it is very crucial to find a specific value of k. If k is too large, there will be a lot of gap problems in the graph. If k is too small, there will a lot of branch problems. IDBA uses not only one specific k but a range of k values to build the iterative de Bruijn graph. It can keep all the information in graphs with different k values. So, it will perform better than other assemblers.
Remarque If you use our assembler in your research, please cite our papers.
Peng, Y., et al. (2010) IDBA- A Practical Iterative de Bruijn Graph De Novo Assembler. RECOMB. Lisbon.Run Unix # idba_ud -r read.fa -o output_dir | Run Web # |
The Integrative Genomics Viewer (IGV) is a high-performance visualization tool for interactive exploration of large, integrated genomic datasets. It supports a wide variety of data types, including array-based and next-generation sequence data, and genomic annotations.
Remarque To cite your use of IGV in your publication:
James T. Robinson, Helga Thorvaldsdóttir, Wendy Winckler, Mitchell Guttman, Eric S. Lander, Gad Getz, Jill P. Mesirov. Integrative Genomics Viewer. Nature Biotechnology 29, 24–26 (2011)
Version | MAJ | Illumina CASAVA-1.8 FASTQ Filter | | |
0.1 | 2014-04-30 | Download | Doc |
The recent version of Illumina's CASAVA pipeline (Version 1.8) produces FASTQ files with both reads that pass filtering and reads that don't.
The new READ-ID (the @ line) contains many new fields, one of them indicates whether the read is filtered or not.
This program can filter FASTQ files produced by CASAVA 1.8, and keep/discard reads based on this filter flag.
Remarque Run Unix # fastq_illumina_filter -h | Run Web # |
Version | MAJ | IM-TORNADO | | |
2.0.3.3 | 2016-02-22 | Download | Doc |
Illumina paired-end sequencing, which produces two separate reads for each
DNA fragment, has become the platform of choice for 16S rDNA hypervariable
tag sequencing. However, when the two reads do not overlap, existing
computational pipelines analyze data from read separately and underutilize
the information contained in the paired-end reads. IM-TORNADO is a tool for
processing non-overlapping reads while retaining maximal information content.
Remarque If you use IM-TORNADO for your project, please cite the following manuscript:
Jeraldo P, Kalari K, Chen X, Bhavsar J, Mangalam A, White B, et al. IM-TORNADO: A Tool for Comparison of 16S Reads from Paired-End Libraries. PLOS ONE 9 (12):e114804. Available from: http://dx.plos.org/10.1371/journal.pone.0114804
indel-Seq-Gen (iSG) is a biological sequence simulation program that simulates highly divergent DNA sequences and protein superfamilies. This is accomplished through the addition of subsequence length constraints and lineage- and site-specific evolution. iSG tracks insertion and deletion processes that occur during the simulation run. iSG records all evolutionary events and outputs the "true" multiple alignment of the sequences, and can generate a larger simulated sequence space by allowing the use of multiple related root sequences. iSG can be used to test the accuracy of multiple alignment methods, evolutionary hypotheses, ancestral protein reconstruction methods, and protein superfamily classification methods. iSG utilizes a highly modified version of the substitution engine from Seq-Gen v1.3.2.
Remarque Run Unix # indel-seq-gen [-bdefghilmnoqsuwz] < [tree_file] (indel-seq-gen -h) | Run Web # |
This is a novel mining pipeline (2009), Integrative Next-generation Genome Analysis Pipeline (inGAP), guided by a Bayesian principle to detect single nucleotide polymorphisms (SNPs), insertion/deletions (indels) by comparing high-throughput pyrosequencing reads with a reference genome of related organisms. inGAP can be applied to the mapping of both Roche/454 and Illumina reads with no restriction of read length.
Remarque Run Unix # inGAP | Run Web # |
InParanoid is a program for automatic identification of orthologs while differentiating between inparalogs and outparalogs. An InParanoid cluster is seeded by a reciprocally bestmatching ortholog pair, around which inparalogs are gathered independently, while outparalogs are excluded. The InParanoid database is a collection of pairwise ortholog groups aiming to include all 'completely sequenced' eukaryotic genomes. By this we mean above 6X coverage, and less than 1% X letters in the protein sequences.
Remarque Run Unix # Usage: inparanoid.pl [FASTAFILE with sequences of species C] | Run Web # |
Version | MAJ | Insyght | | |
| 2014-01-01 | Download | Doc |
Insyght is genomic visualisation tool that combines a symbolic and a proportional view of the genes, syntenies and genomic regions. Another of Insyght's feature is synchronized navigation and zooming across multiple species.
Remarque
JAGS is Just Another Gibbs Sampler. It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation not wholly unlike BUGS. JAGS was written with three aims in mind:
To have a cross-platform engine for the BUGS language
To be extensible, allowing users to write their own functions, distributions and samplers.
To be a plaftorm for experimentation with ideas in Bayesian modelling
Remarque
Jalview is a multiple alignment editor
Remarque Run Unix # jalview | Run Web # |
JELLYFISH is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence. JELLYFISH can count k-mers using an order of magnitude less memory and an order of magnitude faster than other k-mer counting packages by using an efficient encoding of a hash table and by exploiting the "compare-and-swap" CPU instruction to increase parallelism.
JELLYFISH is a command-line program that reads FASTA and multi-FASTA files containing DNA sequences. It outputs its k-mer counts in an binary format, which can be translated into a human-readable text format using the "jellyfish dump" command. See the documentation below for more details.
Remarque If you use JELLYFISH in your research, please cite:
Guillaume Marcais and Carl Kingsford, A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics (2011) 27(6): 764-770 (first published online January 7, 2011) doi:10.1093/bioinformatics/btr011 Run Unix # jellyfish
| Run Web # |
Julia is a high-level, high-performance dynamic programming language for
technical computing, with syntax that is familiar to users of other technical
computing environments.
It is a very performant programming language somehow similar to R, Matlab or
Python, but with performances approaching those of C/Fortran.
Remarque Run Unix # julia | Run Web # |
Version | MAJ | kaiju | | |
1.5.0 | 2017-05-14 | Download | Doc |
Kaiju is a program for the taxonomic classification of metagenomic high-throughput sequencing reads. Each read is directly assigned to a taxon within the NCBI taxonomy by comparing it to a reference database containing microbial and viral protein sequences.
Remarque Citation
Menzel P., Ng K.L., Krogh A. (2016) Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat. Commun. 7:11257Run Unix # kaiju -t nodes.dmp -f kaiju_db.fmi -i reads.fastq [-j reads2.fastq] | Run Web # |
Version | MAJ | kaksi | | |
2.3rc1 | 2008-07-01 | Download | Doc |
Kaksi est un outil d'assignation des structures secondaires. D'après un fichier PDB contenant les coordonnées atomiques d'une protéine, kaksi définit la position des hélices alpha et des feuillets beta. La méthode d'assignation utilise les distances entre carbones alpha et les angles dièdres phi/psi du squelette protéique. Un calcul d'axes permet d'assurer la régularité des hélices assignées : une hélice présentant un coude sera décrite sous la forme de deux hélices distinctes. Les paramètres de détection -valeurs tolérées pour les distances et les angles- peuvent être modifiés en ligne de commande (se reporter à Martin et al, BMC Structural Biology 2005 pour une discussion détaillée du choix des paramètres). Les résultats sont retournés à l'utilisateur sous forme d'un fichier xml. Un utilitaire permettant d'extraire les principales informations au format fasta est fourni avec le programme.
Remarque Run Unix # kaksi -pf my_pdb_file.pdb | Run Web # |
Outil pour l'extraction d'information temporelle sur les gènes à partir de corpus de textes.
Remarque
kClust is a fast and sensitive clustering method for the clustering of protein sequences. It is able to cluster large protein databases down to 20-30% sequence identity.
kClust generates a clustering where each cluster is represented by its longest sequence (representative sequence).
Remarque For generating one multiple sequence alignment file for each cluster, please use kClust_mkAln. Type kClust_mkAlnRun Unix # kClust -i [fasta-db-file] -d [directory] [options] | Run Web # |
The khmer software is a set of command-line tools for working with DNA
shotgun sequencing data from genomes, transcriptomes, metagenomes, and
single cells. khmer can make de novo assemblies faster, and sometimes better. khmer can also identify (and fix) problems with shotgun data.
Remarque
Version | MAJ | Klast | | |
4.4 | 2015-04-24 | Download | Doc |
KLAST is a fast, accurate and NGS scalable bank-to-bank sequence similarity search tool providing significant accelerations of seeds-based heuristic comparison methods, such as the Blast suite of algorithms. Relying on unique software architecture, KLAST takes full advantage of recent multi-core personal computers without requiring any additional hardware devices.
Remarque
Version | MAJ | kmergenie | | |
1.6663 | 2014-06-23 | Download | Doc |
KmerGenie estimates the best k-mer length for genome de novo assembly. Given a set of reads, KmerGenie first computes the k-mer abundance histogram for many values of k. Then, for each value of k, it predicts the number of distinct genomic k-mers in the dataset, and returns the k-mer length which maximizes this number. Experiments show that KmerGenie's choices lead to assemblies that are close to the best possible over all k-mer lengths.
Remarque Run Unix # kmergenie [options] | Run Web # |
raken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.
Remarque If you use Kraken in your research, please cite our paper; the citation
is available on the Kraken website.Run Unix # kraken [options] | Run Web # |
Krona Tools is a set of scripts to create Krona charts from several
Bioinformatics tools as well as from text and XML files.
Remarque
Indentify SNPs in a set of genome sequences without the requirement of a reference sequence or a multiple sequence alignment.
Reconstruction of SNP based phylogenies by maximum likelihood.
Remarque Run Unix # kSNP -k kmer_length -f fasta -d output_directory [-p genomes4positions_list] [-u unassembled_genomes_list] [-m minimum_fraction_genomes_with_locus] [-G genbank.gbk] [-n num_CPU] [-j ] [-v ] [-c min_kmer_coverage]
| Run Web # |
LALNVIEW is a graphical program for visualizing local alignments between two sequences (protein or nucleic acids) [reference]. Sequences are represented by colored rectangles to give an overall picture of the similarities between the two sequences. Blocks of similarity between the two sequences are colored according to the degree of identity between the two segments.
Remarque Run Unix # lalnview | Run Web # |
LAST: Genome-Scale Sequence Comparison
LAST finds similar regions between sequences, and aligns them. It is designed for comparing large datasets to each other (e.g. vertebrate genomes and/or large numbers of DNA reads). It can:
Remarque
Version | MAJ | LEfSe | | |
| 2014-12-24 | Download | Doc |
LEfSe (Linear discriminant analysis Effect Size) determines the features (organisms, clades, operational taxonomic units, genes, or functions) most likely to explain differences between classes by coupling standard tests for statistical significance with additional tests encoding biological consistency and effect relevance. LEfSe is available as a Galaxy module, and as a bitbucket repository. For additional information, please refer to the LEfSe paper. We provide support for LEfSe users. Please join our Google group designated specifically for LEfSe users. F
Remarque
The core of the LINKAGE package is a series of programs for maximum likelihood estimation of recombination rates, calculation of lod score tables, and analysis of genetic risks.
Remarque linkmapslinkmap.tracesmakepedspreplinksilinkslodscoresmlinkRun Unix # preplink ou linkmap ... | Run Web # |
Version | MAJ | loco | | |
0.990329 | | Download | Doc |
Remarque
Version | MAJ | macs | | |
1.4.2 | 2013-05-16 | Download | Doc |
Next generation parallel sequencing technologies made chromatin immunoprecipitation followed by sequencing (ChIP-Seq) a popular strategy to study genome-wide protein-DNA interactions, while creating challenges for analysis algorithms. We present Model-based Analysis of ChIP-Seq (MACS) on short reads sequencers such as Genome Analyzer (Illumina / Solexa). MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence,
allowing for more sensitive and robust prediction. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, is publicly available open source, and can be used for ChIP-Seq with or without control samples.
Remarque Run Unix # macs14 <-t tfile> [-n name] [-g genomesize] [options] | Run Web # |
MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods.
Remarque Run Unix # mafft [options] input > output | Run Web # |
Ce programme permet la détection de séquence d'ADN ribosomal 16S chimère (Une chimère correspond à la fusion de plusieurs séquences d'ADN r 16S).
Remarque http://www.ncbi.nlm.nih.gov/sites/entrez?Db=pubmed&Cmd=ShowDetailView&TermToSearch=16957188&ordinalpos=1&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_RVDocSumRun Unix # mallard | Run Web # |
Multiple Alignment with N Gapped OligossMANGO: A NEW APPROACH TO MULTIPLE SEQUENCE ALIGNMENT
Remarque Please use four scripts provided:smang8: MANGO with 8 seeds, without refinement;smang8r: MANGO with 8 seeds, with refinement;smang90: MANGO with 90 seeds, without refinement;smang90r: MANGO with 90 seeds, with refinement;sRun Unix # mang8 ; mang8r ; mang90 ; mang90r | Run Web # |
Mapsembler is a targeted assembly software. It takes as input a set of NGS raw reads and a set of input sequences (starters). It first determines if each starter is read-coherent, e.g. whether reads confirm the presence of each starter in the original sequence. Then for each read-coherent starter, Mapsembler outputs its sequence neighborhood as a linear sequence or as a graph, depending on the user choice.
Remarque Citation:
Peterlongo, P., & Chikhi, R. (2012). Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer. BMC Bioinformatics, 13(1), 48. doi:10.1186/1471-2105-13-48. Run Unix # mapsembler [-m value] [-o output] [-k value] [-i value] [-e value] [-d value] [-t value] [-E value] [-Clrgfcvsh] | Run Web # |
MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery
MapSplice est un algorithme de seconde génération de détection de sites
d'épissage alternatifs. Son objectif est de détecter les sites d'épissage de
façon sensible et spécifique en maintenant une bonne efficacité au niveau
CPU et mémoire. MapSplice peut être appliqué aux reads courts (>75 pb) et
long (75 pb). Il ne dépend ni des caractéristiques du site d'épissage ni de
la longueur de l'intron, par conséquent, il peut détecter de nouveaux sites
canoniques et non-canoniques d'épissage. MapSplice s'appuie sur la qualité
et la diversité d'alignements des reads pour augmenter la précision de
détection des sites d'épissage.
Remarque Publication
MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery
Kai Wang; Darshan Singh; Zheng Zeng; Stephen J. Coleman; Yan Huang; Gleb L. Savich; Xiaping He; Piotr Mieczkowski; Sara A. Grimm; Charles M. Perou; James N. MacLeod; Derek Y. Chiang; Jan F. Prins; Jinze Liu
Nucleic Acids Research 2010; doi: 10.1093/nar/gkq622Run Unix # python /usr/local/genome/MapSplice_1.15.2/bin/mapsplice_segments.py MapSplice.cfg | Run Web # |
Maq is a software that builds mapping assemblies from short reads generated by the next-generation sequencing machines. It is particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has a preliminary functionality to handle AB SOLiD data.
Remarque
Mascot est un outil de recherche puissant qui utilise des données de spéctrométrie de masse pour identifier des protéines à partir de séquences primaires des bases de données.
Remarque Accès restreint
MaSuRCA is whole genome assembly software. It combines the efficiency of the de Bruijn graph and Overlap-Layout-Consensus (OLC) approaches. MaSuRCA can assemble data sets containing only short reads from Illumina sequencing or a mixture of short reads and long reads (Sanger, 454).
Remarque
Matrix2png is a simple but powerful program for making visualizations of microarray data and many other data types. It generates PNG formatted images from text files of data. It is fast, easy to use, and reasonably flexible. It can be used to generate publication-quality images, or to act as a image generator for web applications. Our group has found it useful for imaging all kinds of matrix-based data, not just microarray data.
Remarque If you use images created with matrix2png for publication or presentation, please cite:Pavlidis, P. and Noble W.S. (2003) Matrix2png: A Utility for Visualizing Matrix Data. Bioinformatics 19: 295-296 (abstract).Readers of the Bioinformatics application note: Here is the color version of the figure from the paper (pdf format).Run Unix # matrix2png | Run Web # |
Multiple Alignment of Conserved Regions in Genome Sequences
Remarque Run Unix # mauve | Run Web # |
Version | MAJ | MaxBin | | |
2.2.1 | 2017-01-17 | Download | Doc |
MaxBin is a software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm. Users could understand the underlying bins (genomes) of the microbes in their metagenomes by simply providing assembled metagenomic sequences and the reads coverage information or sequencing reads. For users' convenience MaxBin will report genome-related statistics, including estimated completeness, GC content and genome size in the binning summary page. Users could use MEGAN or similar software on MaxBin bins to find out the taxonomy of each bin after the binning process is finished.
Remarque Run Unix # run_MaxBin.pl
-contig (contig file)
-out (output file)
| Run Web # |
The MCL algorithm is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for networks (also known as graphs) based on simulation of (stochastic) flow in graphs.
Remarque Run Unix # mcl <-|file name> [options], do 'mcl -h' or 'man mcl' for help | Run Web # |
Mega2 est un logiciel qui sert à partir de trois fichiers d'entrée (pedigree, carte et locus) à créer tous les fichiers nécessaires à l'utilisation de logiciels d'analyse de liaison, d'haplotypes, d'IBD etc.. comme simwalk2, genehunter, vitesse, TDT, SAGE, Allegro ou encore Mendel. Sans Mega2, il faut formater tous les input ce qui est long, fastidieux et source d'erreurs...
Remarque If you use Mega2 as part of a published work, please remember to reference Mega2. You may reference it by citing the following: Mukhopadhyay N, Almasy L, Schroeder M, Mulvihill WP, Weeks DE (2005) Mega2: data-handling for facilitating genetic linkage and association analyses. Bioinformatics. 2005 May 15;21(10):2556-7. PMID: 15746282 Run Unix # mega2 | Run Web # |
Version | MAJ | Megahit | | |
1.0.4-beta | 2016-03-17 | Download | Doc |
MEGAHIT: An ultra-fast single-node solution for large and complex
metagenomics assembly via succinct de Bruijn graph
Remarque Run Unix # megahit [options] {-1 -2 | --12 | -r } [-o ] | Run Web # |
MEGAN - Metagenome Analysis Software
Remarque Run Unix # megan | Run Web # |
Version | MAJ | memrec | | |
1.11 | 2014-11-12 | Download | Doc |
The memrec (memory usage recorder) script is a tool we've written to watch the memory usage of a program.
Remarque Run Unix # memrec [opts] prog | Run Web # |
Transmembrane Protein Modelling
Remarque Run Unix # memsat3 "query" "database" ou runmemsat.sh "query" "database" | Run Web # |
MERLIN est un package qui permet d'effectuer des analyses génétiques rapides de pedigrees (analyses de liaison, d'association, haplotypes...).
Remarque Run Unix # merlin | Run Web # |
Gene Finding Program for Metagenomics MetaGene predicts prokaryotic genes on anonymous genomic sequences. Fragmented sequences (longer than 100 bp) can be accepted.
Remarque Run Unix # metagene [multi-fasta] | Run Web # |
Version améliorée du programe d'annotation de données métagénomiques Metagene. Prediction de genes procaryotes à partir d'un génome ou d'un set de génomes anonymes. Particulierement adapté aux analyses métagénomiques.
Remarque Run Unix # metageneannotator | Run Web # |
MetaSim - A Sequencing Simulator for Genomics and Metagenomics
Remarque f you use this program for your own research please cite our software. Publication: Richter DC, Ott F, Auch AF, Schmid R, Huson DH (2008) MetaSim—A Sequencing Simulator for Genomics and Metagenomics. PLoS ONE 3(10): e3373. doi:10.1371/journal.pone.0003373 Run Unix # MetaSim | Run Web # |
Multiple Genome Aligner (MGA for short) computes multiple genome alignments of large, closely related DNA sequences.
Remarque
MGLTools is a software developed at the Molecular Graphics Laboratory (MGL) of The Scripps Research Institute for visualization and analysis of molecular structures. Short description and demo of its three main applications are given below. Navigation portlet on the left has links to downloads, screenshots, documentation section of this website where you can find more information about MGLTools. Please visit MGL Bugzilla to submit a bug report or to request a new feature.
Remarque Run Unix # pmv, adt, vision | Run Web # |
Version | MAJ | micca
| | |
1.5.0 | 2017-02-27 | Download | Doc |
micca (MICrobial Community Analysis) is a software pipeline for the
processing of amplicon sequencing data, from raw sequences to OTU tables,
taxonomy classification and phylogenetic tree inference. The pipeline can be
applied to a range of highly conserved genes/spacers, such as 16S rRNA gene,
Internal Transcribed Spacer (ITS) and 28S rRNA.
Remarque Run Unix # micca [--version] [--help] []
| Run Web # |
Version | MAJ | minia | | |
1.4683 | 2013-02-21 | Download | Doc |
Minia is a short-read assembler based on a de Bruijn graph, capable of assembling a human genome on a desktop computer in a day. The output of Minia is a set of contigs. Minia produces results of similar contiguity and accuracy to other de Bruijn assemblers (e.g. Velvet).
Remarque PDF and Citation
R. Chikhi, G. Rizk. Space-efficient and exact de Bruijn graph representation based on a Bloom filter, WABI 2012Run Unix # minia fasta_file kmer_size min_abundance estimated_genome_size prefix | Run Web # |
MIRA is a Whole Genome Shotgun and EST Sequence Assembler for Sanger, 454 and Solexa / Illumina. It can perform Hybrid de-novo assemblies as well as SNP and mutations discovery for mapping assemblies.
Remarque
miRanda is an algorithm for the detection of potential microRNA target sites in genomic sequences. miRanda reads RNA sequences (such as microRNAs) from file1 and genomic DNA/RNA sequences from file2. Both of these files should be in FASTA format.
Remarque Run Unix # miranda file1 file2 [options..] | Run Web # |
documentation
miRDeep2 documentation
What is miRDeep2
miRDeep2 is a software package for identification of novel and known miRNAs in deep sequencing data. Furthermore, it can be used for miRNA expression profiling across samples. Last, a new module for preprocessing of raw Illumina sequencing data produces files for downstream analysis with the miRDeep2 or quantifier module. Colorspace sequencing data is currently not supported by the preprocessing module but it is planed to be implemented. Preprocessing is performed with the mapper.pl script. Quantification and expression profiling is done by the quantifier.pl script. miRNA identification is done by the miRDeep2.pl script.
Remarque Run Unix # miRDeep2.pl | Run Web # |
Version | MAJ | MIReNA | | |
2.0 | 2012-09-05 | Download | Doc |
Remarque
Version | MAJ | mktrace | | |
0.001017 | 2005-07-30 | Download | Doc |
This program reads a FASTA file and creates a chromatogram stored in an SCF file and a corresponding phd file. The SCF file contains minimal information at this time. If a quality value FASTA file exists, mktrace uses those quality values in the phd file, otherwise it sets the quality values to the pre-determined values. mktrace produces a fake trace that could be used by Phred/Phrap packages.
Remarque Fait parti du package consedRun Unix # mktrace G0771A003_114.s1.seq G0771A003_114.s1.scf | Run Web # |
Version | MAJ | mmseq
| | |
0.11.2 | 2012-11-20 | Download | Doc |
MMSEQ: haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads
pipeline
The flowchart to the right depicts the MMSEQ pipeline for obtaining expression estimates from RNA-seq data. There are two routes, with starting points labelled A and B. Route A is quite fast and straightforward to run and uses pre-existing transcript sequences for alignment. Route B requires more time, as it involves the creation of custom transcript sequences based on the data.
Remarque Please cite Turro et al. 2011 (Genome Biology) if you use MMSEQ in your work.
http://dx.doi.org/10.1186/gb-2011-12-2-r13Run Unix # mmseq [OPTIONS...] hits_file output_base | Run Web # |
Version | MAJ | MMSEQ | | |
1.0.2 | 2013-09-02 | Download | Doc |
MMSEQ: haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads
Remarque Please cite Turro et al. 2011 (http://dx.doi.org/10.1186/gb-2011-12-2-r13)Run Unix # mmseq / bam2hits | Run Web # |
The Molecular Modelling Toolkit (MMTK) is an Open Source program library for molecular simulation applications. In addition to providing ready-to-use implementations of standard algorithms, MMTK serves as a code basis that can be e
Remarque
MOCAT is a package for analyzing metagenomics datasets. Currently MOCAT supports Illumina single- and paired-end reads in raw FastQ format.
Remarque Jens Roat Kultima & Shinichi Sunagawa (Bork Group, EMBL)Run Unix # MOCAT.pl -sf|sample_file 'FILE' [Pipeline, Statistics, & Additional Options] | Run Web # |
Version | MAJ | modelgenerator | | |
0.85 | 2011-03-15 | Download | Doc |
ModelGenerator is a model selection program that selects optimal amino acid and nucleotide substitution models from Fasta or Phylip alignments. ModelGenerator supports 56 nucleotide and 96 amino acid substitution models.
Remarque Run Unix # modelgenerator | Run Web # |
MODELLER is used for homology or comparative modeling of protein three-dimensional structures (1,2). The user provides an alignment of a sequence to be modeled with known related structures and MODELLER automatically calculates a model containing all non-hydrogen atoms.
Remarque Run Unix # usage: mod9.16 script [...] | Run Web # |
Modeltest est un programme qui évalue différents tests de rapport de vraisemblance de modèles d'évolution dans le but de choisir le modèle le plus approprié aux données.
Remarque Run Unix # modeltest | Run Web # |
Program MOLE is an universal toolkit for rapid and fully automated location and characterization of channels, tunnels and pores in molecular structures. The core of MOLE algorithm is a Dijsktra path search algorithm, which is applied to a Voronoi mesh. MOLE is a powerful software (overcomming some limitations of CAVER tool) for exploring large molecular channels, complex networks of channels and molecular dynamics trajectories (AMBER ascii traj and parm7 are supported) in which analysis of a large number of snapshots is required.
Remarque Run Unix # Mole.exe | Run Web # |
MolScript is a program for displaying molecular 3D structures, such as proteins, in both schematic and detailed representations.
Remarque Run Unix # molscript | Run Web # |
Version | MAJ | MOSAIK assembler | | |
1.1.0021 | 2011-06-06 | Download | Doc |
MOSAIK is a reference-guided assembler comprising of four main modular programs: * MosaikBuild * MosaikAligner * MosaikSort * MosaikAssembler. MosaikBuild converts various sequence formats into Mosaik’s native read format. MosaikAligner pairwise aligns each read to a specified series of reference sequences. MosaikSort resolves paired-end reads and sorts the alignments by the reference sequence coordinates. Finally, MosaikAssembler parses the sorted alignment archive and produces a multiple sequence alignment which is then saved into an assembly file format.
Remarque Run Unix # MosaikAligner MosaikAssembler MosaikBuild MosaikCoverage MosaikDupSnoop MosaikJump MosaikMerge MosaikSort MosaikText | Run Web # |
The goal of mothur is to have a single resource to analyze molecular data that is used by microbial ecologists. Many of these tools are available elsewhere as individual programs and as scripts, which tend to be slow or as web utilities, which limit your ability to analyze your data. mothur offers the ability to go from raw sequences to the generation of visualization tools to describe α and β diversity. Examples of each command are provided within their specific pages, but several users have provided several analysis examples, which use these commands. An exhaustive list of the commands found in mothur is available within the commands category index.
Remarque Run Unix # mothur | Run Web # |
MPscan: fast localisation of multiple reads in genomes
Remarque Please cite THIS paper if you use MPscan. Rivals E., Salmela L., Kiiskinen P., Kalsi P., Tarhio J.Lecture Notes in BioInformatics (LNBI), Springer-Verlag, Vol. 5724, p. 246-260, 2009. Run Unix # mpscan -h | Run Web # |
MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. MrBayes uses Markov chain Monte Carlo (MCMC) methods to estimate the posterior distribution of model parameters.
Remarque
mrFAST is a read mapper that is designed to map short reads to reference genome with a special emphasis on the discovery of structural variation and segmental duplications. mrFAST maps short reads with respect to user defined error threshold, including indels up to 4+4 bp. This manual, describes how to choose the parameters and tune mrFAST with respect to the library settings. mrFAST is designed to find 'all' mappings for a given set of reads, however it can return one "best" map location if the relevant parameter is invoked.
NOTE: mrFAST is developed for Illumina, thus requires all reads to be at the same length. For paired-end reads, lengths of mates may be different from each other, but each "side" should have a uniform length.
Remarque Personalized copy number and segmental duplication maps using next-generation sequencing. Can Alkan, Jeffrey M. Kidd, Tomas Marques-Bonet, Gozde Aksay, Francesca Antonacci, Fereydoun Hormozdiari, Jacob O. Kitzman, Carl Baker, Maika Malig, Onur Mutlu, S. Cenk Sahinalp, Richard A. Gibbs, Evan E. Eichler. Nature Genetics, Oct, 41(10):1061-1067, 2009.
Table of Contents
Sample Set
General
Indexing
Single Genome Mode
Batch Mode
Mapping
Single-end Reads - Single Mode
Single-end Reads - Batch Mode
Paired-end Reads
Discordant Paired-end Reads
Output Format
Sample Set
A sample genome FASTA file, with simulated reads and a command line to map in paired-end mode is supplied. Please download the sample set.
General
Please download the latest version from our download page and then unzip the downloaded file. Run 'make' to build mrFAST.
mrFAST
generates an index of the reference genome(s) and
maps the reads to reference genome.
Requirements:
zlib for the ability to read compressed FASTQ and write compressed SAM files.
C compiler (mrFAST is developed with gcc versions > 4.1.2)
Building:
On Unix/Linux systems, we recommend using GNU gcc version > 4.1.2 as your compiler and type 'make' to build.
Example:
linux> make
gcc -c -O3 baseFAST.c -o baseFAST.o
gcc -c -O3 CommandLineParser.c -o CommandLineParser.o
gcc -c -O3 Common.c -o Common.o
gcc -c -O3 HashTable.c -o HashTable.o
gcc -c -O3 MrFAST.c -o MrFAST.o
gcc -c -O3 Output.c -o Output.o
gcc -c -O3 Reads.c -o Reads.o
gcc -c -O3 RefGenome.c -o RefGenome.o
gcc baseFAST.o CommandLineParser.o Common.o HashTable.o MrFAST.o Output.o Reads.o RefGenome.o
-o mrFAST -lz -lm
rm -rf *.o
Parallelization: The best way to optimize mrFAST is to split the reads into chunks that fit into the memory of the cluster nodes, and implement an MPI wrapper in an embarrassingly parallel fashion. We recommend the following criteria to split the reads:
Single End Mode: The number of reads should be approximately ((M-600)/(4*L)) million where M is the size of the memory for the cluster node (in megabytes) and L is the read length. If you have more nodes, you can make the chunks smaller to use the nodes efficiently. For example, if the library length is 50bp and the memory of nodes is 2 GB, each chunk should contain (2000-600)/(4*50)= 7 million reads.
Paired End Mode: The number of reads in each file should not exceed 1 million (500,000 pairs), however chunk size of 500,000 reads (250,000 pairs) is recommended.
To see the list of options, use "-h" or "--help".
To see the version of mrFAST, user "-v" or "--version".
Indexing
mrFAST's indices can be generated in two modes (single, batch). In single mode, mrFAST indexes a fasta file (which may contain one or more reference genomes) while in batch mode it indexes a set of fasta files.
By default mrFAST uses the window size of 12 characters to generate its index. Please be advised that if you do not choose the window size carefully, you will lose sensitivity.
How to choose the right window size: For a given read length (l) and error threshold (e), the window size is floor(l/(e+1)). For example if the reads length is 36 and the maximum number of mismatches allowed is 2, the window size is 12. if your calculated window size is greater than default, you can use the default window size without losing the sensitivity. For example, for the read length of 64 and error threshold of 2, the windows size should be 21. You can use the default window size 12. However you cannot use 12 as window size for read length of 30 and error threshold of 2.
Single Genome Mode:
To index a reference genome like "refgen.fasta" run the following command:
$>./mrfast --index refgen.fasta
Upon the completion of the indexing phase, you can find "refgen.fasta.index" in the same directory as "refgen.fasta". mrFAST uses a window size of 12 (default) to make the index of the genome, this windows size can be modified with "--ws". There is a restriction on the maximum of the window size as the window size directly affects the memory usage.
$>./mrfast --index refgen.fasta --ws 13
Batch Mode
In batch mode, mrFAST gets a list of reference files and generates the index for each one of them. Similar to single mode, you can specify a different window size for indexing.
$>./mrfast -b --index fasts.list --ws 13
Mapping
mrFAST can map single-end reads and paired-end reads to a reference genome. mrFAST can map in either single or batch mode. In single mode, it only maps to one index. In batch mode, it maps to a list of indices. mrFAST supports both fasta and fastq formats.
Single-end Reads - Single Mode
To map single reads to a reference genome in single mode, run the following command. Use "--seq" to specify the input file. refgen.fa and refgen.fa.index should be in the same folder. You can load a multi-sequence FASTA file as the reference genome.
$>./mrfast --search refgen.fa --seq reads.fastq
The reported locations will be saved into "output" by default. If you want to save it somewhere else, use "-o" to specify another file. mrFAST can report the unmapped reads in fasta/fastq format.
$>./mrfast --search refgen.fasta --seq reads.fastq -o my.map
By default, mrFAST reports all the locations per read. If you need one "best" mapping add the "--best" parameter to the command line:
$>./mrfast --search refgen.fasta --seq reads.fastq -e 3 --best
Single-end Reads - Batch Mode (Note: deprecated after version 2.1.0.6)
In batch mode, mrFAST uses a list of indices to find the mappings of the reads. "index.list" should contain the list of fasta files.
$>./mrfast -b --search index.list --seq reads.fastq
Paired-end Reads
To map paired-end reads, use "--pe" option. The mapping can be done in single/batch mode. If the reads are in two different files, you have to use "--seq1/--seq2" to indicate the files. If the reads are interleaved, use "--seq" to indicated the file. The distance allowed between the paired-end reads should be specified with "--min" and "--max". "--min" and "--max" specify the minmum and maximum of the inferred size (the distance between outer edges of the mapping mates).
$>./mrfast --search refgen.fasta --pe --seq reads.fastq --min 150 --max 250
Discordant Mapping
mrFAST can report the discordant mapping for use of Variation Hunter. The --min and --max optiopns will define the minimum and maximum inferred size for concordant mapping. This is enabled by default since version 2.1.0.6
$>./mrfast --search refgen.fasta --pe --discordant-vh --seq reads.fastq --min 50 --max 75
Parameters
General Options:
-v|--version Shows the current version.
-h Shows the help screen.
Indexing Options:
--index [file]
Generate an index from the specified fasta file.
-b
Indicates the indexing will be done in batch mode. The file specified in --search should contain the list of fasta files.
(Note: deprecated after version 2.1.0.6)
-ws [int] Set window size for indexing (default:12 max:14).
Searching Options:
--search [file]
Search the specified genome. Index file should be in same directory as the fasta file.
-b Indicates the mapping will be done in batch mode. The file specified in --search should contain the list of fasta files.
(Note: deprecated after version 2.1.0.6)
--pe
Search will be done in paired-end mode
--mp
Search will be done in matepair mode
--seq [file] Input sequences in fasta/fastq format [file]. If pairend reads are interleaved, use this option.
--seq1 [file] Input sequences in fasta/fastq format [file] (First file). Use this option to indicate the first file of paired-end reads
--seq2 [file] Input sequences in fasta/fastq format [file] (Second file). Use this option to indicate the second file of paired-end reads.
-o [file] Output of the mapped sequences (SAM format). The default is "output".
-u [file]
FASTA/FASTQ file for the unmapped sequences. The default is "unmapped".
-e [int] Maximum allowed edit distance (default 4% of the read length). Note that although the current version is limited with up to 4+4 indels, it supports any number of substitution errors.
--min [int] Min inferred distance allowed between two pairend sequences.
--max [int] Max inferred distance allowed between two pairend sequences.
--discordant-vh To return all discordant map locations ready for the Variation Hunter program, and OEA map locations ready for the NovelSeq.
--best Return "best" location only (single-end mode).
--seqcomp Indicates that the input sequences are compressed (gz).
--outcomp Indicates that output file should be compressed (gz).
--maxoea [int]
Max number of One End Anchored (OEA) returned for each read pair. Minimum of 100 is recommendded for NovelSeq use.
--maxdis [int]
Max number of discordant map locations returned for each read pair.
--crop [int] Crop the input reads at position [int].
--sample [string]
Sample name to be added to the SAM header (optional).
--rg [string]
Read group ID to be added to the SAM header (optional).
--lib [string]
Library name to be added to the SAM header (optional).
Output Files
Single-End Mode: In the single-end mode mrFAST will generate two files as specified by the "-o" and "-u" parameters. Default filenename if the "-o" parameter is not specified is "output"; and default filename for the "-u" parameter is "unmapped".
output file ("-o"): Contains the map locations of the sequences in the specified genome in SAM format. mrFAST returns all possible map locations within the given edit distance ("-e") by default. If the "--best" parameter is invoked, then it will select one "best" location that has the minimum edit distance to the genome.
unmappped file ("-u"): Contains the unmapped reads in FASTQ or FASTA format, depending on the format of the input sequences.
Paired-End and Matepair Modes: In paired-end and matepair modes, mrFAST will generate a SAM file in the paired-end mode that will store best mapping locations while utilizing the paired-end span information. In addition, it will generate a DIVET file and and OEA file (SAM format). See below:
output file ("-o"): Contains the map locations of the sequences in the specified genome in SAM format. This file will include:
If a read pair can be mapped concordantly, the "best" (minimum total edit distance and minimum differential from the average span) map location for the pair.
If the read pair can not be mapped concordantly, again, the "best" (minimum total edit distance and minimum differential from the average span) map location for the pair.
unmapped file ("-u"): Contains the orphan (both ends unmapped) reads in FASTQ or FASTA format, depending on the format of the input sequences.
output.DIVET.vh file ("-o" option changes the prefix "output"): This file includes all possible map locations for the read pairs that cannot be concordantly mapped. This file can be loaded by VariationHunter tool for structural variation discovery.
output_OEA file: Contains the OEA (One-End-Anchored) reads (paired-end reads where only one read can be mapped to the genome). The output is in SAM format, contains the map location of read that can be mapped to the genome. The unmapped reads of an OEA read pair are not reported in separate lines; instead the sequence and quality information is given in the line that specifies the map location of the mapped read. We use optional fields NS and NQ to specify the unmapped sequence and unmapped quality information. This file can be loaded by NovelSeq tool for novel sequence discovery, however format conversion might be required; please see the NovelSeq documentation.
NOTE: mrFAST will report many (up to 100 by default) possible map locations for the "mapped" read of OEA matepais. This will generate a large file due to repeats and duplications. This file can be limited through the --maxoea parameter (version 2.1.0.0 and above).
Output Format
mrFAST mapping output format is in SAM format. For detail about the definition of the fields please refer to SAM Manual. We have not implemented "MQUAL" field yet. All locations of discordant paired-end reads will be reported in DIVET format as required by the VariationHunter package. Unmapped reads (or, "orphan" read pairs in the PE mode) will be outputted in FASTQ or FASTA format, depending on the input sequence file format.
Run Unix # mrfast [options] | Run Web # |
mrsFAST is a cache oblivious mapper that is designed to map short reads to reference genome. mrsFAST maps short reads with respect to user defined error threshold. In this manual, we will show how to choose the parameters and tune mrsFAST with respect to the library settings. mrsFAST is designed to find 'all' the mappings for a given set of reads.
Remarque Run Unix # mrsFAST -h | Run Web # |
MuGeN (Multi-Genome Navigator) est un outil interactif permettant une exploration dans plusieurs géomes annotés complets par des résultats d'analyse in silico. Il dispose également d'un mode d'exécution en mode batch lui permettant de servir de générateur d'images à divers formats. Ce mode de fonctionnement le prédispose à être intégré à des sites Web pour l'affichage de cartes physiques annotées. MuGeN is a software package for the visual exploration of multiple annotated genome portions. It is capable of simultaneously displaying genome portions loaded from various sources both local and remote and mix these with analysis result plots. It can also be used to generate images of these displays in a wide range of formats (PNG, PostScript, IMAP, XFig).
Remarque La commande : mugenv est suffisante pour lancer l'environnement graphique, mais elle ne charge aucun génome et les fenêtres paraîtront donc un peu vides. Plus fréquemment, on fera : mugenv /chemin/vers/un/fichier/genbank.gbk pour explorer le fichier en question. Les numéros de version de MuGeN correspondent à leur date de sortie, et sont affichées dans la barre titre de sa fenêtre graphique. La dernière en date est la 20040726 qui est celle installée sur topaze et adm. Run Unix # mugenv ou mugenv /chemin/vers/un/fichier/genbank.gbk | Run Web # |
Version | MAJ | Mugsy | | |
1.2.3 | 2013-07-19 | Download | Doc |
Mugsy is a multiple whole genome aligner. Mugsy uses Nucmer for pairwise alignment, a custom graph based segmentation procedure for identifying collinear regions, and the segment-based progressive multiple alignment strategy from Seqan::TCoffee. Mugsy accepts draft genomes in the form of multi-FASTA files and does not require a reference genome.
Remarque To cite Mugsy, use:
Angiuoli SV and Salzberg SL. Mugsy: Fast multiple alignment of closely related whole genomes. Bioinformatics 2011 27(3):334-4Run Unix # mugsy [-p output prefix] multifasta_genome1.fsa multifasta_genome2.fsa ... multifasta_genomeN.fsa | Run Web # |
Version | MAJ | multalin | | |
5.4.1 | 2002-04-04 | Download | Doc |
This software will allow you to align simultaneously several biological sequences.
Remarque
summarize analysis results for multiple tools and samples in a
single report
Remarque Run Unix # multiqc_env | Run Web # |
MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. For example, MUMmer 3.0 can find all 20-basepair or longer exact matches between a pair of 5-megabase genomes in 13.7 seconds, using 78 MB of memory, on a 2.4 GHz Linux desktop computer. MUMmer can also align incomplete genomes; it can easily handle the 100s or 1000s of contigs from a shotgun sequencing project, and will align them to another set of contigs or a genome using the NUCmer program included with the system. If the species are too divergent for a DNA sequence alignment to detect similarity, then the PROmer program can generate alignments based upon the six-frame translations of both input sequences.
Remarque Run Unix # mummer [options] | Run Web # |
MUSCLE stands for MUltiple Sequence Comparison by Log-Expectation.
Remarque Run Unix # muscle -in -out | Run Web # |
MView is a tool for converting the results of a sequence database search (BLAST, FASTA, etc.) into the form of a coloured multiple alignment of hits stacked against the query. Alternatively, an existing multiple alignment (MSF, PIR, CLUSTALW, etc.) can be processed. In either case, the output is simply HTML, so the result is platform independent and does not require a separate application or applet to be loaded. MView is NOT a multiple alignment program, nor is it a general purpose alignment editor.
Remarque Run Unix # mview [options] [file...] | Run Web # |
Remarque Run Unix # naccess | Run Web # |
Remarque Run Unix # ncoils | Run Web # |
Nesoni focusses on analysing the alignment of reads to a reference genome. We use the SHRiMP read aligner, as it is able to detect small insertions and deletions in addition to SNPs. Nesoni can call a consensus of read alignments, taking care to indicate ambiguity. This can then be used in various ways: to determine the protein level changes resulting from SNPs and indels, to find differences between multiple strains, or to produce n-way comparison data suitable for phylogenetic analysis in SplitsTree4. Alternatively, the raw counts of bases at each position in the reference seen in two different sequenced strains can compared using Fisher's Exact Test.
Remarque Run Unix # nesoni | Run Web # |
NetLogo is a programmable modeling environment for simulating natural and social phenomena. It was authored by Uri Wilensky in 1999 and has been in continuous development ever since at the Center for Connected Learning and Computer-Based Modeling.
Remarque Run Unix # netlogo | Run Web # |
Version | MAJ | newbler | | |
2.6 | 2011-07-06 | Download | Doc |
Newbler is a package of three data analysis applications made by Roche 454 : the GS De Novo Assembler (with or without contig scaffolding using Paired End reads), the GS Reference Mapper, and the GS Amplicon Variant Analyzer (AVA). An additional application, the GS Run Browser, is an interactive Run browser/ troubleshooting tool which displays graphically the images, some intermediate data, and various output metrics from a sequencing Run. The software package also includes the SFF Tools commands for handling and using the data files (called Standard Flowgram Format or SFF files) that hold the sequencing trace data.
Remarque Run Unix # newbler | Run Web # |
Version | MAJ | newicktopdf | | |
- | 2010-08-11 | Download | Doc |
Convertit un fichier contenant les caractéristiques d'un arbre au format newick en un fichier pdf (programme du groupe Manolo Gouy à Lyon).
Remarque Run Unix # newicktopdf (produit le même fichier suffixé pdf) | Run Web # |
Tools developed in MIG laboratory to help in the process of Next generation Sequencing Data analysis : quality control, mapping, assembly, global statistics, etc. ///////// adaptiveTrim.pl ///////// alignmentStatistics.pl ///////// contigsExtractionOnLength.pl ///////// fastqQualityConverter.pl ///////// gbk2Fasta.pl ///////// globalTrim.pl ///////// multiFasta2Fasta.pl ///////// show2Fasta.pl ///////// unmappedReadsExtraction.pl ///////// (Cf. Doc)
Remarque Run Unix # ex.: contigsExtractionOnLength.pl -i fichier.fasta -do /Dir1/Dir11/Dir111/ -po fichierFiltre -l 1500 -r | Run Web # |
NJplot is a tree drawing program able to draw any binary tree expressed in the standard phylogenetic tree format (e.g., the format used by the PHYLIP package). NJplot is especially convenient for rooting the unrooted trees obtained from parsimony, distance or maximum likelihood tree-building methods.
Remarque Run Unix # njplot | Run Web # |
Novoalign is an aligner for single-ended and paired-end reads from the Illumina Genome Analyser. Novoalign finds global optimum alignments using full Needleman-Wunsch algorithm with affine gap penalties.
Remarque Run Unix # novoalign [options] | Run Web # |
novoSNP is a program that will help you find variations (SNPs and short INDELs) in resequencing projects. It takes a reference sequence and a number of sequencing trace files as input, and generates a list of possible variations with a quality score. novoSNP allows you to easily filter, sort and check the variations found visually and keep track of your verifications.
Remarque Run Unix # novosnp2.0.1 | Run Web # |
Version | MAJ | nupack | | |
3.0 | 2010-12-01 | Download | Doc |
NUPACK is a growing software suite for the analysis and design of nucleic acid systems. The package currently enables thermodynamic analysis of dilute solutions of interacting nucleic acid strands, and sequence design for complexes of nucleic acid strands intended to adopt a target secondary structure at equilibrium. NUPACK algorithms are formulated in terms of nucleic acid secondary structure. In most cases, pseudo-knots are excluded from the structural ensemble. Much of this software may be conveniently run through the NUPACK web server at http://www.nupack.org (Zadeh et al., 2010b).
Remarque
De novo transcriptome assembler for very short reads
Remarque Run Unix # oases | Run Web # |
Version | MAJ | OBO-Edit | | |
1.101 | 2008-12-05 | Download | Doc |
Obo-Edit est un éditeur d'ontologie dans le format obo. Le format obo a été défini originellement pour Gene Ontology et se répand dans la communauté bioinformatique. Quelques dizaines d'ontologies en format obo sont disponibles et éditables par Obo-Edit.
Remarque Run Unix # oboedit | Run Web # |
Version | MAJ | ocount | | |
0.4 | 2008-02-07 | Download | Doc |
OCOUNT is a fast C command-line utility that has been written in the course of TETRA's development. It counts oligonucleotides in DNA sequences and computes Markov-Model-based z-scores.
Remarque Run Unix # ocount | Run Web # |
Opera (Optimal Paired-End Read Assembler) is a sequence assembly program.
Remarque To cite Opera please use the following citation:
Song Gao, Wing-Kin Sung, Niranjan Nagarajan. Opera: reconstructing optimal genomic scaffolds with
high-throughput paired-end sequences. Journal of Computational Biology, Sept. 2011, doi:10.1089/cmb.2011.0170.
Run Unix # opera
OR
opera
| Run Web # |
OrthoMCl est un logiciel qui construit des clusters d'orthologue à partir de fichiers multifasta contenant des CDS.
Remarque Se placer dans le reépertoire où se trouvent les données et lancer : orthomcl.pl --mode 1 --fa_files "Ath.fa,Hsa.fa,Sce.fa"Run Unix # orthomcl.pl | Run Web # |
Otterlace is an interactive, graphical client, which uses a local acedb database with Zmap and perl/Tk tools to curate genomic annotation. Annotation is stored in an extended Ensembl schema (the "otter" database), which presents the annotator with contiguous regions of a chromosome. The acedb database provides local persistent storage, so that if the software or desktop machine crashes, reboots or is exited, the editing session can be recovered. Since all communication goes through the Sanger web server, annotators can work wherever there is a network connection.
Remarque Run Unix # otterlace | Run Web # |
PAML is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood. It is maintained and distributed for academic use free of charge by Ziheng Yang. ANSI C source codes are distributed for UNIX/Linux/Mac OSX, and executables are provided for MS Windows. PAML is not good for tree making. It may be used to estimate parameters and test hypotheses to study the evolutionary process, when you have reconstructed trees using other programs such as PAUP*, PHYLIP, MOLPHY, PhyML, RaxML, etc.
Remarque Run Unix # #baseml (basemlg codeml pamp evolver yn00 chi2) | Run Web # |
Version | MAJ | pandoc | | |
1.9.4.1 | 2016-02-23 | Download | Doc |
Pandoc is a free and open-source software document converter, widely used as
a writing tool and as a basis for publishing workflows.
Pandoc can convert documents in markdown, reStructuredText, textile, HTML,
DocBook, LaTeX, MediaWiki markup, TWiki markup, OPML, Emacs Org-Mode,
Txt2Tags, Microsoft Word docx, LibreOffice ODT, EPUB, or Haddock markup to
HTML formats: XHTML, HTML5, and HTML slide shows using Slidy, reveal.js,
Slideous, S5, or DZSlides.
Word processor formats: Microsoft Word docx, OpenOffice/LibreOffice ODT,
OpenDocument XML
Ebooks: EPUB version 2 or 3, FictionBook2
Documentation formats: DocBook, GNU TexInfo, Groff man pages, Haddock
markup
Page layout formats: InDesign ICML
Outline formats: OPML
TeX formats: LaTeX, ConTeXt, LaTeX Beamer slides
PDF via LaTeX
Lightweight markup formats: Markdown (including CommonMark),
reStructuredText, AsciiDoc, MediaWiki markup, DokuWiki markup, Emacs
Org-Mode, Textile
Custom formats: custom writers can be written in lua.
Remarque Run Unix # pandoc [OPTIONS] [FILES]
| Run Web # |
PatScan is a pattern matcher which searches protein or nucleotide (DNA, RNA, tRNA etc.) sequence archives for instances of a pattern which you input.
Remarque patscan pat_file < input_file Run Unix # patscan | Run Web # |
Version | MAJ | pfam_scan.pl | | |
| 2012-11-29 | Download | Doc |
pfam_scan.pl - search protein fasta sequences against the Pfam
library of HMMs.
Remarque Run Unix # pfam_scan.pl -fasta -dir /usr/local/genome/PfamScan/databases
| Run Web # |
Le paquetage pftools est une collection de programmes expérimentaux qui permet de manipuler le format généralisé de profils et implémente les méthodes de recherche de PROSITE. Les commandes accessibles sont les suivantes : gtop, pfsearch, pfscan, psa2msa, pfmake, pfw, ptoh, htop, pfscale, pftof.
Remarque
PHAST is a freely available software package for comparative and evolutionary genomics. It consists of about half a dozen major programs, plus more than a dozen utilities for manipulating sequence alignments, phylogenetic trees, and genomic annotations (see left panel). For the most part, PHAST focuses on two kinds of applications: the identification of novel functional elements, including protein-coding exons and evolutionarily conserved sequences; and statistical phylogenetic modeling, including estimation of model parameters, detection of signatures of selection, and reconstruction of ancestral sequences. It consists of over 60,000 lines of C code.
Remarque Run Unix # phast | Run Web # |
Version | MAJ | phd2fasta | | |
0.990622.f | 2005-07-22 | Download | Doc |
Phd2fasta reads phd files and writes sequence and quality value FASTA files, which phrap and cross_match need as input. Phred and consed write sequence and quality value information in 'phd' output files. A phd file contains information in a header, the called bases, the base quality values, and the base call trace locations.
Remarque Run Unix # phd2fasta -id ../phd_dir -os fasta_seq -oq fasta_seq.qual | Run Web # |
PHENIX is a software suite for the automated determination of macromolecular structures using X-ray crystallography and other methods.
Remarque Citing PHENIX:
PHENIX: a comprehensive Python-based system for macromolecular structure solution. P. D. Adams, P. V. Afonine, G. Bunkóczi, V. B. Chen, I. W. Davis, N. Echols, J. J. Headd, L.-W. Hung, G. J. Kapral, R. W. Grosse-Kunstleve, A. J. McCoy, N. W. Moriarty, R. Oeffner, R. J. Read, D. C. Richardson, J. S. Richardson, T. C. Terwilliger and P. H. Zwart. Acta Cryst. D66, 213-221 (2010).Run Unix # phenix | Run Web # |
A combined transmembrane topology and signal peptide prediction method.
Remarque http://www.ncbi.nlm.nih.gov/pubmed/15111065?dopt=AbstractRun Unix # phobius.pl [options] [infile] | Run Web # |
Version | MAJ | phrap | | |
1.090518 | 2010-01-18 | Download | Doc |
phrap is a program for assembling shotgun DNA sequence data. Among other features, it allows use of the entire read and not just the trimmed high quality part, it uses a combination of user-supplied and internally computed data quality information to improve assembly accuracy in the presence of repeats, it constructs the contig sequence as a mosaic of the highest quality read segments rather than a consensus, it provides extensive assembly information to assist in trouble-shooting assembly problems, and it handles large datasets.
Remarque Marche avec cross_match et swat, loco et cluster La version manyreads permet de lire plus de trace. Run Unix # phrap | Run Web # |
Version | MAJ | phrapview | | |
| | Download | Doc |
visualisateur des resultats d'assemblage issus de phraps
Remarque
Phred reads DNA sequencer trace data, calls bases, assigns quality values to the bases, and writes the base calls and quality values to output files. Phred can read trace data from chromatogram files in the SCF, ABI, and ESD formats. It automatically determines the file format, and whether the chromatogram file was compressed using gzip, bzip2, or UNIX compress. After calling bases, phred writes the sequences to files in either FASTA format, the format suitable for XBAP, PHD format, or the SCF format. Quality values for the bases are written to FASTA format files or PHD files, which can be used by the phrap sequence assembly program in order to increase the accuracy of the assembled sequence. phred, phrap, consed are Unix programs that work as a group for analysis of new DNA sequences. They do the following: phred: Base calling and quality assignments phrap: Contig formation and new quality assignments consed: Visual X-Windows graphic interface, to view and edit alignments and contigs, and to view the original traces
Remarque Run Unix # phred ou phredPhrap | Run Web # |
Version | MAJ | phredPhrap | | |
030415 | | Download | Doc |
It runs phred on all *new* reads (reads for which there is no phd file. It runs determineReadTypes.perl so consed, autofinish, and phrap will understand your read naming convention Then it runs crossmatch to screen them for vector. Then it runs phd2fasta to create 2 fasta files (one containing read bases and one containing read quality. These are of the highest versions of each read (in case any editing has been done). It runs phrap It runs transferConsensusTags to transfer any consensus tags from the newest old ace file to the one phrap created in step 4 It runs tagRepeats.perl to tag any common repeats (such as ALU) that you want to have automatically tagged for the benefit of consed users. See README.txt "INSTALLING CONSED Typically, you just type: phredPhrap Within the project, there are 3 directories: chromat_dir (with the chromats), phd_dir (with the phd files) and edit_dir (with the ace files and other files). You type "phredPhrap" from within edit_dir.
Remarque Frontal pour la suite phred phrapRun Unix # phrepPhrap | Run Web # |
Phusion Assembler --- Phusion is a software package for assembling genome sequences from whole genome shotgun(WGS) reads. The Phusion assembler takes WGS reads, mostly paired with known insert sizes, as input along with quality score assigned for each base and produces a set of supercontigs (scaffords) .
Remarque
PHYLIP is a free package of programs for inferring phylogenies.
Remarque
PhyloBayes is a Bayesian Monte Carlo Markov Chain (MCMC) sampler for phylogenetic reconstruction and molecular dating using protein and nucleic acid alignments. Compared to other phylogenetic MCMC samplers, the main distinguishing feature of PhyloBayes is the use of nonparametric methods for modelling site-specific features of sequence evolution.
Remarque
Phylo_win (programme du groupe Manolo Gouy à Lyon) Il offre une interface graphique pour la phylogénie.
Remarque Run Unix # phylo_win | Run Web # |
PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm to perform Nearest Neighbor Interchanges (NNIs), in order to improve a reasonable starting tree topology. Since the original publication (Guindon and Gascuel 2003), PhyML has been widely used (>1,250 citations in ISI Web of Science), due to its simplicity and a fair accuracy/speed compromise. In the mean time research around PhyML has continued. We designed an efficient algorithm to search the tree space using Subtree Pruning and Regrafting (SPR) topological moves (Hordijk and Gascuel 2005), and proposed a fast branch test based on an approximate likelihood ratio test (Anisimova and Gascuel 2006). However, these novelties were not included in the official version of PhyML, and we found that improvements were still needed in order to make them effective in some practical cases. PhyML 3.0 achieves this task. It implements new algorithms to search the space of tree topologies with user-defined intensity. A non-parametric, Shimodaira-Hasegawa-like branch test is also available. The program provides a number of new evolutionary models and its interface was entirely re-designed. We tested PhyML 3.0 on a large collection of real data sets to ensure that the new version is stable, ready-to-use and still reasonably fast and accurate.
Remarque Run Unix # phyml | Run Web # |
A set of Java command line tools for manipulating high-throughput sequencing data (HTS) data and formats.
Picard is implemented using the HTSJDK Java library HTSJDK, supporting
accessing of common file formats, such as SAM and VCF, used for
high-throughput sequencing data.
Remarque Run Unix # picard -h ou encore
PicardCommandLine [-h]
| Run Web # |
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution from next-gen sequence data. It uses a pattern growth approach to identify the breakpoints of these variants from paired-end short reads.
Remarque Cite Pindel:
Ye K, Schulz MH, Long Q, Apweiler R, Ning Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009 Nov 1;25(21):2865-71. Epub 2009 Jun 26.Run Unix # Usage: pindel -f -p
[and/or -i bam_configuration_file]
-c -o
| Run Web # |
PIPITS is an automated pipeline for analyses of fungal internal transcribed
spacer (ITS) sequences from the Illumina sequencing platform.
PIPITS is designed to work best on Bio-Linux
(http://environmentalomics.org/bio-linux/) and Ubuntu. Unfortunately, it's
NOT supported on Windows or a Mac
If you are using Bio-Linux, most of the dependencies are already on
Bio-Linux. Otherwise, you will have to set up the dependencies yourself. If
you are using Ubuntu, then instructions on how to set up dependencies are
described below (1.8).
Remarque Citation
Hyun S. Gweon, Anna Oliver, Joanne Taylor, Tim Booth, Melanie Gibbs, Daniel S. Read, Robert I. Griffiths and Karsten Schonrogge, PIPITS: an automated pipeline for analyses of fungal internal transcribed spacer sequences from the Illumina sequencing platform, Methods in Ecology and Evolution, DOI: 10.1111/2041-210X.12399Run Unix # pipits_env | Run Web # |
PLAST : Parallel Local Alignment Search Tool
Remarque Run Unix # plastall | Run Web # |
Platanus is a de novo assembler designed to assemble high-throughput data.
It can handle highly heterozygotic samples. The following is the assembly
outline. First, it constructs contigs using the algorithm based on de Bruijn graph. Second, the order of contigs is determined according to paired-end (mate-pair) data and constructs scaffolds. Finally, paired-end reads localized on gaps in scaffolds are assembled and gaps are closed.
Remarque To reference the Platanus assembler, please cite :
Kajitani R. et al. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Research 24:1384-95.Run Unix # Usage: platanus Command [options]
Command: assemble, scaffold, gap_close
| Run Web # |
PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
Remarque For documentation, citation & bug-report instructions: http://pngu.mgh.harvard.edu/purcell/plink/ Run Unix # plink | Run Web # |
Version | MAJ | polya_svm | | |
2.2
| 2013-01-29 | Download | Doc |
This program takes a file containing DNA/RNA sequences in the FASTA format as input, and 1) makes prediction for putative mRNA polyadenylation sites [or poly(A) sites] and/or 2) generates results indicating the occurrences of different cis-element
Remarque Run Unix # polya_svm.pl | Run Web # |
PolyPhred is a program that compares fluorescence-based sequences across traces obtained from different individuals to identify heterozygous sites for single nucleotide substitutions.
Remarque Run Unix # polyphred | Run Web # |
poretools: a toolkit for working with nanopore sequencing data from Oxford Nanopore.
The MinION (TM) from Oxford Nanopore Technologies (ONT) is the first nanopore sequencer to be commercialised and is now available to early-access users. The MinION (TM) is a USB-connected, portable nanopore sequencer which permits real-time analysis of streaming event data. Currently, the research community lacks a standardized toolkit for the analysis of nanopore datasets.
Remarque Run Unix # poretools_env | Run Web # |
Primer3 is a complete rewrite of the original PRIMER programs(Primer 0.5), written by Steve Lincoln, Mark Daly, and EricsLander. See DIFFERENCES FROM EARLIER VERSIONS for a discussionsof how Primer3 differs from its predecessors, Primer 0.5 andsPrimer v2.ssPrimer3 picks primers for PCR reactions, considering as criteria:sso oligonucleotide melting temperature, size, GC content,s and primer-dimer possibilities,sso PCR product size,sso positional constraints within the source sequence, andsso miscellaneous other constraints.s
Remarque s
Version | MAJ | prinseq | | |
0.17.1 | 2013-08-20 | Download | Doc |
PRINSEQ CAN BE USED TO FILTER, REFORMAT, OR TRIM YOUR GENOMIC AND METAGENOMIC SEQUENCE DATA. IT GENERATES SUMMARY STATISTICS OF YOUR $
GRAPHICAL AND TABULAR FORMAT.
Remarque Run Unix # prinseq-lite.pl -h | Run Web # |
PROBCONS is a novel tool for generating multiple alignments of protein sequences. Using a combination of probabilistic modeling and consistency-based alignment techniques, PROBCONS has achieved the highest accuracies of all alignment methods to date. On the BAliBASE benchmark alignment database, alignments produced by PROBCONS show statistically significant improvement over current programs, containing an average of 7% more correctly aligned columns than those of T-Coffee, 11% more correctly aligned columns than those of CLUSTAL W, and 14% more correctly aligned columns than those of DIALIGN.
Remarque Publications using the PROBCONS tool should cite:Do, C.B., Mahabhashyam, M.S.P., Brudno, M., and Batzoglou, S. 2005. PROBCONS: Probabilistic Consistency-based Multiple Sequence Alignment. Genome Research 15:330-340.Run Unix # probcons [OPTION]... [MFAFILE]... | Run Web # |
Version | MAJ | ProbeMatch | | |
- | 2010-05-11 | Download | Doc |
ProbeMatch is a sequence alignment program that finds sequence alignments for short DNA sequences ( 36-50 bp ). Unlike other programs such as eland and soap that perform ungapped alignment allowing up to 2 substitution, Probematch performs *gapped* alignment, allowing up to 3 errors including substitution, insertion, and deletion.
Remarque Run Unix # probematch [options] ou # probematch --help | Run Web # |
Version | MAJ | procheck | | |
3.5.4 | 2007-05-21 | Download | Doc |
Remarque
Version | MAJ | prodigal
| | |
2.60 | 2013-03-26 | Download | Doc |
Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program developed at Oak Ridge National Laboratory and the University of Tennessee. Key features of Prodigal include:
Speed: Prodigal is an extremely fast gene recognition tool (written in very vanilla C). It can analyze an entire microbial genome in 30 seconds or less.
Accuracy: Prodigal is a highly accurate gene finder. It correctly locates the 3' end of every gene in the experimentally verified Ecogene data set (except those containing introns). It possesses a very sophisticated ribosomal binding site scoring system that enables it to locate the translation initiation site with great accuracy (96% of the 5' ends in the Ecogene data set are located correctly).
Specificity: Prodigal's false positive rate compares favorably with other gene identification programs, and usually falls under 5%.
GC-Content Indifferent: Prodigal performs well even in high GC genomes, with over a 90% perfect match (5'+3') to the Pseudomonas aeruginosa curated annotations.
Metagenomic Version: Prodigal can run in metagenomic mode and analyze sequences even when the organism is unknown.
Ease of Use: Prodigal can be run in one step on a single genomic sequence or on a draft genome containing many sequences. It does not need to be supplied with any knowledge of the organism, as it learns all the properties it needs to on its own.
Remarque Prodigal Reference: Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010 Mar 8;11(1):119. (Highly Accessed) Run Unix # prodigal -h
| Run Web # |
Prokka is a software tool for the rapid annotation of prokaryotic genomes. A typical 4 Mbp genome can be fully annotated in less than 10 minutes on a quad-core computer, and scales well to 32 core SMP systems. It produces GFF3, GBK and SQN files that are ready for editing in Sequin and ultimately submitted to Genbank/DDJB/ENA.
Remarque Run Unix # prokka [options]
| Run Web # |
Version | MAJ | PROSE | | |
| | Download | Doc |
The relational database PROSE contains protein sequences from Swissprot and Trembl
Remarque
PROTTEST3 is a high-performance computing program for selecting the model of protein evolution that best fits a given set of aligned sequences. This java program uses the Phyml program (for maximum likelihood calculations and optimization of parameters) and the PAL library for handling trees, and the ALTER library for reading aligment formats. Empirical models included are as WAG, LG, mtREV, Dayhoff, DCMut, JTT, VT, Blosum62, CpREV, RtREV, MtMam, MtArt, HIVb/HIVw and FLU, plus +I:invariable sites, +G: rate heterogeneity among sites and +F: observed amino acid frequencies. ProtTest uses the Akaike Information Criterion (AIC) and other statistics (AICc, BIC and DT) to find which of the candidate models best fits the data at hand. It also implements the calculation of model-averged phylogenies.
Remarque Citation:
Darriba D, Taboada GL, Doallo R, Posada D. In press. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics. Run Unix # runProtTestHPC ou runXProtTestHPC | Run Web # |
Version | MAJ | psicov | | |
1.05 | 2012-07-26 | Download | Doc |
Accurate Contact Prediction from large protein alignments
Remarque Run Unix # psicov [options] alnfile | Run Web # |
PSIPRED is a simple and reliable secondary structure prediction method, incorporating two feed-forward neural networks which perform an analysis on output obtained from PSI-BLAST (Position Specific Iterated - BLAST).Version 2.0 of PSIPRED includes a new algorithm which averages the output from up to 4 separate neural networks in the prediction process to further increase prediction accuracy.
Remarque Utilsie les sorties de psiblastRun Unix # psipred | Run Web # |
Version | MAJ | PSMC | | |
0.6.5 | 2016-04-07 | Download | Doc |
This software package infers population size history from a diploid sequence
using the Pairwise Sequentially Markovian Coalescent (PSMC) model.
Remarque Run Unix # psmc [options] input.txt | Run Web # |
PSORT is a computer program for the prediction of protein localization sites in cells. It receives the information of an amino acid sequence and its source orgin, e.g., Gram-negative bacteria, as inputs. Then, it analyzes the input sequence by applying the stored rules for various sequence features of known protein sorting signals. Finally, it reports the possiblity for the input protein to be localized at each candidate site with additional information.
Remarque Run Unix # psort | Run Web # |
Version | MAJ | pymol | | |
0.99 | 2009-04-21 | Download | Doc |
Pymol est un logiciel de visualisation moléculaire associé à un interpréteur Python qui permet la visualisation en temps réel ainsi que la génération rapide et de qualité dŽanimations et dŽimages dŽassemblages moléculaires.
Remarque Run Unix # pymol | Run Web # |
Version | MAJ | pynast | | |
0.1 | 2012-07-23 | Download | Doc |
PyNAST: Python Nearest Alignment Space Termination tool
PyNAST is a reimplementation of the NAST sequence aligner, which has become a popular tool for adding new 16s rDNA sequences to existing 16s rDNA alignments. This reimplementation is more flexible, faster, and easier to install and maintain than the original NAST implementation. PyNAST is built using the PyCogent Bioinformatics Toolkit.
The first versions of PyNAST (through PyNAST 1.0) were written to exactly match the results of the original NAST algorithm. Beginning with the post PyNAST 1.0 development code, PyNAST no longer exactly matches the NAST output but is instead focused on getting better alignments. Users who wish to exactly match the results of NAST should download PyNAST 1.0.
Remarque PyNAST: a flexible tool for aligning sequences to a template alignment. J. Gregory Caporaso, Kyle Bittinger, Frederic D. Bushman, Todd Z. DeSantis, Gary L. Andersen, and Rob Knight. January 15, 2010, DOI 10.1093/bioinformatics/btp636. Bioinformatics 26: 266-267.
Run Unix # pynast [options] {-i input_fp -t template_fp} ou pynast -h | Run Web # |
QIIME (pronounced "chime") stands for Quantitative Insights Into Microbial Ecology. QIIME is an open source software package for comparison and analysis of microbial communities, primarily based on high-throughput amplicon sequencing data (such as SSU rRNA) generated on a variety of platforms, but also supporting analysis of other types of data (such as shotgun metagenomic data). QIIME takes users from their raw sequencing output through initial analyses such as OTU picking, taxonomic assignment, and construction of phylogenetic trees from representative sequences of OTUs, and through downstream statistical analysis, visualization, and production of publication-quality graphics. QIIME has been applied to single studies based on billions of sequences from thousands of samples.
Remarque Run Unix # qiime_env | Run Web # |
QPDF is a command-line program that does structural, content-preserving transformations on PDF files. It could have been called something like pdf-to-pdf. It also provides many useful capabilities to developers of PDF-producing software or for people who just want to look at the innards of a PDF file to learn more about how they work.
Remarque Run Unix # qpdf [options] infile outfile | Run Web # |
Quake is a package to correct substitution sequencing errors in experiments with deep coverage (e.g. >15X), specifically intended for Illumina sequencing reads. Quake adopts the k-mer error correction framework, first introduced by the EULER genome assembly package. Unlike EULER and similar progams, Quake utilizes a robust mixture model of erroneous and genuine k-mer distributions to determine where errors are located. Then Quake uses read quality values and learns the nucleotide to nucleotide error rates to determine what types of errors are most likely. This leads to more corrections and greater accuracy, especially with respect to avoiding mis-corrections, which create false sequence unsimilar to anything in the original genome sequence from which the read was taken.
Remarque Kelley DR, Schatz MC, Salzberg SL. Quake: quality-aware detection and correction of sequencing errors. Genome Biology 11:R116 2010.
(http://genomebiology.com/2010/11/11/R116/abstract)Run Unix # quake.py --help | Run Web # 0.3.5 |
quantiNEMO is an individual-based, genetically explicit stochastic
simulation program. It was developed to investigate the effects of
selection, mutation, recombination, and drift on quantitative traits with
varying architectures in structured populations connected by migration and
located in a heterogeneous habitat. quantiNEMO is highly flexible at various
levels: population, selection, trait(s) architecture, genetic map for QTL
and/or markers, environment, demography, mating system, etc.
Remarque Run Unix # quantiNemo | Run Web # |
QUality ASsesment Tool for Genome Assembly
QUAST evaluates a quality of genome assemblies by computing various metrics and providing nice reports.
Remarque Citation : Alexey Gurevich, Vladislav Saveliev, Nikolay Vyahhi and Glenn Tesler,
QUAST: quality assessment tool for genome assemblies,
Bioinformatics (2013) 29 (8): 1072-1075.
doi: 10.1093/bioinformatics/btt086
First published online: February 19, 2013 Run Unix # quast.py [options]
metaquast.py [options] | Run Web # |
Version | MAJ | QuickTree | | |
1.1 | 2006-02-21 | Download | Doc |
QuickTree is a program for the rapid reconstruction of phylogenies by the Neighbor-Joining method. For details, see the article published in the journal 'Bioinformatics' (18:1546-1547).
Remarque Run Unix # quicktree | Run Web # |
Version | MAJ | quip | | |
1.1.4 | 2013-02-21 | Download | Doc |
Quip compresses next-generation sequencing data with extreme prejudice. It supports input and output in the FASTQ and SAM/BAM formats, compressing large datasets to as little as 15% of their original size.
Remarque Compression of next-generation sequencing reads aided by highly efficient de novo assembly
Daniel C. Jones; Walter L. Ruzzo; Xinxia Peng; Michael G. Katze — Nucleic Acids Research 2012; doi: 10.1093/nar/gks754 Run Unix # quip | Run Web # |
R is a language and environment for statistical computing and graphics. In the context of the analysis of genomic data, R includes some statistical packages for clustering, linear model, anova, ...(downloaded from the CRAN). There is also others packages dedicated for the microarray analysis (downloaded from the CRAN). The last the R-project about bioanalysis is named bioconductor (http://www.bioconductor.org/) for the analysis and comprehension of genomic data. The packages anapuce and varmixt developped by the team Statistique et génome (OMIP department INA P-G & INRA - http://www.inapg.fr/ens_rech/mathinfo/recherche/mathematique/outil.html) for differential analysis are also available on the platform.
Remarque Pour avoir l'aide #help.start(browser="mozilla") ou tout autre navigateur non deja utilise (ouvert)
Version | MAJ | rainbow | | |
2.0 | 2012-09-10 | Download | Doc |
Rainbow package consists of several programs used for RAD-seq related
clustering and de novo assembly.
Remarque Run Unix # rainbow [options] | Run Web # |
Software for looking at macromolecular structure and its relation to function
Remarque Run Unix # rasmol | Run Web # |
RATT is software to transfer annotation from a reference (annotated) genome to an unannotated query genome.
Remarque Run Unix # start.ratt.sh | Run Web # |
RAxML (Randomized Axelerated Maximum Likelihood) is a program for sequential and parallel Maximum
Likelihood [1] based inference of large phylogenetic trees. It has originally been derived from fastDNAml
which in turn was derived from Joe Felsentein’s dnaml which is part of the PHYLIP [2] package.
Remarque If you use RAxML please always cite the following paper: Alexandros Stamatakis : “RAxML-VI-HPC:
Maximum Likelihood-based Phylogenetic Analyses with Thousands of Taxa and Mixed Models”, Bioinformatics
22(21):2688–2690, 2006 [4].Run Unix # raxmlHPC -h ou raxmlHPC-MPI -h | Run Web # |
Ray is a parallel de novo genome assembler that utilises the message-passing interface everywhere and is implemented using peer-to-peer communication.
Remarque Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies.
Sébastien Boisvert, François Laviolette, and Jacques Corbeil.
Journal of Computational Biology (Mary Ann Liebert, Inc. publishers).
November 2010, 17(11): 1519-1533.
doi:10.1089/cmb.2009.0238Run Unix # Ray -help
| Run Web # |
Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy.
Remarque Exemple : classifier testQuerySeq.fasta mon_result train052008/rRNAClassifier.properties ---Il faut préalablement avoir dans son home le repertoire train052008 http://downloads.sourceforge.net/rdp-classifier/RDPClassifier_train052008.tar.gz) (http://rdp.cme.msu.edu/tmp_download/train3.tar.gz) Run Unix # classifier | Run Web # |
Read and reformat biosequences
Remarque Run Unix # readseq2 [option] | Run Web # |
ReAS: Recovery of Ancestral Sequences for Transposable Elements from the Unassembled Reads of a Whole Genome Shotgun
Remarque http://www.ploscompbiol.org/article/info:doi%2F10.1371%2Fjournal.pcbi.0010043
Remarque Pour rechercher une espèce par exemple bos_taurus : /usr/local/genome/RepeatMasker/util/queryTaxonomyDatabase.pl -species "bos taurus"Run Unix # RepeatMasker | Run Web # |
RepeatScout is a tool to discover repetitive substrings in DNA.
Remarque If you use RepeatScout, please cite the following paper: Price A.L., Jones N.C. and Pevzner P.A. 2005. De novo identification of repeat families in large genomes. To appear in Proceedings of the 13 Annual International conference on Intelligent Systems for Molecular Biology (ISMB-05). Detroit, Michigan.
Run Unix # *1/ build_lmer_table -l -sequence -freq
| Run Web # |
Version | MAJ | reptile | | |
2.0 | 2012-05-03 | Download | Doc |
Reptile is a software developed in C++ for correcting sequencing errors in short reads from next-gen sequencing platforms.
Remarque Run Unix # reptile-omp | Run Web # |
R'HOM (Recherche de régions HOMogènes) est un programme pour la segmentation de séquences d'ADN en régions de composition homogènes par chaînes de Markov cachées. L'utilisateur choisi le nombre de type de composition différentes et la longueur des mots à prendre en compte. Les paramètres sont ensuite estimés par maximum de vraisemblance (algorithme EM) et la séquence est finalement segmentée avec l'algorithme forward backward. R'HOM a été initialement développé pendant la thèse de doctorat de Florence Muri et a été ensuite en grande partie ré-implémenté.
Remarque Run Unix # rhom.em | Run Web # |
Easy identification and removal of rRNA-like sequences.
The riboPicker tool can be used to automatically identify and efficiently remove rRNA-like sequences from metatranscriptomic and metagenomic datasets. It is easily configurable and provides a user-friendly interface.
Remarque Run Unix # ribopicker [options] -f -dbs ... | Run Web # |
Programme pour détecter des mots ou motifs ayant une fréquence statistiquement exceptionnelle dans une séquence biologique. (R'MES pour Recherche de Mots Exceptionnels dans les Séquences)
Remarque Voici ce qu'il y a de nouveau par rapport à la version 3.01 : Changements majeurs : - amélioration significative du temps de calcul dans le cas des approximations Gaussiennes, quelque soit l'ordre du modèle, - levée de la contrainte sur la taille des noms des familles de mots. Changements mineurs : - renommage des options de sélection de seuil dans l'outil de mise en forme des résultats (--minthresh et --maxthresh deviennent --tmin et --tmax), - modification de l'ordre de présentation pour les résultats de calcul de biais (triés selon le score, et non plus alphabétiquement). Pour toutes questions, contactez Sophie.Schbath@jouy.inra.frRun Unix # rmes [options] -s -o rmes --help | Run Web # |
Remarque Run Unix # rmesplot | Run Web # |
The SOLiD System Small RNA Analysis Pipeline Tool (RNA2MAP) can be used to perform whole genome analysis of color space RNA library reads. It consists of three major procedures: filtering, matching against miRBase sequences (Sanger), and matching against a reference genome.
Remarque
RNAmmer 1.2 predicts 5s/8s, 16s/18s, and 23s/28s ribosomal RNA in full genome sequences
Remarque Run Unix # rnammer [options] (man rnammer) | Run Web # |
RUM is an alignment, junction calling, and feature quantification pipeline specifically designed for Illumina RNA-Seq data.
Remarque Comparative Analysis of RNA-Seq Alignment Algorithms and the RNA-Seq Unified Mapper (RUM) Gregory R. Grant, Michael H. Farkas, Angel Pizarro, Nicholas Lahens, Jonathan Schug, Brian Brunk, Christian J. Stoeckert Jr, John B. Hogenesch and Eric A. Pierce. Run Unix # RUM_runner.pl | Run Web # |
Version | MAJ | samToFastq | | |
1.62(1113)
| 2012-02-17 | Download | Doc |
Extracts read sequences and qualities from the input SAM/BAM file and writes them into the output file in Sanger fastq format. In the RC mode (default is True), if the read is aligned and the alignment is to the reverse strand on the genome, the read's sequence from input SAM file will be reverse-complemented prior to writing it to fastq in order restore correctly the original read sequence as it was generated by the sequencer.
Remarque
SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
Remarque Run Unix # samtools [options] | Run Web # |
Version | MAJ | SAS | | |
9.2 | 2010-12-16 | Download | Doc |
SAS - Statistical Analysis System
Remarque
Scilab est un logiciel de calcul numérique scientifique qui fournit un puissant environnement de développement pour les applications scientifiques et l’ingénierie.
Remarque Run Unix # scilab | Run Web # |
Version | MAJ | scwrl3 | | |
3.0 | 2005-10-10 | Download | Doc |
SCWRL3.0 is a completely new version of the SCWRL program for prediction of protein side-chain conformations. SCWRL3.0 is based on a new algorithm based on graph theory that solves the combinatorial problem in side-chain prediction more rapidly than any other available program. SCWRL3.0 is more accurate than previous versions of SCWRL, while the new algorithm will allow for development of more sophisticated energy functions and for incorporation of side-chain flexibility around rotameric positions.
Remarque Run Unix # scwrl3 | Run Web # |
Version | MAJ | seaview | | |
4.2 | 20130-01-30 | Download | Doc |
SeaView is a graphical multiple sequence alignment editor. SeaView is able to read and write various alignment formats (NEXUS, MSF, CLUSTAL, FASTA, PHYLIP,MASE)
Remarque Run Unix # seaview | Run Web # |
Seq-Gen is a program that will simulate the evolution of nucleotide or amino acid sequences along a phylogeny, using common models of the substitution process. A range of models of molecular evolution are implemented including the general reversible model. State frequencies and other parameters of the model may be given and site-specific rate heterogeneity may also be incorporated in a number of ways. Any number of trees may be read in and the program will produce any number of data sets for each tree. Thus large sets of replicate simulations can be easily created. It has been designed to be a general purpose simulator that incorporates most of the commonly used (and computationally tractable) models of molecular sequence evolution.
Remarque Run Unix # seq-gen | Run Web # |
SeqMap is a tool for mapping large amount of oligonucleotide to the genome. It is designed for finding all the places in a genome where an oligonucleotide could potentially come from. SeqMap can efficiently map as many as dozens of millions of short sequences to a genome of several billions of nucleotides. While doing the mapping, several mutations as well as insertions/deletions of the nucleotide bases in the sequences can be tolerated and furthermore detected. Various input and output formats are supported, as well as many command line options for tuning almost every steps in the mapping process.
Remarque Publication: http://dx.doi.org/10.1093/bioinformatics/btn429 Run Unix # seqmap | Run Web # |
Seqtk is a fast and lightweight tool for processing sequences
in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ
files which can also be optionally compressed by gzip.
Remarque Run Unix # seqtk
| Run Web # |
A DNA Sequence Submission and Update Tool
Remarque
Version | MAJ | SHOW | | |
20111109 | 2011-11-11 | Download | Doc |
SHOW (Structured HOMgeneity Watcher) permet une utilisation souple des modeles de chaines de Markov cachees. L'utilisateur peut construire son propre modele dont les parametres peuvent ensuite etre estimes par maximum de vraisemblance avec l'algorithme EM. Le modele peut alors servir a faire des predictions avec l'algorithme forward-backward (posterior decoding) ou avec l'algorithme de Viterbi. Il peut aussi servir a simuler des sequences. SHOW implemente aussi un detecteur de genes bacteriens. L'utilisateur n'a alors pas a se soucier du modele ni des parametres. SHOW a deja servi a annoter des genomes complets publies.
Remarque Run Unix # show_viterbi # show2mugen.pl | Run Web # |
Version | MAJ | showVenn | | |
1.0 | 2010-02-26 | Download | Doc |
Cet outil permet de manipuler des listes d'identifiants sous la forme d'un diagramme de Venn. On peut ainsi trouver les éléments en commun ou originaux de 5 listes différentes. En cliquant sur les différents territoires du diagramme, l'utilisateur récupère les identifiants qui correspondent au sous ensemble sélectionné.
Remarque
Version | MAJ | sickle | | |
1.200 | 2013-02-26 | Download | Doc |
sickle - A windowed adaptive trimming tool for FASTQ files using quality
Remarque Run Unix # sickle [options] | Run Web # |
Version | MAJ | signalp | | |
4.0 | 2014-05-07 | Download | Doc |
Détection de séquence signal et de site de clivage sur les séquences protéiques de bactéries Gram+, Gram- et d'eucaryotes.
Remarque The SIGNALP package is a property of Center for Biological Sequence Analysis It may be downloaded only by special agreement (contact software@cbs.dtu.dk).Run Unix # signalp | Run Web # |
SIM4 recherche les meilleurs alignements locaux entre une séquence d'ADNc et une séquence d'ADN génomique (ARNm, EST) contenant ce gène et autorisant la présence d'introns et un petit nombre d'erreurs de séquençage.
Remarque http://globin.cse.psu.edu/html/docs/sim4.htmlRun Unix # sim4 mouse_cDNA human_genomic K=15 C=11 A=3 W=10 | Run Web # |
Simulation of whole bacterial genomes with homologous recombination
Remarque Run Unix # SimBac [OPTIONS]
| Run Web # |
SIMPA est un programme de prédiction de la structure secondaire des protéines. 3 états sont pris en considération : l'hélice alpha (H), les brins bêta (b) et les structures apériodiques (C). Ce programme est basé sur la notion de "nearest neighbor". Il fournit un résultat Q3 de 67%.
Remarque
SimWalk2 is a statistical genetics computer application for haplotype, parametric linkage, non-parametric linkage (NPL), identity by descent (IBD) and mistyping analyses on any size of pedigree. SimWalk2 uses Markov chain Monte Carlo (MCMC) and simulated annealing algorithms to perform these multipoint analyses.
Remarque Ces fichiers sont indispensables a l'utilisation de SimWalk2 et doivent être là où on lancera le soft. (MAP.DAT, LOCUS.DAT, PEDIGREE.DAT, PEN.DAT).Run Unix # simwalk2 | Run Web # |
SLICEMBLER is an iterative meta-assembler that takes advantage of the whole
dataset, and significantly improves the final quality of the assembly.
SLICEMBLER partitions the input data into optimal-sized “slices” and uses
a standard assembly tool (e.g., Velvet, SPAdes, IDBA, Ray) to assemble each
slice individually. SLICEMBLER uses majority voting among the individual
assemblies to identify long contigs that can be merged to the consensus
assembly. It extracts high-quality contigs from the slice assemblies, and
prevents contigs containing mis-joins and calling errors to be included in
the final assembly.
SLICEMBLER has been designed and developed at the algorithm and computational biology lab. , university of California, Riverside.
Remarque
SNAP is a new sequence aligner that is 10-100x faster and simultaneously more accurate than existing tools like BWA, Bowtie2 and SOAP2. It runs on commodity x86 processors, and supports a rich error model that lets it cheaply match reads with more differences from the reference than other tools. This gives SNAP up to 2x lower error rates than existing tools and lets it match larger mutations that they may miss.
Remarque Faster and More Accurate Sequence Alignment with SNAP. Matei Zaharia, William J. Bolosky, Kristal Curtis, Armando Fox, David Patterson, Scott Shenker, Ion Stoica, Richard M. Karp, and Taylor Sittler. arXiv:1111.5572v1, November 2011.
SOAPaligner/soap2 is a member of the SOAP (Short Oligonucleotide Analysis Package). It is an updated version of SOAP software for short oligonucleotide alignment. The new program features in super fast and accurate alignment for huge amounts of short reads generated by Illumina/Solexa Genome Analyzer. Compared to soap v1, it is one order of magnitude faster. It require only 2 minutes aligning one million single-end reads onto the human reference genome. Another remarkable improvement of SOAPaligner is that it now supports a wide range of the read length.
Remarque To run SOAPaligner, we need to build index files for the reference genome (2bwt-builder), and then search reads against the formatted index files(soap).
Version | MAJ | soap.coverage | | |
2.7.7 | 2011-12-14 | Download | Doc |
Utility for SOAP - soap.coverage can calculate sequencing coverage or physical coverage as well as duplication rate and details of specific block for each segments and whole genome by using SOAP, BLAT, BLAST, BlastZ, mum- mer and MAQ aligement results with multi-thread.
Remarque Run Unix # soap.coverage | Run Web # |
SOAPdenovo is a novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes. The program is specially designed to assemble Illumina GA short reads. It creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost effective way.
Remarque Run Unix # soapdenovo [option] | Run Web # |
SolexaQA is a Perl-based software package that calculates quality statistics and creates visual representations of data quality from FASTQ files generated by Illumina second-generation sequencing technology (“Solexa”).
Remarque Run Unix # SolexaQA.pl | Run Web # |
SortMeRNA is a software designed to rapidly filter ribosomal RNA fragments from metatransriptomic data produced by next-generation sequencers. It is capable of handling large RNA databases and sorting out all fragments matching to the database with high accuracy and specificity.
Remarque If you use SortMeRNA, please cite:
Kopylova E., Noé L. and Touzet H., "SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data", Bioinformatics (2012), doi: 10.1093/bioinformatics/bts611. Run Unix # sortmerna -h
| Run Web # |
SPAdes is a de Bruijn graph based assembler. It integrates a read error corrector, a multiple kmer De Bruijn graph assembler, an assembly merger, a scaffoler and a repeat resolver.
Remarque Run Unix # spades | Run Web # |
SPatt (Statistic for Patterns) is a suite of C++ programs designed for the computation of pattern occurrences p-value on text. Assuming the text is generated according to Markov model, the p-value of a given observation is its probability to occur. The lower is the p-value, the more unlikely is the observation. For example, this tools can be used to find patterns with unusual behaviour in DNA sequences.
Remarque Run Unix # spatt (aspatt cpspatt gspatt ldspatt oldxspatt sspatt xspatt) | Run Web # |
Version | MAJ | SPiD | | |
2.1 | | Download | Doc |
Subtilis Protein interaction Database
Remarque
SplitsTree4 is the leading application for computing unrooted phylogenetic networks from molecular sequence data. Given an alignment of sequences, a distance matrix or a set of trees, the program will compute a phylogenetic tree or network using methods such as split decomposition, neighbor-net, consensus network, super networks methods or methods for computing hybridization or simple recombination networks.
Remarque Run Unix # SplitsTree | Run Web # |
The NCBI SRA Toolkit enables reading ("dumping") of sequencing files from the SRA database and writing ("loading") files into the .sra format (Note that this is not required for submission). The Toolkit source code is provided in the form of the SRA SDK, and may be compiled with GCC. However, pre-built software executables are available for Linux, Windows, and Mac OS X, and we highly recommend using these pre-built executables whenever possible.
Remarque
SSAHA (Sequence Search and Alignment by Hashing Algorithm) is an algorithm for very fast matching and alignment of DNA sequences. It achieves its fast search speed by encoding sequence information in a perfect hash function.
Remarque Run Unix # ssaha2 | Run Web # |
SSAKE is a genomics application for assembling millions of very short DNA sequences.sIt is an easy-to-use, robust, reliable and tractable clustering algorithm for very short sequence reads, such as those generated by Illumina Ltd.
Remarque Run Unix # ssake.pl | Run Web # |
SSPACE is not a de novo assembler, it is used after a preassembled
run. SSPACE is a script to extend and scaffold preassembled
contigs using a number of mate pairs or paired-end libraries.
It uses Bowtie to map all the reads to the pre-assembled contigs.
Unmapped reads are used for extending, if desired, the pre-assembled
contigs with the SSAKE assembler. Again Bowtie is used to map the
reads to the extended contigs. Positions and orientation of the reads are
stored and used for scaffolding. If both reads of a pair are found within
the allowed distance, they are used for scaffolding to determine the
orientation, contig pairing and ordering of the contigs.
Remarque Run Unix # /usr/local/genome/SSPACE-BASIC-2.0_linux-x86_64/SSPACE_Basic_v2.0.pl | Run Web # |
SSU-ALIGN is a software package for identifying, aligning, masking and visualizing archaeal 16S, bacterial 16S and eukaryotic 18S small subunit ribosomal RNA (SSU rRNA) sequences. It includes and uses the Infernal software package for generating alignments based on the conserved secondary structure and sequence of SSU rRNA. SSU-ALIGN extends Infernal to make it easier for users to generate large-scale alignments of up to millions of SSU rRNA sequences that will ultimately be used as input to phylogenetic inference methods. (SSU-ALIGN is not capable of inferring phylogenetic trees itself.) Large SSU rRNA sequence datasets are commonly generated by environmental sequencing survey studies that use SSU rRNA as a phylogenetic marker of species in the environment being studied. While designed primarily for these SSU-based studies, SSU-ALIGN is a general tool that can be used to generate alignments of any type of structural RNA, including large subunit ribosomal RNA (LSU rRNA).
Remarque How to cite SSU-ALIGN
SSU-ALIGN does not yet have an associated publication, so please cite the INFERNAL software publication ((Nawrocki et al., 2009a)) if you find the package useful for work that you publish. Additionally, because SU-ALIGN’s seed alignments were derived from the comparative rna website we ask that you cite that database as well: (Cannone et al., 2002).
Version | MAJ | stacks | | |
0.9995 | 2012-08-21 | Download | Doc |
Stacks is a software pipeline for building loci out of a set of short-read sequenced samples. Stacks was developed for the purpose of building genetic maps from RAD-Tag Illumina sequence data, but can also be readily applied to population studies, and phylogeography.
Remarque Please cite this paper:
J. Catchen, A. Amores, P. Hohenlohe, W. Cresko, and J. Postlethwait. Stacks: building and genotyping loci de novo from short-read sequences. G3: Genes, Genomes, Genetics, 1:171-182, 2011. [reprint]
The Staden Package is a set of tools covering sequence assembly, editing and analysis. Gap4 performs sequence assembly, contig ordering based on read pair data, contig joining based on sequence comparisons, assembly checking, repeat searching, experiment suggestion, read pair analysis and contig editing. Pregap4 provides a graphical user interface to set up the processing required to prepare trace data for assembly or analysis. Trev is a rapid and flexible viewer and editor for ABI, ALF, SCF and ZTR trace files. Prefinish analyses partially completed sequence assemblies and suggests the most efficient set of experiments to help finish the project. Tracediff and hetscan automatically locate mutations by comparing trace data against reference traces. Spin analyses nucleotide sequences to find genes, restriction sites, motifs, etc. It can perform translations, find open reading frames, count codons, etc.
Remarque Run Unix # http://staden.sourceforge.net/overview.html | Run Web # |
Similarity, Tree-building, & Alignment of Motifs and Profiles
Remarque Run Unix # STAMP | Run Web # |
Version | MAJ | STFilter | | |
1.0 | | Download | Doc |
STFilter interroge PubMed sur la base d'une liste de noms de gènes ou d'un nom d'espèce, segmente les résumés en phrase et les classes en fonction d'un critère de pertinence. Ce critère de pertinence peut être appris automatiquement à partir de phrases classées.
Remarque
STRIDE = Protein secondary structure assignment from atomic coordinatessSTRIDE is a program to recognize secondary structural elements in proteins from their atomic coordinates.
Remarque Run Unix # stride | Run Web # |
StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length
transcripts representing multiple splice variants for each gene locus. Its
input can include not only the alignments of raw reads used by other
transcript assemblers, but also alignments longer sequences that have been
assembled from those reads.In order to identify differentially expressed
genes between experiments, StringTie's output can be processed by specialized software like Ballgown, Cuffdiff or other programs (DESeq2, edgeR, etc.).
Remarque Run Unix # stringtie -h/--help | Run Web # |
Version | MAJ | structure | | |
2.3.4 | 2012-12-21 | Download | Doc |
The program structure is a free software package for using multi-locus genotype data to investigate population structure. Its uses include inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed. It can be applied to most of the commonly-used genetic markers, including SNPS, microsatellites, RFLPs and AFLPs.
Remarque Run Unix # structure | Run Web # |
Version | MAJ | SUPER-FOCUS | | |
0.26 | 2016-10-04 | Download | Doc |
SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile
homology-based approach using a reduced SEED database to report the
subsystems present in metagenomic samples and profile their abundances. The
tool was tested with over 70 real metagenomes, and the results show that our
approach accurately predicts the subsystems present in microbial communities,
and it can be up to over 1,000 times faster than other tools.
Remarque
SeqUence Repository and Feature detectionsNucleotidic sequence production commonly involve several dedicated bioinformatic softwares for sequence basecalling, vector detection, etc.
Remarque
Version | MAJ | SurfG+ | | |
1.02 | 2012-07-13 | Download | Doc |
SurfG+ is a tool to predict the protein localization in frame-psoitive bacteria. Current protein localization protocols are not suited to this prediction task as they ignore the potential surface exposition of many membrane-associated proteins. Therefore, we developed a new flow scheme, for the processing of protein sequence data with the particular aim of identification of potentially surface exposed (PSE) proteins from Gram-positive bacteria.
Remarque See Barinov A, Loux V, Hammani A, Nicolas P, Langella P, Ehrlich D, et al. Prediction of surface exposed proteins in Streptococcus pyogenes, with a potential application to other Gram-positive bacteria.
Proteomics. 2009 Jan.;9(1):61–73.
Run Unix # Surfg | Run Web # |
SvcR est une implémentation d'un algorithme de clustering basé sur la recherche d'un séparateur dans un espace de caractéristiques entre des points décrits dans un espace de données. Le format de données est défini par une table attribut/valeur (matrice). Les données sont transformées grace à un noyau dans l'espace des caractèristiques en un cluster unique délimité par un rayon de boule et des vecteurs support. On peut utilisé le rayon de cette boule dans l'espace des données pour reconstruire la frontière formant maintenant plusieurs clusters.
Remarque
Version | MAJ | swat | | |
| | Download | Doc |
Remarque
Tablet is a lightweight, high-performance graphical viewer for next generation sequence assemblies and alignments.
Remarque Run Unix # tablet | Run Web # |
Version | MAJ | tagdust | | |
1.13 | 2013-09-13 | Download | Doc |
TagDust is a program to eliminate artifactual reads from next-generation sequencing data sets.
Remarque Lassmann T., et al. (2009) TagDust - A program to eliminate artifacts from next generation sequencing data. Bioinformatics.Run Unix # tagdust [options] lib.fa read1.fa read2.fa ... | Run Web # |
Version | MAJ | Tandem Repeats Finder | | |
4.07b
| 2013-08-20 | Download | Doc |
A tandem repeat in DNA is two or more adjacent, approximate copies of a pattern of nucleotides. Tandem Repeats Finder is a program to locate and display tandem repeats in DNA sequences.
Remarque
T-Coffee is a multiple sequence alignment package. Given a set of sequences (Proteins or DNA), T-Coffee generates a multiple sequence alignment. Version 2.00 and higher can mix sequences and structures.
Remarque Run Unix # t_coffee sequence_file | Run Web # |
Version | MAJ | TFM-Pvalue | | |
- | 2014-01-14 | Download | Doc |
TFM-Pvalue is a software suite providing tools for computing the score threshold associated to a given P-value and the P-value associated to a given score threshold. It uses Position Weight Matrices, such as those available in the Transfac or Jaspar databases.
Remarque Efficient and accurate P-value computation for Position Weight Matrices
H. Touzet and J.S. Varré
Algorithms for Molecular Biology 2007, 2:15
TGI Clustering tools (TGICL): a software system for fast clustering of large EST datasets This package automates clustering and assembly of a large EST/mRNA dataset. The clustering is performed by a slightly modified version of NCBI's megablast , and the resulting clusters are then assembled using CAP3 assembly program. TGICL starts with a large multi-FASTA file (and an optional peer quality values file) and outputs the assembly files as produced by CAP3.
Remarque Run Unix # tgicl , cap3... | Run Web # |
Version | MAJ | TM-align | | |
20160521 | 2017-02-28 | Download | Doc |
TM-align is an algorithm for sequence-order independent protein structure comparisons. For two protein structures of unknown equivalence, TM-align first generates optimized residue-to-residue alignment based on structural similarity using dynamic programming iterations. An optimal superposition of the two structures, as well as the TM-score value which scales the structural similarity, will be returned. TM-score has the value in (0,1], where 1 indicates a perfect match between two structures. Following strict statistics of structures in the PDB, scores below 0.2 corresponds to randomly chosen unrelated proteins whereas with a score higher than 0.5 assume generally the same fold in SCOP/CATH.
Remarque Run Unix # TMalign PDB1.pdb PDB2.pdb [Options]
| Run Web # |
Version | MAJ | TMAP | | |
3.4.1 | 2013-10-25 | Download | Doc |
TMAP / Torrent Mapping Alignment Program - Alignment software for short and long nucleotide sequences produced by next-generation sequencing technologies.
Remarque
Version | MAJ | tmhmm | | |
2.0c | 2007-11-22 | Download | Doc |
tmhmm is one of the better prediction methods of transmembrane helices in proteinss
Remarque tmhmm ma_sequence.fasta puis le resultat est genere sur la sortie standard (pas tres bavard) et dans un repertoire nomme TMHMM_ avec etant le PID du processus qui l a genere.
Version | MAJ | tmmod | | |
| 2009-02-23 | Download | Doc |
An Improved Hidden Markov Model for Transmembrane Protein Topology Prediction and Its Applications to Complete Genomes
Remarque Run Unix # tmmod | Run Web # |
TopHat is a fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
Remarque Run Unix # tophat -h | Run Web # |
TREE-PUZZLE is a computer program to reconstruct phylogenetic trees from molecular sequence data by maximum likelihood. It implements a fast tree search algorithm, quartet puzzling, that allows analysis of large data sets and automatically assigns estimations of support to each internal branch. TREEPUZZLE also computes pairwise maximum likelihood distances as well as branch lengths for user specified trees. Branch lengths can be calculated with and without the molecular-clock assumption. In addition, TREE-PUZZLE o ers likelihood mapping, a method to investigate the support of a hypothesized internal branch without computing an overall tree and to visualize the phylogenetic content of a sequence alignment. TREE-PUZZLE also conducts a number of statistical tests on the data set (chi-square test for homogeneity of base composition, likelihood ratio to test the clock hypothesis, one and two-sided Kishino-Hasegawa test, Shimodaira-Hasegawa test, Expected Likelihood Weights). The models of substitution provided by TREE-PUZZLE are GTR, TN, HKY, F84, SH for nucleotides, Dayhoff, JTT, mtREV24, BLOSUM 62, VT, WAG for amino acids, and F81 for two-state data. Rate heterogeneity is modeled by a discrete Gamma distribution and by allowing invariable sites. The corresponding parameters (except for GTR) can be inferred from the data set.
Remarque Run Unix # puzzle | Run Web # |
TribeMCL is a method for clustering proteins into related groups, which are termed 'protein families'. This clustering is achieved by analysing similarity patterns between proteins in a given dataset, and using these patterns to assign proteins into related groups.
Remarque
Trimmomatic: A flexible read trimming tool for Illumina NGS data
Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line.
Remarque Run Unix # trimmomatic | Run Web # |
Version | MAJ | Trinity | | |
2.2.0 | 2016-07-01 | Download | Doc |
RNA-Seq De novo Assembly Using Trinity
Trinity, developed at the Broad Institute and the Hebrew University of Jerusalem, represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads.
Remarque Run Unix # Trinity | Run Web # |
Version | MAJ | uchime | | |
4.2.40 | 2013-05-27 | Download | Doc |
UCHIME is an algorithm for detecting chimeric sequences.
Remarque Run Unix # uchime --input query.fasta [--db db.fasta] [--uchimeout results.uchime] [--uchimealns results.alns]
| Run Web # |
UCLUST is a high-performance clustering, alignment and search algorithm that is capable of handling millions of sequences.
Remarque Run Unix # uclust --sort seqs.fasta --output seqs_sorted.fasta | Run Web # |
USEARCH is a unique sequence analysis tool with thousands of users world-wide. USEARCH offers search and clustering algorithms that are often orders of magnitude faster than BLAST.
Remarque Run Unix # usearch
| Run Web # |
variant detection in massively parallel sequencing data

Remarque Run Unix # varscan [COMMAND] [OPTIONS]
| Run Web # |
Version | MAJ | VAST | | |
| | Download | Doc |
Programme de comparaison et d'alignement des structures 3D des protéines. VAST est basé sur une procédure en 2 étapes. Dans la première étape on utilise une description simplifiée des protéines où les éléments de structure secondaire sont représentés par des vecteurs. Le but de cette première étape est de trouver le sous-ensemble des vecteurs qui se superimposent au mieux entre les 2 structures. La significativité du résultat est évaluée en calculant la probabilité d'observer cette superimposition juste par chance. Dans la seconde étape on revient à une description atomique des structures 3D en décrivant la chaîne polypeptique par les positions des CA de chaque résidu. L'objectif de cette seconde étape est d'établir une correspondance univoque (alignement) entre les CA jouant le même rôle dans les 2 structures. On cherche à obtenir l'alignement contenant les plus de paires de CA et le rmsd (root mean square deviation) le plus faible. Pour ce faire l'algorithme est amené à répondre à des questions comme : quel alignement, l'un comprenant 100 paires de CA et ayant un rms de 3 A, et l'autre comprenant 60 paires de CA et un rms de 2 A est le meilleur? Ce problème est résolu en considérant l'alignement qui a la probabilité la plus faible d'être généré par hasard.
Remarque
VCAKE is a genetic sequence assembler capable of assembling millions of small nucleotide reads even in the presence of sequencing error. This software is currently geared towards de novo assembly of Illumina's Solexa Sequencing data.
Remarque Run Unix # perl -S vcake.pl | Run Web # |
A C++ library for parsing and manipulating VCF files.
Remarque
Version | MAJ | vcftools | | |
1.12 | 2015-07-13 | Download | Doc |
vcftools is a suite of functions for use on genetic variation data in the form of VCF and BCF files. The tools provided will be used mainly to summarize data, run calculations on data, filter out data, and convert data into other useful file formats.
Remarque Run Unix # vcftools [ --vcf FILE | --gzvcf FILE | --bcf FILE] [ --out OUTPUT PREFIX ] [ FILTERING OPTIONS ] [ OUTPUT OPTIONS ] | Run Web # |
Sequence assembler for very short reads. Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the European Bioinformatics Institute (EMBL-EBI), near Cambridge, in the United Kingdom.
Remarque Run Unix # velveth # velvetg | Run Web # |
Version | MAJ | Vienna | | |
1.8.4 | 2010-12-02 | Download | Doc |
The Vienna RNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures. RNA secondary structure prediction through energy minimization is the most used function in the package. We provide three kinds of dynamic programming algorithms for structure prediction: the minimum free energy algorithm of (Zuker & Stiegler 1981) which yields a single optimal structure, the partition function algorithm of (McCaskill 1990) which calculates base pair probabilities in the thermodynamic ensemble, and the suboptimal folding algorithm of (Wuchty et.al 1999) which generates all suboptimal structures within a given energy range of the optimal energy. For secondary structure comparison, the package contains several measures of distance (dissimilarities) using either string alignment or tree-editing (Shapiro & Zhang 1990). Finally, we provide an algorithm to design sequences with a predefined structure (inverse folding).
Remarque
Vmatch replaces Reputer. It looks for all possible repeats in genomes, withsa possibility to specify the kind of repeats to look for, like its identityspercentage, minimal length, etc...Can also be used to mask repeats inssequences, to analyze repeat families, etc...
Remarque
Recherche nouveaux TFBSs dans un jeu de sequences fasta, recherche de plusieurs tailles et limite de mutations autorisees. Ne sort que les motifs ayant passe le tri stat, contrairement a MEME qui donne autant de motifs que specifie dans les parametres. Par defaut les stat de genome sont basees sur un promoteur de 1000 pb, mais possibilite d'utiliser des stats basees sur toute la sequence intergenique.
Remarque Pavesi G, Mereghetti P, Mauri G, Pesole G. Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res. 2004 32:W199-W203Run Unix # weederlauncher.out inputfilename speciescode analysistype | Run Web # |
Version | MAJ | weederH | | |
1.4.2 | 2009-12-07 | Download | Doc |
Recherche de TFBS et ECR dans des sequences homologues. Pas d'alignement necessaire en input, pas de prerequis de PWM. Mesure de la conservation relative entre les sequences par recherche d'oligo conserves et scoring de similarite globale entre deux sequences homologues. Permet de chercher aussi les enhancers distaux. Fonctionnerait sur des promoteurs non annotes (pas de TSS connu).
Remarque Pavesi, G., Zambelli, F., Pesole, G. WeederH: an algorithm for finding conserved regulatory motifs and regions in homologous sequences. BMC Bioinformatics 2007, 8:46Run Unix # weederH.out -f inputfilename -O speciescode | Run Web # |
WoLF PSORT predicts the subcellular localization sites of proteins based on their amino acid sequences.
Remarque Run Unix # runWolfPsortSummary | Run Web # |
WOMBAT is a program to facilitate analyses fitting a linear, mixed model via restricted maximum likelihood (REML). It is assumed that traits analysed are continuous and have a multivariate normal distribution.
Remarque http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2064953/ ~ http://didgeridoo.une.edu.au/km/download.php?file=hangzhou.pdfRun Unix # wombat | Run Web # |
Washington University BLAST (WU BLAST) version 2.0 is a powerful software package for gene and protein identification, using sensitive, selective and rapid similarity searches of protein and nucleotide sequence databases.
Remarque Run Unix # wu-blastall | Run Web # |
Evaluation d experience d hybridation
Remarque Run Unix # xdigitise | Run Web # |
Metagenomes present assembly challenges, when assembling multiple genomes
from mixed reads of multiple species. An assembler for single genomes can’t
adapt well when applied in this case. A metagenomic assembler, Genovo, is a
de novo assembler for metagenomes under a generative probabilistic model.
Genovo assembles all reads without discarding any reads in a preprocessing
step, and is therefore able to extract more information from metagenomic data
and, in principle, generate better assembly results. Paired end sequencing is
currently widely-used yet Genovo was designed for 454 single end reads. In
this research, we attempted to extend Genovo by incorporating paired-end
information, named Xgenovo, so that it generates higher quality assemblies
with paired end reads.
Remarque Run Unix # assemble - finalize
| Run Web # |
GRAIL is a suite of tools designed to provide analysis and putative annotation of DNA sequences both interactively and through the use of automated computation.
Remarque
X-PLOR is a program system for computational structural biology. X-PLOR stands for exploration of conformational space of macromolecules restrained to regions allowed by combinations of empirical energy functions and experimental data. But it also stands for exploration of modern concepts of structured programming in macromolecular simulation.
Remarque Run Unix # xplor | Run Web # |
YASS est un outil permettant la recherche locale de similaritées dans les séquences d'ADN.
Remarque