Clustering

Version	MAJ	cd-hit
4.6.1	2013-08-12	cd-hit	Download	Doc

CD-HIT stands for Cluster Database at High Identity with Tolerance. The program (cd-hit) takes a fasta format sequence database as input and produces a set of 'non-redundant' (nr) representative sequences as output.

Remarque Exemple d'utilisation : cd-hit -n 5 -i /db/fasta/nr90/nr90.fsa -o nr80 -M 2048 -c 0.8 -u clstr.lastweek

Run Unix # cd-hit [Options]

Run Web #

Version	MAJ	cd-hit-454
-	2013-08-05	cd-hit-454	Download	Doc

The 454 pyrosequencing reads contains artificially duplicates, which might lead to misleading conclusions. cdhit-454 is a fast program to identify exact and nearly identical duplicates, the reads begin at the same position but may vary in length or bear mismatches. cdhit-454 can process a dataset in ~10 minutes. it also provides a consensus sequence for each group of duplicates.

Remarque

Run Unix # cd-hit-454

Run Web # 4.6.1

Version	MAJ	cluster-3.0
3.0	2013-05-24	cluster-3.0	Download	Doc

The open source clustering software available here implement the most commonly used clustering methods for gene expression data analysis. The clustering methods can be used in several ways.Cluster 3.0 provides a Graphical User Interface to access to the clustering routines. It is available for Windows, Mac OS X, and Linux/Unix. Python users can access the clustering routines by using Pycluster, which is an extension module to Python. People that want to make use of the clustering algorithms in their own C, C++, or Fortran programs can download the source code of the C Clustering Library.

Remarque

Run Unix # cluster

Run Web #

Version	MAJ	InParanoid
4.1	2011-01-21	InParanoid	Download	Doc

InParanoid is a program for automatic identification of orthologs while differentiating between inparalogs and outparalogs. An InParanoid cluster is seeded by a reciprocally bestmatching ortholog pair, around which inparalogs are gathered independently, while outparalogs are excluded. The InParanoid database is a collection of pairwise ortholog groups aiming to include all 'completely sequenced' eukaryotic genomes. By this we mean above 6X coverage, and less than 1% X letters in the protein sequences.

Remarque

Run Unix # Usage: inparanoid.pl [FASTAFILE with sequences of species C]

Run Web #

Version	MAJ	kClust
1.0	2015-01-21	kClust	Download	Doc

kClust is a fast and sensitive clustering method for the clustering of protein sequences. It is able to cluster large protein databases down to 20-30% sequence identity. kClust generates a clustering where each cluster is represented by its longest sequence (representative sequence).

Remarque For generating one multiple sequence alignment file for each cluster, please use kClust_mkAln. Type kClust_mkAln

Run Unix # kClust -i [fasta-db-file] -d [directory] [options]

Run Web #

Version	MAJ	mcl
12-068	2013-08-22	mcl	Download	Doc

The MCL algorithm is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for networks (also known as graphs) based on simulation of (stochastic) flow in graphs.

Remarque

Run Unix # mcl <-|file name> [options], do 'mcl -h' or 'man mcl' for help

Run Web #

Version	MAJ	mothur
1.34,4	2014-12-23	mothur	Download	Doc

The goal of mothur is to have a single resource to analyze molecular data that is used by microbial ecologists. Many of these tools are available elsewhere as individual programs and as scripts, which tend to be slow or as web utilities, which limit your ability to analyze your data. mothur offers the ability to go from raw sequences to the generation of visualization tools to describe α and β diversity. Examples of each command are provided within their specific pages, but several users have provided several analysis examples, which use these commands. An exhaustive list of the commands found in mothur is available within the commands category index.

Remarque

Run Unix # mothur

Run Web #

Version	MAJ	rainbow
2.0	2012-09-10	rainbow	Download	Doc

Rainbow package consists of several programs used for RAD-seq related clustering and de novo assembly.

Remarque

Run Unix # rainbow [options]

Run Web #

Version	MAJ	uclust
1.2.22q	2012-11-06	uclust	Download	Doc

UCLUST is a high-performance clustering, alignment and search algorithm that is capable of handling millions of sequences.

Remarque

Run Unix # uclust --sort seqs.fasta --output seqs_sorted.fasta

Run Web #

Menu principal

Clustering

cd-hit

cd-hit-454

cluster-3.0

InParanoid

kClust

mcl

mothur

rainbow

uclust

Menu principal

Menu principal

Vous êtes ici

Clustering

cd-hit

cd-hit-454

cluster-3.0

InParanoid

kClust

mcl

mothur

rainbow

uclust

Menu principal