CD-HIT stands for Cluster Database at High Identity with Tolerance. The program (cd-hit) takes a fasta format sequence database as input and produces a set of 'non-redundant' (nr) representative sequences as output.
Remarque Exemple d'utilisation : cd-hit -n 5 -i /db/fasta/nr90/nr90.fsa -o nr80 -M 2048 -c 0.8 -u clstr.lastweek
|Run Unix # cd-hit [Options]||Run Web #|
The 454 pyrosequencing reads contains artificially duplicates, which might lead to misleading conclusions. cdhit-454 is a fast program to identify exact and nearly identical duplicates, the reads begin at the same position but may vary in length or bear mismatches. cdhit-454 can process a dataset in ~10 minutes. it also provides a consensus sequence for each group of duplicates.
|Run Unix # cd-hit-454||Run Web # 4.6.1|
The open source clustering software available here implement the most commonly used clustering methods for gene expression data analysis. The clustering methods can be used in several ways.Cluster 3.0 provides a Graphical User Interface to access to the clustering routines. It is available for Windows, Mac OS X, and Linux/Unix. Python users can access the clustering routines by using Pycluster, which is an extension module to Python. People that want to make use of the clustering algorithms in their own C, C++, or Fortran programs can download the source code of the C Clustering Library.
|Run Unix # cluster||Run Web #|
InParanoid is a program for automatic identification of orthologs while differentiating between inparalogs and outparalogs. An InParanoid cluster is seeded by a reciprocally bestmatching ortholog pair, around which inparalogs are gathered independently, while outparalogs are excluded. The InParanoid database is a collection of pairwise ortholog groups aiming to include all 'completely sequenced' eukaryotic genomes. By this we mean above 6X coverage, and less than 1% X letters in the protein sequences.
|Run Unix # Usage: inparanoid.pl ||Run Web #|
kClust is a fast and sensitive clustering method for the clustering of protein sequences. It is able to cluster large protein databases down to 20-30% sequence identity. kClust generates a clustering where each cluster is represented by its longest sequence (representative sequence).
Remarque For generating one multiple sequence alignment file for each cluster, please use kClust_mkAln. Type kClust_mkAln
|Run Unix # kClust -i [fasta-db-file] -d [directory] [options]||Run Web #|
The MCL algorithm is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for networks (also known as graphs) based on simulation of (stochastic) flow in graphs.
|Run Unix # mcl <-|file name> [options], do 'mcl -h' or 'man mcl' for help||Run Web #|
The goal of mothur is to have a single resource to analyze molecular data that is used by microbial ecologists. Many of these tools are available elsewhere as individual programs and as scripts, which tend to be slow or as web utilities, which limit your ability to analyze your data. mothur offers the ability to go from raw sequences to the generation of visualization tools to describe α and β diversity. Examples of each command are provided within their specific pages, but several users have provided several analysis examples, which use these commands. An exhaustive list of the commands found in mothur is available within the commands category index.
|Run Unix # mothur||Run Web #|
Rainbow package consists of several programs used for RAD-seq related clustering and de novo assembly.
|Run Unix # rainbow ||Run Web #|
UCLUST is a high-performance clustering, alignment and search algorithm that is capable of handling millions of sequences.
|Run Unix # uclust --sort seqs.fasta --output seqs_sorted.fasta||Run Web #|