Clustering


VersionMAJ

cd-hit

4.6.12013-08-12DownloadDoc
CD-HIT stands for Cluster Database at High Identity with Tolerance. The program (cd-hit) takes a fasta format sequence database as input and produces a set of 'non-redundant' (nr) representative sequences as output.

Remarque Exemple d'utilisation : cd-hit -n 5 -i /db/fasta/nr90/nr90.fsa -o nr80 -M 2048 -c 0.8 -u clstr.lastweek
Run Unix # cd-hit [Options]Run Web #

VersionMAJ

cd-hit-454

-2013-08-05DownloadDoc
The 454 pyrosequencing reads contains artificially duplicates, which might lead to misleading conclusions. cdhit-454 is a fast program to identify exact and nearly identical duplicates, the reads begin at the same position but may vary in length or bear mismatches. cdhit-454 can process a dataset in ~10 minutes. it also provides a consensus sequence for each group of duplicates.

Remarque
Run Unix # cd-hit-454Run Web # 4.6.1

VersionMAJ

cluster-3.0

3.02013-05-24DownloadDoc
The open source clustering software available here implement the most commonly used clustering methods for gene expression data analysis. The clustering methods can be used in several ways.Cluster 3.0 provides a Graphical User Interface to access to the clustering routines. It is available for Windows, Mac OS X, and Linux/Unix. Python users can access the clustering routines by using Pycluster, which is an extension module to Python. People that want to make use of the clustering algorithms in their own C, C++, or Fortran programs can download the source code of the C Clustering Library.

Remarque
Run Unix # clusterRun Web #

VersionMAJ

InParanoid

4.12011-01-21DownloadDoc
InParanoid is a program for automatic identification of orthologs while differentiating between inparalogs and outparalogs. An InParanoid cluster is seeded by a reciprocally bestmatching ortholog pair, around which inparalogs are gathered independently, while outparalogs are excluded. The InParanoid database is a collection of pairwise ortholog groups aiming to include all 'completely sequenced' eukaryotic genomes. By this we mean above 6X coverage, and less than 1% X letters in the protein sequences.

Remarque
Run Unix # Usage: inparanoid.pl [FASTAFILE with sequences of species C] Run Web #

VersionMAJ

kClust

1.02015-01-21DownloadDoc
kClust is a fast and sensitive clustering method for the clustering of protein sequences. It is able to cluster large protein databases down to 20-30% sequence identity. kClust generates a clustering where each cluster is represented by its longest sequence (representative sequence).

Remarque For generating one multiple sequence alignment file for each cluster, please use kClust_mkAln. Type kClust_mkAln
Run Unix # kClust -i [fasta-db-file] -d [directory] [options]Run Web #

VersionMAJ

mcl

12-0682013-08-22DownloadDoc
The MCL algorithm is short for the Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for networks (also known as graphs) based on simulation of (stochastic) flow in graphs.

Remarque
Run Unix # mcl <-|file name> [options], do 'mcl -h' or 'man mcl' for helpRun Web #

VersionMAJ

mothur

1.34,42014-12-23DownloadDoc
The goal of mothur is to have a single resource to analyze molecular data that is used by microbial ecologists. Many of these tools are available elsewhere as individual programs and as scripts, which tend to be slow or as web utilities, which limit your ability to analyze your data. mothur offers the ability to go from raw sequences to the generation of visualization tools to describe α and β diversity. Examples of each command are provided within their specific pages, but several users have provided several analysis examples, which use these commands. An exhaustive list of the commands found in mothur is available within the commands category index.

Remarque
Run Unix # mothurRun Web #

VersionMAJ

rainbow

2.02012-09-10DownloadDoc
Rainbow package consists of several programs used for RAD-seq related clustering and de novo assembly.

Remarque
Run Unix # rainbow [options]Run Web #

VersionMAJ

uclust

1.2.22q2012-11-06DownloadDoc
UCLUST is a high-performance clustering, alignment and search algorithm that is capable of handling millions of sequences.

Remarque
Run Unix # uclust --sort seqs.fasta --output seqs_sorted.fastaRun Web #

Menu principal

Page | by Dr. Radut