R'MES: Family File Format

The family file used by the -f <filename> option of the command rmes must have the following format.

  • a title ended by the # symbol,
  • an integer which specifies the number of enumerated families,
  • an integer which specifies the number of words within the families,
  • an integer whicg specify the length of the words,
  • and then each family is listed as follows: a family name followed by all the words of the family.

Note that all the families in a file must be composed of the same number of words and all these words from all the families must have the same length. Here is an example :

4 families rny, rnr, ynr et yry of 16 trinucleotides #


  aac  agc  acc  atc  
  aat  agt  act  att
  gac  ggc  gcc  gtc
  gat  ggt  gct  gtt

  aaa  aga  aca  ata
  aag  agg  acg  atg
  gaa  gga  gca  gta
  gag  ggg  gcg  gtg

  caa  cga  cca  cta
  cag  cgg  ccg  ctg
  taa  tga  tca  tta
  tag  tgg  tcg  ttg

  cac  cgc  ccc  ctc
  cat  cgt  cct  ctt
  tac  tgc  tcc  ttc
  tat  tgt  tct  ttt

Warning: families should not start or end with some n's. Such cases are equivalent to study shorter words. One then has to be careful by chosing the order of the Markov model. For instance, the family natc reduces to the single word atc, and can obsviously not be analyzed in the M2 model.


Menu principal

Article | by Dr. Radut