The family file used by the -f <filename> option of the command rmes must have the following format.
- a title ended by the # symbol,
- an integer which specifies the number of enumerated families,
- an integer which specifies the number of words within the families,
- an integer whicg specify the length of the words,
- and then each family is listed as follows: a family name followed by all the words of the family.
Note that all the families in a file must be composed of the same number of words and all these words from all the families must have the same length. Here is an example :
4 families rny, rnr, ynr et yry of 16 trinucleotides # 4 16 3 rny aac agc acc atc aat agt act att gac ggc gcc gtc gat ggt gct gtt rnr aaa aga aca ata aag agg acg atg gaa gga gca gta gag ggg gcg gtg ynr caa cga cca cta cag cgg ccg ctg taa tga tca tta tag tgg tcg ttg yny cac cgc ccc ctc cat cgt cct ctt tac tgc tcc ttc tat tgt tct ttt
Warning: families should not start or end with some n's. Such cases are equivalent to study shorter words. One then has to be careful by chosing the order of the Markov model. For instance, the family natc reduces to the single word atc, and can obsviously not be analyzed in the M2 model.