Project

General

Profile

Tutorial

Getting started

  • Connect to bibdev:
    ssh bibdev
  • Uncompress it:
    tar -x -z -f alvisnlp_stfilter.tar.gz. This should create a directory named stfilter
  • cd stfilter
  • Load the environment to run AlvisNLP/ML:
    source environ.sh
  • Run:
    alvisnlp stfilter.plan

Exercises

Observe stfilter.plan and common.plan

  1. What are sequence and import useful for?
  2. Which files are read and written by AlvisNLP/ML?
  3. What is the meaning of taxid,pos,rank in module taxa?
  4. Add a module to extract all gene occurrences.
  5. Why fixed-forms and fixed-forms-overlaps are necessary?
  6. What is classified by train?
  7. What is the current discrimination performance? Use different classifiers instead of Naive Bayes.
  8. Where are learning attributes specified?

Observe attr/base.xml

  1. How many learning attributes are there?
  2. What is the meaning of the length attribute?
  3. What is the meaning of bag in attr/bow.xml and attr/vici.xml? What is the difference between one and another?
  4. Which one performs best?
  5. Use word lemmas instead of surface forms. Does it improve the performance?
  6. Can you think of other attributes?

Using dependencies

  1. Change the POS tagging from TreeTagger to CCGPosTagger.
  2. Add CCGParser to common.plan.
  3. Wow! That takes too much time! Investigate how you could use -dumpModule and -resume to your advantage.
  4. Use dependencies in the learning attributes.

Cheating