Project

General

Profile

This page is obsolete, please go to https://github.com/Bibliome/alvisnlp/wiki/Pattern-matcher

Pattern Matcher

PatternMatcher is an AlvisNLP/ML module for searching sequences of annotations. It features a language similar to regular expressions to specify annotation sequence queries. It is also capable of several actions on matched sequences like adding annotations, removing annotations, setting features and adding tuples.

Pattern language

The PatternMatcher module requires the pattern parameter that specifies an annotation sequence query. This query is written in a regex-like language.

Single annotation clause

  [ EXPR ]

EXPR is an element expression (see Element Expression). It will be evaluated as a boolean with each annotation in the sequence as the context element. This query matches any annotation for which the expression evaluated to true. Matches for this clause are always a single annotation.

Sequence

  CLAUSE1 CLAUSE2 ... CLAUSE3

CLAUSE1 CLAUSE2 ... CLAUSE3 are clauses (single annotation or groups). This searches for subsequences of annotations that match all clauses in the specified order.

Groups

  ( CLAUSE )
  (NAME: CLAUSE )

CLAUSE is a clause (single annotation or sequence), and NAME is a name (see Element Expression).
The first form is a non capturing group, usually used to apply a quantifier to a sequence or an union.
The second form is a capturing group, the NAME can be referenced in PatternMatcher actions.

Union

  LEFT | RIGHT

LEFT and RIGHT are clauses (single annotation or group). This searches for a subsequence that match either LEFT or RIGHT.

Greedy quantifiers

  CLAUSE ?
  CLAUSE *
  CLAUSE +
  CLAUSE {N}
  CLAUSE {N,M}
  CLAUSE {N,}
  CLAUSE {,M}

CLAUSE is a clause (single annotation or group), N and M are integer constants.

? optional {0,1}
* kleene star {0,}
+ repeat {1,}
N exactly N
{N,M} at least N, at most M
{N,} at least N, no upper limit
{,M} at most M, possibly 0

Reluctant quantifiers

  CLAUSE ??
  CLAUSE *?
  CLAUSE +?
  CLAUSE {N}?
  CLAUSE {N,M}?
  CLAUSE {N,}?
  CLAUSE {,M}?

Reluctant quantifiers will not attempt to maximize the length of the match.

Examples

  [ @form == "," ]
  [ true ]{1,3}
  [ @form == "," ]

Two commas separated by one, two or three words.

  [ true ]
  [ @form == "(" ]
  [ @pos == before:words{-2}.@pos ]
  [ @form == ")" ]

Apposition; note that the word between parentheses must have the same POS tag than the word before the opening parenthesis.

Actions

The actions parameter specifies what should be done with the matches. PatternMatcher can perform several actions for the same match. Each action is specified by a specific tag.
All action tags accept an attribute group, if this attribute is specified, then the action concerns annotations in the specified capturing group. If this attribute is not specified, then the action concerns all annotations in the whole match.

In most actions, you can specify a set of features to add to one or several elements. The feature specification is a mapping of expression in the form:

  KEY1 = EXPR1, KEY2 = EXPR2, ..., KEYN = EXPRN

KEY1 KEY2 ... KEYN are feature keys and EXPR1 EXPR2 EXPRN are expressions. The element context for the evaluation of the expression is the element to which the features will be added. Additionally PatternMatcher defines a reference named after for each group that returns the annotations matched in the corresponding group. The match reference returns all annotations of the whole match.

Add to layer

  <addToLayer [group="GROUP"] layer="LAYER"/>

This action adds all annotations in the group or match into the layer named LAYER.

Create feature

  <createAnnotation [group="GROUP"] layer="LAYER" [features="FEATURES"]/>

This action creates an annotation that spans over all the group or match and adds this annotation in the layer named LAYER.
Additionally it adds to this annotation the features specified by FEATURES.

Remove annotations

  <removeAnnotations group="GROUP" layer="LAYER"/>

This action removes all annotations in the group or match from the layer named LAYER.

Set annotation features

  <setFeatures group="GROUP" features="FEATURES"/>

This action adds features specified by FEATURES to all annotations in the group or match.

Create a tuple

  <createTuple relation="RELATION" arguments="ARGS" features="FEATURES"/>

This action creates a tuple to the relation named RELATION with arguments specified by ARGS and features specified by FEATURES.
ARGS is a mapping of expressions (like FEATURES) though the expressions will be evaluated as a list: the argumebnt will be the first annotation of the list. PatternMatcher will issue a warning if the first element of the list is not an annotation or if the list is empty. The context element is the freshly created tuple and references for each groups are defined.