mtap.metrics

Computing Metrics using processing

Provides functionality for measuring processor performance against gold standards.

class mtap.metrics.Metric[source]

Base class for metrics.

class mtap.metrics.Metrics(*metrics: Metric, tested: str, target: str, tested_filter: Callable[[Label], bool] | None = None, target_filter: Callable[[Label], bool] | None = None)[source]

A document processor that computes a set of metrics.

Parameters:
  • tested – The name of the index to use as the hypothesis / predictions.

  • target – The name of the index to use as the ground truth / gold standard.

  • tested_filter – A filter to apply to the tested index.

  • target_filter – A filter to apply to the target index.

mtap.metrics.fields_match_test(fields: Sequence[str] | None = None)[source]

Creates an equivalence test that tests whether the specified fields are equal on both labels.

Parameters:

fields – The fields to test or None if all fields should be tested.

class mtap.metrics.Accuracy(name: str = 'accuracy', mode: str = 'equals', print_debug: bool = False, boundary_fuzz: int = 0, fields: Sequence[str] | None = ..., equivalence_test: Callable[[Any, Any], bool] | None = fields_match_test(...))[source]

An accuracy metric with several options for equivalence.

Parameters:
  • name – An identifier for the metric.

  • mode – ‘equals’ - counts as a hit if there is one and only one label at the same location in the tested index as the target index, and it has the same values for its fields. ‘location’ - counts as a hit if there is one and only one label at the same location in the tested index as the target index. ‘any’ - counts as a hit if there is one or more labels at the same location with the same values for its fields.

  • print_debug – If true will print debug information about the misses.

  • boundary_fuzz – How different the target label boundaries can be from the tested boundaries before it doesn’t count as a match.

  • equivalence_test – callable A function which takes two argument labels, and returns true if the labels are equivalent for the purpose of the test.

class mtap.metrics.ConfusionMatrix(true_positives: float = 0, false_positives: float = 0, false_negatives: float = 0)[source]

A representation of a confusion matrix.

true_positives: float

Count of true positive examples.

false_positives: float

Count of false positive examples.

false_negatives: float

Count of false negative examples.

class mtap.metrics.FirstTokenConfusion(name: str = 'first_token_confusion', tested_filter: Callable[[Label], bool] | None = None, target_filter: Callable[[Label], bool] | None = None, print_debug: str | None = None, debug_range: int = 30, debug_handle: TextIO = sys.stdout)[source]

A metric which treats the first word token in every label as an example of the positive class and calculates the precision, recall, and f1 confusion matrix metrics for that positive class. Useful for evaluation of segmentation tasks.

precision = true positives / (true positives + false positives) recall = true positives / (true positives + false negatives) f1 = 2 * true positives / (2 * true positives + false positives + false negatives)

Parameters:
  • name – An identifying name for the metric.

  • tested_filter – A filter to apply to the tested index.

  • target_filter – A filter to apply to the target index.

  • print_debug – An argument to print failing examples. ‘fp’ prints only false positive errors, ‘fn’ prints only false negative errors, ‘all’ prints both false positive and false negative errors

  • debug_range – The range before and after the example to print.

  • debug_handle – A text io file handle to print the debug information to.

property precision: float

Ratio of true positives to the total number of positive predictions.

property recall: float

Ratio of true positives to the total number of positive ground truths.

property f1: float

The harmonic mean of precision and recall.