mtap

Running Events Service

usage:

python -m mtap events [-h] [--address ADDRESS] [--port PORT]
                         [--workers WORKERS] [--register] [--config CONFIG]

optional arguments:
 -h, --help            show this help message and exit
 --address ADDRESS, -a ADDRESS
                       the address to serve the service on
 --port PORT, -p PORT  the port to serve the service on
 --workers WORKERS, -w WORKERS
                       number of worker threads to handle requests
 --register, -r        whether to register the service with the configured
                       service discovery
 --config CONFIG, -c CONFIG
                       path to config file

Events service client, documents

mtap.events_client(address: str) EventsClient[source]
mtap.events_client(addresses: Iterable[str]) EventsClient
mtap.events_client() EventsClient
class mtap.types.EventsClient[source]

Bases: ContextManager[EventsClient], ABC

Communicates with the events service.

abstract close()[source]

Closes the events client

class mtap.Event(event_id: str | None = None, *, client: EventsClient | None = None, only_create_new: bool = False, label_adapters: Mapping[str, ProtoLabelAdapter] | None = None, event_service_instance_id: str | None = None, lease: bool = True)[source]

An object for interacting with a specific event locally or on the events service.

The Event object functions as a map from string document names to Document objects that can be used to access document data from the events server.

To connect to the events service and load an existing event, all of event_id, event_service_instance_id, and client must be specified.

Parameters:
  • event_id – A globally-unique identifier for the event, or omit / none for a random UUID.

  • client – If specified, connects to the events service with id event_service_instance_id and accesses either accesses the existing event with the event_id or creates a new one.

  • only_create_new – Fails if the event already exists on the events service.

label_adapters

A mapping of string label index names to ProtoLabelAdapter instances to perform custom mapping of label types.

Type:

Mapping[str, ProtoLabelAdapter]

Examples

Creating a new event locally.

>>> event = Event()

Creating a new event remotely.

>>> with EventsClient(address='localohost:50000') as c,         >>>     Event(event_id='id', client=c) as event:
>>>     # use event
>>>     ...

Connecting to an existing event.

>>> with EventsClient(address='localohost:50000') as c,         >>>     Event(event_id='id',
>>>           event_service_instance_id='events_sid',
>>>           client=c) as event:
>>>
>>>
property event_id: str

The globally unique identifier for this event.

property event_service_instance_id: str

The unique instance identifier for this event’s paired event service.

property documents: Mapping[str, Document]

A mutable mapping of strings to Document objects that can be used to query and add documents to the event.

property metadata: MutableMapping[str, str]

A mutable mapping of strings to strings that can be used to query and add metadata to the event.

property binaries: MutableMapping[str, bytes]

A mutable mapping of strings to bytes that can be used to query and add binary data to the event.

property created_indices: Dict[str, List[str]]

A mapping of document names to a list of the names of all the label indices that have been added to that document

close()[source]

Closes this event. Lets the event service know that we are done with the event, allowing to clean up the event if no other clients have open leases to it.

create_document(document_name: str, text: str) Document[source]

Adds a document to the event keyed by document_name and containing the specified text.

Parameters:
  • document_name – The event-unique identifier for the document, example: 'plaintext'.

  • text – The content of the document. This is a required field, and the document text is final and immutable.

Returns:

The added document.

Examples

>>> event = Event()
>>> doc = event.create_document('plaintext',
>>>                             text="The text of the document.")
add_document(document: Document)[source]

Adds the document to this event, first uploading to events service if this event has a client connection to the events service.

Parameters:

document – The document to add to this event.

Examples

>>> event = Event()
>>> doc = Document('plaintext',
>>>                text="The text of the document.")
>>> event.add_document(doc)
class mtap.Document(document_name: str, *, text: str, label_adapters: LabelAdapters | None = None)[source]
class mtap.Document(document_name: str, *, event: Event, label_adapters: LabelAdapters | None = None)
class mtap.Document(document_name: str, *, text: str, event: Event, label_adapters: LabelAdapters | None = None)

An object for interacting with text and labels stored on an Event.

Documents are keyed by their name, and pipelines can store different pieces of related text on a single processing event using multiple documents. An example would be storing the text of one language on one document, and a translation on another, or storing the rtf or html encoding on one document (or as a binary in Event.binaries()), and the parsed plaintext on another document.

Both the document text and any added label indices are immutable. This is to enable parallelization and distribution of processing, and because other label indices might be downstream dependents on the earlier created labels.

Parameters:
  • document_name – The document name identifier.

  • text – The document text, can be omitted if this is an existing document and text needs to be retrieved from the events service.

  • event – The parent event of this document. If the event has a client, then that client will be used to share changes to this document with all other clients of the Events service. In that case, text should only be specified if it is the known existing text of the document.

Examples

Local document:

>>> document = Document('plaintext', text='Some document text.')

Existing distributed object:

>>> with EventsClient(address='localhost:8080') as client,         >>>      Event(event_id='1',
>>>            event_service_instance_id='events_sid',
>>>            client=client) as event:
>>>     document = event.documents['plaintext']
>>>     document.text
'Some document text fetched from the server.'

New distributed object:

>>> with EventsClient(address='localhost:8080') as client,         >>>      Event(event_id='1', client=client) as event:
>>>     document = Document('plaintext', text='Some document text.')
>>>     event.add_document(document)

or

>>> with EventsClient(address='localhost:8080') as client,         >>>      Event(event_id='1', client=client) as event:
>>>     document = event.create_document('plaintext',
>>>                                      text='Some document text.')
property event: Event | None

The parent event of this document.

property document_name: str

The unique identifier for this document on the event.

property text: str

The document text.

property created_indices: List[str]

A list of all the label index names that have been created on this document using a labeler either locally or by remote pipeline components invoked on this document.

property labels: Mapping[str, LabelIndex]

A mapping from label index names to their label index.

Items will be fetched from the events service if they are not cached locally when the document has an event with a client.

get_labeler(label_index_name: str, *, distinct: bool | None = None) Labeler[GenericLabel][source]

Alias for labeler()

labeler(label_index_name: str, *, distinct: bool | None = None) Labeler[GenericLabel][source]

Creates a function that can be used to add labels to a label index.

Parameters:
  • label_index_name – An identifying name for the label index.

  • distinct – Optional, if using generic labels, whether to use distinct generic labels or non-distinct generic labels, will default to False. Distinct labels are non-overlapping and can use faster binary search indices.

Returns:

A callable when used in conjunction with the ‘with’ keyword will automatically handle uploading any added labels to the server.

Examples

>>> with document.get_labeler('sentences',
>>>                           distinct=True) as labeler:
>>>     labeler(0, 25, sentence_type='STANDARD')
>>>     sentence = labeler(26, 34)
>>>     sentence.sentence_type = 'FRAGMENT'
add_labels(label_index_name: str, labels: Sequence[L], *, distinct: bool | None = None, label_adapter: ProtoLabelAdapter | None = None)[source]

Skips using a labeler and adds the sequence of labels as a new label index.

Parameters:
  • label_index_name – The name of the label index.

  • labels – The labels to add.

  • distinct – Whether the index is distinct or non-distinct.

  • label_adapter – A label adapter to use.

Returns:

The new label index created from the labels.

class mtap.types.Labeler(document: Document, label_index_name: str, label_adapter: ProtoLabelAdapter[L])[source]

Object provided by get_labeler() which is responsible for adding labels to a label index on a document.

done()[source]

Finalizes the label index, uploads the added labels to the events service.

Normally called automatically on exit from a context manager block, but can be manually invoked if the labeler is not used in a context manager block.

Labels

class mtap.types.Label[source]

An abstract base class for a label of attributes on text.

abstract property document: Document

The parent document this label appears on.

abstract property label_index_name: str

The label index this label appears on.

abstract property identifier: int

The index of the label within its label index.

abstract property start_index: int

The index of the first character of the text covered by this label.

abstract property end_index: int

The index after the last character of the text covered by this label.

property location: Location

A tuple of (start_index, end_index) used to perform sorting and comparison first based on start_index, then based on end_index.

property text

The slice of document text covered by this label. Will retrieve from events server if it is not cached locally.

abstract shallow_fields_equal(other) bool[source]

Tests if the fields on this label and locations of references are the same as another label.

Parameters:

other – The other label to test.

Returns:

True if all the fields are equal and the references are at the same locations.

class mtap.Location(start_index: float, end_index: float)[source]

A location in text, a tuple of (start_index, end_index).

Used to perform comparison of labels based on their locations.

start_index: float

The start index inclusive of the location in text.

end_index: float

The end index exclusive of the location in text.

covers(other: Location | Label)[source]

Whether the span of text covered by this label completely overlaps the span of text covered by the other label or location.

Parameters:

other – A location or label to compare against.

Returns:

True if other is completely overlapped/covered

False otherwise.

relative_to(location: Location | Label | float) Location[source]

Creates a location relative to the same origin as location and makes it relative to location.

Parameters:

location – A location to relativize this location to.

Returns:

A copy with updated indices.

Examples

>>> sentence = Location(10, 20)
>>> token = Location(10, 15)
>>> token.relative_to(sentence)
Location(start_index=0, end_index=5)
offset_by(location: Location | Label | float) Location[source]

Creates a location by offsetting this location by an integer or the start_index of a location / label. De-relativizes this location.

Parameters:

location – A location to offset this location by.

Returns:

A copy with updated indices.

Examples

>>> sentence = Location(10, 20)
>>> token_in_sentence = Location(0, 5)
>>> token_in_sentence.offset_by(sentence)
Location(start_index=10, end_index=15)
mtap.label(start_index: int, end_index: int, *, document: Document | None = None, **kwargs) GenericLabel[source]

An alias for GenericLabel.

Parameters:
  • start_index – The index of the first character in text to be included in the label.

  • end_index – The index after the last character in text to be included in the label.

  • document – The parent document of the label. This will be automatically set if the label is created via labeler.

  • **kwargs – Arbitrary, any other fields that should be added to the label, values must be json-serializable.

class mtap.GenericLabel(start_index: int, end_index: int, *, identifier: int | None = None, document: Document | None = None, label_index_name: str | None = None, fields: dict | None = None, reference_field_ids: dict | None = None, **kwargs)[source]

Bases: Label

Default implementation of the Label class which uses a dictionary to store attributes.

Will be suitable for the majority of use cases for labels.

Parameters:
  • start_index – The index of the first character in text to be included in the label.

  • end_index – The index after the last character in text to be included in the label.

  • document – The parent document of the label. This will be automatically set if the label is created via labeler.

  • **kwargs – Arbitrary, any other fields that should be added to the label, values must be json-serializable.

Examples

>>> pos_tag = pos_tag_labeler(0, 5)
>>> pos_tag.tag = 'NNS'
>>> pos_tag.tag
'NNS'
>>> pos_tag2 = pos_tag_labeler(6, 10, tag='VB')
>>> pos_tag2.tag
'VB'

Label Indices

Label indices are normally retrieved via the labels property, but they can be created independently of documents as well.

mtap.label_index(labels: List[L], distinct: bool = False, adapter: ProtoLabelAdapter | None = None) LabelIndex[L][source]

Creates a label index from labels.

Parameters:
  • labels – Zero or more labels to create a label index from.

  • distinct – Whether the label index is distinct or not.

  • adapter – The label adapter for these labels.

Returns:

The newly created label index.

Examples

>>> from mtap import label_index, label
>>> index = label_index([label(0, 5, x=1),
...                      label(0, 10, x=2),
...                      label(5, 10, x=3),
...                      label(7, 10, x=4),
...                      label(5, 15, x=5),
...                      label(10, 15, x=6)])
>>> index
label_index([GenericLabel(0, 5, x=1), GenericLabel(0, 10, x=2),
GenericLabel(5, 10, x=3), GenericLabel(5, 15, x=5),
GenericLabel(7, 10, x=4), GenericLabel(10, 15, x=6)], distinct=False)
class mtap.types.LabelIndex[source]

An immutable Sequence of labels ordered by their location in text. By default, sorts by ascending start_index and then by ascending end_index.

abstract property distinct: bool

Whether this label index is distinct, i.e. all the labels in it are non-overlapping.

abstract filter(fn: Callable[[Label], bool]) LabelIndex[L][source]

Filters the label index according to a filter function.

This function is less efficient for filtering based on indices than inside(), covering(), etc., which use a binary search method on the sorted index.

Parameters:

fn – A filter function, returns true if the label should be included, false if it should not be included

Returns:

A view of this label index.

Return type:

LabelIndex

abstract at(location: Location | Label) LabelIndex[L][source]
abstract at(start: float, end: float) LabelIndex[L]

Returns the labels at the specified location in text.

Parameters:
  • location – A label or location.

  • start – The inclusive start index.

  • end – The inclusive end index of the location in text.

Returns:

A view of this label index.

Examples

>>> from mtap import label_index, label
>>> index = label_index([label(0, 10, x=1),
...                      label(0, 10, x=2),
...                      label(6, 20, x=3)])
>>> index.at(0, 10)
label_index([GenericLabel(0, 10, x=1), GenericLabel(0, 10, x=2)],
distinct=False)
abstract covering(location: Location | Label) LabelIndex[L][source]
abstract covering(start: float, end: float) LabelIndex[L]

A label index containing all labels that cover / contain the specified location in text.

Parameters:
  • start – The inclusive start of the location.

  • end – The inclusive end of the location.

  • location – A label or location.

Returns:

A view of this label index.

Examples

>>> from mtap import label_index, label
>>> index = label_index([label(0, 5, x=1),
...                      label(0, 10, x=2),
...                      label(5, 10, x=3),
...                      label(7, 10, x=4),
...                      label(5, 15, x=5),
...                      label(10, 15, x=6)])
>>> index.covering(5, 10)
label_index([GenericLabel(0, 10, x=2), GenericLabel(5, 10, x=3),
GenericLabel(5, 15, x=5)], distinct=False)
abstract inside(location: Location | Label) LabelIndex[L][source]
abstract inside(start: float, end: float) LabelIndex[L]

A label index containing all labels that are inside the specified location in text.

Parameters:
  • location – A label or location.

  • start – The inclusive start of the location.

  • end – The inclusive end of the location.

Returns:

A view of this label index.

Examples

>>> from mtap import label_index, label
>>> index = label_index([label(0, 5, x=1),
...                      label(0, 10, x=2),
...                      label(5, 10, x=3),
...                      label(7, 10, x=4),
...                      label(5, 15, x=5),
...                      label(10, 15, x=6)])
>>> index.inside(5, 10)
label_index([GenericLabel(5, 10, x=3), GenericLabel(7, 10, x=4)],
distinct=False)
abstract beginning_inside(location: Location | Label) LabelIndex[L][source]
abstract beginning_inside(start: float, end: float) LabelIndex[L]

A label index containing all labels whose begin index is inside the specified location in text.

Parameters:
  • location – A label or location.

  • start – The inclusive start of the location.

  • end – The inclusive end of the location.

Returns:

A view of this label index.

Examples

>>> from mtap import label_index, label
>>> index = label_index([label(0, 5, x=1),
...                      label(0, 10, x=2),
...                      label(5, 10, x=3),
...                      label(7, 10, x=4),
...                      label(5, 15, x=5),
...                      label(10, 15, x=6)])
>>> index.beginning_inside(6, 11)
label_index([GenericLabel(7, 10, x=4), GenericLabel(10, 15, x=6)],
distinct=False)
abstract overlapping(location: Location | Label) LabelIndex[L][source]
abstract overlapping(start: float, end: float) LabelIndex[L]

Returns all labels that overlap the specified location in text.

Parameters:
  • location – A label or location.

  • start – The inclusive start of the location.

  • end – The inclusive end of the location.

Returns:

A view of this label index.

Return type:

LabelIndex

Examples

>>> from mtap import label_index, label
>>> index = label_index([label(0, 5, x=1),
...                      label(0, 10, x=2),
...                      label(5, 10, x=3),
...                      label(7, 10, x=4),
...                      label(5, 15, x=5),
...                      label(10, 15, x=6)])
>>> index.overlapping(6, 10)
label_index([GenericLabel(0, 10, x=2), GenericLabel(5, 10, x=3),
GenericLabel(5, 15, x=5), GenericLabel(7, 10, x=4)],
distinct=False)
before(x: Location | Label | float) LabelIndex[L][source]

A label index containing all labels that are before a label’s location in text or an index in text.

Parameters:

x – A label or location whose start_index will be used, or a float index in text.

Returns:

A view of this label index.

Examples

>>> from mtap import label_index, label
>>> index = label_index([label(0, 5, x=1),
...                      label(0, 10, x=2),
...                      label(5, 10, x=3),
...                      label(7, 10, x=4),
...                      label(5, 15, x=5),
...                      label(10, 15, x=6)])
>>> index.before(6)
label_index([GenericLabel(0, 5, x=1)], distinct=False)
after(x: Location | Label | float) LabelIndex[L][source]

A label index containing all labels that are after a label’s location in text or an index in text.

Parameters:

x – A label or location whose end_index will be used, or a float index in text.

Returns:

A view of this label index.

Return type:

LabelIndex

Examples

>>> from mtap import label_index, label
>>> index = label_index([label(0, 5, x=1),
...                      label(0, 10, x=2),
...                      label(5, 10, x=3),
...                      label(7, 10, x=4),
...                      label(5, 15, x=5),
...                      label(10, 15, x=6)])
>>> index.after(6)
label_index([GenericLabel(7, 10, x=4), GenericLabel(10, 15, x=6)],
distinct=False)
abstract ascending() LabelIndex[L][source]

This label index sorted according to ascending start and end index.

Returns:

A view of this label index.

Return type:

LabelIndex

Examples

>>> from mtap import label_index, label
>>> index = label_index([label(0, 5, x=1),
...                      label(0, 10, x=2),
...                      label(5, 10, x=3),
...                      label(7, 10, x=4),
...                      label(5, 15, x=5),
...                      label(10, 15, x=6)])
>>> index == index.ascending()
True
abstract descending() LabelIndex[L][source]

This label index sorted according to descending start index and ascending end index.

Returns:

A view of this label index.

Return type:

LabelIndex

Examples

>>> from mtap import label_index, label
>>> index = label_index([label(0, 5, x=1),
...                      label(0, 10, x=2),
...                      label(5, 10, x=3),
...                      label(7, 10, x=4),
...                      label(5, 15, x=5),
...                      label(10, 15, x=6)])
>>> index.descending()
label_index([GenericLabel(10, 15, x=6), GenericLabel(7, 10, x=4),
GenericLabel(5, 15, x=5), GenericLabel(5, 10, x=3),
GenericLabel(0, 10, x=2), GenericLabel(0, 5, x=1)], distinct=False)

Configuration

class mtap.Config(*args)[source]

The MTAP configuration dictionary.

By default configuration is loaded from one of a number of locations in the following priority:

  • A file at the path of the ‘–config’ parameter passed into main methods.

  • A file at the path of the ‘MTAP_CONFIG’ environment variable

  • $PWD/mtapConfig.yml

  • $HOME/.mtap/mtapConfig.yml’

  • /etc/mtap/mtapConfig.yml

MTAP components will use a global shared configuration object, by entering the context of a config object using “with”, all of the MTAP functions called on that thread will make use of that config object.

Examples

>>> with mtap.Config() as config:
>>>     config['key'] = 'value'
>>>     # other MTAP methods in this
>>>     # block will use the updated config object.