mtap
Running Events Service
usage:
python -m mtap events [-h] [--address ADDRESS] [--port PORT]
[--workers WORKERS] [--register] [--config CONFIG]
optional arguments:
-h, --help show this help message and exit
--address ADDRESS, -a ADDRESS
the address to serve the service on
--port PORT, -p PORT the port to serve the service on
--workers WORKERS, -w WORKERS
number of worker threads to handle requests
--register, -r whether to register the service with the configured
service discovery
--config CONFIG, -c CONFIG
path to config file
Events service client, documents
- mtap.events_client(address: str) EventsClient [source]
- mtap.events_client(addresses: Iterable[str]) EventsClient
- mtap.events_client() EventsClient
- class mtap.types.EventsClient[source]
Bases:
ContextManager
[EventsClient
],ABC
Communicates with the events service.
- class mtap.Event(event_id: str | None = None, *, client: EventsClient | None = None, only_create_new: bool = False, label_adapters: Mapping[str, ProtoLabelAdapter] | None = None, event_service_instance_id: str | None = None, lease: bool = True)[source]
An object for interacting with a specific event locally or on the events service.
The Event object functions as a map from string document names to
Document
objects that can be used to access document data from the events server.To connect to the events service and load an existing event, all of
event_id
,event_service_instance_id
, andclient
must be specified.- Parameters:
event_id – A globally-unique identifier for the event, or omit / none for a random UUID.
client – If specified, connects to the events service with id
event_service_instance_id
and accesses either accesses the existing event with theevent_id
or creates a new one.only_create_new – Fails if the event already exists on the events service.
- label_adapters
A mapping of string label index names to
ProtoLabelAdapter
instances to perform custom mapping of label types.- Type:
Mapping[str, ProtoLabelAdapter]
Examples
Creating a new event locally.
>>> event = Event()
Creating a new event remotely.
>>> with EventsClient(address='localohost:50000') as c, >>> Event(event_id='id', client=c) as event: >>> # use event >>> ...
Connecting to an existing event.
>>> with EventsClient(address='localohost:50000') as c, >>> Event(event_id='id', >>> event_service_instance_id='events_sid', >>> client=c) as event: >>> >>>
- property event_service_instance_id: str
The unique instance identifier for this event’s paired event service.
- property documents: Mapping[str, Document]
A mutable mapping of strings to
Document
objects that can be used to query and add documents to the event.
- property metadata: MutableMapping[str, str]
A mutable mapping of strings to strings that can be used to query and add metadata to the event.
- property binaries: MutableMapping[str, bytes]
A mutable mapping of strings to bytes that can be used to query and add binary data to the event.
- property created_indices: Dict[str, List[str]]
A mapping of document names to a list of the names of all the label indices that have been added to that document
- close()[source]
Closes this event. Lets the event service know that we are done with the event, allowing to clean up the event if no other clients have open leases to it.
- create_document(document_name: str, text: str) Document [source]
Adds a document to the event keyed by document_name and containing the specified text.
- Parameters:
document_name – The event-unique identifier for the document, example:
'plaintext'
.text – The content of the document. This is a required field, and the document text is final and immutable.
- Returns:
The added document.
Examples
>>> event = Event() >>> doc = event.create_document('plaintext', >>> text="The text of the document.")
- add_document(document: Document)[source]
Adds the document to this event, first uploading to events service if this event has a client connection to the events service.
- Parameters:
document – The document to add to this event.
Examples
>>> event = Event() >>> doc = Document('plaintext', >>> text="The text of the document.") >>> event.add_document(doc)
- class mtap.Document(document_name: str, *, text: str, label_adapters: LabelAdapters | None = None)[source]
- class mtap.Document(document_name: str, *, event: Event, label_adapters: LabelAdapters | None = None)
- class mtap.Document(document_name: str, *, text: str, event: Event, label_adapters: LabelAdapters | None = None)
An object for interacting with text and labels stored on an
Event
.Documents are keyed by their name, and pipelines can store different pieces of related text on a single processing event using multiple documents. An example would be storing the text of one language on one document, and a translation on another, or storing the rtf or html encoding on one document (or as a binary in
Event.binaries()
), and the parsed plaintext on another document.Both the document text and any added label indices are immutable. This is to enable parallelization and distribution of processing, and because other label indices might be downstream dependents on the earlier created labels.
- Parameters:
document_name – The document name identifier.
text – The document text, can be omitted if this is an existing document and text needs to be retrieved from the events service.
event – The parent event of this document. If the event has a client, then that client will be used to share changes to this document with all other clients of the Events service. In that case, text should only be specified if it is the known existing text of the document.
Examples
Local document:
>>> document = Document('plaintext', text='Some document text.')
Existing distributed object:
>>> with EventsClient(address='localhost:8080') as client, >>> Event(event_id='1', >>> event_service_instance_id='events_sid', >>> client=client) as event: >>> document = event.documents['plaintext'] >>> document.text 'Some document text fetched from the server.'
New distributed object:
>>> with EventsClient(address='localhost:8080') as client, >>> Event(event_id='1', client=client) as event: >>> document = Document('plaintext', text='Some document text.') >>> event.add_document(document)
or
>>> with EventsClient(address='localhost:8080') as client, >>> Event(event_id='1', client=client) as event: >>> document = event.create_document('plaintext', >>> text='Some document text.')
- property created_indices: List[str]
A list of all the label index names that have been created on this document using a labeler either locally or by remote pipeline components invoked on this document.
- property labels: Mapping[str, LabelIndex]
A mapping from label index names to their label index.
Items will be fetched from the events service if they are not cached locally when the document has an event with a client.
- get_labeler(label_index_name: str, *, distinct: bool | None = None) Labeler[GenericLabel] [source]
Alias for
labeler()
- labeler(label_index_name: str, *, distinct: bool | None = None) Labeler[GenericLabel] [source]
Creates a function that can be used to add labels to a label index.
- Parameters:
label_index_name – An identifying name for the label index.
distinct – Optional, if using generic labels, whether to use distinct generic labels or non-distinct generic labels, will default to False. Distinct labels are non-overlapping and can use faster binary search indices.
- Returns:
A callable when used in conjunction with the ‘with’ keyword will automatically handle uploading any added labels to the server.
Examples
>>> with document.get_labeler('sentences', >>> distinct=True) as labeler: >>> labeler(0, 25, sentence_type='STANDARD') >>> sentence = labeler(26, 34) >>> sentence.sentence_type = 'FRAGMENT'
- add_labels(label_index_name: str, labels: Sequence[L], *, distinct: bool | None = None, label_adapter: ProtoLabelAdapter | None = None)[source]
Skips using a labeler and adds the sequence of labels as a new label index.
- Parameters:
label_index_name – The name of the label index.
labels – The labels to add.
distinct – Whether the index is distinct or non-distinct.
label_adapter – A label adapter to use.
- Returns:
The new label index created from the labels.
Labels
- class mtap.types.Label[source]
An abstract base class for a label of attributes on text.
- abstract property start_index: int
The index of the first character of the text covered by this label.
- abstract property end_index: int
The index after the last character of the text covered by this label.
- property location: Location
A tuple of (start_index, end_index) used to perform sorting and comparison first based on start_index, then based on end_index.
- property text
The slice of document text covered by this label. Will retrieve from events server if it is not cached locally.
- class mtap.Location(start_index: float, end_index: float)[source]
A location in text, a tuple of (start_index, end_index).
Used to perform comparison of labels based on their locations.
- covers(other: Location | Label)[source]
Whether the span of text covered by this label completely overlaps the span of text covered by the
other
label or location.- Parameters:
other – A location or label to compare against.
- Returns:
True
if other is completely overlapped/coveredFalse
otherwise.
- relative_to(location: Location | Label | float) Location [source]
Creates a location relative to the same origin as
location
and makes it relative tolocation
.- Parameters:
location – A location to relativize this location to.
- Returns:
A copy with updated indices.
Examples
>>> sentence = Location(10, 20) >>> token = Location(10, 15) >>> token.relative_to(sentence) Location(start_index=0, end_index=5)
- offset_by(location: Location | Label | float) Location [source]
Creates a location by offsetting this location by an integer or the
start_index
of a location / label. De-relativizes this location.- Parameters:
location – A location to offset this location by.
- Returns:
A copy with updated indices.
Examples
>>> sentence = Location(10, 20) >>> token_in_sentence = Location(0, 5) >>> token_in_sentence.offset_by(sentence) Location(start_index=10, end_index=15)
- mtap.label(start_index: int, end_index: int, *, document: Document | None = None, **kwargs) GenericLabel [source]
An alias for
GenericLabel
.- Parameters:
start_index – The index of the first character in text to be included in the label.
end_index – The index after the last character in text to be included in the label.
document – The parent document of the label. This will be automatically set if the label is created via labeler.
**kwargs – Arbitrary, any other fields that should be added to the label, values must be json-serializable.
- class mtap.GenericLabel(start_index: int, end_index: int, *, identifier: int | None = None, document: Document | None = None, label_index_name: str | None = None, fields: dict | None = None, reference_field_ids: dict | None = None, **kwargs)[source]
Bases:
Label
Default implementation of the Label class which uses a dictionary to store attributes.
Will be suitable for the majority of use cases for labels.
- Parameters:
start_index – The index of the first character in text to be included in the label.
end_index – The index after the last character in text to be included in the label.
document – The parent document of the label. This will be automatically set if the label is created via labeler.
**kwargs – Arbitrary, any other fields that should be added to the label, values must be json-serializable.
Examples
>>> pos_tag = pos_tag_labeler(0, 5) >>> pos_tag.tag = 'NNS' >>> pos_tag.tag 'NNS'
>>> pos_tag2 = pos_tag_labeler(6, 10, tag='VB') >>> pos_tag2.tag 'VB'
Label Indices
Label indices are normally retrieved via the labels
property, but
they can be created independently of documents as well.
- mtap.label_index(labels: List[L], distinct: bool = False, adapter: ProtoLabelAdapter | None = None) LabelIndex[L] [source]
Creates a label index from labels.
- Parameters:
labels – Zero or more labels to create a label index from.
distinct – Whether the label index is distinct or not.
adapter – The label adapter for these labels.
- Returns:
The newly created label index.
Examples
>>> from mtap import label_index, label >>> index = label_index([label(0, 5, x=1), ... label(0, 10, x=2), ... label(5, 10, x=3), ... label(7, 10, x=4), ... label(5, 15, x=5), ... label(10, 15, x=6)]) >>> index label_index([GenericLabel(0, 5, x=1), GenericLabel(0, 10, x=2), GenericLabel(5, 10, x=3), GenericLabel(5, 15, x=5), GenericLabel(7, 10, x=4), GenericLabel(10, 15, x=6)], distinct=False)
- class mtap.types.LabelIndex[source]
An immutable
Sequence
of labels ordered by their location in text. By default, sorts by ascending start_index and then by ascending end_index.- abstract property distinct: bool
Whether this label index is distinct, i.e. all the labels in it are non-overlapping.
- abstract filter(fn: Callable[[Label], bool]) LabelIndex[L] [source]
Filters the label index according to a filter function.
This function is less efficient for filtering based on indices than
inside()
,covering()
, etc., which use a binary search method on the sorted index.- Parameters:
fn – A filter function, returns
true
if the label should be included,false
if it should not be included- Returns:
A view of this label index.
- Return type:
- abstract at(location: Location | Label) LabelIndex[L] [source]
- abstract at(start: float, end: float) LabelIndex[L]
Returns the labels at the specified location in text.
- Parameters:
location – A label or location.
start – The inclusive start index.
end – The inclusive end index of the location in text.
- Returns:
A view of this label index.
Examples
>>> from mtap import label_index, label >>> index = label_index([label(0, 10, x=1), ... label(0, 10, x=2), ... label(6, 20, x=3)]) >>> index.at(0, 10) label_index([GenericLabel(0, 10, x=1), GenericLabel(0, 10, x=2)], distinct=False)
- abstract covering(location: Location | Label) LabelIndex[L] [source]
- abstract covering(start: float, end: float) LabelIndex[L]
A label index containing all labels that cover / contain the specified location in text.
- Parameters:
start – The inclusive start of the location.
end – The inclusive end of the location.
location – A label or location.
- Returns:
A view of this label index.
Examples
>>> from mtap import label_index, label >>> index = label_index([label(0, 5, x=1), ... label(0, 10, x=2), ... label(5, 10, x=3), ... label(7, 10, x=4), ... label(5, 15, x=5), ... label(10, 15, x=6)]) >>> index.covering(5, 10) label_index([GenericLabel(0, 10, x=2), GenericLabel(5, 10, x=3), GenericLabel(5, 15, x=5)], distinct=False)
- abstract inside(location: Location | Label) LabelIndex[L] [source]
- abstract inside(start: float, end: float) LabelIndex[L]
A label index containing all labels that are inside the specified location in text.
- Parameters:
location – A label or location.
start – The inclusive start of the location.
end – The inclusive end of the location.
- Returns:
A view of this label index.
Examples
>>> from mtap import label_index, label >>> index = label_index([label(0, 5, x=1), ... label(0, 10, x=2), ... label(5, 10, x=3), ... label(7, 10, x=4), ... label(5, 15, x=5), ... label(10, 15, x=6)]) >>> index.inside(5, 10) label_index([GenericLabel(5, 10, x=3), GenericLabel(7, 10, x=4)], distinct=False)
- abstract beginning_inside(location: Location | Label) LabelIndex[L] [source]
- abstract beginning_inside(start: float, end: float) LabelIndex[L]
A label index containing all labels whose begin index is inside the specified location in text.
- Parameters:
location – A label or location.
start – The inclusive start of the location.
end – The inclusive end of the location.
- Returns:
A view of this label index.
Examples
>>> from mtap import label_index, label >>> index = label_index([label(0, 5, x=1), ... label(0, 10, x=2), ... label(5, 10, x=3), ... label(7, 10, x=4), ... label(5, 15, x=5), ... label(10, 15, x=6)]) >>> index.beginning_inside(6, 11) label_index([GenericLabel(7, 10, x=4), GenericLabel(10, 15, x=6)], distinct=False)
- abstract overlapping(location: Location | Label) LabelIndex[L] [source]
- abstract overlapping(start: float, end: float) LabelIndex[L]
Returns all labels that overlap the specified location in text.
- Parameters:
location – A label or location.
start – The inclusive start of the location.
end – The inclusive end of the location.
- Returns:
A view of this label index.
- Return type:
Examples
>>> from mtap import label_index, label >>> index = label_index([label(0, 5, x=1), ... label(0, 10, x=2), ... label(5, 10, x=3), ... label(7, 10, x=4), ... label(5, 15, x=5), ... label(10, 15, x=6)]) >>> index.overlapping(6, 10) label_index([GenericLabel(0, 10, x=2), GenericLabel(5, 10, x=3), GenericLabel(5, 15, x=5), GenericLabel(7, 10, x=4)], distinct=False)
- before(x: Location | Label | float) LabelIndex[L] [source]
A label index containing all labels that are before a label’s location in text or an index in text.
- Parameters:
x – A label or location whose start_index will be used, or a float index in text.
- Returns:
A view of this label index.
Examples
>>> from mtap import label_index, label >>> index = label_index([label(0, 5, x=1), ... label(0, 10, x=2), ... label(5, 10, x=3), ... label(7, 10, x=4), ... label(5, 15, x=5), ... label(10, 15, x=6)]) >>> index.before(6) label_index([GenericLabel(0, 5, x=1)], distinct=False)
- after(x: Location | Label | float) LabelIndex[L] [source]
A label index containing all labels that are after a label’s location in text or an index in text.
- Parameters:
x – A label or location whose end_index will be used, or a float index in text.
- Returns:
A view of this label index.
- Return type:
Examples
>>> from mtap import label_index, label >>> index = label_index([label(0, 5, x=1), ... label(0, 10, x=2), ... label(5, 10, x=3), ... label(7, 10, x=4), ... label(5, 15, x=5), ... label(10, 15, x=6)]) >>> index.after(6) label_index([GenericLabel(7, 10, x=4), GenericLabel(10, 15, x=6)], distinct=False)
- abstract ascending() LabelIndex[L] [source]
This label index sorted according to ascending start and end index.
- Returns:
A view of this label index.
- Return type:
Examples
>>> from mtap import label_index, label >>> index = label_index([label(0, 5, x=1), ... label(0, 10, x=2), ... label(5, 10, x=3), ... label(7, 10, x=4), ... label(5, 15, x=5), ... label(10, 15, x=6)]) >>> index == index.ascending() True
- abstract descending() LabelIndex[L] [source]
This label index sorted according to descending start index and ascending end index.
- Returns:
A view of this label index.
- Return type:
Examples
>>> from mtap import label_index, label >>> index = label_index([label(0, 5, x=1), ... label(0, 10, x=2), ... label(5, 10, x=3), ... label(7, 10, x=4), ... label(5, 15, x=5), ... label(10, 15, x=6)]) >>> index.descending() label_index([GenericLabel(10, 15, x=6), GenericLabel(7, 10, x=4), GenericLabel(5, 15, x=5), GenericLabel(5, 10, x=3), GenericLabel(0, 10, x=2), GenericLabel(0, 5, x=1)], distinct=False)
Configuration
- class mtap.Config(*args)[source]
The MTAP configuration dictionary.
By default configuration is loaded from one of a number of locations in the following priority:
A file at the path of the ‘–config’ parameter passed into main methods.
A file at the path of the ‘MTAP_CONFIG’ environment variable
$PWD/mtapConfig.yml
$HOME/.mtap/mtapConfig.yml’
/etc/mtap/mtapConfig.yml
MTAP components will use a global shared configuration object, by entering the context of a config object using “with”, all of the MTAP functions called on that thread will make use of that config object.
Examples
>>> with mtap.Config() as config: >>> config['key'] = 'value' >>> # other MTAP methods in this >>> # block will use the updated config object.