mtap.processing

Processor Abstract Classes

class mtap.processing.Processor[source]

Mixin used by all processor abstract base classes that provides the ability to update serving status and use timers.

update_serving_status(status: str)[source]

Updates the serving status of the processor for health checking.

Parameters:

status (str) – One of “SERVING”, “NOT_SERVING”, “UNKNOWN”.

static started_stopwatch(key: str) Stopwatch[source]

An object that can be used to time aspects of processing. The stopwatch will be started at creation.

Parameters:

key – The key to store the time under.

Returns:

An object that is used to do the timing.

Examples

>>> # In a process method
>>> with self.started_stopwatch('key'):
>>>     # do work
>>>     ...
static unstarted_stopwatch(key: str) Stopwatch[source]

An object that can be used to time aspects of processing. The stopwatch will be stopped at creation.

Parameters:

key – The key to store the time under.

Returns:

An object that is used to do the timing.

Examples

>>> # In a process method
>>> with self.unstarted_stopwatch('key') as stopwatch:
>>>     for _ in range(10):
>>>         # work you don't want timed
>>>         ...
>>>         stopwatch.start()
>>>         # work you do want timed
>>>         ...
>>>         stopwatch.stop()
class mtap.EventProcessor[source]

Bases: Processor

Abstract base class for an event processor.

Examples

>>> class ExampleProcessor(EventProcessor):
...     def process(self, event, params):
...          # do work on the event
...          ...
property custom_label_adapters: Mapping[str, ProtoLabelAdapter]

Optional method used to provide non-standard proto label adapters for specific index names. Default implementation returns an empty dictionary.

Returns:

A mapping from strings to label adapters.

abstract process(event: Event, params: Dict[str, Any]) Dict[str, Any] | None[source]

Performs processing on an event, implemented by the subclass.

Parameters:
  • event – The event object to be processed.

  • params – Processing parameters. A dictionary of strings mapped to json-serializable values.

Returns:

An arbitrary dictionary of strings mapped to json-serializable values which will be returned to the caller, even remotely.

close()[source]

Can be overridden for cleaning up anything that needs to be cleaned up. Will be called by the framework after it’s done with the processor.

class mtap.DocumentProcessor[source]

Bases: EventProcessor

Abstract base class for a document processor.

Examples

>>> class ExampleProcessor(mtap.DocumentProcessor):
...     def process(self, document, params):
...         # do processing on document
...         ...
>>> class ExampleProcessor(mtap.DocumentProcessor):
...     def process(self, document, params):
...          with self.started_stopwatch('key'):
...               # use stopwatch on something
...               ...
abstract process_document(document: Document, params: Dict[str, Any]) Dict[str, Any] | None[source]

Performs processing of a document on an event, implemented by the subclass.

Parameters:
  • document – The document object to be processed.

  • params – Processing parameters. A dictionary of strings mapped to json-serializable values.

Returns:

An arbitrary dictionary of strings mapped to json-serializable values that will be returned to the caller of the processor.

close()[source]

Can be overridden for cleaning up anything that needs to be cleaned up. Will be called by the framework after it’s done with the processor.

Processor Utilities

class mtap.processing.Stopwatch(key: str | None = None, context: Optional = None)[source]

A class for timing runtime of components and returning the total runtime with the processor’s results.

Although it can be instantiated and used outside a processing context the normal usage would be to instantiate using Processor.started_stopwatch() or Processor.unstarted_stopwatch() methods.

duration

The amount of time elapsed for this timer.

Type:

datetime.timedelta

Examples

>>> # in an EventProcessor or DocumentProcessor process method call
>>> with self.started_stopwatch('key'):
>>>     timed_routine()
>>> # in an EventProcessor or DocumentProcessor process method call
>>> with self.unstarted_stopwatch('key') as stopwatch:
>>>     for _ in range(10):
>>>         # work you don't want timed
>>>         ...
>>>         stopwatch.start()
>>>         # work you want timed
>>>         ...
>>>         stopwatch.stop()
start()[source]

Starts the timer.

stop()[source]

Stops / pauses the timer

Processor Description Decorators

Descriptors for processor functionality.

mtap.descriptors.processor(name: str, human_name: str | None = None, description: str | None = None, parameters: List[ParameterDescriptor] | None = None, inputs: List[LabelIndexDescriptor] | None = None, outputs: List[LabelIndexDescriptor] | None = None, additional_data: Dict[str, Any] | None = None) None

Decorator which attaches a service name and metadata to a processor. Which then can be used for runtime reflection of how the processor works.

Returns:

A decorator to be applied to instances of EventProcessor or DocumentProcessor. This decorator attaches the metadata, so it can be reflected at runtime.

Examples

>>> from mtap.processing import EventProcessor
>>> @processor('example-text-converter')
>>> class TextConverter(EventProcessor):
>>>     ...

or

>>> from mtap.processing import DocumentProcessor
>>> @processor('example-sentence-detector')
>>> class SentenceDetector(DocumentProcessor):
>>>     ...

From our own example processor:

>>> from mtap.processing import DocumentProcessor
>>> @processor('mtap-example-processor-python',
>>>            human_name="Python Example Processor",
>>>            description="counts the number of times the letters a"
>>>                        "and b occur in a document",
>>>            parameters=[
>>>                parameter(
>>>                     'do_work',
>>>                     required=True,
>>>                     data_type='bool',
>>>                     description="Whether the processor should do"
>>>                                 "anything."
>>>                )
>>>            ],
>>>            outputs=[
>>>                labels('mtap.examples.letter_counts',
>>>                       properties=[label_property('letter',
>>>                                                  data_type='str'),
>>>                                   label_property('count',
>>>                                                  data_type='int')])
>>>            ])
>>> class ExampleProcessor(DocumentProcessor):
>>>     ...
mtap.descriptors.parameter(name: str, description: str | None = None, data_type: str | None = None, required: bool = False) None

Alias for ParameterDescriptor.

mtap.descriptors.labels(name: str, reference: str | None = None, name_from_parameter: str | None = None, optional: bool = False, description: str | None = None, properties: List[LabelPropertyDescriptor] | None = None) None

Alias for ParameterDescriptor

mtap.descriptors.label_property(name: str, description: str | None = None, data_type: str | None = None, nullable: bool = False) None

Alias for LabelPropertyDescriptor.

class mtap.descriptors.ProcessorDescriptor(name: str, human_name: str | None = None, description: str | None = None, parameters: List[ParameterDescriptor] | None = None, inputs: List[LabelIndexDescriptor] | None = None, outputs: List[LabelIndexDescriptor] | None = None, additional_data: Dict[str, Any] | None = None)[source]

Decorator which attaches a service name and metadata to a processor. Which then can be used for runtime reflection of how the processor works.

Returns:

A decorator to be applied to instances of EventProcessor or DocumentProcessor. This decorator attaches the metadata, so it can be reflected at runtime.

Examples

>>> from mtap.processing import EventProcessor
>>> @processor('example-text-converter')
>>> class TextConverter(EventProcessor):
>>>     ...

or

>>> from mtap.processing import DocumentProcessor
>>> @processor('example-sentence-detector')
>>> class SentenceDetector(DocumentProcessor):
>>>     ...

From our own example processor:

>>> from mtap.processing import DocumentProcessor
>>> @processor('mtap-example-processor-python',
>>>            human_name="Python Example Processor",
>>>            description="counts the number of times the letters a"
>>>                        "and b occur in a document",
>>>            parameters=[
>>>                parameter(
>>>                     'do_work',
>>>                     required=True,
>>>                     data_type='bool',
>>>                     description="Whether the processor should do"
>>>                                 "anything."
>>>                )
>>>            ],
>>>            outputs=[
>>>                labels('mtap.examples.letter_counts',
>>>                       properties=[label_property('letter',
>>>                                                  data_type='str'),
>>>                                   label_property('count',
>>>                                                  data_type='int')])
>>>            ])
>>> class ExampleProcessor(DocumentProcessor):
>>>     ...
name: str

Identifying service name both for launching via command line and for service registration.

Should be a mix of alphanumeric characters and dashes so that it plays nice with the DNS name requirements of service discovery tools like Consul.

human_name: str | None = None

An optional human name for the processor.

description: str | None = None

A short description of the processor and what it does.

parameters: List[ParameterDescriptor] | None = None

The processor’s parameters.

inputs: List[LabelIndexDescriptor] | None = None

String identifiers for the label output from a previously-run processor that this processor requires as an input.

Takes the format "[processor-name]/[output]". Examples would be "tagger/pos_tags" or "sentence-detector/sentences".

outputs: List[LabelIndexDescriptor] | None = None

The label indices this processor outputs.

additional_data: Dict[str, Any] | None = None

Any other data that should be added to the processor’s metadata, should be serializable to yaml and json.

class mtap.descriptors.ParameterDescriptor(name: str, description: str | None = None, data_type: str | None = None, required: bool = False)[source]

A description of one of the processor’s parameters.

name: str

The parameter name / key.

description: str | None = None

A short description of the property and what it does.

data_type: str | None = None

The data type of the parameter. str, float, or bool; List[T] or Mapping[T1, T2] of those.

required: bool = False

Whether the processor parameter is required.

class mtap.descriptors.LabelIndexDescriptor(name: str, reference: str | None = None, name_from_parameter: str | None = None, optional: bool = False, description: str | None = None, properties: List[LabelPropertyDescriptor] | None = None)[source]

A description for a label type.

name: str

The label index name.

reference: str | None = None

If this is an output of another processor, that processor’s name followed by a slash and the default output name of the index go here. Example: “sentence-detector/sentences”.

name_from_parameter: str | None = None

If the label index gets its name from a parameter of the processor, specify that name here.

optional: bool = False

Whether this label index is an optional input or output.

description: str | None = None

A short description of the label index.

properties: List[LabelPropertyDescriptor] | None = None

The properties of the labels in the label index.

class mtap.descriptors.LabelPropertyDescriptor(name: str, description: str | None = None, data_type: str | None = None, nullable: bool = False)[source]

Creates a description for a property on a label.

name: str

The property’s name.

description: str | None = None

A short description of the property.

data_type: str | None = None

The data type of the property. Options are "str", "float", or "bool"; "List[T]" or "Mapping[str, T]" where T is one of those types.

nullable: bool = False

Whether the property can have a valid value of null.

Running Services

mtap.processor_parser() ArgumentParser[source]

An ArgumentParser that can be used to parse the settings for run_processor().

Returns:

A parser containing server settings.

Examples

Using this as a parent parser:

>>> parser = ArgumentParser(parents=[processor_parser()])
>>> parser.add_argument('--my-arg-1')
>>> parser.add_argument('--my-arg-2')
>>> args = parser.parse_args()
>>> processor = MyProcessor(args.my_arg_1, args.my_arg_2)
>>> run_processor(processor, args)
mtap.run_processor(proc: EventProcessor, *, options: Namespace | None = None, args: Sequence[str] | None = None, mp_context=None)[source]

Runs the processor as a GRPC service, blocking until an interrupt signal is received.

Parameters:
  • proc – The processor to host.

  • mp – If true, will create instances of proc on multiple forked processes to process events. This is useful if the processor is computationally intensive and would run into Python GIL issues on a single process.

  • options – The parsed arguments from the parser returned by processor_parser().

  • args – Arguments to parse server settings from if namespace was not supplied.

  • mp_context – A multiprocessing context that gets passed to the process pool executor in the case of mp = True.

Examples

Will automatically parse arguments:

>>> run_processor(MyProcessor())

Manual arguments:

>>> run_processor(MyProcessor(), args=['-p', '8080'])