mtap.processing
Processor Abstract Classes
- class mtap.processing.Processor[source]
Mixin used by all processor abstract base classes that provides the ability to update serving status and use timers.
- update_serving_status(status: str)[source]
Updates the serving status of the processor for health checking.
- Parameters:
status (str) – One of “SERVING”, “NOT_SERVING”, “UNKNOWN”.
- static started_stopwatch(key: str) Stopwatch [source]
An object that can be used to time aspects of processing. The stopwatch will be started at creation.
- Parameters:
key – The key to store the time under.
- Returns:
An object that is used to do the timing.
Examples
>>> # In a process method >>> with self.started_stopwatch('key'): >>> # do work >>> ...
- static unstarted_stopwatch(key: str) Stopwatch [source]
An object that can be used to time aspects of processing. The stopwatch will be stopped at creation.
- Parameters:
key – The key to store the time under.
- Returns:
An object that is used to do the timing.
Examples
>>> # In a process method >>> with self.unstarted_stopwatch('key') as stopwatch: >>> for _ in range(10): >>> # work you don't want timed >>> ... >>> stopwatch.start() >>> # work you do want timed >>> ... >>> stopwatch.stop()
- class mtap.EventProcessor[source]
Bases:
Processor
Abstract base class for an event processor.
Examples
>>> class ExampleProcessor(EventProcessor): ... def process(self, event, params): ... # do work on the event ... ...
- property custom_label_adapters: Mapping[str, ProtoLabelAdapter]
Optional method used to provide non-standard proto label adapters for specific index names. Default implementation returns an empty dictionary.
- Returns:
A mapping from strings to label adapters.
- abstract process(event: Event, params: Dict[str, Any]) Dict[str, Any] | None [source]
Performs processing on an event, implemented by the subclass.
- Parameters:
event – The event object to be processed.
params – Processing parameters. A dictionary of strings mapped to json-serializable values.
- Returns:
An arbitrary dictionary of strings mapped to json-serializable values which will be returned to the caller, even remotely.
- class mtap.DocumentProcessor[source]
Bases:
EventProcessor
Abstract base class for a document processor.
Examples
>>> class ExampleProcessor(mtap.DocumentProcessor): ... def process(self, document, params): ... # do processing on document ... ...
>>> class ExampleProcessor(mtap.DocumentProcessor): ... def process(self, document, params): ... with self.started_stopwatch('key'): ... # use stopwatch on something ... ...
- abstract process_document(document: Document, params: Dict[str, Any]) Dict[str, Any] | None [source]
Performs processing of a document on an event, implemented by the subclass.
- Parameters:
document – The document object to be processed.
params – Processing parameters. A dictionary of strings mapped to json-serializable values.
- Returns:
An arbitrary dictionary of strings mapped to json-serializable values that will be returned to the caller of the processor.
Processor Utilities
- class mtap.processing.Stopwatch(key: str | None = None, context: Optional = None)[source]
A class for timing runtime of components and returning the total runtime with the processor’s results.
Although it can be instantiated and used outside a processing context the normal usage would be to instantiate using
Processor.started_stopwatch()
orProcessor.unstarted_stopwatch()
methods.- duration
The amount of time elapsed for this timer.
- Type:
Examples
>>> # in an EventProcessor or DocumentProcessor process method call >>> with self.started_stopwatch('key'): >>> timed_routine()
>>> # in an EventProcessor or DocumentProcessor process method call >>> with self.unstarted_stopwatch('key') as stopwatch: >>> for _ in range(10): >>> # work you don't want timed >>> ... >>> stopwatch.start() >>> # work you want timed >>> ... >>> stopwatch.stop()
Processor Description Decorators
Descriptors for processor functionality.
- mtap.descriptors.processor(name: str, human_name: str | None = None, description: str | None = None, parameters: List[ParameterDescriptor] | None = None, inputs: List[LabelIndexDescriptor] | None = None, outputs: List[LabelIndexDescriptor] | None = None, additional_data: Dict[str, Any] | None = None) None
Decorator which attaches a service name and metadata to a processor. Which then can be used for runtime reflection of how the processor works.
- Returns:
A decorator to be applied to instances of EventProcessor or DocumentProcessor. This decorator attaches the metadata, so it can be reflected at runtime.
Examples
>>> from mtap.processing import EventProcessor >>> @processor('example-text-converter') >>> class TextConverter(EventProcessor): >>> ...
or
>>> from mtap.processing import DocumentProcessor >>> @processor('example-sentence-detector') >>> class SentenceDetector(DocumentProcessor): >>> ...
From our own example processor:
>>> from mtap.processing import DocumentProcessor >>> @processor('mtap-example-processor-python', >>> human_name="Python Example Processor", >>> description="counts the number of times the letters a" >>> "and b occur in a document", >>> parameters=[ >>> parameter( >>> 'do_work', >>> required=True, >>> data_type='bool', >>> description="Whether the processor should do" >>> "anything." >>> ) >>> ], >>> outputs=[ >>> labels('mtap.examples.letter_counts', >>> properties=[label_property('letter', >>> data_type='str'), >>> label_property('count', >>> data_type='int')]) >>> ]) >>> class ExampleProcessor(DocumentProcessor): >>> ...
- mtap.descriptors.parameter(name: str, description: str | None = None, data_type: str | None = None, required: bool = False) None
Alias for
ParameterDescriptor
.
- mtap.descriptors.labels(name: str, reference: str | None = None, name_from_parameter: str | None = None, optional: bool = False, description: str | None = None, properties: List[LabelPropertyDescriptor] | None = None) None
Alias for
ParameterDescriptor
- mtap.descriptors.label_property(name: str, description: str | None = None, data_type: str | None = None, nullable: bool = False) None
Alias for
LabelPropertyDescriptor
.
- class mtap.descriptors.ProcessorDescriptor(name: str, human_name: str | None = None, description: str | None = None, parameters: List[ParameterDescriptor] | None = None, inputs: List[LabelIndexDescriptor] | None = None, outputs: List[LabelIndexDescriptor] | None = None, additional_data: Dict[str, Any] | None = None)[source]
Decorator which attaches a service name and metadata to a processor. Which then can be used for runtime reflection of how the processor works.
- Returns:
A decorator to be applied to instances of EventProcessor or DocumentProcessor. This decorator attaches the metadata, so it can be reflected at runtime.
Examples
>>> from mtap.processing import EventProcessor >>> @processor('example-text-converter') >>> class TextConverter(EventProcessor): >>> ...
or
>>> from mtap.processing import DocumentProcessor >>> @processor('example-sentence-detector') >>> class SentenceDetector(DocumentProcessor): >>> ...
From our own example processor:
>>> from mtap.processing import DocumentProcessor >>> @processor('mtap-example-processor-python', >>> human_name="Python Example Processor", >>> description="counts the number of times the letters a" >>> "and b occur in a document", >>> parameters=[ >>> parameter( >>> 'do_work', >>> required=True, >>> data_type='bool', >>> description="Whether the processor should do" >>> "anything." >>> ) >>> ], >>> outputs=[ >>> labels('mtap.examples.letter_counts', >>> properties=[label_property('letter', >>> data_type='str'), >>> label_property('count', >>> data_type='int')]) >>> ]) >>> class ExampleProcessor(DocumentProcessor): >>> ...
- name: str
Identifying service name both for launching via command line and for service registration.
Should be a mix of alphanumeric characters and dashes so that it plays nice with the DNS name requirements of service discovery tools like Consul.
- parameters: List[ParameterDescriptor] | None = None
The processor’s parameters.
- inputs: List[LabelIndexDescriptor] | None = None
String identifiers for the label output from a previously-run processor that this processor requires as an input.
Takes the format
"[processor-name]/[output]"
. Examples would be"tagger/pos_tags"
or"sentence-detector/sentences"
.
- outputs: List[LabelIndexDescriptor] | None = None
The label indices this processor outputs.
- class mtap.descriptors.ParameterDescriptor(name: str, description: str | None = None, data_type: str | None = None, required: bool = False)[source]
A description of one of the processor’s parameters.
- class mtap.descriptors.LabelIndexDescriptor(name: str, reference: str | None = None, name_from_parameter: str | None = None, optional: bool = False, description: str | None = None, properties: List[LabelPropertyDescriptor] | None = None)[source]
A description for a label type.
- reference: str | None = None
If this is an output of another processor, that processor’s name followed by a slash and the default output name of the index go here. Example: “sentence-detector/sentences”.
- name_from_parameter: str | None = None
If the label index gets its name from a parameter of the processor, specify that name here.
- properties: List[LabelPropertyDescriptor] | None = None
The properties of the labels in the label index.
- class mtap.descriptors.LabelPropertyDescriptor(name: str, description: str | None = None, data_type: str | None = None, nullable: bool = False)[source]
Creates a description for a property on a label.
Running Services
- mtap.processor_parser() ArgumentParser [source]
An
ArgumentParser
that can be used to parse the settings forrun_processor()
.- Returns:
A parser containing server settings.
Examples
Using this as a parent parser:
>>> parser = ArgumentParser(parents=[processor_parser()]) >>> parser.add_argument('--my-arg-1') >>> parser.add_argument('--my-arg-2') >>> args = parser.parse_args() >>> processor = MyProcessor(args.my_arg_1, args.my_arg_2) >>> run_processor(processor, args)
- mtap.run_processor(proc: EventProcessor, *, options: Namespace | None = None, args: Sequence[str] | None = None, mp_context=None)[source]
Runs the processor as a GRPC service, blocking until an interrupt signal is received.
- Parameters:
proc – The processor to host.
mp – If true, will create instances of
proc
on multiple forked processes to process events. This is useful if the processor is computationally intensive and would run into Python GIL issues on a single process.options – The parsed arguments from the parser returned by
processor_parser()
.args – Arguments to parse server settings from if
namespace
was not supplied.mp_context – A multiprocessing context that gets passed to the process pool executor in the case of mp = True.
Examples
Will automatically parse arguments:
>>> run_processor(MyProcessor())
Manual arguments:
>>> run_processor(MyProcessor(), args=['-p', '8080'])