Configuring the PESTO packaging of your algorithm
Once you initialized the new PESTO project, a number of files has to be edited to describe the algorithm's input, output and requirements.
List of configuration files
| File | Description |
|---|---|
algorithm/process.py |
The python file containing your algorithm with the Process.process() function |
algorithm/input_output.py |
The python file containing the input and output dataclasses |
pesto/api/config.json |
A generic configuration file at the disposal of process.py for its own configuration |
pesto/api/config_schema.json |
The schema of is a json schema file that specifies what config.json should look like |
pesto/api/description.json |
Description your processing algorithm |
pesto/api/input_schema.json |
Specifications of the algorithm's input format |
pesto/api/output_schema.json |
Specifications of the algorithm's output format |
pesto/api/version.json |
Algorithm version description |
To package your algorithm with PESTO, you'll need to :
-
Provide the implementation of your algorithm within the
process()function inalgorithm/process.py -
Specify the API of your algorithm, in other words its input and output formats in
pesto/api/input_schema.jsonandpesto/api/output_schema.json. -
Describe and configure the dependencies in
pesto/api/description.jsonandpesto/build/requirements.json
Tip
Always start from the pesto-template as it is already a working PESTO project.
One of the main points of attention is to align schemas of input_schema.json and output_schema.json with the signature of the process function. Defining the input/output schemas can be done in two different ways:
-
either the
process()function takes anInputobject and returns anOutputobject specified ininput_ouput.py: in that case,pesto schemagencan generate the schemas for you. This is the easiest and recommended way. -
or the
process()function takes a set of parameters of your choice and returns an object : in that cas you have to specify by yourself theinput_schema.jsonandoutput_schema.jsoncontents. This is recommended for complex input/output structures that require a json schema that can not be inferred bypesto schemagen.
Python algorithm
Look at algorithm/process.py. This is the module that will be loaded by PESTO inside our server and which will be called during preprocessing.
There is a Process class with on_start() and process() methods.
Process.on_start()
The algorithm/process.py should contains a Process class with the Process.on_start() method.
The on_start() method will be called on the first processing request. It is used to load resources such as Machine Learning models that are then called in the Process.process() method.
Process.process()
The algorithm/process.py should contains a Process class with the Process.process() method.
The process() method is called during call to /api/v1/process, when we want to actually process input data
Warning
Depending on the Process.process() function signature, you will be able to automatically generate or not the input/output schemas with pesto schemagen.
If you can encapsulate the input parameters in the algorithm.input_output.Input dataclass and the returned objects in the algorithm.input_output.Output dataclass, then you can benefit from the schema generation. Simply edit the algorithm/input_output.py file to specify the input parameters and the output structure. The signature of the algorithm must be
process(input: Input) -> Output
If your algorithm Process.process() function takes a list of parameters or returns an object that is not an Output, then the signature is not compatible with pesto schemagen : you will have to implement the schemas.
process.py schemagen compatible, or not
| process.py | |
|---|---|
1 2 3 4 | |
| input_output.py | |
|---|---|
1 2 3 4 5 6 7 8 9 10 | |
| process.py | |
|---|---|
1 2 3 4 5 6 7 8 | |
Warning
In this case, you also have to define manually the input/output schemas. This is detailed in the next section.
Info
Images are converted to/from numpy arrays by PESTO. Thus, the Process.process() function should expect to receive numpy arrays and always return images as numpy arrays.
Input / Output specification
To run, PESTO needs the input and output schema files:
-
pesto/api/input_schema.json -
pesto/api/output_schema.json
With schemagen
If the Process.process() function takes an Input and returns an Output, then you can generate the input_schema.json and output_schema.json:
pesto schemagen --force algo-service/
Success
[2022-12-21 13:50:33,830] 82809-INFO schemagen::__class2schema():l59:
Using the geojson user defined definition from PestoFiles.user_definitions_schema
[2022-12-21 13:50:33,830] 82809-INFO schemagen::__generate():l32:
The Input schema is now in algo-service/pesto/api/input_schema.json
[2022-12-21 13:50:33,831] 82809-INFO schemagen::__class2schema():l59:
Using the geojson user defined definition from PestoFiles.user_definitions_schema
[2022-12-21 13:50:33,832] 82809-INFO schemagen::__generate():l32:
The Output schema is now in algo-service/pesto/api/output_schema.json
See the documentation of pesto schemagen to see all the supported types.
Manually
If you do/can not use pesto schemagen, then you have to define the processing input and output.
The REST API use json to communicate with external services or users.
We then use JSON schema to validate input payloads.
pesto/api/input_schema.json : specify the input validation schema
Example: input_schema.json
{
"image": {
"$ref": "#/definitions/Image",
"description": "Input image"
},
"dict_parameter": {
"$ref": "#/definitions/Metadata",
"description": "A dict parameter"
},
"object_parameter": {
"description": "A dict parameter with more spec, of the form {'key':'value'}",
"type": "object",
"properties": {
"key": {
"type": "string"
}
}
},
"number_parameter": {
"type": "number",
"description": "A (floating point) number parameter"
},
"integer_parameter": {
"type": "integer",
"description": "A (integer) number parameter"
},
"string_parameter": {
"type": "string",
"description": "A string parameter"
},
"required": [
"image"
]
}
pesto/api/output_schema.json : specify the output validation schema.
Example: output_schema.json
{
"image": {
"$ref": "#/definitions/Image"
},
"areas": {
"$ref": "#/definitions/Polygons"
},
"number_output": {
"type": "number"
},
"integer_output": {
"type": "integer"
},
"dict_output": {
"$ref": "#/definitions/Metadata"
},
"string_output": {
"type": "string"
},
"image_list": {
"$ref": "#/definitions/Images"
},
"geojson": {
"description": "A Geojson.FeatureCollection containing only Polygons as geometries",
"type": "object",
"properties": {
"features": {
"type": "array",
"items": {
"$schema": "http://json-schema.org/draft-06/schema#",
"title": "GeoJSON Feature",
"type": "object",
"required": [
"type",
"properties",
"geometry"
],
"properties": {
"type": {
"type": "string",
"enum": [
"Feature"
]
},
"properties": {
"oneOf": [
{
"type": "null"
},
{
"type": "object"
}
]
},
"geometry": {
"$ref": "#/definitions/Polygon"
}
}
}
},
"type": {
"type": "string"
}
}
}
}
The json files contain the input/output variables name and their information (type/$ref, description)
Default PESTO types can be found in the source code : processing-factory/pesto-cli/pesto/cli/resources/schema/definitions.json
Requirements
PESTO provides a generic way to include any files in the final docker image using the pesto/build/requirements.json file.
The following fields are required :
- environments: some user defined variables
- requirements: A (from,to) list, where
fromis an URI to some files andtois the target path in the docker image - dockerBaseImage : the docker image to use as a base
Example: requirements.json
{
"environments": {
"DEEPWORK": "/deep/deliveries",
"DEEPDELIVERY": "/deep/deliveries"
},
"requirements": {
"lib1": {
"from": "file:///tmp/my-lib1",
"to": "/opt/lib1",
"type": "python"
},
"lib2": {
"from": "file:///tmp/my-lib2.tar.gz",
"to": "/opt/lib2",
"type": "pip"
},
"model": {
"from": "gs://path/to/my-model.tar.gz",
"to": "/opt/model"
}
},
"dockerBaseImage": "python:3.8-buster"
}
PESTO can handle requirements in many formats. Each requirement accepts an optional type field :
- python : add the
topath to the PYTHONPATH environment variable - pip : run a
pip installcommand on the providedwheelor setuptools compatibletar.gzarchive - default : simply copy the files (uncompressed the
tar.gzarchive)
Warning
The tar.gz with type 'python' usage is DEPRECATED and will fail with an archive build with setuptools. Such an archive contains a root folder that should be removed when adding the path to PYTHON_PATH.