Configuring the PESTO packaging of your algorithm
Once you initialized the new PESTO project, a number of files has to be edited to describe the algorithm's input, output and requirements.
List of configuration files
File | Description |
---|---|
algorithm/process.py |
The python file containing your algorithm with the Process.process() function |
algorithm/input_output.py |
The python file containing the input and output dataclasses |
pesto/api/config.json |
A generic configuration file at the disposal of process.py for its own configuration |
pesto/api/config_schema.json |
The schema of is a json schema file that specifies what config.json should look like |
pesto/api/description.json |
Description your processing algorithm |
pesto/api/input_schema.json |
Specifications of the algorithm's input format |
pesto/api/output_schema.json |
Specifications of the algorithm's output format |
pesto/api/version.json |
Algorithm version description |
To package your algorithm with PESTO, you'll need to :
-
Provide the implementation of your algorithm within the
process()
function inalgorithm/process.py
-
Specify the API of your algorithm, in other words its input and output formats in
pesto/api/input_schema.json
andpesto/api/output_schema.json
. -
Describe and configure the dependencies in
pesto/api/description.json
andpesto/build/requirements.json
Tip
Always start from the pesto-template as it is already a working PESTO project.
One of the main points of attention is to align schemas of input_schema.json
and output_schema.json
with the signature of the process
function. Defining the input/output schemas can be done in two different ways:
-
either the
process()
function takes anInput
object and returns anOutput
object specified ininput_ouput.py
: in that case,pesto schemagen
can generate the schemas for you. This is the easiest and recommended way. -
or the
process()
function takes a set of parameters of your choice and returns an object : in that cas you have to specify by yourself theinput_schema.json
andoutput_schema.json
contents. This is recommended for complex input/output structures that require a json schema that can not be inferred bypesto schemagen
.
Python algorithm
Look at algorithm/process.py
. This is the module that will be loaded by PESTO inside our server and which will be called during preprocessing.
There is a Process
class with on_start()
and process()
methods.
Process.on_start()
The algorithm/process.py
should contains a Process
class with the Process.on_start()
method.
The on_start()
method will be called on the first processing request. It is used to load resources such as Machine Learning models that are then called in the Process.process()
method.
Process.process()
The algorithm/process.py
should contains a Process
class with the Process.process()
method.
The process()
method is called during call to /api/v1/process
, when we want to actually process input data
Warning
Depending on the Process.process()
function signature, you will be able to automatically generate or not the input/output schemas with pesto schemagen
.
If you can encapsulate the input parameters in the algorithm.input_output.Input
dataclass and the returned objects in the algorithm.input_output.Output
dataclass, then you can benefit from the schema generation. Simply edit the algorithm/input_output.py
file to specify the input parameters and the output structure. The signature of the algorithm must be
process(input: Input) -> Output
If your algorithm Process.process()
function takes a list of parameters or returns an object that is not an Output
, then the signature is not compatible with pesto schemagen
: you will have to implement the schemas.
process.py schemagen compatible, or not
process.py | |
---|---|
1 2 3 4 |
|
input_output.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 |
|
process.py | |
---|---|
1 2 3 4 5 6 7 8 |
|
Warning
In this case, you also have to define manually the input/output schemas. This is detailed in the next section.
Info
Images are converted to/from numpy arrays by PESTO. Thus, the Process.process()
function should expect to receive numpy arrays and always return images as numpy arrays.
Input / Output specification
To run, PESTO needs the input and output schema files:
-
pesto/api/input_schema.json
-
pesto/api/output_schema.json
With schemagen
If the Process.process()
function takes an Input
and returns an Output
, then you can generate the input_schema.json
and output_schema.json
:
pesto schemagen --force algo-service/
Success
[2022-12-21 13:50:33,830] 82809-INFO schemagen::__class2schema():l59:
Using the geojson user defined definition from PestoFiles.user_definitions_schema
[2022-12-21 13:50:33,830] 82809-INFO schemagen::__generate():l32:
The Input schema is now in algo-service/pesto/api/input_schema.json
[2022-12-21 13:50:33,831] 82809-INFO schemagen::__class2schema():l59:
Using the geojson user defined definition from PestoFiles.user_definitions_schema
[2022-12-21 13:50:33,832] 82809-INFO schemagen::__generate():l32:
The Output schema is now in algo-service/pesto/api/output_schema.json
See the documentation of pesto schemagen
to see all the supported types.
Manually
If you do/can not use pesto schemagen
, then you have to define the processing input and output.
The REST API use json to communicate with external services or users.
We then use JSON schema to validate input payloads.
pesto/api/input_schema.json
: specify the input validation schema
Example: input_schema.json
{
"image": {
"$ref": "#/definitions/Image",
"description": "Input image"
},
"dict_parameter": {
"$ref": "#/definitions/Metadata",
"description": "A dict parameter"
},
"object_parameter": {
"description": "A dict parameter with more spec, of the form {'key':'value'}",
"type": "object",
"properties": {
"key": {
"type": "string"
}
}
},
"number_parameter": {
"type": "number",
"description": "A (floating point) number parameter"
},
"integer_parameter": {
"type": "integer",
"description": "A (integer) number parameter"
},
"string_parameter": {
"type": "string",
"description": "A string parameter"
},
"required": [
"image"
]
}
pesto/api/output_schema.json
: specify the output validation schema.
Example: output_schema.json
{
"image": {
"$ref": "#/definitions/Image"
},
"areas": {
"$ref": "#/definitions/Polygons"
},
"number_output": {
"type": "number"
},
"integer_output": {
"type": "integer"
},
"dict_output": {
"$ref": "#/definitions/Metadata"
},
"string_output": {
"type": "string"
},
"image_list": {
"$ref": "#/definitions/Images"
},
"geojson": {
"description": "A Geojson.FeatureCollection containing only Polygons as geometries",
"type": "object",
"properties": {
"features": {
"type": "array",
"items": {
"$schema": "http://json-schema.org/draft-06/schema#",
"title": "GeoJSON Feature",
"type": "object",
"required": [
"type",
"properties",
"geometry"
],
"properties": {
"type": {
"type": "string",
"enum": [
"Feature"
]
},
"properties": {
"oneOf": [
{
"type": "null"
},
{
"type": "object"
}
]
},
"geometry": {
"$ref": "#/definitions/Polygon"
}
}
}
},
"type": {
"type": "string"
}
}
}
}
The json
files contain the input/output variables name and their information (type
/$ref
, description
)
Default PESTO types can be found in the source code : processing-factory/pesto-cli/pesto/cli/resources/schema/definitions.json
Requirements
PESTO provides a generic way to include any files in the final docker image using the pesto/build/requirements.json
file.
The following fields are required :
- environments: some user defined variables
- requirements: A (from,to) list, where
from
is an URI to some files andto
is the target path in the docker image - dockerBaseImage : the docker image to use as a base
Example: requirements.json
{
"environments": {
"DEEPWORK": "/deep/deliveries",
"DEEPDELIVERY": "/deep/deliveries"
},
"requirements": {
"lib1": {
"from": "file:///tmp/my-lib1",
"to": "/opt/lib1",
"type": "python"
},
"lib2": {
"from": "file:///tmp/my-lib2.tar.gz",
"to": "/opt/lib2",
"type": "pip"
},
"model": {
"from": "gs://path/to/my-model.tar.gz",
"to": "/opt/model"
}
},
"dockerBaseImage": "python:3.8-buster"
}
PESTO can handle requirements in many formats. Each requirement accepts an optional type
field :
- python : add the
to
path to the PYTHONPATH environment variable - pip : run a
pip install
command on the providedwheel
or setuptools compatibletar.gz
archive - default : simply copy the files (uncompressed the
tar.gz
archive)
Warning
The tar.gz with type 'python' usage is DEPRECATED and will fail with an archive build with setuptools. Such an archive contains a root folder that should be removed when adding the path to PYTHON_PATH.