Node providers¶
CATMAID uses so called node providers query tracing data from the database.
Depending on the query there are different query strategies that can be useful.
By default CATMAID uses a node provider called postgis3d
, which uses a 3D
bounding box query on a PostGIS representation of the tracing data. This works
well with very dense data and seems to be a good average. The NODE_PROVIDERS
setting in settings.py
gives access to the behavior and allows to configure
different node providers along with constraints that would mark them valid or
invalid for a particular request. This allows to e.g. use a caching node
provider that doesn’t update immediately for large field of views. For smaller
field of views a regular postgis3d
node provider might be used. These are
the available node providers:
postgis2d
- Nodes are queried using a PostGIS 2D index. This seems fast for sparser data and large field of views.
postgis2dblurry
- Like
postgis2d
, but one intersection test less, which includes more false positives for a given bounding box. It is however also faster.
postgis3d
- Nodes are queried using a PostGIS 3D index. This seems fast for denser data and smaller field of views.
postgis3dblurry
- Like
postgis3d
, but one intersection test less, which includes more false positives for a given bounding box. It is however also faster.
cached_json
- A cached version of the data for a given section using the
node_query_cache
table. It is stored as JSON database object.
cached_json_text
- A cached version of the data for a given section using the
node_query_cache
table. It is stored as JSON text string.
cached_msgpack
- A cached version of the data for a given section using the
node_query_cache
table. It is stored as msgpack encoded binary database object.
Cached node queries¶
The caches for use with the last three entries can be populated using the following management command:
manage.py catmaid_update_cache_tables
It allows to populate cache data for the formats json
, json_text
and
msgpack
based on a set of parameters. It can be run for all projects or a
subset. A typical call could look like this:
manage.py catmaid_update_cache_tables --project_id 1 --type msgpack --orientation xy --step 40 --node-limit 0
The --step
parameter sets the section thickness, i.e. Z resolution in xy
orientation. A --node-limit
of 0 will remove any existing node limits. The
type msgpack
turned out to be the fastest one in our tests so far.
It makes sense to automate this process to run once every night. This can be
done with a cron-job or with predefined Celery tasks <celery tasks>,
which can be added to settings.py
like this:
CELERY_BEAT_SCHEDULE['update-node-query-cache'] = {
'task': 'update_node_query_cache',
'schedule': crontab(hour=0, minute=30)
}
This would require Celery Beat to run. If it does, it would update all caches
defined in NODE_PROVIDERS
every night at 00:30.
Using multiple node providers¶
It is possible to define multiple node providers that are valid in different or the same situation. An example could look like this:
NODE_PROVIDERS = [
('cached_msgpack', {
'enabled': True,
'min_width': 200000,
'min_heigth': 120000,
'orientation': 'xy'
}),
('postgis3d', {
'project_id': 2
}),
# Fallback
'postgis2d'
]
For an incoming request, CATMAID will first find all valid node providers, depending on e.g. the project ID, or bounding box of the query. It will then iterate this list and return results from the first node provider that returns results. The following options are available for all node providers:
enabled
- Whether the node provider can be used at all.
project_id
:- For which project this node provider can be used.
orientation
- For which orientation this node provider can be used.
min_width
- Which minimum width the query bounding box must have for this node provider (in project coordinates).
min_height
- Which minimum height the query bounding box must have for this node provider (in project coordinates).
min_depth
- Which minimum depth the query bounding box must have for this node provider (in project coordinates).
max_width
- Which maximum width the query bounding box can have for this node provider (in project coordinates).
max_height
- Which maximum height the query bounding box can have for this node provider (in project coordinates).
max_depth
- Which maximum depth the query bounding box can have for this node provider (in project coordinates).