Warning This set of functions is not a getODK project, but a personal one. The ODK teams will not provide any support if you face an issue. Please post on Github to discuss any needs or difficulties you may encounter.
First version of functions that pull data from ODK Central, even using a filter, and automatically creates dedicated tables in your PostgreSQL database
-> for example from last submission_date known in the database
ODK Central returns all datas by default. We use ODK Collect every day to collect data that goes and is edited in our own PostGIS database. Each day we download hourly a lot of data with Central2pg. It works really fine but most of the downloaded data has already been consolidated into our GIS database. We just need the last ones.
Thanks to pyODK and pl/python we can now ask central for the only data that are not already in our database, so maybe 30 or 40 submissions instead of 5000 ;-)
pl/pyDOK in the workflow :
You have two options :
- install pl-pyODK and its requirements on your database -> here
- Or use the docker image (for testing purpose only) -> there
CREATE OR REPLACE PROCEDURAL LANGUAGE plpython3u;
On the host of the database server
pip install -U pyodk
Edit the .template_pyodk_config.toml file and save it as .pyodk_config.toml
.pyodk_config.toml conf file must exists in Postgresql directory (ie /var/lib/postgresql/)
[central]
base_url = "https://my_central_server.url"
username = "my_username"
password = "my_password"
default_project_id = 5
psql -f pl-pyODK.sql -U my_ser my_database
cd docker_postgis_curl_plpython_pgcron
Edit the .template_pyodk_config.toml file and save it as .pyodk_config.toml
sudo docker build -t postgis:test_pyodk .
sudo docker run --restart="always" --dns 1.1.1.1 --name test_plpyodk -e POSTGRES_DB=field_data -e POSTGRES_USER=tester -e POSTGRES_PASSWORD=testerpwd -d -p 5555:5432 postgis:test_pyodk
You can now connect to the database with your favorite client:
- host = localhost
- port = 5555
- user = tester
- password = testerpwd
- dbname = field_data
Test with the form you want on your central server :
SELECT plpyodk.odk_central_to_pg(
3, -- the project id,
'waypoint', -- form ID
'odk_central', -- schema where to create tables and store data
'filter_to_use', -- the filter "clause" used in the API call ex. '__system/submissionDate ge 2023-04-01'. Empty string ('') will get all the datas.
'point_auto_5,point_auto_10,point_auto_15,point,ligne,polygone' -- (geo)columns to ignore in json transformation to database attributes (geojson fields of GeoWidgets)
);
Or try the example above, which make use of this form : https://biodiversityforms.org/docs/ODK-CEN/donnees_opportunistes/ODK_waypoints
-
First upload it to you central server, note the project id (3 in our instance), the form_id (waypoint), and the name of each "geo" question in the form (point_auto_5,point_auto_10,point_auto_15,point,ligne,polygone) .
-
Send some submissions to central.
-
Now you are ready to make the first call, that will download all the data submitted for this form. The filter parameter may be set to an empty string.
SELECT plpyodk.odk_central_to_pg(
3, -- the project id
'waypoint'::text, -- form ID
'odk_central'::text, -- schema where to create tables and store data
'', -- the filter "clause" used in the API call
'point_auto_5,point_auto_10,point_auto_15,point,ligne,polygone'::text -- json (geo)columns to ignore
);
This will automatically :
- ask Central for the data that correspond to the filter
- get the data
- create the tables to stores those data in the schema "odk_data" (one table for the submissions and one for each repeat group.
- the last parameter lists the question to ignore in json exploration recusrion (geowidgets columns)
- feed those tables with the retrieved data
And at next call :
- check for new questions in the form
- then create the attributes in the table for the new questions
- insert new data
- Check the data you got from Central
SELECT * FROM odk_central.waypoint_submissions_data
SELECT * FROM odk_central.waypoint_emplacements_data;
- Now we can perform a query that uses last submission date (column "submissionDate") as a parameter in the function call.
-- or this to get only data collected since last known submissionDate in the database
CREATE MATERIALIZED VIEW IF NOT EXISTS odk_central.waypoint_last_submission_date AS
SELECT max("submissionDate")::text AS last_submission_date
FROM odk_central.waypoint_submissions_data;
REFRESH MATERIALIZED VIEW odk_central.waypoint_last_submission_date;
SELECT plpyodk.odk_central_to_pg(
3,
'waypoint'::text,
'odk_central'::text,
concat('__system/submissionDate ge ',last_submission_date),
'point_auto_5,point_auto_10,point_auto_15,point,ligne,polygone'::text
)
FROM odk_central.waypoint_last_submission_date;
- Send new submissions to central
- Run last query at the frequency you want, manually You can save your script in a sql file like get_waypoint_data.sql and then call it with psql :
psql -h localhost -p 5555 -U tester -f get_waypoint_data.sql -d field_data
- Or you may want to define a cron task Adapt and add a line as above to your cron list. See https://crontab.guru/ to learn about cron task scheduling.
crontab -e
For example, to run the script every day at 18:00, add this line to the crontab :
0 18 * * * psql -h localhost -p 5555 -U tester -f get_waypoint_data.sql -d field_data
CREATE VIEW waypoints AS
SELECT places.data_id, date_heure, mail_observateur as email, nom_observateur, etiquette, heure_localite,
st_force2d(st_geomfromgeojson(replace(COALESCE(ligne, point, point_auto_10, point_auto_15, point_auto_5, polygone),'\','')))::geometry(geometry, 4326) AS geom, prise_image, remarque
FROM odk_central.waypoint_submissions_data submissions JOIN odk_central.waypoint_emplacements_data places ON places."__Submissions-id" = submissions."__id"