Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v.dissolve: Compute attribute aggregate statistics #2388

Merged
merged 24 commits into from
Jul 22, 2023

Conversation

wenzeslaus
Copy link
Member

@wenzeslaus wenzeslaus commented May 20, 2022

In addition to geometry dissolving, compute aggregate statistics for the attribute values of dissolved features with v.db.univar and SQL.

v.db.select with group is used to obtain unique values of the column the dissolving is based on. Add column and update now happens for every value, column, and statistics.

Originally implemented with v.db.univar only because it has a good set of functions, but direct SQL is faster and potentially can have more functions (although default SQLite has less).

Auto-generates names and combinations of column-method for convenience, but when all needed parameters are provided, uses them as is.

Has documentation, examples, image for original functionality, and test (image generated in notebook).

Uses plural for columns and methods.

Removes duplicate columns and methods for non-explicit automatic (interactive) result column handling.

Support SQL expressions as columns (as in v.db.update query_column or v.db.select columns). Supports general SQL syntax just like v.db.select for the price of less checks. Supports also text-returning aggregate functions and functions with multiple parameters such as SQLite group_concat. Supports any layer, not just 1, for attributes.

Uses a simple SQL escape function to double single quotes.

Requires v.db.univar JSON output and v.db.select column info in JSON output.

Handles cleanup from the main function. Removes global variables. Uses PID and node name for the temporary vector. Partially modernizes the existing code by using gs alias instead of grass alias. Improves author lists.

@wenzeslaus wenzeslaus added this to the 8.4.0 milestone May 20, 2022
@wenzeslaus wenzeslaus added Python Related code is in Python enhancement New feature or request labels May 20, 2022
@wenzeslaus
Copy link
Member Author

This depends on JSON output in #2386.

@wenzeslaus wenzeslaus force-pushed the v_dissolve-attr-stats branch from 1c6fe46 to e440269 Compare June 9, 2022 20:20
@wenzeslaus
Copy link
Member Author

Notebook to test: https://mybinder.org/v2/gh/wenzeslaus/grass/v_dissolve-attr-stats?urlpath=lab%2Ftree%2Fscripts%2Fv.dissolve%2Fv_dissolve.ipynb

Copy link
Member

@HuidaeCho HuidaeCho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wenzeslaus Overall, it looks good to me. Just have a few minor comments.

scripts/v.dissolve/v.dissolve.py Outdated Show resolved Hide resolved
scripts/v.dissolve/v.dissolve.py Outdated Show resolved Hide resolved
scripts/v.dissolve/v.dissolve.py Show resolved Hide resolved
scripts/v.dissolve/v.dissolve.py Outdated Show resolved Hide resolved
scripts/v.dissolve/v.dissolve.py Show resolved Hide resolved
scripts/v.dissolve/v.dissolve.py Outdated Show resolved Hide resolved
scripts/v.dissolve/v.dissolve.py Outdated Show resolved Hide resolved
scripts/v.dissolve/v.dissolve.py Outdated Show resolved Hide resolved
scripts/v.dissolve/v.dissolve.py Show resolved Hide resolved
@wenzeslaus
Copy link
Member Author

I made some significant updates. The interface is now tailored for two different use cases, one for interactive use, when a lot of things happen automatically and then the other one for scripting when user is expected to be very explicit about what should be computed. This is now described in the documentation.

I resolved some of the comments after writing a comment with resolution, but some may still require some discussion.

Thank you for the feedback, @HuidaeCho.

@wenzeslaus
Copy link
Member Author

Should the option names be in singular or plural? For this module, it is aggregate_column, aggregate_method, and result_column versus aggregate_columns, aggregate_methods, and result_columns.

Is it singular as in:

r.patch input=aaa,bbb,ccc output=xxx
v.patch input=aaa,bbb,ccc output=xxx
g.list type=raster,vector

or plural as in:

v.db.select map=aaa columns=xxx,yyy
v.db.addcolumn map=aaa columns="xxx double precision,yyy integer"
v.db.addtable map=aaa columns="xxx double precision,yyy integer"

?

Singular versus plural in option names

The following does not include more special options in terms of use of singular and plural such as cats, coordinates or GDAL options.

Standard options

Standard options with multiple set to yes.

id name s/p
G_OPT_V_INPUTS input singular
G_OPT_V_MAPS map singular
G_OPT_V_TYPE type singular
G_OPT_V3_TYPE type singular
G_OPT_DB_COLUMNS columns plural
G_OPT_R_INPUTS input singular
G_OPT_R_OUTPUTS output singular
G_OPT_R_MAPS map singular
G_OPT_R_ELEVS elevation singular
G_OPT_R3_INPUTS input singular
G_OPT_R3_MAPS map singular
G_OPT_M_DATATYPE type singular
G_OPT_STDS_INPUTS inputs plural
G_OPT_STRDS_INPUTS inputs plural
G_OPT_STRDS_OUTPUTS outputs plural
G_OPT_STVDS_INPUTS inputs plural
G_OPT_STR3DS_INPUTS inputs plural

Notes: G_OPT_V_OUTPUTS does not exist, but given the other ones, it would be output. There is many temporal options, but they are not used that much as the other ones: all combined are used 10 times while just G_OPT_V_INPUTS is used 5 times and G_OPT_R3_INPUTS 3 times.

Vector modules in C

C modules with multiple = YES in the vector directory. Other modules not listed (too many).

module option s/p
v.normal tests plural
v.clean tool singular
v.net.iso costs plural
v.what layer singular
v.distance column, upload singular
v.build option singular
v.in.pdal class_filter singular
v.in.ogr layer singular

Python modules

Python modules with multiple: yes.

module option s/p
v.db.select columns plural
v.db.univar, db.univar percentile singular
v.db.addtable columns plural
v.db.addcolumnumn columns plural
r.patch input singular
r.texture method singular
r.buffer.lowmem distances plural
r.semantic.label semantic_label singular
r.in.wms layers, styles plural
g.search.modules keyword singular
t.vect.db.select columns plural
t.merge inputs plural

@wenzeslaus wenzeslaus marked this pull request as ready for review June 14, 2022 17:56
@wenzeslaus wenzeslaus added the C Related code is in C label Jul 26, 2022
@wenzeslaus wenzeslaus removed the C Related code is in C label Aug 28, 2022
@wenzeslaus wenzeslaus modified the milestones: 8.3.0, 8.4.0 Feb 10, 2023
@wenzeslaus wenzeslaus force-pushed the v_dissolve-attr-stats branch from 1f7058f to d2cd66f Compare May 4, 2023 16:42
@wenzeslaus wenzeslaus force-pushed the v_dissolve-attr-stats branch from d2cd66f to b646272 Compare July 17, 2023 13:28
@wenzeslaus wenzeslaus requested a review from HuidaeCho July 17, 2023 14:20
@wenzeslaus wenzeslaus force-pushed the v_dissolve-attr-stats branch from b646272 to d14d6f2 Compare July 18, 2023 13:48
@wenzeslaus
Copy link
Member Author

While this could be faster or parallel, the aggregation works well for simple cases and can deal with some complex cases, too. From my perspective, this is ready to be merged right after #3090.

scripts/v.dissolve/v.dissolve.py Show resolved Hide resolved
In addition to geometry dissolving, compute aggregate statistics for the attribute values of dissolved features with v.db.univar.

Requires v.db.univar JSON output. v.db.select with group is used to obtain unique values of the column the dissolving is based on. Add column and update now happens for every value, column, and statistics.
…rom the main function. Remove global variables. Use PID and node name for the temporary vector.
…y dev null in cleanup code. Modernize and Pylint generic error message.
@wenzeslaus wenzeslaus force-pushed the v_dissolve-attr-stats branch from 7cd9500 to be56efd Compare July 19, 2023 13:53
@wenzeslaus wenzeslaus dismissed HuidaeCho’s stale review July 22, 2023 01:32

Issues addressed, code changed significantly, review is no longer relevant.

@wenzeslaus wenzeslaus merged commit 9d44603 into OSGeo:main Jul 22, 2023
@wenzeslaus wenzeslaus deleted the v_dissolve-attr-stats branch July 22, 2023 01:53
@wenzeslaus wenzeslaus removed the request for review from HuidaeCho July 22, 2023 01:54
landam pushed a commit to landam/grass that referenced this pull request Oct 25, 2023
In addition to geometry dissolving, compute aggregate statistics for the attribute values of dissolved features with v.db.univar and SQL.

v.db.select with group is used to obtain unique values of the column the dissolving is based on. Add column and update now happens for every value, column, and statistics.

Originally implemented with v.db.univar only because it has a good set of functions, but direct SQL is faster and potentially can have more functions (although default SQLite has less).

Auto-generates names and combinations of column-method for convenience, but when all needed parameters are provided, uses them as is.

Has documentation, examples, image for original functionality, and test (image generated in notebook).

Uses plural for columns and methods.

Removes duplicate columns and methods for non-explicit automatic (interactive) result column handling.

Support SQL expressions as columns (as in v.db.update query_column or v.db.select columns). Supports general SQL syntax just like v.db.select for the price of less checks. Supports also text-returning aggregate functions and functions with multiple parameters such as SQLite group_concat. Supports any layer, not just 1, for attributes.

Uses a simple SQL escape function to double single quotes.

Requires v.db.univar JSON output and v.db.select column info in JSON output.

Handles cleanup from the main function. Removes global variables. Uses PID and node name for the temporary vector. Partially modernizes the existing code by using gs alias instead of grass alias. Improves author lists.
neteler pushed a commit to nilason/grass that referenced this pull request Nov 7, 2023
In addition to geometry dissolving, compute aggregate statistics for the attribute values of dissolved features with v.db.univar and SQL.

v.db.select with group is used to obtain unique values of the column the dissolving is based on. Add column and update now happens for every value, column, and statistics.

Originally implemented with v.db.univar only because it has a good set of functions, but direct SQL is faster and potentially can have more functions (although default SQLite has less).

Auto-generates names and combinations of column-method for convenience, but when all needed parameters are provided, uses them as is.

Has documentation, examples, image for original functionality, and test (image generated in notebook).

Uses plural for columns and methods.

Removes duplicate columns and methods for non-explicit automatic (interactive) result column handling.

Support SQL expressions as columns (as in v.db.update query_column or v.db.select columns). Supports general SQL syntax just like v.db.select for the price of less checks. Supports also text-returning aggregate functions and functions with multiple parameters such as SQLite group_concat. Supports any layer, not just 1, for attributes.

Uses a simple SQL escape function to double single quotes.

Requires v.db.univar JSON output and v.db.select column info in JSON output.

Handles cleanup from the main function. Removes global variables. Uses PID and node name for the temporary vector. Partially modernizes the existing code by using gs alias instead of grass alias. Improves author lists.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Python Related code is in Python
Development

Successfully merging this pull request may close these issues.

4 participants