Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_SDA_interpretation/get_SDA_property: Add "NONE" aggregation method and improve details for arguments #181

Merged
merged 8 commits into from
May 7, 2021

Conversation

brownag
Copy link
Member

@brownag brownag commented Apr 9, 2021

Working on some new tools for interp validations in southwest region and intend to be using the new ssurgoOnDemand get_SDA* functions for handling some of the comparisons to "live" SSURGO. Here is my first cut at extending these things beyond the standard "mukey" and "areasymbol" and mapunit-level aggregation methods.

I am considering ways to further improve and manipulate the output for higher-level purposes (i.e. some sort of fetch* function) and I expect new aggregation methods and options will be added to the ones that have already been implemented to improve parity with WSS and more.

Perhaps the most generic case of "aggregation" is to return un-aggregated component (or horizon) data back to the user. This is especially helpful when trying to interpret "Dominant Condition" where the result may be a product of several different components in a map unit that share a common rating.

Now, you can pass method="NONE" to get_SDA_interpretation/get_SDA_property to forgo any of the dominant/min/max/averaging that normally takes place when making "MUKEY thematic" output

The get_SDA_property method is smart enough to check to see if you are requesting a horizon-level property, and if so the result will return rows 1:1 with chorizon (and will have the chkey). Otherwise, a result that is 1:1 with the components in the selected map units or soil survey areas will be returned.

Example usage:

library(soilDB)

# interpretation by component, no aggregation
get_SDA_interpretation("WMS - Pond Reservoir Area", method = "NONE", mukeys = 2403721)
#> single result set, returning a data.frame
#>   areasymbol musym                                                   muname
#> 1      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 2      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 3      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 4      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 5      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#>     MUKEY    cokey           compname comppct_r rating            class
#> 1 2403721 19586079             Flanly        18  1.000     Very limited
#> 2 2403721 19586080  Typic Fluvaquents         2  1.000     Very limited
#> 3 2403721 19586081 Ultic Haploxeralfs         5  0.047 Somewhat limited
#> 4 2403721 19586082             Sierra        25  1.000     Very limited
#> 5 2403721 19586083         Urban land        50     NA        Not rated
#>                             reason
#> 1 Slope; Seepage; Depth to bedrock
#> 2                          Seepage
#> 3        Seepage; Depth to bedrock
#> 4                   Slope; Seepage
#> 5                             <NA>

# property by component, no aggregation; using label from lookup table
get_SDA_property('Corrosion of Steel', method = 'NONE', mukeys = 2403721)
#> single result set, returning a data.frame
#>   areasymbol musym                                                   muname
#> 1      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 2      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 3      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 4      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 5      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#>     mukey           compname comppct_r    cokey corsteel
#> 1 2403721         Urban land        50 19586083     <NA>
#> 2 2403721             Sierra        25 19586082 Moderate
#> 3 2403721             Flanly        18 19586079      Low
#> 4 2403721 Ultic Haploxeralfs         5 19586081 Moderate
#> 5 2403721  Typic Fluvaquents         2 19586080     High

# property by horizon, no aggregation; using physical column name
get_SDA_property('sandtotal_r', method = 'NONE', mukeys = 2403721, )
#> single result set, returning a data.frame
#>    areasymbol musym                                                   muname
#> 1       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 2       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 3       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 4       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 5       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 6       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 7       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 8       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 9       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 10      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 11      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 12      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 13      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 14      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 15      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 16      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 17      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 18      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#>      mukey    cokey    chkey           compname comppct_r sandtotal_r
#> 1  2403721 19586082 57324675             Sierra        25          65
#> 2  2403721 19586082 57324676             Sierra        25          45
#> 3  2403721 19586082 57324677             Sierra        25          40
#> 4  2403721 19586082 57324678             Sierra        25          40
#> 5  2403721 19586082 57324679             Sierra        25          40
#> 6  2403721 19586079 57324662             Flanly        18          40
#> 7  2403721 19586079 57324663             Flanly        18          40
#> 8  2403721 19586079 57324664             Flanly        18          40
#> 9  2403721 19586079 57324665             Flanly        18          40
#> 10 2403721 19586079 57324666             Flanly        18          45
#> 11 2403721 19586079 57324667             Flanly        18          NA
#> 12 2403721 19586081 57324671 Ultic Haploxeralfs         5          NA
#> 13 2403721 19586081 57324672 Ultic Haploxeralfs         5          69
#> 14 2403721 19586081 57324673 Ultic Haploxeralfs         5          79
#> 15 2403721 19586081 57324674 Ultic Haploxeralfs         5          80
#> 16 2403721 19586080 57324668  Typic Fluvaquents         2          94
#> 17 2403721 19586080 57324669  Typic Fluvaquents         2          93
#> 18 2403721 19586080 57324670  Typic Fluvaquents         2          92

@brownag
Copy link
Member Author

brownag commented Apr 9, 2021

A minor issue I have noticed with the SOD translations (changed in this issue, as well as get_SDA_hydric) is there is a little inconsistency / variation in the use of MUKEY versus mukey in columns in query result. Of course there are different conventions depending on what data source we are talking about... but we should probably pick one way of doing this essentially package-wide. I think lowercase might be more prevalent at this point.

@brownag
Copy link
Member Author

brownag commented Apr 9, 2021

Added hzdept_r and hzdepb_r to non-aggregate, horizon-level get_SDA_property results.

hzdept_r is used for sorting in physical depth ascending order, as opposed to sorting on keys (commonly not informative spatially)

library(soilDB)
get_SDA_property('sandtotal_r', method = 'NONE', mukeys = 2403721)
#> single result set, returning a data.frame
#>    areasymbol musym                                                   muname
#> 1       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 2       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 3       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 4       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 5       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 6       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 7       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 8       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 9       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 10      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 11      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 12      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 13      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 14      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 15      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 16      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 17      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 18      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#>      mukey    cokey    chkey           compname comppct_r hzdept_r hzdepb_r
#> 1  2403721 19586082 57324675             Sierra        25        0        9
#> 2  2403721 19586082 57324676             Sierra        25        9       21
#> 3  2403721 19586082 57324677             Sierra        25       21       52
#> 4  2403721 19586082 57324678             Sierra        25       52       82
#> 5  2403721 19586082 57324679             Sierra        25       82      150
#> 6  2403721 19586079 57324662             Flanly        18        0       10
#> 7  2403721 19586079 57324663             Flanly        18       10       20
#> 8  2403721 19586079 57324664             Flanly        18       20       33
#> 9  2403721 19586079 57324665             Flanly        18       33       66
#> 10 2403721 19586079 57324666             Flanly        18       66       84
#> 11 2403721 19586079 57324667             Flanly        18       84      109
#> 12 2403721 19586081 57324674 Ultic Haploxeralfs         5        0        7
#> 13 2403721 19586081 57324673 Ultic Haploxeralfs         5        7       28
#> 14 2403721 19586081 57324672 Ultic Haploxeralfs         5       28      117
#> 15 2403721 19586081 57324671 Ultic Haploxeralfs         5      117      142
#> 16 2403721 19586080 57324670  Typic Fluvaquents         2        0       15
#> 17 2403721 19586080 57324669  Typic Fluvaquents         2       15       63
#> 18 2403721 19586080 57324668  Typic Fluvaquents         2       63      150
#>    sandtotal_r
#> 1           65
#> 2           45
#> 3           40
#> 4           40
#> 5           40
#> 6           40
#> 7           40
#> 8           40
#> 9           40
#> 10          45
#> 11          NA
#> 12          80
#> 13          79
#> 14          69
#> 15          NA
#> 16          92
#> 17          93
#> 18          94

@brownag
Copy link
Member Author

brownag commented Apr 12, 2021

Latest commits add support for vectorized input of requested properties or interpretations when the aggregation method is "NONE"

As before you can use the fancy human readable lookup table names or their corresponding SQL column names, but now you can also specify columns that do not have a label -- such as the low *_l or high *_h columns.

At this point for a single call to get_SDA_property the vector needs to be either all from component or all from chorizon, but I have plans to both expand the lookup table options and be a bit smarter about how the component v.s. horizon column checking is done.

library(soilDB)
get_SDA_property(c("sieveno200_l", "sieveno200_r", "sieveno200_h"), method = 'NONE', mukeys = 2403721)
#> single result set, returning a data.frame
#>    areasymbol musym                                                   muname
#> 1       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 2       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 3       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 4       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 5       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 6       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 7       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 8       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 9       CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 10      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 11      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 12      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 13      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 14      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 15      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 16      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 17      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 18      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#>      mukey    cokey    chkey           compname comppct_r hzdept_r hzdepb_r
#> 1  2403721 19586082 57324675             Sierra        25        0        9
#> 2  2403721 19586082 57324676             Sierra        25        9       21
#> 3  2403721 19586082 57324677             Sierra        25       21       52
#> 4  2403721 19586082 57324678             Sierra        25       52       82
#> 5  2403721 19586082 57324679             Sierra        25       82      150
#> 6  2403721 19586079 57324662             Flanly        18        0       10
#> 7  2403721 19586079 57324663             Flanly        18       10       20
#> 8  2403721 19586079 57324664             Flanly        18       20       33
#> 9  2403721 19586079 57324665             Flanly        18       33       66
#> 10 2403721 19586079 57324666             Flanly        18       66       84
#> 11 2403721 19586079 57324667             Flanly        18       84      109
#> 12 2403721 19586081 57324674 Ultic Haploxeralfs         5        0        7
#> 13 2403721 19586081 57324673 Ultic Haploxeralfs         5        7       28
#> 14 2403721 19586081 57324672 Ultic Haploxeralfs         5       28      117
#> 15 2403721 19586081 57324671 Ultic Haploxeralfs         5      117      142
#> 16 2403721 19586080 57324670  Typic Fluvaquents         2        0       15
#> 17 2403721 19586080 57324669  Typic Fluvaquents         2       15       63
#> 18 2403721 19586080 57324668  Typic Fluvaquents         2       63      150
#>    sieveno200_l sieveno200_r sieveno200_h
#> 1            36           40           46
#> 2            44           55           60
#> 3            50           56           68
#> 4            56           64           64
#> 5            46           64           64
#> 6            52           71           83
#> 7            37           59           78
#> 8            57           65           76
#> 9            35           46           76
#> 10           44           62           68
#> 11           NA           NA           NA
#> 12            7           14           20
#> 13            6           13           36
#> 14           17           37           47
#> 15           NA           NA           NA
#> 16            8           11           15
#> 17            4           10           13
#> 18            1            3            9

Demo of summarizing rating/class/reasons for multiple interpretations in the same query/call. Still pondering the best way to maintain informative info about rule name while keeping column names manageable in this "wide" result format

library(soilDB)
get_SDA_interpretation(c("FOR - Rutting Hazard by Season", "FOR - Soil Compactibility Risk"),
                       method = "NONE", mukeys = 2403721)
#> single result set, returning a data.frame
#>   areasymbol musym                                                   muname
#> 1      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 2      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 3      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 4      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#> 5      CA630  9011 Urban land-Sierra-Flanly complex, 3 to 25 percent slopes
#>     MUKEY    cokey           compname comppct_r rating_FORRuttingHazardbySeason
#> 1 2403721 19586079             Flanly        18                               0
#> 2 2403721 19586080  Typic Fluvaquents         2                               0
#> 3 2403721 19586081 Ultic Haploxeralfs         5                               0
#> 4 2403721 19586082             Sierra        25                               0
#> 5 2403721 19586083         Urban land        50                              NA
#>   class_FORRuttingHazardbySeason reason_FORRuttingHazardbySeason
#> 1                    Not limited                              NA
#> 2                    Not limited                              NA
#> 3                    Not limited                              NA
#> 4                    Not limited                              NA
#> 5                      Not rated                              NA
#>   rating_FORSoilCompactibilityRisk class_FORSoilCompactibilityRisk
#> 1                            0.865                          Medium
#> 2                            0.000                             Low
#> 3                            0.000                             Low
#> 4                            0.737                          Medium
#> 5                               NA                       Not rated
#>                                                                                                                                  reason_FORSoilCompactibilityRisk
#> 1 Rock fragments, 0-12 inches; Soil structure grade, 0-12 inches; Soil texture, 0-12 inches; Organic matter content, 0-30 cm; Bulk density-compactibility to 30cm
#> 2                            Bulk density-compactibility to 30cm; Rock fragments, 0-12 inches; Soil structure grade, 0-12 inches; Organic matter content, 0-30 cm
#> 3                            Organic matter content, 0-30 cm; Soil structure grade, 0-12 inches; Rock fragments, 0-12 inches; Bulk density-compactibility to 30cm
#> 4 Rock fragments, 0-12 inches; Soil structure grade, 0-12 inches; Soil texture, 0-12 inches; Organic matter content, 0-30 cm; Bulk density-compactibility to 30cm
#> 5                                                                                                                                                            <NA>

@dylanbeaudette
Copy link
Member

A minor issue I have noticed with the SOD translations (changed in this issue, as well as get_SDA_hydric) is there is a little inconsistency / variation in the use of MUKEY versus mukey in columns in query result. Of course there are different conventions depending on what data source we are talking about... but we should probably pick one way of doing this essentially package-wide. I think lowercase might be more prevalent at this point.

We should use lower case whenever possible. SQL Server doesn't (currently) care, but there will be a time when it will cause problems (next 2-4 years with planned upgrades / possible shift from MSSQL -> PGSQL).

@dylanbeaudette
Copy link
Member

This looks pretty good to me, but I don't have much time for a detailed review. Don't let that stop a timely merge when you are confident in the results.

It would be good to work with @cferguso in the coming months to ensure consistency across this, the original SOD, and his new python module.

@brownag brownag merged commit 5b38bdf into master May 7, 2021
@brownag brownag deleted the SODSDAtools branch May 7, 2021 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants