Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aqp::harmonize - method for profile level denormalization (of Lo-RV-Hi or similar) #164

Closed
brownag opened this issue Sep 3, 2020 · 3 comments

Comments

@brownag
Copy link
Member

brownag commented Sep 3, 2020

soilDB issue 140 raises the issue of comparison of SPC data from different sources as well as among "similar" attributes within a single data source:

ncss-tech/soilDB#140

If columns reflecting a similar property within a layer have different names (e.g. socQ05, socQ50, socQ95; clay, clayQ50, clay_spline, clayQ50_spline) it is inconvenient for them to be shown on a common scale or analyzed in long form within a single vector. In general this is by design but for plots/sketches in particular it becomes an issue.

This would easily encapsulate what I have seen @dylanbeaudette do before, and what I show in the Gist (https://gist.github.com/brownag/d69b899253eef505e1771e7adbef37ad). In my gist, rather than a range I compare different data sources/splines of the RV by unioning separate SPCs that share a common [calculated] attribute finalclay. I don't think doing it manually is particularly difficult in simple case, but it would become cumbersome when dealing with multiple properties or many data sources.

The three-statistic representation of variation / common scale for a property across Low,RV,HI is important for showing "variation" in a concept. I think what Dylan described in part 3 of his comment on ncss-tech/soilDB#140 describe is a matter of dropping/renaming specific columns from the horizon data (that match a set of patterns), appending on profile IDs for each set to make them unique, then union-ing the result [or returning a list of SPCs]. This sequence of operations isn't "fetching" of the data or specific to the SoilGrids data model -- it is a view and it can be implemented generically in terms of aqp methods (profile_id<-, union). The method would apply to any product [or stack of products] where multiple values are reported for a single property*layer and stored in a single "parent" SPC.

I'll post a prototype of this soon.

@brownag brownag changed the title method for profile level denormalization (of Lo-RV-Hi or similar) aqp::harmonize - method for profile level denormalization (of Lo-RV-Hi or similar) Sep 4, 2020
brownag added a commit that referenced this issue Sep 4, 2020
brownag added a commit that referenced this issue Sep 4, 2020
@dylanbeaudette
Copy link
Member

Thanks. This is a far more general solution that I had put together in my private soilgrids-related functions.

The following does not work as expected. Did I miss something?

library(aqp)
library(soilDB)

x <- fetchSDA(WHERE = "cokey = '19623334'", duplicates = TRUE)
z <- harmonize(x, x.names = list(clay = c(low = 'claytotal_l', rv = 'claytotal_r', high = 'claytotal_h')))

@brownag
Copy link
Member Author

brownag commented Sep 5, 2020

I appreciate you testing this out with a realistic example -- as it points out far more problematic issue with my recent work. I admittedly got so caught up "generalizing" that I didn't try with fetchSDA...

Incidentally, the above bug reveals a rather hefty inconsistency and assumption on my part.

aqp:::.data.frame.j was not properly designed for re-arranging the column names [tangential fix here: e59258a] . Presumably in some cases this could result in corrupt show output? I don't think that I ever noticed that

In harmonize, there is an explicit order of the "preserved" columns that is broken by the fetchSDA result. fetchSDA first column in horizon data is chkey with cokey way after [in the middle of the table]. This got some stuff twisted where I was assuming things got rearranged correctly [idname, top depth, bottom depth, harmonized columns, keep columns, hzidname] as they were with data.frame [,j].

Had I fully developed the tests to "work" these parts in situations requiring re-arranging I probably would have come across this... but that wouldn't have been at least until next week some time.

dat <- data.frame(a=1, b=2)
dat[,c("b","a")]
#  b a
#1 2 1

The fix is here: c519a19; now works as expected:

library(aqp)
library(soilDB)

x <- fetchSDA(WHERE = "cokey = '19623334'", duplicates = TRUE)
z <- harmonize(x, x.names = list(clay = c(low = 'claytotal_l', rv = 'claytotal_r', high = 'claytotal_h')))

plot(z, color = "clay", plot.order = c(2,3,1))

image

@dylanbeaudette
Copy link
Member

Seems like we are done here, yes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants