Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interface of exported functions in package #41

Closed
xhdong-umd opened this issue Dec 4, 2017 · 24 comments
Closed

Interface of exported functions in package #41

xhdong-umd opened this issue Dec 4, 2017 · 24 comments

Comments

@xhdong-umd
Copy link
Contributor

xhdong-umd commented Dec 4, 2017

@jmcalabrese @chfleming
It took me a lot of time to adjust the website, update the help documents.

UPDATE: With most functions completed and documented, it's easier to just use the reference website page for current exported functions:

screen shot 2017-12-12 at 3 56 44 pm

features that not planned to export

  • I think the plot 4 individual in visualization page only make senses with a slider.
  • outlier page and time subsetting page are heavily interactive so I don't think it's very useful to generate command line functions for them. Though the function of calculating distance and speed outliers are exported.
@xhdong-umd
Copy link
Contributor Author

I implemented plot_loc and found a serious bug. After some plot in RStudio, the shiny app will have all the plots shown in RStudio plot windows instead of app itself.

It only appears when app is saving plot into .png. This should not happen, I'm not sure if it's problem with ggsave or shiny or RStudio. I have asked question in forums, hopefully can be fixed quickly.

@xhdong-umd
Copy link
Contributor Author

I found a workaround to avoid the problem. The problem is caused by ggsave and Shiny.

@xhdong-umd
Copy link
Contributor Author

plot functions in visualization page are finished, with help updated and linked in first post.

@xhdong-umd
Copy link
Contributor Author

plot for variograms is done.

@xhdong-umd
Copy link
Contributor Author

I moved individual code examples in each function into a vignette, since most of them are part of a workflow, and it's easier to explain them under one document.

@xhdong-umd
Copy link
Contributor Author

plot for home range and occurence are done.

Though it will take quite some efforts to write example code for this part, since I need to duplicate the in app process of maintaining model list and names etc with a different workflow as simple as possible.

@xhdong-umd
Copy link
Contributor Author

I have finished on map page, and updated the repo and reference website. With almost all functions finished and documented, I updated the first post with the reference website page. You can click the reference page to look at help on every function.

I decided to move code examples into a single vignette so there is no example in help now. It will take some time to write the vignette, because I need to reproduce a lot of user selections and a lot of operations in app which are not exported in package yet (and probably need a different usage).

@xhdong-umd
Copy link
Contributor Author

xhdong-umd commented Dec 21, 2017

@jmcalabrese @chfleming
I just finished the main vignette. To establish a workflow with package functions, I have to create quite some new functions for user's convenience, which duplicate some app behavior in different ways.

You can have a look at the vignette, hopefully it's not hard to read.

I have updated the repo, the package website, and set the package version to 0.1.0.

@xhdong-umd
Copy link
Contributor Author

I just updated the hosted app to match the package version.

Next I'll look at the map animation possibilities before our next meeting.

@jmcalabrese
Copy link

@xhdong-umd: Thanks for putting the package vignette together. Some comments and edits to the vignette are in the attached .pdf. Let me know if you have question or want to discuss any of it.
Package Usage • ctmmweb_jmc.pdf

@xhdong-umd
Copy link
Contributor Author

xhdong-umd commented Jan 3, 2018

@jmcalabrese Thanks for the edits, the vignettes is in a much better shape now. I probably should take more time to review and edit it again, though it was hard to resist the urge of finishing it before the holiday vacation starts 😅.

Here is a diff of the RMarkdown. You can choose among source diff mode, rich diff mode or view the file directly.

For some of your comments:

  • There was a print out of the information data.table in the code but lacked description text. I added a sentence and some references about it.
  • The dt variable name came from data.table usage convention. I changed it to loc_data.
  • There is a parameter to turn off parallel mode, I added it in comments. I can add another parameter to set the upper limit of cores used. Do you want it to be used_core = 6 or reserved_core = 2? I think a number of reserved cores is better since it can be used across different platforms. And maybe we should make it to reserve 1 core by default?
  • For multiple models fitted for each animal if they are validate candidates, I was referring the fact that ctmm.fit can return single best model (if all other models can be just ignored) or multiple candidate models for one animal. I think just removing if they are validate candidates and stating the multiple result possibility should be OK.
  • I added some simple text for home range and occurrence section.
  • The home range map function need some parameters, I added a brief description in code comments.

@xhdong-umd
Copy link
Contributor Author

I have added a reserved_cores parameter to the parallel functions.

Though one tricky point in parallel core setting is that there are physical cores and logical cores. A windows pc with 4 physical cores often have 8 logical cores via hyperthread. The usual advice is to use physical cores - 1, however I found it's better to use cores = list length if the number is not too big. Because running 8 jobs on 8 threads is always faster than 8 jobs on 3 threads, even if you only have 4 cores, since the jobs are much more balanced with more threads.

To make sure some physical cores are not used I cannot use logical core number in this mode. So it's possible the default mode used 8 logical cores with 4 physical cores machine, but with 1 reserved core it can only use 3 cores.

@chfleming
Copy link
Contributor

chfleming commented Jan 7, 2018

Hi @xhdong-umd , nice work. I have some minor suggestions. @jmcalabrese may feel free to chime in.

  1. There are some methods like plot_vario and plot_ud that have similar functionality but act on different class objects. I think the R way of doing this would be to define a generic and then define different dispatch methods. For lists of objects, which could be a list of variograms or a list of UDs, I create a list dispatch method to handle the final dispatch (see plot.list in generic.R of ctmm).
  2. For some methods like plot_vario that add bells and whistles to base ctmm functions (plot.variogram) and take the same objects as inputs, they could possibly be integrated into base ctmm functions---specifically if plot_vario is just using par(mfrow=...) and title, then we could add the mfrow argument to plot.variogram that would plot lists of variograms in a grid instead of ontop of each other with their names as titles.
  3. Following on (1), I think any class specific function like data_sample would be better as a class dispatch method for a generic, even if that's the only method with that functionality. In the case of data_sample, the logical thing would be making a sample.telemetry except that base sample does not have a ... argument allowing it to be properly overloaded (and sample.list already exists with different functionality). But you could make a differently named generic, like resample or something.
  4. Following on (1) & (3), I think most of the C-like underscores _ in the function names could be gotten rid of in favor of R-like functions with parsimoniously named generics that are overloaded for different object classes.

@xhdong-umd
Copy link
Contributor Author

xhdong-umd commented Jan 8, 2018

@chfleming I'll try to make some methods to generic. Some comments:

  1. Some function will have a generic name but additional parameters specific to itself. So user can use a generic name with default parameters which is very familiar. However if they want more control, they need to read help and use specific parameters. And user need to know to look for ?plot.vario for help when they are using plot(vario_list). Though I think there is no way to work around this...
# default usage
plot(vario_list)

# more parameters
plot(vario_list, model_list = NULL,
                       fraction = 0.5, relative_zoom = TRUE, cex = 0.65,
                       model_color = "blue", columns = 2)

# for home range
plot(UD_list)

# more parameters
plot(UD_list, level_vec = 0.95, columns = 2, cex = 0.65,
                    tele_list)
  1. plot_vario is taking a list of variograms, and the names of list items should have some information in empirical variogram mode. I have no problem to integrate it into base ctmm, though I'm not sure if that will bring too much additional requirements (see the help page of plot_vario), since there are 3 modes depend on the model_list parameter.
  2. For the _ in names. I saw extensive _ usages in some of R packages (like stringr and some of Hadley packages), so I adopted this style after comparing the other options. We could remove some of them with generic names, but there probably will be quite some left.

@chfleming
Copy link
Contributor

chfleming commented Jan 8, 2018

@xhdong-umd

  1. Yes, you make the generic as generic(universal_args,...) and then the class specific methods as generic.class(universal_args,class_args,...). The generic arguments need to be arguments that every method will always have. Unfortunately, there are many R functions that take up good names but don't make good generics because of their argument structure. E.g., sample.
    I have an example S3 generic emulate if you want to see. In 1.R I have the S3 generic defined (I keep this stuff separate in the first alphabetical file due to peculiarities with S4---some generics that I import are S4 generics). Then in emulate.R I define the class methods. In NAMESPACE I export the generic and the S3 methods. In man/emulate.Rd I document the generic and S3 methods inline with CRAN standards. S4 is a bit different and I actively avoid it.
  2. That's fine.
  3. Its a matter of taste. Personally, I think its good for internal code, but bad for the end user to see.

@xhdong-umd
Copy link
Contributor Author

xhdong-umd commented Jan 9, 2018

@chfleming I updated some dt parameters to loc_data to avoid the naming conflict mentioned by @jmcalabrese .

For the functions can be made generic, I went through the exported functions:

  • tele_list_info, this is similar to summary(buffalo) but with different columns. To make it a generic would conflict with summary in ctmm. I changed the name to report.
  • merge_tele, merge has a different meaning from merge in base R. I don't think it's a good idea to have it like merge(buffalo). We could do as.data.table(buffalo) since this is the expected function to convert an object into data.table. However they usually convert a list into data.table with each list item as a column, which is different from our behavior. We will need to override that, which is not optimal. I changed the name to combine now.
  • plot_loc, plot_loc_facet, plot_time have same parameter, so I don't think they can be made into generic. Unless we add additional parameter like this plot(loc_data), plot(loc_data, mode = "facet"), plot(loc_data, mode = "time") (another option is hist(loc_data) but the plot is made from ggplot instead of base histogram, I'm not sure if that's OK). Do you like this?
  • summary_model_fit, this can be made into generic and become summary(model_fit_res). Though the behavior of summary on a list of model fit result (a data frame) is different from summary on single model fit result (some summary from ctmm), it could be a little bit confusing. Now it's named summary_tried_models.
  • list_model_fit is renamed to flatten_models. There is a flatten in rlang.
  • plot_vario. If we integrate it into ctmm, what's the best approach?
    • currently plot.list in ctmm just dispatch to plot.variogram. I think it make sense to direct that to plot.variogram_list which is just plot_vario (I'm not sure if plot.variogram.list will work). The function should not be a part of plot.variogram because it called plot.variogram.
    • We can move the function to ctmm, but all my future possible changes will need to go through ctmm.
    • or ctmm could call ctmmweb::plot.variogram_list. This will introduce ctmmweb dependencies even plot_vario don't really need any dependencies.
    • or we can keep the function in ctmmweb, user need to load ctmmweb before using plot(vario_list)
  • plot_ud plot each home range separately, while plot(home_range_list) will plot them together. Making plot_ud generic will replace that behavior. Is that desired?
  • I'm not sure how to remove _ in map functions.
  • par_lapply cannot be renamed because there is parLapply in parallel.
  • I renamed calculate_speed into calc_speed to make it shorter.
  • resample is a good name because it means scale down images which is quite similar to our functions. We can make it into resample(buffalo[[1]]) and resample(buffalo).
    Though this will have same name with raster::resample, and loading raster package could override our functions. I'm using pick now since that kind of remind user to use a point count parameter.

@xhdong-umd
Copy link
Contributor Author

I realized if I move some functions to ctmm and I need to make some changes to them, I'll need the changes to go live in ctmm for the app to make use of those functions. That means the github master branch of ctmm must be updated immediately, instead of the original plan of only change ctmm infrequently in major versions.

One alternative could be make a copy to ctmm but still keep the local version in app. So my app can still update anytime when needed, and the changes in ctmm don't need to be as frequent. We can also only export the ctmm version of functions and hide the ctmmweb version, so user will only see one version to avoid confusions.

@chfleming
Copy link
Contributor

I think we can copy the code to ctmm after making some necessary changes, and then remove/refactor the corresponding ctmmweb code after some delay to prevent conflicts.

In addition to having a mc.cores like argument, maybe the code to calculate a good number of default cores/threads should be in a separate function, to be used like plapply(list,fn,cores=defaultCores()). In some cases with rapid function calls, I need to limit to 1 cores in Windows because of socket overhead, but I guess I could just stick to mclapply in those cases instead.

@xhdong-umd
Copy link
Contributor Author

I'm not sure if I understand the suggestion.

  • Approach 1: Move code to ctmm
    Let's say we have decided to move plot_vario into ctmm, and ctmmweb will just call ctmm::plot_vario. Later I need to make changes to plot_vario, the changes need to go live in ctmm master branch so that ctmmweb can use it properly.

  • Approach 2: Copy code to ctmm
    There will be plot_vario in both ctmm and ctmmweb. The ctmm one is exported, documented, open to user. The ctmmweb version is internal used and not exported. Usually they are kept synchronized, but I can make changes to ctmmweb version frequently, and only update ctmm version less frequently. Unless there is a serious bug need to be fixed, some delay in ctmm version should not create problem for user.

  • Approach 3: ?
    I'm not sure I understand it, did you mean the app use ctmm version when it's up to date, switch to a local version when changes are needed, then switch back to ctmm version when it's updated again?

In the process of creating interface and renaming/refactoring functions, I found I need to check every usage of changed functions, modify them and test the app thoroughly. That was time consuming and easy to introduce errors can go unfound if not tested.

For parallel function, there are some code detecting platform as windows or linux/mac. If we abstract the core calculating part, both the new function and the original function will need this platform detection code and create a little bit duplication (although not much).

If you want to limit to 1 cores, why don't just use cores = 1, or even just parallel = FALSE to avoid the overhead completely? I don't see how abstracting the core calculation function can be useful for this case. I'll just make cores = NULL as default. Any real input will override it, and no input means using our core calculation.

@chfleming
Copy link
Contributor

Sorry, I was just thinking about non-exported functions like plapply.

@xhdong-umd
Copy link
Contributor Author

xhdong-umd commented Jan 18, 2018

My plan with the integration:

  • There need to be some entry points in ctmm code. For example we can change the method dispatch on plot.variogram.list to our new function plot_vario (or whatever name you prefer).
  • I'll put all my actual code in separate files from ctmm code, even logically they could be put in some ctmm files like variogram.R. Later we should only need to update the actual code, not the entry points. And all the changes should only happen in files separate from ctmm code, so it'll easier for you to merge them.

@xhdong-umd
Copy link
Contributor Author

I found the package installation need to download the whole tar ball now, which include a 10M folder of package website. I tried to exclude that folder, finally got a solution to satisfied multiple requirement from git, github, devtools, which involves creating 3 branches and some jumping hoops inside git.

@xhdong-umd
Copy link
Contributor Author

There is still some problem with git. It often took a lot of time when my requirement in git is a little bit complex...

@xhdong-umd
Copy link
Contributor Author

xhdong-umd commented Feb 8, 2018

The parallel function is updated to reflect new changes on cores parameter. The core code is integrated into ctmm but ctmm and ctmmweb will maintain different customized versions since their needs are also different.

The package site is updated with the current version, and I tagged it as v0.1.0 since it's first major version with the package structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants