-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track R-GSOC-2016 Progress #2
Comments
Qin, Congratulations, as you probably already know, the RE2 project has been accepted! |
Project Status ReportMay 19 - 2016 Changes during Community Bonding1. Setup continuous integration and code coverage testThis package now checks CI on Mac, Linux, and Windows, and the code coverage status is checked by codecov.io. 2. More docs and testsAdd more docs and test cases for new functions and existing functions. 3. Documentation PagesInitial work on the documentation pages https://qinwf.github.io/re2r_doc/ . 4. Parallel SupportAll pattern matching routines have been implemented to work in parallel with RcppParallel. 5. Add split and locate functionsAdd 6. Add regular expression visualization with regexper libraryAdd 7. Improve PerformanceUse Google Performance Tools to profile the compiled C++ codes. Rewrite some critical code using raw R-C API to avoid the overhead of Issue Status#3 Solaris buildThere will be changes now and then. We can test Solaris in the future. #4 Long Vector TestsInitial test cases was added. #5 Match failure when LC_COLLATE is not UTF-8Use Initial test cases was added. #6 Question: argument orderChange order from #7 Using SET_STRING_ELT and Rf_mkCharLenCE to handle output string encodingChanges were landed. There is one case to take care of. It is that Rcpp exception strings are set to be native encoding instead of UTF-8 encoding, and if a pattern can not be parsed, the error message raised from Rcpp may contain strange characters. To fix it, we can remove Rcpp dependency in the near future. Now most parts of the code are Rcpp independent, it should be easy to fix. #8 Handle NA_STRINGAll pattern matching routines have been implemented, including Initial test cases was added. Future Plan1. Follow the timeline in the proposalSee the proposal. 2. Make functions vectorizedMake functions accept multiple patterns with multiple strings. 3. Add more test cases and close existing issuesAdd more tests cases and improve the test coverage ratio. 4. Maybe some new ideas and refine APIsThanks for any help and advice! |
You're way ahead the timeline! Theoretically, you should now "Look for examples of how regular expressions are used in existing R packages." 😛 Congrats! |
about vectorizing, I think it is mainly necessary to vectorize the subject (not the pattern), since the typical usage is "apply this single regex to this set of subjects" |
On the other hand, @qinwf could make the API as much similar to stringi (and hence stringr) as possible. Who knows, maybe re2r will some day be wrapped by stringr too.. |
No description provided.
The text was updated successfully, but these errors were encountered: