Skip to content

Commit

Permalink
Merge pull request #125 from beniaminogreen/cran_resubmission
Browse files Browse the repository at this point in the history
Preparing for CRAN Release
  • Loading branch information
beniaminogreen authored Jul 2, 2024
2 parents 0815677 + bc05231 commit 4828a1a
Show file tree
Hide file tree
Showing 8 changed files with 35 additions and 133 deletions.
9 changes: 4 additions & 5 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: zoomerjoin
Title: Superlatively Fast Fuzzy Joins
Version: 0.1.2.9000
Version: 0.1.5
Authors@R: c(
person("Beniamino", "Green", , "beniamino.green@yale.edu", role = c("aut", "cre", "cph")),
person("Etienne", "Bacher", email = "etienne.bacher@protonmail.com", role = "ctb",
Expand All @@ -13,7 +13,7 @@ Description: Empowers users to fuzzily-merge data frames with millions or tens o
License: GPL (>= 3)
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.1
RoxygenNote: 7.3.2
SystemRequirements: Cargo (>= 1.56) (Rust's package manager), rustc
Imports:
collapse,
Expand All @@ -35,12 +35,11 @@ Suggests:
tidyverse,
vdiffr
Config/testthat/edition: 3
URL: https://beniamino.org/zoomerjoin/
BugReports: https://github.com/beniaminogreen/zoomerjoin/issues/
URL: https://beniamino.org/zoomerjoin/, https://github.com/beniaminogreen/zoomerjoin
BugReports: https://github.com/beniaminogreen/zoomerjoin/issues
VignetteBuilder: knitr
Depends:
R (>= 2.10)
LazyData: true
LazyDataCompression: xz
Config/rextendr/version: 0.3.1.9000

10 changes: 5 additions & 5 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
# zoomerjoin (development version)
# zoomerjoin 0.1.5

## New features

* Several performance improvements (#101, #104).
* Added support for joining based on hamming distance (#100).
* Bumped `extendr` to v0.7.0 (#121)

## Bug fixes

* When `clean = TRUE`, strings were not coerced to lower case. This is now the
case (#105).
* Fix argument `progress`, which didn't print anything when it was `TRUE` (#107).
* Fixed bug where when `clean = TRUE`, strings were not coerced to lower case (#105).
* Fix argument `progress`, was inoperative (#107).

# zoomerjoin 0.1.2
# zoomerjoin 0.1.4

* Submitted Package to CRAN
* Add support for new `join_by()` syntax
Expand Down
1 change: 0 additions & 1 deletion R/extendr-wrappers.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
# This file was created with the following call:
# .Call("wrap__make_zoomerjoin_wrappers", use_symbols = TRUE, package_name = "zoomerjoin")

#' @docType package
#' @usage NULL
#' @useDynLib zoomerjoin, .registration = TRUE
NULL
Expand Down
48 changes: 24 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Please be aware that you will have to have Cargo (the rust toolchain and
compiler) installed to build the package from source.

``` r
install.packages('zoomerjoin')
install.packages(zoomerjoin)
```

### Installing from R-Universe:
Expand Down Expand Up @@ -138,7 +138,7 @@ I start with two corpuses I would like to combine, `corpus_1`:

``` r
corpus_1 <- dime_data %>%
head(500)
head(500)
names(corpus_1) <- c("a", "field")
corpus_1
```
Expand All @@ -162,7 +162,7 @@ And `corpus_2`:

``` r
corpus_2 <- dime_data %>%
tail(500)
tail(500)
names(corpus_2) <- c("b", "field")
corpus_2
```
Expand Down Expand Up @@ -205,7 +205,7 @@ vignette](https://beniamino.org/zoomerjoin/articles/guided_tour.html).
``` r
set.seed(1)
start_time <- Sys.time()
join_out <- jaccard_inner_join(corpus_1, corpus_2, n_gram_width=6, n_bands=20, band_width=6)
join_out <- jaccard_inner_join(corpus_1, corpus_2, n_gram_width = 6, n_bands = 20, band_width = 6)
```

## Warning in jaccard_join(a, b, mode = "inner", by = by, salt_by = block_by, : A pair of records at the threshold (0.7) have only a 92% chance of being compared.
Expand All @@ -217,7 +217,7 @@ join_out <- jaccard_inner_join(corpus_1, corpus_2, n_gram_width=6, n_bands=20, b
print(Sys.time() - start_time)
```

## Time difference of 0.01455116 secs
## Time difference of 0.03253984 secs

``` r
print(join_out)
Expand All @@ -226,25 +226,25 @@ print(join_out)
## # A tibble: 19 × 4
## a field.x b field.y
## <dbl> <chr> <dbl> <chr>
## 1 216 kent county republican finance committee 607 lake co
## 2 238 4th congressional district democratic party 518 16th co
## 3 292 bill bradley for u s senate '84 913 bill br
## 4 378 guarini for congress 1982 606 guarini
## 5 232 republican county committee of chester county 710 republi
## 6 387 committee to re elect congressman staton 805 committ
## 7 122 tarrant county republican victory fund 761 lake co
## 8 378 guarini for congress 1982 883 guarini
## 9 238 4th congressional district democratic party 792 8th con
## 10 88 scheuer for congress 1980 667 scheuer
## 11 45 dole for senate committee 623 riegle
## 12 87 kentucky state democratic central executive committee 639 arizona
## 13 319 7th congressional district democratic party of wisconsin 792 8th con
## 14 478 united democrats for better government 642 democra
## 15 163 davies county republican executive committee 852 warren
## 16 230 pipefitters local union 524 998 pipefit
## 17 216 kent county republican finance committee 719 harford
## 18 302 americans for good government inc 910 america
## 19 35 solarz for congress 82 671 solarz
## 1 88 scheuer for congress 1980 667 scheuer
## 2 35 solarz for congress 82 671 solarz
## 3 378 guarini for congress 1982 883 guarini
## 4 163 davies county republican executive committee 852 warren
## 5 87 kentucky state democratic central executive committee 639 arizona
## 6 302 americans for good government inc 910 america
## 7 216 kent county republican finance committee 719 harford
## 8 319 7th congressional district democratic party of wisconsin 792 8th con
## 9 122 tarrant county republican victory fund 761 lake co
## 10 238 4th congressional district democratic party 792 8th con
## 11 387 committee to re elect congressman staton 805 committ
## 12 478 united democrats for better government 642 democra
## 13 45 dole for senate committee 623 riegle
## 14 216 kent county republican finance committee 607 lake co
## 15 230 pipefitters local union 524 998 pipefit
## 16 232 republican county committee of chester county 710 republi
## 17 292 bill bradley for u s senate '84 913 bill br
## 18 378 guarini for congress 1982 606 guarini
## 19 238 4th congressional district democratic party 518 16th co

Zoomerjoin is able to quickly find the matching columns without
comparing all pairs of records. This saves more and more time as the
Expand Down
4 changes: 0 additions & 4 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,3 @@ reference:
- title: Data
contents:
- dime_data

- title: Miscellaneous
contents:
- zoomerjoin-package
13 changes: 2 additions & 11 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,3 @@
## Resubmission
This is a resubmission. In this version I have:

* Added DOI's and author names to DESCRIPTION file.
* Removed usage of installed.packages to detect if optional dependency `igraph` is installed.

There is one note about possibly misspelled words in the DESCRIPTION. These are author names.

Many thanks for your help!
Ben

## R CMD check results

0 errors | 0 warnings | 0 notes
30 changes: 0 additions & 30 deletions man/zoomerjoin-package.Rd

This file was deleted.

Loading

0 comments on commit 4828a1a

Please sign in to comment.