Releases: tidyverse/dtplyr
dtplyr 1.3.1
dtplyr 1.3.0
Breaking changes
- dplyr and tidyr verbs no longer dispatch to dtplyr translations when used
directly on data.table objects.lazy_dt()
must now explicitly be called by
the user (#312).
New features
-
across()
output can now be used as a data frame (#341). -
.by
/by
has been implemented formutate()
,summarise()
,filter()
,
and theslice()
family (#399). -
New translations for
add_count()
,pick()
(#341), andunite()
. -
min_rank()
,dense_rank()
,percent_rank()
, &cume_dist()
are now mapped
to theirdata.table
equivalents (#396).
Performance improvements
-
arrange()
now utilizessetorder()
when possible for improved performance
(#364). -
select()
now drops columns by reference when possible for improved
performance (#367). -
slice()
uses an intermediate variable to reduce computation time of row
selection (#377).
Minor improvements and bug fixes
-
dtplyr no longer directly depends on
ellipsis
. -
Chained operations properly prevent modify-by-reference (#210).
-
across()
,if_any()
, andif_all()
evaluate the.cols
argument
in the environment from which the function was called. -
count()
properly handles grouping variables (#356). -
desc()
now supports use of.data
pronoun inside inarrange()
(#346). -
full_join()
now produces output with correctly named columns when a
non-default value forsuffix
is supplied. Previously thesuffix
argument
was ignored (#382). -
if_any()
andif_all()
now work without specifying the.fns
argument
(@mgirlich, #325) and for a list of functions specified in the
(@mgirlich, #335). -
pivot_wider()
'snames_glue
now works even whennames_from
contains
NA
s (#394). -
In
semi_join()
they
table is again coerced to a lazy table if
copy = TRUE
(@mgirlich, #322). -
mutate()
can now use.keep
. -
mutate()
/summarize()
correctly translates anonymous functions (#362). -
mutate()
/transmute()
now supportsglue::glue()
andstringr::str_glue()
without specifiying.envir
. -
where()
now clearly errors because dtplyr doesn't support selection
by predicate (#271).
dtplyr 1.2.2
- Hot patch release to resolve R CMD check failures.
dtplyr 1.2.1
- Fix for upcoming rlang release.
dtplyr 1.2.0
New authors
@markfairbanks, @mgirlich, and @eutwt are now dtplyr authors in recognition of their significant and sustained contributions. Along with @eutwt, they supplied the bulk of the improvements in this release!
New features
-
dtplyr gains translations for many more tidyr verbs:
drop_na()
(@markfairbanks, #194)complete()
(@markfairbanks, #225)expand()
(@markfairbanks, #225)fill()
(@markfairbanks, #197)pivot_longer()
(@markfairbanks, #204)replace_na()
(@markfairbanks, #202)nest()
(@mgirlich, #251)separate()
(@markfairbanks, #269)
-
ifelse()
is mapped tofifelse()
(@markfairbanks, #220).
Minor improvements and bug fixes
-
slice()
helpers (slice_head()
,slice_tail()
,slice_min()
,slice_max()
andslice_sample()
) now accept negative values forn
andprop
. -
across()
defaults toeverything()
when.cols
isn't provided
(@markfairbanks, #231), and handles named selections (@eutwt #293).
It ˜ow handles.fns
arguments in more forms (@eutwt #288):- Anonymous functions, such as
function(x) x + 1
- Formulas which don't require a function call, such as
~ 1
- Anonymous functions, such as
-
arrange(dt, desc(col))
is translated todt[order(-col)]
in order to
take advantage of data.table's fast order (@markfairbanks, #227). -
count()
applied to data.tables no longer breaks when dtplyr is loaded
(@mgirlich, #201). -
case_when()
supports use ofT
to specify the default (#272). -
filter()
errors for named input, e.g.filter(dt, x = 1)
(@mgirlich, #267) and works for negated logical columns (@mgirlich, @211). -
group_by()
ungroups when no grouping variables are specified
(@mgirlich, #248), and supports inline mutation likegroup_by(dt, y = x)
(@mgirlich, #246). -
if_else()
named arguments are translated to the correct arguments in
data.table::fifelse()
(@markfairbanks, #234).if_else()
supports.data
and.env
pronouns (@markfairbanks, #220). -
if_any()
andif_all()
default toeverything()
when.cols
isn't
provided (@eutwt, #294). -
intersect()
/union()
/union_all()
/setdiff()
convert data.table inputs
tolazy_dt()
(#278). -
lag()
/lead()
are translated toshift()
. -
left_join()
produces the same column order as dplyr
(@markfairbanks, #139). -
left_join()
,right_join()
,full_join()
, andinner_join()
perform a
cross join forby = character()
(@mgirlich, #242). -
left_join()
,right_join()
, andinner_join()
are always translated to
the[.data.table
equivalent. For simple merges the translation gets a bit
longer but thanks to the simpler code base it helps to better handle
names inby
and duplicated variables names produced in the data.table join
(@mgirlich, #222). -
mutate()
andtransmute()
work when called without variables
(@mgirlich, #248). -
mutate()
gains new experimental arguments.before
and.after
that allow
you to control where the new columns are placed (to match dplyr 1.0.0)
(@eutwt #291). -
mutate()
can modify grouping columns (instead of creating another
column with the same name) (@mgirlich, #246). -
n_distinct()
is translated touniqueN()
. -
tally()
andcount()
follow the dplyr convention of creating a unique
name if the default outputname
(n) already exists (@eutwt, #295). -
pivot_wider()
names the columns correctly whennames_from
is a
numeric column (@mgirlich, #214). -
slice()
no longer returns excess rows (#10). -
slice_*()
functions aftergroup_by()
are faster (@mgirlich, #216). -
slice_max()
works when ordering by a character column (@mgirlich, #218). -
summarise()
supports the.groups
argument (@mgirlich, #245). -
summarise()
,tally()
, andcount()
can change the value of a grouping
variables (@eutwt, #295). -
transmute()
doesn't produce duplicate columns when assigning to the same
variable (@mgirlich, #249). It correctly flags grouping variables so they
selected (@mgirlich, #246). -
ungroup()
removes variables in...
from grouping (@mgirlich, #253).
dtplyr 1.1.0
New features
-
All verbs now have (very basic) documentation pointing back to the
dplyr generic, and providing a (very rough) description of the translation
accompanied with a few examples. -
Passing a data.table to a dplyr generic now converts it to a
lazy_dt()
,
making it a little easier to move between data.table and dplyr syntax. -
dtplyr has been bought up to compatibility with dplyr 1.0.0. This includes
new translations for:-
across()
,if_any()
,if_all()
(#154). -
count()
(#159). -
relocate()
(@smingerson, #162). -
rename_with()
(#160) -
slice_min()
,slice_max()
,slice_head()
,slice_tail()
, and
slice_sample()
(#174).
And
rename()
andselect()
now support dplyr 1.0.0 tidyselect syntax
(apart from predicate functions which can't easily work on lazily evaluated
data tables). -
-
We have begun the process of add translations for tidyr verbs beginning
withpivot_wider()
(@markfairbanks, #189).
Translation improvements
-
compute()
now creates an intermediate assignment within the translation.
This will generally have little impact on performance but it allows you to
use intermediate variables to simplify complex translations. -
case_when()
is now translated tofcase()
(#190). -
cur_data()
(.SD
),cur_group()
(.BY
),cur_group_id()
(.GRP
),
andcur_group_rows() (
.I`) are now tranlsated to their data.table
equivalents (#166). -
filter()
on grouped data nows use a much faster translation using on.I
rather than.SD
(and requiring an intermediate assignment) (#176). Thanks
to suggestion from @myoung3 and @ColeMiller1. -
Translation of individual expressions:
-
x[[1]]
is now translated correctly. -
Anonymous functions are now preserved (@smingerson, #155)
-
Environment variables used in the
i
argument of[.data.table
are
now correctly inlined when not in the global environment (#164). -
T
andF
are correctly translated toTRUE
andFALSE
(#140).
-
Minor improvements and bug fixes
-
Grouped filter, mutate, and slice no longer affect ordering of output (#178).
-
as_tibble()
gains a.name_repair
argument (@markfairbanks). -
as.data.table()
always calls[]
so that the result will print (#146). -
print.lazy_dt()
shows total rows, and grouping, if present. -
group_map()
andgroup_walk()
are now translated (#108).
dtplyr 1.0.1
-
Better handling for
.data
and.env
pronouns (#138). -
dplyr verbs now work with
NULL
inputs (#129). -
joins do better job at determining output variables in the presence of
duplicated outputs (#128). When joining based on different variables inx
andy
, joins consistently preserve column fromx
, noty
(#137). -
lazy_dt()
objects now have a usefulglimpse()
method (#132). -
group_by()
now has anarrange
parameter which, if set toFALSE
, sets
the data.table translation to useby
rather thankeyby
(#85). -
rename()
now works withoutdata.table
attached, as intended
(@MichaelChirico, #123).
dtplyr 1.0.0
-
Converted from eager approach to lazy approach. You now must use
lazy_dt()
to begin a translation pipeline, and must usecollect()
,as.data.table()
,
as.data.frame()
, oras_tibble()
to finish the translation and actually
perform the computation (#38).This represents a complete overhaul of the package replacing the eager
evaluation used in the previous releases. This unfortunately breaks all
existing code that used dtplyr, but frankly the previous version was
extremely inefficient so offered little of data.table's impressive speed,
and was used by very few people. -
dtplyr provides methods for data.tables that warning you that they use the
data frame implementation and you should uselazy_dt()
(#77) -
Joins now pass
...
on to data.table's merge method (#41). -
ungroup()
now copies it's input (@christophsax, #54). -
mutate()
preserves grouping (@christophsax, #17). -
if_else()
andcoalesce()
are mapped to data.table'sfifelse()
and
fcoalesce()
respectively (@MichaelChirico, #112)
dtplyr 0.0.3
-
Maintenance release for CRAN checks.
-
inner_join()
,left_join()
,right_join()
, andfull_join()
: newsuffix
argument which allows you to control what suffix duplicated variable names
receive, as introduced in dplyr 0.5 (#40, @christophsax). -
Joins use extended
merge.data.table()
and theon
argument, introduced in
data.table 1.9.6. Avoids copy and allows joins by different keys (#20, #21,
@christophsax).
dtplyr 0.0.2
This is a compatibility release. It makes dtplyr compatible with dplyr 0.6.0 in addition to dplyr 0.5.0.