- FIXED
- Curly-brace text in variable labels, value labels, and values no longer raise errors by colliding with glue strings. Closes #17.
-
FIXED
- Removed
useHash = TRUE
fromsample.int()
insideshould_approx()
to match changes made in R 4.2.0.sample.int()
now decided for itself whether touseHash
or not. Closes #16.
- Removed
-
ADDED
- save_dictionary() writes an empty
exclude
column to factor files.
- save_dictionary() writes an empty
-
CHANGED
- Changes to sift()'s announcement of a dictionary's contents. Closes #15:
- Now says "{dataframe name} contains n columns..." instead of the generic "Dictionary contains n columns...".
- Now shows head and tail of column names rather than just the head.
- save_dictionary() writes only one cell of the
ordered
column, to conform to what real-world users would do.
- Changes to sift()'s announcement of a dictionary's contents. Closes #15:
-
ADDED
- User is now warned if
.dist
is used with an orderless search (because.dist
is ignored in those cases).
- User is now warned if
-
FIXED
- Numeric vectors with
NA
s give a correct peek of their non-NA
contents, rather than being reported asNA
entirely.
- Numeric vectors with
-
CHANGED
- Suggestion to increase
.dist
in the case of no matches is now hidden for orderless searches.
- Suggestion to increase
-
Renamed package to
siftr
to avoid name collision with existingsift
package on CRAN that I somehow missed. -
Initial CRAN submission.
-
ADDED
options_sift()
gets a new option:sift_peeklength
. This controls the approximate length of therand_unique
entries in the data dictionary, i.e. a list of unique values in each column. This "full peek" is used as part of the "haystack" that actually gets searched bysift()
. It defaults to 3000 characters, but the final length increases when separators are added. Previously, a length limit of only 500 characters was hard-coded in. 3000 characters is about the length of a 1-page Word document at default settings.
-
FIXED
- Fixed a bug where columns that had multiple
class()
(e.g."labelled"
and"integer"
) would create a dictionary with two entries per dataframe column. has_class()
can deal with multi-classed variables now.- Row names are now discarded from generated dictionaries.
- Fixed a bug where columns that had multiple
-
CHANGED
some_uniques()
has short-circuit routes for datatypes that don't need the full "random sampling to get a list of its unique values" treatment. So far this is: Factors, Logicals, and Numerics.- Changed "peek" separator to vertical bar
|
from comma,
because some data may use commas within string values. save_dictionary()
generates factor files for each unique factor in the dataframe now, according to thetsv2label
spec.- Sample data (
mtcars_lab
) now has a list column added. should_approx()
now usessample.int()
with theuseHash
argument, which performed better thansample()
.
-
ADDED
save_dictionary()
allows you to save a data dictionary in a form that my other package,tsv2label
, will accept. Closes #11.options_sift()
prints the status of all options when invoked with no arguments. Closes #12.
-
FIXED
- sift() returns a dataframe of all results, not just the first
n = sift_limit
results.
- sift() returns a dataframe of all results, not just the first
- Initial commit.