Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #2930 -- bugfix to as.matrix.data.table() #2939

Merged
merged 8 commits into from
Aug 16, 2018
Merged

Closes #2930 -- bugfix to as.matrix.data.table() #2939

merged 8 commits into from
Aug 16, 2018

Conversation

sritchie73
Copy link
Contributor

Migrated bugfix commits to new branch, and addressed all comments on the now defunct PR #2938

Summary of bugfix:
Supplying a column to use as the rownames in as.matrix.data.table() now also works when the input data.table has only one row. Previously the rownames argument was treated as a vector of user supplied rownames because the length of rownames was the same as the number of rows in x.

The user can now still supply there own length 1 vector of rownames to use when the input data.table has one row, provided it is not a column in the input data.table. This makes the case when the data.table has one row consistent with the functionality where the data.table has > 1 rows.

A new argument is gained, rownames.literal which is used when the input data.table only has 1 row and the user wishes to supply a vector of rownames to use. This enables the function to distinguish between cases where the uses wishes to manually supply a vector of rownames as opposed to using a column in the input data.table as the rownames.

Migrating to commits to a new branch.

Passing a column to use as the rownames to as.matrix.data.table() now works when the input data.table has a single row.
man/as.matrix.Rd Outdated
@@ -7,7 +7,7 @@ Converts a \code{data.table} into a \code{matrix}, optionally using one
of the columns in the \code{data.table} as the \code{matrix} \code{rownames}.
}
\usage{
\method{as.matrix}{data.table}(x, rownames, ...)}
\method{as.matrix}{data.table}(x, rownames, rownames.literal, ...)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This differ from function code, default value, and should raise a warning when checking package

Copy link
Member

@jangorecki jangorecki Jun 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it didn't raise a warning, maybe methods have less checks, still should be corrected to include default value, as in function definition. Also instead of ... use \dots keyword.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - i'm surprised the checks didnt pick this up (even R CMD check --as-cran misses this).

As an aside, is there any reason we don't use Roxygen to generate documentation? This would avoid a lot of these fiddly issues arising from manually creating the .Rd files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember exact reasoning when we were discussing roxygen. As for me, as of now (I used to use roxygen everywhere), I don't like to "touch" source code when I only need to edit documentation. Model of keeping documentation in source files brings a lot of "noise" in git history. Not even mentioning stuff like this, because of which I had to use my own fork of roxygen, till RStudio stops to support such "old" version of roxygen. Then I realized the chain of dependencies of my development helpers was not actually helping as much as disturbing :)

man/as.matrix.Rd Outdated
@@ -7,7 +7,7 @@ Converts a \code{data.table} into a \code{matrix}, optionally using one
of the columns in the \code{data.table} as the \code{matrix} \code{rownames}.
}
\usage{
\method{as.matrix}{data.table}(x, rownames, rownames.literal, ...)}
\method{as.matrix}{data.table}(x, rownames, rownames.literal=FALSE, ...)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also rownames.literal=FALSE, ...)} into rownames.literal=FALSE, \dots)}
as per https://cran.r-project.org/doc/manuals/r-release/R-exts.html#index-_005cdots

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@mattdowle
Copy link
Member

mattdowle commented Aug 14, 2018

I agree with this bug fix but can we have a small change to the fix. The PR as it stands introduces a new argument (rownames.literal=TRUE|FALSE) which changes the meaning of the rownames= argument. When rownames= is intended to be rowname values, it will be advised good practice to always set rownames.literal to TRUE, just in case nrow(DT) is ever 1. But users won't have to and there is no nudge/warning to change. Existing code which passes length(rownames)>1 will continue to work, until nrow(DT)==1 when it will fail, and we won't have nudged any progress towards a better place. A change to existing user code (to pass rownames.literal=TRUE) will be needed and just be advisable; in the same way that data.frame drop=FALSE needs to be known and used and is good practice ... and we don't like that feature of base data.frame.

Suggestion is to deprecate rownames= length > 1. Which is much less commonly used anyway, is my guess. rownames= must always be length 1 and will always be a column name, column number, or, as before, TRUE meaning the first column of the key (which by the way doesn't seem very useful, but that's an aside). If row name values are to be supplied, then new argument rowname.values must be used. You won't be able to use both rownames and rowname.values at the same time.

It's quite easy to manage the migration on this one, slowly over time in the usual way :

  1. Introduce rowname.values= and suggest in NEWS to start using it.
  2. Warn when length(rownames)>1 and suggest rowname.values.
  3. Error when length(rownames)>1 and suggest rowname.values.

Views? (e.g. thumbs up / down)

@mattdowle mattdowle added this to the 1.11.6 milestone Aug 14, 2018
@MichaelChirico
Copy link
Member

@mattdowle sounds good to me.

Using vectors that are length-nrow(x) but aren't currently kept within the frame of x (which is the main use case for rowname.values as I understand it) seems error prone, so forcing users to be careful about this use case strikes me as best practice.

(that said, I've definitely used this sort of thing before 😅)

@sritchie73
Copy link
Contributor Author

I agree, that sounds like a much nicer solution

@mattdowle
Copy link
Member

Great. I've been looking at this today and hadn't seen your push -- my bad for not checking. Looks fine other than we can't change to warning straight away. We have to warn in NEWS there will be a warning, before adding the warning later. Few will read it and act, but at least we gave the best notice we could. Then we can point to NEWS and it's not our fault. Documentation can just go straight to the new usage without mentioning that old usage still works. To my eye rowname.values reads better without the double plural of rownames.values. I suppose rownames.value would avoid the double plural too but then it sounds too much like a single value. A single rowname can't have more than one value so I don't think any beginner would naturally think rowname.values meant the multiple values for one row name.
I'll tweak the PR ...

@sritchie73
Copy link
Contributor Author

I went with rownames.values over rowname.values for consistency with the rownames argument.

@mattdowle
Copy link
Member

Ok. I got another suggestion I'll commit. Btw, please create branches directly in the Rdatatable project rather than in your own; you can because you're a project member. That makes it easier for me and others to push to the same branch.

@mattdowle
Copy link
Member

mattdowle commented Aug 16, 2018

I can't work out how to push to this PR at all since it's in your repo. Thought that I could do it but with a few more steps, but I can't do it at all. I'm sure I've used the online editor within GitHub web interface to do this in the past even when the branch is not in Rdatatable, but these changes are more involved than I'm comfortable using GitHub edit button. I've changed locally and tested and I'm fairly confident you'll all be fine with my change. I'll merge to master, from my local branch. Hopefully my change will be included, this PR and issue will be closed. If it all goes horribly wrong, we can always have a new PR.

@mattdowle mattdowle merged commit 25b126e into Rdatatable:master Aug 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants