Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Desc #845

Closed
wants to merge 3 commits into from
Closed

Desc #845

wants to merge 3 commits into from

Conversation

rsaporta
Copy link
Contributor

desc(DT) is a simple analogy to desc in SQL.
print method supplied which groups together columns by class

@rsaporta
Copy link
Contributor Author

example for desc(DT)

L <- 1:5

DT <- data.table(ID = LETTERS[1L]
             ,   date = seq(Sys.Date(), length=max(L), by="day")
             ,   last.occurance = seq(Sys.Date(), length=max(L), by="-2 month")
             ,   value = rnorm(L)
             ,   distance = runif(L, 100, 1e5)
             ,   group = factor(letters[L])
             ,   days = as.integer(sample(30, L))
             ,   user = c("tammy", "tommmy", "billie", "zoe", "chloe")
             ,   storeid = paste0("store", L)
             ,   mileage = {set.seed(1); rnorm(L, 100, 1e5)}
             ,   mileage.scaled = scale({set.seed(1); rnorm(L, 100, 1e5)})
)


desc(DT)
#
#     character :  ID, storeid, user
#
#     factor    :  group
#
#     integer   :  days
#
#     numeric   :  distance, mileage, mileage.scaled.V1,
#                  value
#
#     Date      :  date, last.occurance
#


desc.data.table(DT, tight.output=TRUE)
#     character :  ID, storeid, user
#     factor    :  group
#     integer   :  days
#     numeric   :  distance, mileage, mileage.scaled.V1,
#                  value
#     Date      :  date, last.occurance

@gsee
Copy link

gsee commented Sep 30, 2014

Just an FYI, the output looks similar to something you'd get with str(). str() has a lot of parameters to handle things like columns that contain nested lists. For example, I have a package that has a print method for objects that are basically just lists that looks like this:

str(unclass(DT), comp.str="", no.list=TRUE, give.head=FALSE, 
   give.length=FALSE, give.attr=FALSE, next.lev=-1, indent.str="")

There are lots of possibilities for how the arguments could be tweaked if you prefer something different. For example, you can use vec.len=0 if you don't want to display any of the data. Just an idea for something that could you could use inside desc if you decided to add functionality to it.

@rsaporta
Copy link
Contributor Author

@gsee the biggest difference in the output is that desc() groups by column class.

But also, if I want to grab all of my, say, Date columns I can simply do

   dateCols <- desc(DT)["Date"]

@matthieugomez
Copy link
Contributor

As a mere data.table user, I like this function a lot!
desc is the name of a function in dplyr though (short for descending order). describe or descSQL may be better names.
I don't understand very well what it has to do with data.table though. Why did not you write the function using only dataframe?

@rsaporta
Copy link
Contributor Author

Ahh, that's a shame. I use this desc() all time. Possible options for names for this function:

 describe
 Desc
 DESC
 descdt
 desctable
 desktablechair (I'm full of good jokes)
 des
 dsc

I feel this function will be most useful when working interactive, so less
typing = better.

@arunsrinivasan
Copy link
Member

Hm, desc is -xtfrm(x) in dplyr.

How about naming it glimpse (which is already a function in dplyr doing something like this, but slightly more fancier), and making it a S3-generic, and directing to dplyr::glimpse if input isn't a data.table?
Edit: Now that I think of what desc does, glimpse may not be the way to go.

Or descDT - like setDT. I also like describe.

@rsaporta
Copy link
Contributor Author

rsaporta commented Oct 1, 2014

I like both describe and descDT
The former has the advantage of being able to generalize to lists and other objects.

@matthieugomez
Copy link
Contributor

Should not the returned object be wide instead of long? so that desc(DT)$integer gives vector of column names

@arunsrinivasan
Copy link
Member

@rsaporta "describe" sounds great.
I'm not so sure about the need for tight.output arg. I'd just keep the output for =TRUE as the only way.

@gsee
Copy link

gsee commented Oct 1, 2014

FWIW, describe is an S3 generic in Hmisc.

On Wed, Oct 1, 2014 at 11:51 AM, Arun notifications@github.com wrote:

@rsaporta https://github.com/rsaporta "describe" sounds great.
I'm not so sure about the need for tight.output arg. I'd just keep the
output for =TRUE as the only way.


Reply to this email directly or view it on GitHub
#845 (comment).

@jangorecki
Copy link
Member

My Information schema branch looks heavily related to this one. Somehow an extension of your proposal, it also construct the gathered metadata into data.table object so it can be easier to utilize.
it is not completed yet (so no PR) as I'm not sure if anyone (incl. Matt and Arun) would be interested in merging those features. But it is stable so you may check it @rsaporta.

@arunsrinivasan
Copy link
Member

Please file an issue if this needs to be revisited. Closing this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants