We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
library(data.table) #1. create dataset
weight=sample(c(0.1,0.2,0.7),100,replace=T) dt=data.table(personID=sample(1:100,500,replace=T,prob=weight), CaseID=1:500)
table(dt$personID) dt #2.random sampling one case per person
s=dt[,list(CaseID=sample(CaseID,1)),by=personID];s
#3. check
library(sqldf) sqldf("select * from s except select * from dt") #rows that not in origninal dataset.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
summary, the example dataset contain 2 columns, personID and CaseID. one perosnID has multi CaseID.
The purpose is to sample one case for each person. the result is not correct
but if replace with any of two codes marked with #$$$$$$ ,i.e.1.each person has same number of cases
or 2. sample .N instead of directly sampling CaseID, the result is correct.
library(data.table)
#1. create dataset
set.seed(1222)
weight=sample(c(0.1,0.2,0.7),100,replace=T)
dt=data.table(personID=sample(1:100,500,replace=T,prob=weight), CaseID=1:500)
dt=data.table(personID=rep(1:100,5), CaseID=1:500)#$$$$$$$$$$ no weight works
table(dt$personID)
dt
#2.random sampling one case per person
set.seed(1222)
s=dt[,list(CaseID=sample(CaseID,1)),by=personID];s
s=dt[,list(CaseID=CaseID[sample(.N,1)]),by=personID];s #$$$$$$$$$$ sample N works
#3. check
library(sqldf)
sqldf("select * from s except select * from dt") #rows that not in origninal dataset.
The text was updated successfully, but these errors were encountered: