GitHub - tdhock/nc-article

Paper

Title: Wide-to-tall data reshaping using regular expressions and the nc package.

Abstract: Regular expressions are powerful tools for extracting tables from non-tabular text data. Capturing regular expressions that describe information to extract from column names can be especially useful when reshaping a data table from wide (few rows with many regularly named columns) to tall (fewer columns with more rows). We present the R package nc (short for named capture), which provides functions for wide-to-tall data reshaping using regular expressions. We describe the main new ideas of nc, and provide detailed comparisons with related R packages (stats, utils, data.table, tidyr, tidyfast, tidyfst, reshape2, cdata).

Local output RJwrapper.pdf
Main input/source file to edit is hocking.Rnw
Makefile takes care of creating submission.zip

TODOs

compare with tidyfst::longer_dt? should be same as data.table::melt. https://hope-data-science.github.io/tidyfst/articles/example3_reshape.html

8 Nov 2023

figures-iris-dt contains figures to explain melt, for LatinR data.table tutorial.

11 Oct 2020

figure-who-cols-new-data.R runs new timings and figure-who-cols-new.R makes new figure:

5 Oct 2020

figure-who-rows-dt-data.R and figure-iris-rows-dt-data.R compute timings, figure-who-rows-dt.R plots

figure-who-cols-dt-data.R computes timings, figure-who-cols-dt.R plots

figure-iris-cols-dt-valgrind.R run under valgrind, no memory problems.

figure-iris-cols-dt-data.R computes timings of new data table methods, figure-iris-cols-dt.R makes

17 May 2020

maybe add comparison with tidyfast::dt_pivot_longer?

29 Oct 2019

figure-iris-cols-new.R makes a new figure based on timings computed using updated R packages.

28 Oct 2019

figure-iris-cols.R makes a figure, based on data computed by figure-iris-cols-data.R, which shows that wide-to-tall data reshaping using either data.table or nc packages is much faster than other packages (cdata, stats, tidyr). This experiment uses inputs with a fixed number of rows, and a variable number of input reshape columns. Each function in the experiment outputs a table with multiple (2) reshape columns. It shows that the quadratic time complexity of cdata, stats, tidyr results in significant slowdowns when there are at least 10,000 input reshape columns.

In contrast everything below appears to be linear in the number of input columns when the output has only a single reshape column:

source: figure, timings.

Note that stats::reshape is missing in the second plot here, but the result for a smaller N.col size can be seen here https://github.com/tdhock/nc-article/blob/master/figure-who-cols.png

25 Oct 2019

figure-who-both-rows.R makes

24 Oct 2019

figure-who-complex-rows.R makes

23 Oct 2019

figure-who-rows.R makes

figure-who-cols.R makes

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
figures-iris-dt		figures-iris-dt
.gitignore		.gitignore
Makefile		Makefile
README.org		README.org
README.txt		README.txt
RJournal.sty		RJournal.sty
RJtemplate.tex		RJtemplate.tex
RJwrapper.pdf		RJwrapper.pdf
RJwrapper.tex		RJwrapper.tex
cdata.R		cdata.R
figure-1-iris.R		figure-1-iris.R
figure-1-iris.pdf		figure-1-iris.pdf
figure-1-iris.svg		figure-1-iris.svg
figure-iris-both-cols.pdf		figure-iris-both-cols.pdf
figure-iris-both-cols.png		figure-iris-both-cols.png
figure-iris-cols-convert-data.R		figure-iris-cols-convert-data.R
figure-iris-cols-convert-data.rds		figure-iris-cols-convert-data.rds
figure-iris-cols-data.R		figure-iris-cols-data.R
figure-iris-cols-data.rds		figure-iris-cols-data.rds
figure-iris-cols-dt-data.R		figure-iris-cols-dt-data.R
figure-iris-cols-dt-data.csv		figure-iris-cols-dt-data.csv
figure-iris-cols-dt-data.rds		figure-iris-cols-dt-data.rds
figure-iris-cols-dt-valgrind.R		figure-iris-cols-dt-valgrind.R
figure-iris-cols-dt.R		figure-iris-cols-dt.R
figure-iris-cols-dt.pdf		figure-iris-cols-dt.pdf
figure-iris-cols-dt.png		figure-iris-cols-dt.png
figure-iris-cols-new-data.R		figure-iris-cols-new-data.R
figure-iris-cols-new-data.csv		figure-iris-cols-new-data.csv
figure-iris-cols-new-data.rds		figure-iris-cols-new-data.rds
figure-iris-cols-new.R		figure-iris-cols-new.R
figure-iris-cols-new.pdf		figure-iris-cols-new.pdf
figure-iris-cols-new.png		figure-iris-cols-new.png
figure-iris-cols.R		figure-iris-cols.R
figure-iris-cols.csv		figure-iris-cols.csv
figure-iris-cols.pdf		figure-iris-cols.pdf
figure-iris-cols.png		figure-iris-cols.png
figure-iris-rows-convert-data.R		figure-iris-rows-convert-data.R
figure-iris-rows-convert-data.rds		figure-iris-rows-convert-data.rds
figure-iris-rows-data.R		figure-iris-rows-data.R
figure-iris-rows-data.rds		figure-iris-rows-data.rds
figure-iris-rows-dt-data.R		figure-iris-rows-dt-data.R
figure-iris-rows-dt-data.csv		figure-iris-rows-dt-data.csv
figure-iris-rows-new-data.R		figure-iris-rows-new-data.R
figure-iris-rows-new-data.csv		figure-iris-rows-new-data.csv
figure-iris-rows-new.R		figure-iris-rows-new.R
figure-iris-rows-new.pdf		figure-iris-rows-new.pdf
figure-iris-rows-new.png		figure-iris-rows-new.png
figure-iris-rows.R		figure-iris-rows.R
figure-iris-rows.pdf		figure-iris-rows.pdf
figure-iris-rows.png		figure-iris-rows.png
figure-who-both-cols.R		figure-who-both-cols.R
figure-who-both-cols.pdf		figure-who-both-cols.pdf
figure-who-both-cols.png		figure-who-both-cols.png
figure-who-both-rows.R		figure-who-both-rows.R
figure-who-both-rows.pdf		figure-who-both-rows.pdf
figure-who-both-rows.png		figure-who-both-rows.png
figure-who-cols-data.R		figure-who-cols-data.R
figure-who-cols-data.rds		figure-who-cols-data.rds
figure-who-cols-dt-data.R		figure-who-cols-dt-data.R
figure-who-cols-dt-data.csv		figure-who-cols-dt-data.csv
figure-who-cols-dt-data.rds		figure-who-cols-dt-data.rds
figure-who-cols-dt.R		figure-who-cols-dt.R
figure-who-cols-dt.pdf		figure-who-cols-dt.pdf
figure-who-cols-dt.png		figure-who-cols-dt.png
figure-who-cols-minimal-data.R		figure-who-cols-minimal-data.R
figure-who-cols-minimal.R		figure-who-cols-minimal.R
figure-who-cols-minimal.pdf		figure-who-cols-minimal.pdf
figure-who-cols-minimal.png		figure-who-cols-minimal.png
figure-who-cols-new-data-odd-positions.csv		figure-who-cols-new-data-odd-positions.csv
figure-who-cols-new-data.R		figure-who-cols-new-data.R
figure-who-cols-new-data.csv		figure-who-cols-new-data.csv
figure-who-cols-new.R		figure-who-cols-new.R
figure-who-cols-new.pdf		figure-who-cols-new.pdf
figure-who-cols-new.png		figure-who-cols-new.png
figure-who-cols.R		figure-who-cols.R
figure-who-cols.pdf		figure-who-cols.pdf
figure-who-cols.png		figure-who-cols.png
figure-who-complex-cols-data.R		figure-who-complex-cols-data.R
figure-who-complex-cols-data.rds		figure-who-complex-cols-data.rds
figure-who-complex-rows-data.R		figure-who-complex-rows-data.R
figure-who-complex-rows-data.rds		figure-who-complex-rows-data.rds
figure-who-complex-rows.R		figure-who-complex-rows.R
figure-who-complex-rows.pdf		figure-who-complex-rows.pdf
figure-who-complex-rows.png		figure-who-complex-rows.png
figure-who-rows-data.R		figure-who-rows-data.R
figure-who-rows-data.rds		figure-who-rows-data.rds
figure-who-rows-dt-data.R		figure-who-rows-dt-data.R
figure-who-rows-dt-data.csv		figure-who-rows-dt-data.csv
figure-who-rows-dt.R		figure-who-rows-dt.R
figure-who-rows-dt.pdf		figure-who-rows-dt.pdf
figure-who-rows-dt.png		figure-who-rows-dt.png
figure-who-rows-new-data.R		figure-who-rows-new-data.R
figure-who-rows-new-data.csv		figure-who-rows-new-data.csv
figure-who-rows-new.R		figure-who-rows-new.R
figure-who-rows-new.pdf		figure-who-rows-new.pdf
figure-who-rows-new.png		figure-who-rows-new.png
figure-who-rows.R		figure-who-rows.R
figure-who-rows.pdf		figure-who-rows.pdf
figure-who-rows.png		figure-who-rows.png
hocking-remove-space.R		hocking-remove-space.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paper

TODOs

8 Nov 2023

11 Oct 2020

5 Oct 2020

17 May 2020

29 Oct 2019

28 Oct 2019

25 Oct 2019

24 Oct 2019

23 Oct 2019

About

Releases

Packages

Languages

tdhock/nc-article

Folders and files

Latest commit

History

Repository files navigation

Paper

TODOs

8 Nov 2023

11 Oct 2020

5 Oct 2020

17 May 2020

29 Oct 2019

28 Oct 2019

25 Oct 2019

24 Oct 2019

23 Oct 2019

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages