👨 Josep Espasa Reig - Data Scientist @ LIS - Cross-National Data Center in Luxembourg
👋 Welcome to the repository of my presentation for the NTTS 2023 conference in Brussels!
This talk addresses the advantages and disadvantages of using Julia in the field of Official Statistics and Social Sciences.
Here you will find a short summary of the presentation, the recording of the presentation, the slides in PDF format, the links to the repositiories, and the Docker images you can use to reproduce the computations.
❓ Should Data Scientists use Julia for Official Statistics and Social Sciences tasks?
- ⚡ Julia offers substantial increases in speed for most functions. These tend to range from 1.2x to ~20x improvements.
- 🐘 Julia tends to be more memory efficient than R, but the improvements are much more moderate (up to ~4.5x less memory)
- ❕ Calling Julia functions from R creates an overhead which tends to halve the speed benefits of using Julia.
- The package ecosystem in Julia is less mature. DS in the field will find:
- ✅ great packages for general tasks such as reading and manipulating data;
- 🔶 a limited range of options for more specific tasks such as computing estimates and variances with complex survey designs or imputing missing values;
- 🔴 no packages for some of the most niche tasks, such as statistical matching.
- 📹 Recording of the presentation - ⏰ Starts at 16:19:00
- 📝 Presentation in PDF format
- 📦 Inequality.jl Julia package for computing income and wealth inequality indicators
- 📁 Repository for Julia computations
- 📁 Repository for R computations
To download the Docker images for the computations the following commands on the CLI:
docker pull joseper/ntts_benchmarks_jl
and
docker pull joseper/ntts_benchmarks_r