How to export content to csv? #1

anatoliivanov · 2021-10-31T03:54:35Z

Hey @Watchful1 , I ran the script to iterate over the contents of the zst dumps but the output shows the number of lines it has iterated, how do I export the contents to a csv file so that I can start using it for analysis and model building?

Watchful1 · 2021-11-10T05:15:49Z

That depends on what you're trying to do. Which script did you use? Which dump files do you have?

aryashah2k · 2021-11-28T06:16:20Z

I think I have a similar doubt as @anatoliivanov, What he is trying to say (or) what I'm also trying to achieve is that export all of the lines to a comma separated value (csv) file, in the sense that I can view the data as an excel sheet and then use that for data analysis, etc.
@Watchful1 , I would appreciate your help with this.
Apparently, while running your script - "single_file.py" all we get is the number of lines it has iterated over.
How shall we use this data in terms of further analysis?

Watchful1 · 2021-11-28T06:22:04Z

It's not a question with a single answer. It varies depending on what files you're processing, what filtering you want to do, what fields you want to output, etc.

But generally speaking this code is just intended as an example for reading the compressed files, actually doing something with the data once it's read would have to be done by editing the script yourself.

aryashah2k · 2021-11-28T06:39:46Z

Indeed, I am trying to figure that out, just out of curiosity, are these files in NDJSON format?(The files from academic torrents pushshift dumps?)
I am using the r/relationships data for my analysis.
source: https://academictorrents.com/details/cbe9a74749406433ca5c7b29d0c003dafb91d02b

Watchful1 · 2021-11-28T06:50:51Z

Yes, these files are NDJSON compressed with ZStandard. But uncompressed all together it's something like 30 gigabytes. Even if you put it all in a single csv file, excel couldn't open it. That's more than most computers have RAM for, so unless you're using a program specifically suited for analysis of large amounts of data, it will struggle if it works at all.

With large amounts of data like this, it's important to have a specific plan for what analysis you want to do, then do it directly from the compressed files rather than trying to change it into some alternative format first.

aryashah2k · 2021-11-28T07:00:36Z

Yes buddy, I realised that now, Thanks. Will probably work out some way to analyze directly from the compressed files :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to export content to csv? #1

How to export content to csv? #1

anatoliivanov commented Oct 31, 2021

Watchful1 commented Nov 10, 2021

aryashah2k commented Nov 28, 2021

Watchful1 commented Nov 28, 2021

aryashah2k commented Nov 28, 2021 •

edited

Loading

Watchful1 commented Nov 28, 2021

aryashah2k commented Nov 28, 2021

How to export content to csv? #1

How to export content to csv? #1

Comments

anatoliivanov commented Oct 31, 2021

Watchful1 commented Nov 10, 2021

aryashah2k commented Nov 28, 2021

Watchful1 commented Nov 28, 2021

aryashah2k commented Nov 28, 2021 • edited Loading

Watchful1 commented Nov 28, 2021

aryashah2k commented Nov 28, 2021

aryashah2k commented Nov 28, 2021 •

edited

Loading