-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to export content to csv? #1
Comments
That depends on what you're trying to do. Which script did you use? Which dump files do you have? |
I think I have a similar doubt as @anatoliivanov, What he is trying to say (or) what I'm also trying to achieve is that export all of the lines to a comma separated value (csv) file, in the sense that I can view the data as an excel sheet and then use that for data analysis, etc. |
It's not a question with a single answer. It varies depending on what files you're processing, what filtering you want to do, what fields you want to output, etc. But generally speaking this code is just intended as an example for reading the compressed files, actually doing something with the data once it's read would have to be done by editing the script yourself. |
Indeed, I am trying to figure that out, just out of curiosity, are these files in NDJSON format?(The files from academic torrents pushshift dumps?) |
Yes, these files are NDJSON compressed with ZStandard. But uncompressed all together it's something like 30 gigabytes. Even if you put it all in a single csv file, excel couldn't open it. That's more than most computers have RAM for, so unless you're using a program specifically suited for analysis of large amounts of data, it will struggle if it works at all. With large amounts of data like this, it's important to have a specific plan for what analysis you want to do, then do it directly from the compressed files rather than trying to change it into some alternative format first. |
Yes buddy, I realised that now, Thanks. Will probably work out some way to analyze directly from the compressed files :) |
Hey @Watchful1 , I ran the script to iterate over the contents of the zst dumps but the output shows the number of lines it has iterated, how do I export the contents to a csv file so that I can start using it for analysis and model building?
The text was updated successfully, but these errors were encountered: