-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
append a categorical with different categories to the existing #12509
Comments
this was discussed in #9927 would be ok with adding this as a sub-section in the Cookbook somewhere. would you like to do a pull-request? you can point to the SO post and do a short-in-line version. |
Thanks for the quick reply. I'm a physicist having no experience in collaborating on such a big project as pandas. Sure I would like to gain some experience by improving the docs, but I need to learn how. Also, the problem I have goes a tiny step further than the SO question. I am parsing log-files (1.5k files with 100M lines in total; 12GB is the total size of all files) into a dataframe, so I can get some insight into our experiment. I am parsing the log files one by one, and would like to append them to a table in a HDF5 file. A part of each log message, is the name of the process, which created the message. And I know there are a lot less names of processes than lines. So I thought using Categoricals is feasible here. (It might just be another form of efficient string storage .. I don't know... ) I have no way of knowing the complete set of categories in advance. From your SO answer, I learned how to create the following Categoricals using an explicit set of categories. But I have not yet understood how/if I can append following Categoricals to a Table in an HDF5 file (I should add an example here) |
docs for contributing are here |
consistency across DataFrames Resolves pandas-dev#12509
consistency across DataFrames Resolves pandas-dev#12509
Is this still open? Interested in working on this — thanks! |
It seems like the overall idea @dneise wants to accomplish is to dynamically create and update a set of categories based on the data from multiple dataframes. We haven't fully explored the codebase yet, but from a cursory exploration, it seems that there are two ways to potentially accomplish this:
|
I just ran into the same problem as the person asking this question on SO
http://stackoverflow.com/questions/29709918/pandas-and-category-replacement
Jeff gave an excellent answer as usual, I believe he is a pandas developer as well?
So I was wondering whether something like hist answer might be planned to become the default behaviour, when appending Categoricals.
The text was updated successfully, but these errors were encountered: