Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support customised category name instead of category 0, category 1 and etc #31

Open
GaaraZhu opened this issue Dec 20, 2017 · 8 comments

Comments

@GaaraZhu
Copy link

For example, I want to generate the bar chart for below csv

GaryZ@GaryZhus-MacBook-Pro-2 ~/Desktop $ cat test2.csv
,jack,jame
age,25,30
height,168,172

After running this command: cat test2.csv | chart bar , it generates below chart. Instead of showing jack and jame it shows category 0 and category 1 at the bottom.

example

@szaydel
Copy link

szaydel commented Dec 20, 2017

I don't think column names are used as labels at the moment. I actually noticed similar. Seems like a good enhancement. 👍

@marianogappa
Copy link
Owner

This format is only common of csvs though; I wouldn't consider this as a built-in.
This is also pretty much only useful in the context of muti-series categorical charts.
Nevertheless, I agree it's a good idea.

How would you guys have it implemented? I'm leaning towards a command-line option flag to interpret the first line of STDIN as category labels.

@szaydel
Copy link

szaydel commented Dec 21, 2017

I think there are a number of challenges with this. One obvious one to me is that even if one of variables is discrete, it may still have many unique values. Somehow I think this should be capped, like top 5, top 10, etc. I think implementing this as an option flag with ability to specify column number is one reasonable approach. But I certainly would allow for passing in the column number to use.

@marianogappa
Copy link
Owner

Another possibility is e.g.:

chart bar --categories "Jack,Jame"

Which works for datasets that are not csvs as well, but requires one to write the categories manually.

In the original solution, there wouldn't be a "which column number to use" problem in my opinion, because chart parses and knows which columns are floats, so the category name for each float column would always be in the first row and the same column number as that float.

@marianogappa
Copy link
Owner

cc #17

@szaydel
Copy link

szaydel commented Dec 22, 2017

I think my issue with having this information in the data is that you are technically requiring its use, instead of making it purely optional with an argument like --categories "Jack,Jame" or --categories="Jack,Jame". My preference, I think would be making it optional without the format dependency, because format dependencies may mean other tools stop working on same data, or two or more copies of data now have to be kept, etc.

@Kuraio
Copy link
Contributor

Kuraio commented Nov 11, 2018

Could we combine both by having --header when the input has the categories in the first row and --categories="Jack,Jame" when we want to manually add it?

@marianogappa
Copy link
Owner

Note that the example in the original post is one weird csv: generally in a csv you'd have a data point per line, but instead there's one column per line. I'm not convinced we should support that format as it's very rare. Also note that at the moment csvs are barely supported, because field escaping is not supported (i.e. fields wrapped in quotes or other separators).

However, I agree that support for category labels is desirable. @Kuraio 's suggestion sounds good, although I'd first check if it's possible to make the two flags the same name and take the string optionally, and if not then to have a similar name for both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants