Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the R code in parallel coordinates plot and correct the principal components dialog. #8985

Closed
rdstern opened this issue May 22, 2024 · 2 comments · Fixed by #8993
Closed
Assignees
Milestone

Comments

@rdstern
Copy link
Collaborator

rdstern commented May 22, 2024

@Vitalis95 I would like to improve the R code that we generate and use in the Describe > Graphs > Parallel Coordinates Plot.

There is also a serious bug in the R code generated in the principal componen ts dialog, which I hope can be corrected. The commands seem to be duplicared and don't work.

The bug is quite urgent to fix. We can't release a new version until that is done. At the same time the multivariate dialogs (using the Factominor package) have a similar feature to the parallel coordinates plot function. Hence, possibly my suggestion below, may help the R code for those dialogs too.

I use the decathlon data from factorminer, where I have added a special factor column. Here are the data:

decathlon3.zip

The problem, in the current R code, we generate is that the GGally::ggparcoord function prefers to use the numbers of the columns in the data frame, rather than their names. (I think @Vitalis95 found this already to be a problem in the multivariate dialogs and he may have a better solution, compared to my suggestion below?

@lilyclements could you please check my suggestion and improve the R code towards the bottom if you can? Here is the current R code for an example I produced here:

image

# Dialog: Parallel Coordinate Plot

decathlon <- data_book$get_data_frame(data_name="decathlon")

last_graph <- GGally::ggparcoord(data=decathlon, columns=c("X100m","Long.jump","Shot.put","High.jump","X400m","X110m.hurdle","Discus","Pole.vault","Javeline","X1500m"),
 groupColumn="rank3", scale="centerObs", missing="exclude", order="skewness") + theme_grey()

 data_book$add_object(data_name="decathlon", object_name="last_graph", object_type_label="graph", object_format="image", object=check_graph(graph_object=last_graph))
data_book$get_object_data(data_name="decathlon", object_name="last_graph", as_file=TRUE)
rm(list=c("last_graph", "decathlon"))

The problem with the code is the order="skewness" argument. This presents the columns (on the x-axis) in the order of their skewness, while I would like them to be in the order I gave in the dialog. I think what I want is the default if the columns argument is numeric, here c(1:10). So replacing the main lines by:

last_graph <- GGally::ggparcoord(data=decathlon, columns=c(1:10),
 groupColumn="rank3", scale="centerObs", centerObsID = 1, missing="exclude", order=c(1:10)) + theme_grey() 

gives the graph I would like.

Here is "my" solution - of course using stack-overflow, here.

decathlon <- data_book$get_data_frame(data_name="decathlon")
column_numbers <- match(c("X100m","Long.jump","Shot.put","High.jump","X400m","X110m.hurdle","Discus","Pole.vault","Javeline","X1500m"),names(decathlon))

last_graph <- GGally::ggparcoord(data=decathlon, columns=column_numbers,
 groupColumn="rank3", scale="Std", centerObsID = 1, missing="exclude", order=column_numbers) + theme_grey()
 
 data_book$add_object(data_name="decathlon", object_name="last_graph", object_type_label="graph", object_format="image", object=check_graph(graph_object=last_graph))
data_book$get_object_data(data_name="decathlon", object_name="last_graph", as_file=TRUE)
rm(list=c("last_graph", "decathlon"))

@Vitalis95 If lily is ok with the code, which is mainly the line with the match command, which then gives the variable to use, with the required numbers, then can you please make the changes. Note also

  1. I have added centerObsID = 1 into the command. Could you also include this. It is ignored except for one scales option, but does no harm. (I don't want to complicate the dialog and this will be an easy tweak when needed.)
  2. I presume we should include the column_numbers variable in what is removed in the last line?
@rdstern rdstern added this to the 0.7.20 milestone May 22, 2024
@rdstern
Copy link
Collaborator Author

rdstern commented May 23, 2024

@lilyclements when we next meet, here is a topic for you to advise quickly.

image

In our plot how could I emphasise some levels of a factor - so make those lines stand out more? The examples in the guide seem to indicate one way, and it is a general change, because the initial output is just a ggplot. I hope it will be just a feww minutes play when we next meet?

@Vitalis95
Copy link
Contributor

@lilyclements , the line of code of getting column indexes suggested by Roger, is simpler and straightforward . I can implement it that way

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants