-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metavariables are not excluded in feature selection methods #7009
Comments
Now visualization scoring (VizRank) uses all features that a user can choose manually. But I do agree: in some cases you'll there can be lots of combinations that we are surely not interested in, and having some optional filtering would be nice. I remember the development team discussing this long ago. I am not quite sure, but I think we used to have the opposite problem: metas were ignored even when we wanted to have them considered. |
Selecting metadata for visualization is useful, but having them permuted in
the model suggestions is not good! Metadata should remain as metadata,
serving solely for interpretation purposes. This includes decorating plots
with colors, labels, and annotations. Relevant predictors should be
included in the X-block. It might be worth considering placing variables
either in the X-block or in the metadata bucket. This approach would allow
any variable to be used for modeling or proposing features, or for
decorative purposes.
Am Do., 30. Jan. 2025 um 14:57 Uhr schrieb Marko Toplak <
***@***.***>:
… Now visualization scoring (VizRank) uses all features that a user can
choose manually.
But I do agree: in some cases you'll there can be lots of combinations
that we are surely not interested in, and having some optional filtering
would be nice.
I remember the development team discussing this long ago. I am not quite
sure, but I think we used to have the opposite problem: metas were ignored
even when we wanted to have them considered.
—
Reply to this email directly, view it on GitHub
<#7009 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AM5BQGHFIINBU6H3DWTFXLL2NIVUDAVCNFSM6AAAAABWFH4KGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRUGU4DQNBUGY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Coming from digital humanities, we often don't do predictions but data discovery, text mining and so on. There are many interesting variables that are not used for text mining (for example), but we still wish to explore them. Imagine having a corpus of parliamentary speeches, where you'd wish to also explore the structure of the parliament (through the metadata on the speakers). Of course, this could be achieved by shuffling data back and forth with Select Columns, but it leads to more work just like the above issue. I suggest having a checkbox in VizRank that excludes metas. On by default. |
I agree with all sides. @markotoplak remembers correctly. This is indeed a "longstanding issue" in the sense that it used to be different (many) years ago. There are arguments for both sides; at this very moment I agree that visualizations are basically models, so meta variables should appear only as decoration. But I know I'll change my mind again. Without having a convincing argument (or a solution that satisfies both) we shouldn't switch back and forth every few years. I would prefer not to add any check boxes. A we open the floodgates, their number will rise quickly. Furthermore, a checkbox like this would be error-prone because the implementation would be too difficult: the state of the checkbox changes the content of the model and thus affects context matching. On the other hand, the check box state is a part of the context. Even if we can decide how to resolve this in principle, I predict we'll never resolve all the edge cases in the code. |
A longstanding issue is that metavariables are not excluded from methods. For example, in "find informative projections" for scatter plots, they appear as suggestions. Also, in feature suggestions, the metas are included. If there are many, the automatic feature selection breaks down. This is a nuisance, as metas often contain the solution to a classification problem. "Find informative mosaics" has the same issue, as does the violin plot where ordering by relevance also includes metas. Tree prediction does ignore them, though.
I am currently using version 338 on a Mac, and this error is present in the PC version as well.
This issue has existed in every version of Orange that I can recall.
Best larerooreal
The text was updated successfully, but these errors were encountered: