You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm interested in assembling a rule-based recommender for PennAI. There are a number of themes that I think will ultimately be important to a strong functioning recommender.
*Meta-features of the dataset (e.g. sample size, number of features, etc) will be important for making early recommendations on a new dataset. At this point we know nothing else about a dataset other than these metafeatures. We don't know if the dataset signal is clean, noisy, simple complex, univariate, multivariate, etc. Thus early recommendations should weigh most heavily on available meta-feature info on the data.
*The recommender should always start with the (or one of the) simplest, fastest algorithms and parameter settings, essentially assuming that the data pattern may be a simple one. If it turns out not to be simple, then we have some place logical to build from.
*After the first recommendation I think it will be important for the recommender to focus on changes in metric performance as it transitions from one ML to another or one parameter setting to another. The system should both learn from these performance differences, and apply them when deciding what to recommend next, based on observed performance differences it's seen so far in modeling the current dataset of interest.
*I think that one potentially good way to approach this problem is to update a number of evidence categories on a given new dataset for analysis. These categories would be general themes that are understood to be important factors in the success of one ML algorithm or one parameter setting over another. These might be categories like (small or large feature space, noisy or clean problem, simple or complex associations, no missing or missing data, classification or regression, etc). Over the course of making recommendations the algorithm will update evidence regarding where it knows or thinks the dataset lies in each of these categories, and these probabilities will feed into determining the type of machine learner (and parameters) that gets picked next.
*Ultimately there are two kinds of recommendations we want the system to make. A starting point recommendation (what single or set of initial runs do we want to test on this new dataset. And the second is after the first or first few analyses, what is the next best ML or parameters to test? I think these are almost two separate prediction problems that will need to be handled differently by the recommender.
*I think that there should be a mechanism built into the recommender that notices when performance improvements have stagnated despite having made educated recommendations, at which point the recommender switches to a random exploratory recomendation mode, picking an ML or parameter settings that it hasn't tried, and that there may be no, or low evidence to support it's selection. Determining if this random approach is still used will be based on whether any new improvements to performance are observed following this random choice.
*Regarding a rule-based method, i might look into chained rules, where one rule can activate another. I think this approach might be useful in this context.
The text was updated successfully, but these errors were encountered:
I'm interested in assembling a rule-based recommender for PennAI. There are a number of themes that I think will ultimately be important to a strong functioning recommender.
*Meta-features of the dataset (e.g. sample size, number of features, etc) will be important for making early recommendations on a new dataset. At this point we know nothing else about a dataset other than these metafeatures. We don't know if the dataset signal is clean, noisy, simple complex, univariate, multivariate, etc. Thus early recommendations should weigh most heavily on available meta-feature info on the data.
*The recommender should always start with the (or one of the) simplest, fastest algorithms and parameter settings, essentially assuming that the data pattern may be a simple one. If it turns out not to be simple, then we have some place logical to build from.
*After the first recommendation I think it will be important for the recommender to focus on changes in metric performance as it transitions from one ML to another or one parameter setting to another. The system should both learn from these performance differences, and apply them when deciding what to recommend next, based on observed performance differences it's seen so far in modeling the current dataset of interest.
*I think that one potentially good way to approach this problem is to update a number of evidence categories on a given new dataset for analysis. These categories would be general themes that are understood to be important factors in the success of one ML algorithm or one parameter setting over another. These might be categories like (small or large feature space, noisy or clean problem, simple or complex associations, no missing or missing data, classification or regression, etc). Over the course of making recommendations the algorithm will update evidence regarding where it knows or thinks the dataset lies in each of these categories, and these probabilities will feed into determining the type of machine learner (and parameters) that gets picked next.
*Ultimately there are two kinds of recommendations we want the system to make. A starting point recommendation (what single or set of initial runs do we want to test on this new dataset. And the second is after the first or first few analyses, what is the next best ML or parameters to test? I think these are almost two separate prediction problems that will need to be handled differently by the recommender.
*I think that there should be a mechanism built into the recommender that notices when performance improvements have stagnated despite having made educated recommendations, at which point the recommender switches to a random exploratory recomendation mode, picking an ML or parameter settings that it hasn't tried, and that there may be no, or low evidence to support it's selection. Determining if this random approach is still used will be based on whether any new improvements to performance are observed following this random choice.
*Regarding a rule-based method, i might look into chained rules, where one rule can activate another. I think this approach might be useful in this context.
The text was updated successfully, but these errors were encountered: