- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.9k
Closed
Labels
perfPerformance and Benchmarking relatedPerformance and Benchmarking relatedquestionFurther information is requestedFurther information is requested
Description
I am attempting to train random forests on a fixed 6GB data set of features, with N different labels, using M different random forest parameter settings.  Overwhelmingly, the time taken to do this appears to be the disk transpose operation, which occurs N * M times, when ideally it should only be done once (as the feature set is common to all models).
To rectify this, is there any way to either:
- train multiple random forests in the same pipeline, or,
- share the transposed data object between multiple training pipelines?
Metadata
Metadata
Assignees
Labels
perfPerformance and Benchmarking relatedPerformance and Benchmarking relatedquestionFurther information is requestedFurther information is requested