-
Notifications
You must be signed in to change notification settings - Fork 262
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Ballista at the moment does not support cache functionality, df.cache() does not produce expected behaviour, so until we come up with the caching strategy for ballista I would suggest disabling the functionality
Describe the solution you'd like
DataFusion 52 comes with method to override cache factory SessionState::with_cache_factory we should use it to override cache behaviour when SessionContext is created, and return error if df.cache is called.
EDIT: maybe we could introduce logical plan extension which represents distributed cache, and then users can plug-in their behaviour on scheduler side.
Describe alternatives you've considered
Alternative would be to implement cache behaviour, which should be done, but as it is non trivial task, having dependencies on few other things we would have incorrect behaviour.
Additional context
- This task depends on Update to DataFusion v.52 #1357
- Also there is example how to create cache factory in doc: add example for cache factory datafusion#19139