-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vega, Datashader, and Holoviews Collaboration #67
Comments
That sounds pretty interesting. I don't know enough about datashader and holoviews to say anything smart before I take a closer look at their models. |
I am just starting to look into them, so if I got any of my summary wrong I would appreciate being corrected by any of the authors. |
Could you clarify how you think holoviews could be translated to SQL for omnisci? |
I don't think it would be. I think for this to help omnisci directly, there would have to be a omnisci backend to datashader. I am not sure how that would work. Possibly as some UDFs that run on their server, but I would defer to the datashaders devs, if they have done any work running on existing databases. This would be helpful for other datashaders backends. Like in the example, it is running off of a Parquet file. |
Based on that very interesting meeting (which has already started to fade from my memory, alas!), I think there were a few things that were clear about how Vega, Datashader, and HoloViews could relate:
|
Note that for option 6, this recent addition to HoloViews may be relevant, holoviz/holoviews#3967. It supports storing a HoloViews transformation pipeline in a re-playable semi-declarative form. (Only semi-declarative because even though it's a text-based spec, it's really just a recipe for function calls, but at least it's constrained and introspectable and thus potentially mappable between different declarative systems...) |
Ah great, yeah that is very useful. Just curious, what was the impetus for this addition? Is it a similar use case? |
Pipeline capturing was added to support replaying the data transformations behind a visible plot, specifically in the case of selecting a subset of the data in one plot and wanting that same subset reflected in various other plots derived from the same data. See holoviz/holoviews#3951 . E.g. if you have 6 columns and some Datashaded plots of various dimensions against other dimensions and see something interesting in one specific plot, the original data points leading to that plot are no longer available (having been rasterized away), but you can still select a region of that plot and now replay the full pipeline to update each of the other plots to show only the points that fall in that region for those dimensions, without ever having to send the full data down to the browser, and whether or not those dimensions are actually shown in the other plots. Moreover, linked selections like this can simply be enabled without any user-written callback code; they will simply be available if someone wants their plots to work like that. I think this support will cover many of the reasons that people want to set up a custom dashboard in the first place, with essentially zero code. But in general having the full provenance and reproducible recipe for each plot from a source dataset is likely to be valuable for lots of other purposes we haven't even contemplated yet. E.g. I'm hoping it can be extended to cover "drilling down" use cases with almost no coding as well, which is the second big reason people write custom dashboards (after linked selections). @jonmmease can comment on that one... |
Oh, I guess the third big reason people write dashboards has always been covered by HoloViews already, which is to show a plot that shows a slice of a multidimensional dataset, with values for the dimensions not shown in the plot being selected by widgets. That's just always worked in HV but otherwise would require writing widget code, so I tend to forget about that even more common case. |
Yeah, this should be possible for many use-cases. Here's a good overview of how Spotfire handles configuring custom drill-down dashboards using their GUI menus (https://www.youtube.com/watch?v=a5FMokQ2CR0). The machinery we'll have in place when holoviz/holoviews#3951 is finished should be a suitable foundation for these more flexible workflows. Of course we'll need to work out a reasonable API for the user to provide the kind of marking/limiting/combining options that the Spotfire's menus provide. |
@jonmmease , sounds great! |
Hey datashader folks! Just wanted to point you to this new issue where some discussion of adding rasterization primitives to Vega Lite is taking place: vega/vega-lite#6043 Since you all have a lot of experience designing this kind of API, I would be curious if you have any feedback on the proposal there. |
We had a call a few weeks ago with @jbednar @tonyfast @dharhas @philippjfr to discuss different ways datashader and holoviews could be useful to the work we are doing with Omnisci. I was particularly interested in all the work that has been put into creating these interactive rasterized geospatial plots (NYC Taxi example) could be reused for our current work getting interactive vega visualizations to execute on a python backend.
My takeaway from the conversation is that datashader is all about taking some data and rasterizing it. If we wanna think of this in terms of transformations on the data, it is like doing a groupby by pixel and then displaying some aggregate.
And Holoviews, despite its name, is at its core not about viewing data, but about transforming it. The key idea is to maintain enough semantic knowledge about the data as we transform it so that appropriate visualizations are implicit in the data encoding.
So if we think about holoviews as a way of transforming data, with datashader being one particular type of transform that is heavily optimized, then we can see where this can fit in our current pipeline. What do we use currently for transforming data? We take Vega transforms and map them to ibis expressions. Instead, we could take vega transforms and map them to holoviews calls. So holoviews wouldn't be used at all on the frontend for visualizing, it would just be a backend library to do the appropriate transforms, which vega would call out to when it needed to transform data. If we wanted to use our existing pipeline directly, we could try to write an Ibis backend for holoviews. However, there might be too much impedance mismatch between the grammer of ibis and that of holoviews, so instead we could write a different python backend for vega, that transforms directly to holoviews instead of ibis.
What would be the payoff here? Well, users would get to use the Altair API to construct interactive visualizations. And they would get the efficiency built into datashader for computing rasterizing data.
The next steps here would be to explore how vega transforms like groupbys and aggregates could be mapped to datashader. Before that, we should come up with a particular use case for interactive visualization with datashader and holoviews, try to replicate it with altair, and then see how we would map the vega transforms to the holoviews expressions.
Taking a step back, what we are doing here is mapping one domain specific language, Vega transforms, to another, Holoviews operations.
cc @ian-r-rose @domoritz
The text was updated successfully, but these errors were encountered: