Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equivalence classes on requests. #40

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

kantp
Copy link

@kantp kantp commented Jan 27, 2016

Hi,

this pull request proposes a new function, dataFetchEquiv, that allows to define equivalent requests, i.e. classes of requests where it can be known a priori that they will yield the same result.

For a given equivalence class, at most one request will be performed, and all equivalent requests will yield the same result.

I use this facility to stretch Haxl a bit, and allow to perform write operations that are disguised as reads: my request type will look like

data MyRequest a where
  EnterOrRetrieveData :: MyData -> MyRequest MyData
  RetrieveData :: MyKey -> MyRequest MyData

I will also have a function that maps MyData to MyKey,

key :: MyData -> MyKey

The query that I like to perform is this: given some value x of type MyData, is there already some y in my data source that has the same key as x? If yes, give me that y, if not, enter x into my data source, and give me x as the result of the query. All future queries for a value z with key z == key x should yield the same result (the goal is to find similar pieces of data, and the function key is chosen so as to map similar data to equal keys).

This can be achieved with the function dataFetchEquiv,

getSimilar :: MyData -> (GenHaxl UserEnv) MyData
getSimilar =
  let equiv :: MyRequest a -> MyRequest a -> Bool
      equiv (EnterOrRetrieveData x) (EnterOrRetrieveData y) = key x == key y
      representative :: MyRequest a -> MyRequest a
      representative (EnterOrRetrieveData x) = RetrieveData (key x)
  in dataFetchEquiv equiv representative . EnterOrRetrieveData

The write operation is idempotent: Haxl will perform at most one write for each key. It is not observable whether anything was written at all, or if the data was already there in the first place. This ensures that the caching and re-ordering made by Haxl are safe (as long as it is not important which data is written for a given key, since that may well be affected by the re-ordering).

This might be a pretty narrow use case, but since the changes are non-invasive, I thought I'd offer a pull-request, in case it is useful to somebody else. Any feedback is greatly appreciated.

@kantp
Copy link
Author

kantp commented Jan 27, 2016

Strange, it gets a Kind mis-match on ghc-7.6.3, but is accepted by ghc-7.8.4 and 7.10.2. I'll have to look into this.

Philipp Kant added 2 commits January 30, 2016 21:30
This commit introduces a new function dataFetchEquiv, that allows to
define equivalent requests, i.e. classes of requests where it can be
known a priori that they will yield the same result.

For a given equivalence class, at most one request will be performed,
and equivalent requests will give the same result.
The type signature needs to be slightly different on older ghc
versions (Typeable vs. Typeable1).
@kantp kantp force-pushed the equivalence-classes branch from acff66e to 46d9dfd Compare January 30, 2016 20:31
For illustration and testing.
@kantp kantp force-pushed the equivalence-classes branch from 3f10d2e to 32827ba Compare February 2, 2016 12:20
@kantp
Copy link
Author

kantp commented Feb 4, 2016

Ok, with 46d9dfd, it works with older ghc versions as well.

Also, for added context (because the example I give in the test is not particularly useful), I described how I use this new functionality to solve a real problem in a lightning talk at the last Haskell eXchange: https://skillsmatter.com/skillscasts/6538-data-deduplication-in-haskell-an-experience-report.

@simonmar
Copy link
Contributor

simonmar commented Feb 5, 2016

I started at this for a while and I don't think I understand all the ramifications. You can certainly do very strange things with dataFetchEquiv. What are the properties under which dataFetchEquiv is safe to use?

I'm also concerned that there's an /O(n)/ search in there when the request is not already cached.

This needs more thought IMO. We've come across cases that are similar but I'm not sure if they're catered for by dataFetchEquiv. For example, a query that counts the number of Xs is satisfied by a previous query that fetches all the Xs.

@kantp
Copy link
Author

kantp commented Feb 5, 2016

Thanks for your feedback, Simon. I think the documentation needs some improvement.

In order to safely use dataFetchEquiv, the following conditions have to hold for the two functions equiv and representative that you supply to it:

  • equiv has to be an equivalence relation (i.e., a equiv b and b equiv c imply a equiv c, a equiv b is the same as b equiv a, and a equiv a always holds)
  • representative a == representative b iff a equiv b
  • Two requests for which equiv is true always give the same results.

If you can define these two functions on your requests, you can avoid some redundant data fetching, since for any two requests a, b with a equiv b, only one will be performed, and the result used for both.

Regarding the O(n) search, this might be problematic if there are many BlockedFetches in the queue. However, I don't see a way to avoid this, without changing the internals of the RequestStore.

The case you described is interesting, but I don't think it is covered by this as is. Maybe it could be solved in a somewhat similar fashion, though. You are not trying to identify requests that give the same answer, but for two requests where the answer to one can be inferred from the answer to the other.

I think it could be possible to write down a function that is similar to dataFetchEquiv, but instead of the function equiv, you would have to provide one function that can identify a request that provides enough information to get the answer to the request you try to perform, and another function that uses the result from the other request to get the answer to this one.

In your example, before entering a request Count xs, it would check if Get xs was already there (in which case it could use the length of the result of Get xs). What do you think?

@kantp kantp mentioned this pull request Feb 5, 2016
@kantp
Copy link
Author

kantp commented Feb 5, 2016

@simonmar I just made a draft of how I think the use case you described can be handled (see #41). It could probably benefit from some refactoring, as it introduces a little code redundancy, but it demonstrates the general idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants