Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move fs2 Catenable into cats.data #2358

Closed
LukaJCB opened this issue Aug 2, 2018 · 40 comments · Fixed by #2371
Closed

Move fs2 Catenable into cats.data #2358

LukaJCB opened this issue Aug 2, 2018 · 40 comments · Fixed by #2371

Comments

@LukaJCB
Copy link
Member

LukaJCB commented Aug 2, 2018

Catenable is a really nice data structure that gives great performance for things like appending. It'd be great to have it in cats.data. @mpilquist agreed that this would be a good idea.

@djspiewak
Copy link
Member

I'm leery of putting fancy data structures into cats just to have a place to put them. To me, this seems like something that belongs more in a project like dogs. Even NonEmptyList is kind of pushing it, IMO. And unlike NEL, there's no strong integration between Catenable and other structures/classes.

@LukaJCB
Copy link
Member Author

LukaJCB commented Aug 2, 2018

@djspiewak There's been some similar discussion in #2058.
I'm still open to the idea of a seperate module, but I don't think dogs has really worked out.

@djspiewak
Copy link
Member

I feel like dogs mostly hasn't worked out for other reasons, one of which being differentiation on the data structures. Not a lot of people care about IList or ISet, but if you gave them a HashMap backed by a bitmapped trie that works via typeclasses, or a Catenable, etc etc, now suddenly it's a more interesting dependency. Pushing Catenable into cats-data just perpetuates the root problem rather than addressing it while also scope-creeping the API surface area.

@johnynek
Copy link
Contributor

johnynek commented Aug 2, 2018

I’m +1 on taking it in core. I have previously made my views on this known in #2058

@djspiewak
Copy link
Member

@johnynek A lot of this is the poly/mono debate. I will note that I am in agreement with you on things like cats-free, which doesn't really serve a purpose at all since it a) has the same upstream dependences, and b) is version- and compatibility-locked to cats-core. If either of those were false, there would be value in having it separated. In the case of a hypothetical data structures project (such as dogs), b is clearly false, and desirably so.

@tpolecat
Copy link
Member

tpolecat commented Aug 2, 2018

👍 for Catenable because it's not at all speculative. I like it when generic stuff gets pushed upstream from proven libraries.

And more generally I'm 👍 for more functional data structures in cats.data … dogs isn't working and we need this stuff.

@djspiewak
Copy link
Member

djspiewak commented Aug 2, 2018

dogs isn't working

Let's dwell on that point for a second: why isn't dogs working? Is it the limited data structures currently implemented? @stew's restricted availability of free time? The sheer fact that it's a separate dependency? What exactly?

@tpolecat
Copy link
Member

tpolecat commented Aug 2, 2018

Yeah I think it hasn't gotten enough attention for people to feel confident using it. If it were a cats module (or just got GC'd over into cats.data) or otherwise demonstrated signs of life we might get to a critical mass where there would be some uptake. But it's often easier to pile onto an already-critical mass.

@djspiewak
Copy link
Member

But it's often easier to pile onto an already-critical mass.

@tpolecat That's part of what I'm afraid of. Huge surface area on API makes compatibility extremely hard, releases slow and bulky and painful. It means that data structure fixes and tweaks (which are quite common!) get conflated with core typeclass fixes and tweaks (which are unbelievably rare). I think cats-effect demonstrates that separate-but-closely-related ecosystem projects can work, it just takes a bit of maintenance attention and a bit more marketing effort.

A temporarily unattended project is a temporary problem. Increased API surface area is forever.

@tpolecat
Copy link
Member

tpolecat commented Aug 2, 2018

I don't get the sense that it has been a problem for scalaz, although Kenji is some kind of mighty space robot so he may have managed a lot of hassles we never heard about.

@djspiewak
Copy link
Member

djspiewak commented Aug 2, 2018

It's been hugely annoying for scalaz, resulting in issues not actually being addressed in things like Task and Actor over the years. It also kind of freezes IList/IMap in place in a lot of unfortunate ways, especially from the standpoint of optimization and API tweaks.

Also yeah, Kenji is definitely a mighty space robot.

@johnynek
Copy link
Contributor

johnynek commented Aug 2, 2018

we (Stripe) just finally executed a fairly painful update to cats 1.1. It was a lot of work, and no one thought it was fun.

To me, locking APIs is exactly the value add of stable libraries. I am willing to compromise API beauty for reduced cost on the ecosystem to upgrade.

I'm already looking at 2.13, and I'm a bit skeptical I will ever use it. Getting cats to work with it was hard enough. I can't imagine how long it's going to take Stripe (much less someplace like Twitter) to make all the needed changes. Stripe is still mostly on 2.11 (a few repos on 2.12).

My point is: the scala community is far too willing to break compatibility, if you ask me. If bringing things into cats makes it harder to change the APIs, I'd say that's a win.

As for the objection of bad design and starting to use types in inappropriate places, all I can say there is I hope we have good code review.

@johnynek
Copy link
Contributor

johnynek commented Aug 2, 2018

side note: I wish we could communicate the eng-hours cost of breaking APIs back to library authors. Currently there is no real mechanism except hearing that adoption isn't great. I think people would be shocked at the collective eng-hours these breakages cause.

@kailuowang
Copy link
Contributor

kailuowang commented Aug 2, 2018

We haven't been able to agree upon a principled standard to decide whether something should go into cats-core or a separate module or a separate repo. Every time we needed to make such a decision, we spent a lot of time re-debate on what this principled standard should be.
Maybe we should consider giving up debating on such a principled standard (I think it's really a bunch of trade-offs that differ case by case). Rather we just evaluate case by case, i.e. we focus very specifically on whether we should move Catenable into cats-core.

  • How likely isCatenable is going to change?
  • How applicable/useful is it to cats-core users?

If it's very unlikely to change and very useful/applicable, then we'll probably get more benefit out of moving it in than the risk of future cost.

We also need some tooling to quickly start something that is potentially useful but also likely to change. I suggest that we should make sbt-catalyst so easy to use that it takes no more than 10 minutes to

  1. start a project using sbt new sbt-catalyst.g8, with all the common settings cats-xxxx projects use.
  2. copy the code and test in.
  3. run the tests
  4. publish to maven central.

@ghost
Copy link

ghost commented Aug 2, 2018

Perhaps one solution to help ease the pain and suffering for multi-billion dollar companies might be that said companies invest some of their hard-earned pocket money into the open source projects?

It's just not true that there is "no real mechanism"...it's an open community. Just join in the fun.

@djspiewak
Copy link
Member

djspiewak commented Aug 2, 2018

Perhaps one solution to help ease the pain and suffering for multi-billion dollar companies might be that said companies invest some of their hard-earned pocket money into the open source projects?

This is a bigger can of worms but the short answer is that they often do. Objectively not as much as they should but it still happens, especially in areas of upgrades.

In my experience, the problem often boils down to the developers and immediate team leads "in the trenches": they're just not comfortable jumping into OSS like that. Most people don't have a long history of work done on github. Most people aren't on a first-name basis with project maintainers across the ecosystem. Most people are actually pretty intimidated by all of that. It's really no one's fault – this is a very natural reaction, and certainly most Scala project maintainers have done everything in their power to break down those walls – but it is what it is.

So what happens then is a company starts working on slowly turning its version of the Titanic from bow heading 2.11 to 2.12. This usually takes the form of some sort of directive or tooling version bump that strongly encourages/pushes/mandates/whatevers. Ultimately, the individual teams end up doing all of the work, and when they ultimately run into missing versions, forced major upgrades to get Scala compatibility (looking at you, ScalaCheck), and so on, their solution is more often than not to punt. Sometimes they remove dependencies entirely. Sometimes they use Ivy force() and hope that god never notices. Most often they just report back that it's too hard and just not going to happen in the midst of all their other deadlines.

Very, very, very rarely do they submit a PR that fixes things. It does happen but almost exclusively when the developer/team-lead in question is someone who has a moderate-to-strong background in OSS and the Scala ecosystem in particular. Stripe has quite a few of those people. Verizon IPTV had even more of them at its peak, and upgrading 2.10 -> 2.11 was a nearly impossible task for them (forget about 2.12).

So tldr, what you're suggesting does happen, and when it doesn't it's not necessarily the fault of the company itself (most C-level executives really don't care if you have to put some time into nameless (to them) OSS in order to deliver product). However, even when it does happen, it doesn't solve the problem.

Upgrades are hard. Especially in the Scala ecosystem.

@ghost
Copy link

ghost commented Aug 2, 2018

Well I've done my time in companies ten times+ the worth of Twitter, so for sure I get your points. Often, people just aren't allowed to engage with OSS - it really is that simple.

But here's a flip: I get paid zero for any work in any programming language, so really, why should I care? I do care about some things, but not about faceless, imaginary things...it's just not possible.

@LukaJCB
Copy link
Member Author

LukaJCB commented Aug 3, 2018

Answering Kai's questions:

  • How likely is Catenable is going to change?

I think Catenable can be considered fairly stable, AFAICS it's been in fs2 for over two years and has barely had any changes since, its algebra is very straightforward and very unlikely to change. I can envision additions to its api, but I don't think compatibility breaking changes are likely at all.

  • How applicable/useful is it to cats-core users?

I'd argue it's really useful when working with monoids in e.g. WriterT, it's really useful to have a datastructure that supports O(1) appends and I'd argue Vector just isn't the best structure to go with. I recently got some really nice speedups switching from Vector to Catenable. I think cats is at its best when we can sell our abstractions in a performant way. Monoidal accumulation is super common with things like Writer, Const, Validated or Ior and having a good answer to performance questions is really useful IMO. :)

@djspiewak
Copy link
Member

@kailuowang On the subject of Catenable in cats-core specifically… One problem we're going to hit if we do pull it into core is we sort of obviate the whole notion of dogs. It doesn't make any sense to have one data structure in core and all the others somewhere else, and once Catenable is in core, it's very very difficult to get it back out again. So this is part of why I think the general principle argument is still relevant here.

Let's take a step back from what has happened with dogs specifically and think about what we really want from an ecosystem organizational standpoint. Here's the question I would pose do you: In an ideal world, is it better to have dogs, or cats-data (as part of core)? Imagine a dogs that is as actively maintained and as widely-known as cats-effect. My argument is that this situation is preferable over having a cats-data as part of the core project. If, however, the broad consensus is that having the data structures (not just Catenable) all pulled into the core project is optimal for the ecosystem, then that really does resolve the Catenable question.

What I really want is to summon spare time from a hat so that I can give dogs some of the love it deserves…

@tpolecat
Copy link
Member

tpolecat commented Aug 3, 2018

Maybe we can convince Haoyi to let us borrow his time machine.

@kailuowang
Copy link
Contributor

kailuowang commented Aug 3, 2018

@djspiewak I see your point.
Let's take one more step back and state the specific problem as follows:
Catenable is useful, we want to enable more projects to use it without fs2 dep.

Right now we have mainly 3 options:

  1. a new project just for Catenable
    sounds like too much infrastructure cost to provide a single feature.
  2. put it in cats-core the foundation of an ecosystem
    If something is widely useful, we'd like it to be part of a foundation of an ecosystem so that a) people can use it and b) people wouldn't worry too much about it being changed. On the other hand, it's very costly to have something suboptimal stuck in the foundation, so it has to be stable and proven, which it is.
    But, hey what if dogs becomes mature and it'd be logical to have it in dogs.
  3. put it in dogs, but, it's not that well maintained and it's not that stable so people might hesitate to use Catenable as a foundational data structure without worrying about binary compatibility.

Does that summarize our current situation with this specific problem?

@djspiewak
Copy link
Member

Dose that summarize our current situation with this specific problem?

It does. Just so my preferences are clear at a high level:

  • 3
  • 2
  • 1

(1 is a distant last place)

Obviously 3 is contingent on being able to resolve the maintenance and stability problems. I really wish I could just straight up volunteer to take it on. I certainly have the interest, I just don't have the time right now (similar to the situation @stew is in, I would assume).

@johnynek
Copy link
Contributor

johnynek commented Aug 3, 2018

2, 3, 1

PS: I think Haoyi's time machine is that he doesn't really engage in democratically run projects, and he can quickly make decisions/changes. Secondly, he gets things in a decent shape, and moves on to the next project.

I think for long term governance, cats' approach works well, it won't die if any one person leaves, but it does create costs as we struggle through areas where we disagree.

I would argue we should have a project lead (Kai?) with a term of say 1 year, then a formal set of committers that can vote on the next maintainer. If we had something like this, we could just agree the maintainer makes the call on an unclear decision.

@LukaJCB
Copy link
Member Author

LukaJCB commented Aug 3, 2018

I also vote 2,3,1.

I'm not sure I like the idea of a "benevolent dictator". I'd rather we actually have something like a formal vote instead. :)

@mpilquist
Copy link
Member

mpilquist commented Aug 3, 2018 via email

@djspiewak
Copy link
Member

@johnynek In fairness, I think we would all pretty much defer to @kailuowang. I'm not necessarily averse to what you're saying, just pointing out that, at least for the time being, things aren't exactly ambiguous.

@kailuowang
Copy link
Contributor

I have to say I am honored to be mentioned by @johnynek in regards to such a role - I am not so sure about that, but let's discuss in a separate thread (created #2362)

@LukaJCB
Copy link
Member Author

LukaJCB commented Aug 8, 2018

So what is the best way to unblock this issue, given that this should probably happen before fs2 1.0 rolls out?

@kailuowang
Copy link
Contributor

Looks like 2 got the majority vote here, also 3 has a contingency issue (dogs maintainence) yet to resolve.
I'd say let's move it in to cats.data

@LukaJCB LukaJCB mentioned this issue Aug 8, 2018
6 tasks
@SystemFw
Copy link
Contributor

SystemFw commented Aug 8, 2018

Maybe it's really stupid, but I feel that dogs would benefit from being called cats-data instead. It kinda gives it that stamp of approval that cats-effect has

@martin-g
Copy link

martin-g commented Aug 9, 2018

What about mouse, kittens, mainecoon ? Do they need better names too ? In some docu I read that cats is short for categories but the names of the related libraries suggest we are talking about pets/zoo :-)

@ChristopherDavenport
Copy link
Member

I would personally prefer a cats-data library rather than loading these up into cats. If everything moves into cats, then we start to lose the modularity that has so far been a feature to the ecosystem.

@martin-g
Copy link

martin-g commented Aug 9, 2018

I meant just their names, because of dogs would benefit from being called cats-data instead. It kinda gives it that stamp of approval.
I like the renaming of dogs to cats-data, it is much more clear what to expect from this library.
kittens just moved to typelevel organization in GitHub. If it is stamped then maybe its name should be improved too, to cats-derive or something.

@mpilquist
Copy link
Member

👍 for an official cats collections library. I'd avoid calling it cats-data as that conflicts with cats.data package in cats-core and packages shouldn't be split across jars/modules. And we can't remove cats.data from core b/c it's too intertwined with other parts of core. Hence, I'd rather just put everything in 1 JAR and move on.

@djspiewak
Copy link
Member

IMO, anything that uses the cats top-level package should be named cats-something as a project. This was the rationale behind cats-effect, originally. Using the cats.data namespace for a cats collections library would be awesome, but there's simply no way to do it without package splitting, which is a non-starter.

I don't think dogs is necessarily a bad name. I agree cats-data would be better but that's not really the problem. The problem is there's no one to really take on the mantle of maintaining it, adding what is missing, promoting, etc. As I said, I would, but I really don't have time. If someone were to choose to drive dogs forward though, I'm sure it would be embraced by the community. The time is really now, too, especially given the scala collections rework.

@johnynek
Copy link
Contributor

I would argue that Catenable (or Chain) should potentially be in core because it would show up in typeclasses. For instance on Foldable we have toList but that forces and order of construction.

With Catenable/Chain we could do toChain and many structures could more efficiently be created (e.g. pretty much any tree structure not already a linear graph)

@stew
Copy link
Contributor

stew commented Aug 31, 2018

dogs doesn't get attention because nobody is using it, including myself. (I'm not using scala for anything at the moment and haven't been for some time). The reason nobody is using it is likely that it doesn't get any attention, etc. I would love for someone to pay attention to it, I don't have the time or motivation currently

@johnynek
Copy link
Contributor

It would be nice if we had tooling to version and publish all "the typelevel stack" together. We could do that in one large repo, or we could have some tooling and maybe an sbt plugin that allows you to get the right version of all projects in the typelevel stack such that you know they were tested together.

I think adding modules to cats proper is a good solution in the mean time, especially for projects that aren't changing super fast.

@kailuowang
Copy link
Contributor

@johnynek the vision for sbt-catalyst is to become such a tool to coordinate the typelevel stack releases - with a single bump of sbt-catalysts version, you automatically get the latest releases of all your typelevel dependencies that work together properly. Though we never had the time to implement such a community build in it.

@tpolecat
Copy link
Member

tpolecat commented Sep 4, 2018

I have not used sbt-catalysts because it does a lot of stuff I don't understand and it kind of weirds me out. If the only thing it did was provide versions then maybe it would get more uptake.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants