Move fs2 Catenable into cats.data #2358

LukaJCB · 2018-08-02T14:35:50Z

Catenable is a really nice data structure that gives great performance for things like appending. It'd be great to have it in cats.data. @mpilquist agreed that this would be a good idea.

The text was updated successfully, but these errors were encountered:

djspiewak · 2018-08-02T15:00:41Z

I'm leery of putting fancy data structures into cats just to have a place to put them. To me, this seems like something that belongs more in a project like dogs. Even NonEmptyList is kind of pushing it, IMO. And unlike NEL, there's no strong integration between Catenable and other structures/classes.

LukaJCB · 2018-08-02T15:23:27Z

@djspiewak There's been some similar discussion in #2058.
I'm still open to the idea of a seperate module, but I don't think dogs has really worked out.

djspiewak · 2018-08-02T15:41:27Z

I feel like dogs mostly hasn't worked out for other reasons, one of which being differentiation on the data structures. Not a lot of people care about IList or ISet, but if you gave them a HashMap backed by a bitmapped trie that works via typeclasses, or a Catenable, etc etc, now suddenly it's a more interesting dependency. Pushing Catenable into cats-data just perpetuates the root problem rather than addressing it while also scope-creeping the API surface area.

johnynek · 2018-08-02T15:45:45Z

I’m +1 on taking it in core. I have previously made my views on this known in #2058

djspiewak · 2018-08-02T15:50:05Z

@johnynek A lot of this is the poly/mono debate. I will note that I am in agreement with you on things like cats-free, which doesn't really serve a purpose at all since it a) has the same upstream dependences, and b) is version- and compatibility-locked to cats-core. If either of those were false, there would be value in having it separated. In the case of a hypothetical data structures project (such as dogs), b is clearly false, and desirably so.

tpolecat · 2018-08-02T16:20:13Z

👍 for Catenable because it's not at all speculative. I like it when generic stuff gets pushed upstream from proven libraries.

And more generally I'm 👍 for more functional data structures in cats.data … dogs isn't working and we need this stuff.

djspiewak · 2018-08-02T16:44:59Z

dogs isn't working

Let's dwell on that point for a second: why isn't dogs working? Is it the limited data structures currently implemented? @stew's restricted availability of free time? The sheer fact that it's a separate dependency? What exactly?

tpolecat · 2018-08-02T17:05:55Z

Yeah I think it hasn't gotten enough attention for people to feel confident using it. If it were a cats module (or just got GC'd over into cats.data) or otherwise demonstrated signs of life we might get to a critical mass where there would be some uptake. But it's often easier to pile onto an already-critical mass.

djspiewak · 2018-08-02T17:55:52Z

But it's often easier to pile onto an already-critical mass.

@tpolecat That's part of what I'm afraid of. Huge surface area on API makes compatibility extremely hard, releases slow and bulky and painful. It means that data structure fixes and tweaks (which are quite common!) get conflated with core typeclass fixes and tweaks (which are unbelievably rare). I think cats-effect demonstrates that separate-but-closely-related ecosystem projects can work, it just takes a bit of maintenance attention and a bit more marketing effort.

A temporarily unattended project is a temporary problem. Increased API surface area is forever.

tpolecat · 2018-08-02T18:00:11Z

I don't get the sense that it has been a problem for scalaz, although Kenji is some kind of mighty space robot so he may have managed a lot of hassles we never heard about.

djspiewak · 2018-08-02T18:11:21Z

It's been hugely annoying for scalaz, resulting in issues not actually being addressed in things like Task and Actor over the years. It also kind of freezes IList/IMap in place in a lot of unfortunate ways, especially from the standpoint of optimization and API tweaks.

Also yeah, Kenji is definitely a mighty space robot.

johnynek · 2018-08-02T18:39:37Z

we (Stripe) just finally executed a fairly painful update to cats 1.1. It was a lot of work, and no one thought it was fun.

To me, locking APIs is exactly the value add of stable libraries. I am willing to compromise API beauty for reduced cost on the ecosystem to upgrade.

I'm already looking at 2.13, and I'm a bit skeptical I will ever use it. Getting cats to work with it was hard enough. I can't imagine how long it's going to take Stripe (much less someplace like Twitter) to make all the needed changes. Stripe is still mostly on 2.11 (a few repos on 2.12).

My point is: the scala community is far too willing to break compatibility, if you ask me. If bringing things into cats makes it harder to change the APIs, I'd say that's a win.

As for the objection of bad design and starting to use types in inappropriate places, all I can say there is I hope we have good code review.

johnynek · 2018-08-02T18:41:06Z

side note: I wish we could communicate the eng-hours cost of breaking APIs back to library authors. Currently there is no real mechanism except hearing that adoption isn't great. I think people would be shocked at the collective eng-hours these breakages cause.

kailuowang · 2018-08-02T19:20:45Z

We haven't been able to agree upon a principled standard to decide whether something should go into cats-core or a separate module or a separate repo. Every time we needed to make such a decision, we spent a lot of time re-debate on what this principled standard should be.
Maybe we should consider giving up debating on such a principled standard (I think it's really a bunch of trade-offs that differ case by case). Rather we just evaluate case by case, i.e. we focus very specifically on whether we should move Catenable into cats-core.

How likely isCatenable is going to change?
How applicable/useful is it to cats-core users?

If it's very unlikely to change and very useful/applicable, then we'll probably get more benefit out of moving it in than the risk of future cost.

We also need some tooling to quickly start something that is potentially useful but also likely to change. I suggest that we should make sbt-catalyst so easy to use that it takes no more than 10 minutes to

start a project using sbt new sbt-catalyst.g8, with all the common settings cats-xxxx projects use.
copy the code and test in.
run the tests
publish to maven central.

ghost · 2018-08-02T20:52:37Z

Perhaps one solution to help ease the pain and suffering for multi-billion dollar companies might be that said companies invest some of their hard-earned pocket money into the open source projects?

It's just not true that there is "no real mechanism"...it's an open community. Just join in the fun.

djspiewak · 2018-08-02T21:18:16Z

Perhaps one solution to help ease the pain and suffering for multi-billion dollar companies might be that said companies invest some of their hard-earned pocket money into the open source projects?

This is a bigger can of worms but the short answer is that they often do. Objectively not as much as they should but it still happens, especially in areas of upgrades.

In my experience, the problem often boils down to the developers and immediate team leads "in the trenches": they're just not comfortable jumping into OSS like that. Most people don't have a long history of work done on github. Most people aren't on a first-name basis with project maintainers across the ecosystem. Most people are actually pretty intimidated by all of that. It's really no one's fault – this is a very natural reaction, and certainly most Scala project maintainers have done everything in their power to break down those walls – but it is what it is.

So what happens then is a company starts working on slowly turning its version of the Titanic from bow heading 2.11 to 2.12. This usually takes the form of some sort of directive or tooling version bump that strongly encourages/pushes/mandates/whatevers. Ultimately, the individual teams end up doing all of the work, and when they ultimately run into missing versions, forced major upgrades to get Scala compatibility (looking at you, ScalaCheck), and so on, their solution is more often than not to punt. Sometimes they remove dependencies entirely. Sometimes they use Ivy force() and hope that god never notices. Most often they just report back that it's too hard and just not going to happen in the midst of all their other deadlines.

Very, very, very rarely do they submit a PR that fixes things. It does happen but almost exclusively when the developer/team-lead in question is someone who has a moderate-to-strong background in OSS and the Scala ecosystem in particular. Stripe has quite a few of those people. Verizon IPTV had even more of them at its peak, and upgrading 2.10 -> 2.11 was a nearly impossible task for them (forget about 2.12).

So tldr, what you're suggesting does happen, and when it doesn't it's not necessarily the fault of the company itself (most C-level executives really don't care if you have to put some time into nameless (to them) OSS in order to deliver product). However, even when it does happen, it doesn't solve the problem.

Upgrades are hard. Especially in the Scala ecosystem.

ghost · 2018-08-02T21:38:25Z

Well I've done my time in companies ten times+ the worth of Twitter, so for sure I get your points. Often, people just aren't allowed to engage with OSS - it really is that simple.

But here's a flip: I get paid zero for any work in any programming language, so really, why should I care? I do care about some things, but not about faceless, imaginary things...it's just not possible.

LukaJCB · 2018-08-03T09:46:11Z

Answering Kai's questions:

How likely is Catenable is going to change?

I think Catenable can be considered fairly stable, AFAICS it's been in fs2 for over two years and has barely had any changes since, its algebra is very straightforward and very unlikely to change. I can envision additions to its api, but I don't think compatibility breaking changes are likely at all.

How applicable/useful is it to cats-core users?

I'd argue it's really useful when working with monoids in e.g. WriterT, it's really useful to have a datastructure that supports O(1) appends and I'd argue Vector just isn't the best structure to go with. I recently got some really nice speedups switching from Vector to Catenable. I think cats is at its best when we can sell our abstractions in a performant way. Monoidal accumulation is super common with things like Writer, Const, Validated or Ior and having a good answer to performance questions is really useful IMO. :)

djspiewak · 2018-08-03T16:57:38Z

@kailuowang On the subject of Catenable in cats-core specifically… One problem we're going to hit if we do pull it into core is we sort of obviate the whole notion of dogs. It doesn't make any sense to have one data structure in core and all the others somewhere else, and once Catenable is in core, it's very very difficult to get it back out again. So this is part of why I think the general principle argument is still relevant here.

Let's take a step back from what has happened with dogs specifically and think about what we really want from an ecosystem organizational standpoint. Here's the question I would pose do you: In an ideal world, is it better to have dogs, or cats-data (as part of core)? Imagine a dogs that is as actively maintained and as widely-known as cats-effect. My argument is that this situation is preferable over having a cats-data as part of the core project. If, however, the broad consensus is that having the data structures (not just Catenable) all pulled into the core project is optimal for the ecosystem, then that really does resolve the Catenable question.

What I really want is to summon spare time from a hat so that I can give dogs some of the love it deserves…

tpolecat · 2018-08-03T16:58:44Z

Maybe we can convince Haoyi to let us borrow his time machine.

kailuowang · 2018-08-03T17:43:57Z

@djspiewak I see your point.
Let's take one more step back and state the specific problem as follows:
Catenable is useful, we want to enable more projects to use it without fs2 dep.

Right now we have mainly 3 options:

a new project just for Catenable
sounds like too much infrastructure cost to provide a single feature.
put it in cats-core the foundation of an ecosystem
If something is widely useful, we'd like it to be part of a foundation of an ecosystem so that a) people can use it and b) people wouldn't worry too much about it being changed. On the other hand, it's very costly to have something suboptimal stuck in the foundation, so it has to be stable and proven, which it is.
But, hey what if dogs becomes mature and it'd be logical to have it in dogs.
put it in dogs, but, it's not that well maintained and it's not that stable so people might hesitate to use Catenable as a foundational data structure without worrying about binary compatibility.

Does that summarize our current situation with this specific problem?

djspiewak · 2018-08-03T17:48:19Z

Dose that summarize our current situation with this specific problem?

It does. Just so my preferences are clear at a high level:

3
2
1

(1 is a distant last place)

Obviously 3 is contingent on being able to resolve the maintenance and stability problems. I really wish I could just straight up volunteer to take it on. I certainly have the interest, I just don't have the time right now (similar to the situation @stew is in, I would assume).

johnynek · 2018-08-03T18:43:41Z

2, 3, 1

PS: I think Haoyi's time machine is that he doesn't really engage in democratically run projects, and he can quickly make decisions/changes. Secondly, he gets things in a decent shape, and moves on to the next project.

I think for long term governance, cats' approach works well, it won't die if any one person leaves, but it does create costs as we struggle through areas where we disagree.

I would argue we should have a project lead (Kai?) with a term of say 1 year, then a formal set of committers that can vote on the next maintainer. If we had something like this, we could just agree the maintainer makes the call on an unclear decision.

LukaJCB · 2018-08-03T19:05:44Z

I also vote 2,3,1.

I'm not sure I like the idea of a "benevolent dictator". I'd rather we actually have something like a formal vote instead. :)

mpilquist · 2018-08-03T19:24:49Z

2, 1, 3 personally with 1 and 3 close.

…

On Aug 3, 2018, at 3:05 PM, Luka Jacobowitz ***@***.***> wrote: I also vote 2,3,1. I'm not sure I like the idea of a "benevolent dictator". I'd rather we actually have something like a formal vote instead. :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

djspiewak · 2018-08-03T19:43:58Z

@johnynek In fairness, I think we would all pretty much defer to @kailuowang. I'm not necessarily averse to what you're saying, just pointing out that, at least for the time being, things aren't exactly ambiguous.

kailuowang · 2018-08-03T19:45:54Z

I have to say I am honored to be mentioned by @johnynek in regards to such a role - I am not so sure about that, but let's discuss in a separate thread (created #2362)

LukaJCB · 2018-08-08T08:37:12Z

So what is the best way to unblock this issue, given that this should probably happen before fs2 1.0 rolls out?

kailuowang · 2018-08-08T10:46:28Z

Looks like 2 got the majority vote here, also 3 has a contingency issue (dogs maintainence) yet to resolve.
I'd say let's move it in to cats.data

SystemFw · 2018-08-08T16:47:50Z

Maybe it's really stupid, but I feel that dogs would benefit from being called cats-data instead. It kinda gives it that stamp of approval that cats-effect has

martin-g · 2018-08-09T13:20:53Z

What about mouse, kittens, mainecoon ? Do they need better names too ? In some docu I read that cats is short for categories but the names of the related libraries suggest we are talking about pets/zoo :-)

ChristopherDavenport · 2018-08-09T13:31:35Z

I would personally prefer a cats-data library rather than loading these up into cats. If everything moves into cats, then we start to lose the modularity that has so far been a feature to the ecosystem.

martin-g · 2018-08-09T13:49:03Z

I meant just their names, because of dogs would benefit from being called cats-data instead. It kinda gives it that stamp of approval.
I like the renaming of dogs to cats-data, it is much more clear what to expect from this library.
kittens just moved to typelevel organization in GitHub. If it is stamped then maybe its name should be improved too, to cats-derive or something.

mpilquist · 2018-08-09T14:44:14Z

👍 for an official cats collections library. I'd avoid calling it cats-data as that conflicts with cats.data package in cats-core and packages shouldn't be split across jars/modules. And we can't remove cats.data from core b/c it's too intertwined with other parts of core. Hence, I'd rather just put everything in 1 JAR and move on.

djspiewak · 2018-08-11T17:57:43Z

IMO, anything that uses the cats top-level package should be named cats-something as a project. This was the rationale behind cats-effect, originally. Using the cats.data namespace for a cats collections library would be awesome, but there's simply no way to do it without package splitting, which is a non-starter.

I don't think dogs is necessarily a bad name. I agree cats-data would be better but that's not really the problem. The problem is there's no one to really take on the mantle of maintaining it, adding what is missing, promoting, etc. As I said, I would, but I really don't have time. If someone were to choose to drive dogs forward though, I'm sure it would be embraced by the community. The time is really now, too, especially given the scala collections rework.

johnynek · 2018-08-11T18:04:39Z

I would argue that Catenable (or Chain) should potentially be in core because it would show up in typeclasses. For instance on Foldable we have toList but that forces and order of construction.

With Catenable/Chain we could do toChain and many structures could more efficiently be created (e.g. pretty much any tree structure not already a linear graph)

stew · 2018-08-31T00:06:04Z

dogs doesn't get attention because nobody is using it, including myself. (I'm not using scala for anything at the moment and haven't been for some time). The reason nobody is using it is likely that it doesn't get any attention, etc. I would love for someone to pay attention to it, I don't have the time or motivation currently

johnynek · 2018-08-31T00:16:56Z

It would be nice if we had tooling to version and publish all "the typelevel stack" together. We could do that in one large repo, or we could have some tooling and maybe an sbt plugin that allows you to get the right version of all projects in the typelevel stack such that you know they were tested together.

I think adding modules to cats proper is a good solution in the mean time, especially for projects that aren't changing super fast.

kailuowang · 2018-09-04T15:00:56Z

@johnynek the vision for sbt-catalyst is to become such a tool to coordinate the typelevel stack releases - with a single bump of sbt-catalysts version, you automatically get the latest releases of all your typelevel dependencies that work together properly. Though we never had the time to implement such a community build in it.

tpolecat · 2018-09-04T17:56:56Z

I have not used sbt-catalysts because it does a lot of stuff I don't understand and it kind of weirds me out. If the only thing it did was provide versions then maybe it would get more uptake.

LukaJCB added the help wanted label Aug 2, 2018

kailuowang added the enhancement label Aug 2, 2018

djspiewak mentioned this issue Aug 3, 2018

[Meta] how to make decisions as a team during disagreement #2362

Closed

LukaJCB mentioned this issue Aug 8, 2018

Add Chain #2371

Merged

6 tasks

LukaJCB added the in progress label Aug 14, 2018

LukaJCB closed this as completed in #2371 Aug 15, 2018

mijicd mentioned this issue Nov 1, 2018

Catenable stack-ended queue typelevel/cats-collections#76

Closed

Move fs2 Catenable into cats.data #2358

Move fs2 Catenable into cats.data #2358

Comments

LukaJCB commented Aug 2, 2018

djspiewak commented Aug 2, 2018

LukaJCB commented Aug 2, 2018 • edited Loading

djspiewak commented Aug 2, 2018

johnynek commented Aug 2, 2018

djspiewak commented Aug 2, 2018

tpolecat commented Aug 2, 2018

djspiewak commented Aug 2, 2018 • edited Loading

tpolecat commented Aug 2, 2018

djspiewak commented Aug 2, 2018

tpolecat commented Aug 2, 2018

djspiewak commented Aug 2, 2018 • edited Loading

johnynek commented Aug 2, 2018 • edited Loading

johnynek commented Aug 2, 2018

kailuowang commented Aug 2, 2018 • edited Loading

ghost commented Aug 2, 2018

djspiewak commented Aug 2, 2018 • edited Loading

ghost commented Aug 2, 2018

LukaJCB commented Aug 3, 2018 • edited Loading

djspiewak commented Aug 3, 2018

tpolecat commented Aug 3, 2018

kailuowang commented Aug 3, 2018 • edited Loading

djspiewak commented Aug 3, 2018

johnynek commented Aug 3, 2018

LukaJCB commented Aug 3, 2018

mpilquist commented Aug 3, 2018 via email

djspiewak commented Aug 3, 2018

kailuowang commented Aug 3, 2018

LukaJCB commented Aug 8, 2018

kailuowang commented Aug 8, 2018

SystemFw commented Aug 8, 2018

martin-g commented Aug 9, 2018

ChristopherDavenport commented Aug 9, 2018

martin-g commented Aug 9, 2018

mpilquist commented Aug 9, 2018

djspiewak commented Aug 11, 2018

johnynek commented Aug 11, 2018

stew commented Aug 31, 2018

johnynek commented Aug 31, 2018

kailuowang commented Sep 4, 2018

tpolecat commented Sep 4, 2018 • edited Loading

LukaJCB commented Aug 2, 2018 •

edited

Loading

djspiewak commented Aug 2, 2018 •

edited

Loading

djspiewak commented Aug 2, 2018 •

edited

Loading

johnynek commented Aug 2, 2018 •

edited

Loading

kailuowang commented Aug 2, 2018 •

edited

Loading

djspiewak commented Aug 2, 2018 •

edited

Loading

LukaJCB commented Aug 3, 2018 •

edited

Loading

kailuowang commented Aug 3, 2018 •

edited

Loading

tpolecat commented Sep 4, 2018 •

edited

Loading