Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add distinct method on NonEmptyList and NonEmptyVector #1243

Merged
merged 2 commits into from
Aug 12, 2016
Merged

Add distinct method on NonEmptyList and NonEmptyVector #1243

merged 2 commits into from
Aug 12, 2016

Conversation

Tvaroh
Copy link
Contributor

@Tvaroh Tvaroh commented Jul 29, 2016

Initially submitted as #1240 (see the discussion there for details).

Adds distinct method to NonEmptyList and NonEmptyVector that uses Order typeclass instance to keep track of duplicates (Scala's immutable TreeSet under the hood).

@frosforever
Copy link
Contributor

LGTM super useful!

/**
* Remove duplicates. Duplicates are checked using `Order[_]` instance.
*/
def distinct(implicit O: Order[A]): NonEmptyList[A] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only question: there is an O(N^2) algorithm that uses only Eq[A] I wonder is it makes sense to also implement that?

we could use something like: https://github.com/non/algebra/blob/master/core/src/main/scala/algebra/Priority.scala

(or we could use Xor for that) and have something like:

def distinct(implicit oe: Xor[Eq[A], Order[A]]): NonEmptyList[A] = oe match {
  case Xor.Right(ord) => // do the tree set which is `O(log N)` per check
  case Xor.Left(eq) => // do a "listset" approach of checking each item, this incurs `O(N)` per check
}

If we had a Hash[A] type that potentially extended Eq[A] we could even have something like:

def distinct(implicit oe: Xor[Eq[A], Xor[Order[A], Hash[A]]]): NonEmptyList[A]

which would prefer to use hash sets, then tree sets, then list sets.

This is perhaps best served with different method names so readers can be more clear which complexity they get, but it is interesting that the semantics of the method don't care.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is Priority recursive, i.e. can support > 2 implicits? Regarding hash sets and Hash typeclass, since hash sets implementations in Scala are based on Object.hashcode() it would require some wrapping of the elements, wouldn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is perhaps best served with different method names so readers can be more clear which complexity they get

I strongly agree with this statement. Especially since IntelliJ has a nasty habit of telling people that imports are unused when they are only used to bring in implicit instances. It would be really unfortunate if deleting an import brought the Hash or Order instance out of scope and some code went from O(n) to O(n^2).

I don't feel a strong need to add an O(n^2) version that requires Eq[A] instead of Order[A], because O(n^2) operations are often impractical, and I suspect that most of the cases in which you want to do something like this you probably can have an Order[A] available. I could definitely see some value in a hash-based approach, but that's a higher barrier given what's currently in Cats.

Personally I'm pretty happy to go forward with this approach and leave open the possibility of another approach in the future.

@ceedubs ceedubs closed this Aug 3, 2016
@ceedubs ceedubs reopened this Aug 3, 2016
@codecov-io
Copy link

codecov-io commented Aug 3, 2016

Current coverage is 90.60% (diff: 100%)

Merging #1243 into master will increase coverage by 0.02%

@@             master      #1243   diff @@
==========================================
  Files           243        243          
  Lines          3288       3298    +10   
  Methods        3231       3237     +6   
  Messages          0          0          
  Branches         54         58     +4   
==========================================
+ Hits           2978       2988    +10   
  Misses          310        310          
  Partials          0          0          

Sunburst

Powered by Codecov. Last update 3fb3ed3...5f2483f

@non
Copy link
Contributor

non commented Aug 12, 2016

I agree with @ceedubs -- for now let's just have .distinct require Order and go from there.

# Conflicts:
#	tests/src/test/scala/cats/tests/NonEmptyVectorTests.scala
@Tvaroh
Copy link
Contributor Author

Tvaroh commented Aug 12, 2016

I've resolved conflicts. Let's merge this before any other appear. Frankly, it's counter-motivating for contributing when a PR hangs for some weeks with no reason.

@johnynek
Copy link
Contributor

👍

sorry for the delay.

@kailuowang
Copy link
Contributor

kailuowang commented Aug 12, 2016

I apologize in advance @Tvaroh for possible further delay but I have a thought that I have to speak out.
Does the duplication between NonEmptyList and NonEmptyVector bother anyone other than me? Could this be implemented as in Reducible? Or does it call for a CanBuildFrom 😭 ? I mean for now the duplication looks minor but what if people keep finding other useful scala.collection methods? I just want to make sure that we didn't accidentally start our own collection lib without planing ahead.

@johnynek
Copy link
Contributor

can we merge this one and add an issue to maybe make a generic one using Foldable and MonoidK? Would that be okay @kailuowang ?

@Tvaroh
Copy link
Contributor Author

Tvaroh commented Aug 12, 2016

Will be happy to generify this if needed.

@johnynek
Copy link
Contributor

johnynek commented Aug 12, 2016

So, I think something like:

  def distinct[A](f: F[A])(implicit ord: Order[A], mk: MonoidK[F]): F[A] = 

on Foldable[F] and maybe a:

def distinct[A](f: F[A])(implicit ord: Order[A], semigroup: SemigroupK[F]): F[A]

on Reducible[F] is possible but it will be less efficient than the ones you have here, so we should specialize the implementation for Reducible[NonEmptyList] and Reducible[NonEmptyVector] (and probably Foldable[List] Foldable[Vector]).

@johnynek
Copy link
Contributor

actually, since you can't promise the MonoidK is a certain form, I don't think you can ever be really efficient there.

I think we should separate the generic versions, from this PR. The ones in the current PR are very efficient, which is nice.

@Tvaroh
Copy link
Contributor Author

Tvaroh commented Aug 12, 2016

I agree (about separating generification from this PR).

@kailuowang
Copy link
Contributor

👍 . I am okay with having a separate issue for generic distinct.

@johnynek johnynek merged commit cb7e2df into typelevel:master Aug 12, 2016
@stew stew removed the in progress label Aug 12, 2016
@Tvaroh
Copy link
Contributor Author

Tvaroh commented Aug 12, 2016

Thank you. Should I create an issue to discuss generic stuff there?

@Tvaroh
Copy link
Contributor Author

Tvaroh commented Aug 12, 2016

@johnynek I think we also need A: Applicative[F] implicit to be able to do:

val as: F[A] = ??? // distinct values
mk.combineK(as, A.pure(a))) // in foldLeft body

Not sure if it could be made more efficient.

@johnynek
Copy link
Contributor

@Tvaroh there is MonadCombine. Can we think of a lawful way do define ApplicativeCombine?

@Tvaroh
Copy link
Contributor Author

Tvaroh commented Aug 12, 2016

MonadCombine is fine, thanks. So, on Foldable it could be something like this:

def distinct[A](fa: F[A])(implicit O: Order[A], F: MonadCombine[F]): F[A] = {
  implicit val ord = O.toOrdering

  val (_, result) = foldLeft(fa, (TreeSet.empty[A], F.empty[A])) { case (acc@(elementsSoFar, as), a) =>
    if (elementsSoFar(a)) acc else (elementsSoFar + a, F.combineK(as, F.pure(a)))
  }

  result
}

@Tvaroh
Copy link
Contributor Author

Tvaroh commented Aug 12, 2016

Though we don't have MonadCombine for NonEmptyList or NonEmptyVector.

@johnynek
Copy link
Contributor

yeah, no empty. We need something like MonadSemigroupK....

maybe we are barking up the wrong tree here.

@Tvaroh
Copy link
Contributor Author

Tvaroh commented Aug 12, 2016

Also MonoidK.empty cannot be implemented for "non empty" collections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants