-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regarding stdlib functions #100
Comments
The is already |
@asm0dey yes that's right. That's already possible, but it might be heavy, since it needs to first collect everything in memory and then run the stdlib functions on the resulting list. However, if we would reimplement all or most stdlib functions for Datasets separately using the |
It sounds interesting, but I'm not sure if it may be implemented as Kotlin
sequences TBH
…On Sun, Jul 18, 2021 at 1:20 AM Jolan Rensen ***@***.***> wrote:
@asm0dey <https://github.com/asm0dey> yes that's right. That's already
possible, but it might be heavy, since it needs to first collect everything
in memory and then run the stdlib functions on the resulting list.
However, if we would reimplement all or most stdlib functions for Datasets
separately using the map and filter functions already present in Spark,
it might be possible to make them more efficient.
I'll try a few out to see if it will work. In the best case we get some
extra helpful functions like mapNotNull {} etc.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#100 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ4XAUZJ5VYP2GMTN4KHALTYH62FANCNFSM5ARAIWIA>
.
|
@asm0dey Yeah you're right. It's somewhere in between. However, for some functions, I'm not sure how efficient they will be. For instance, |
Yes, in some cases it could really make sense to implement such wrappers! |
@asm0dey For example: inline operator fun <reified T> Dataset<T>.contains(element: T): Boolean = !filter { it == element }.isEmpty vs inline operator fun <reified T> Dataset<T>.contains(element: T): Boolean = Iterable<T> { toLocalIterator() }.contains(element)
|
Both of course :) |
@asm0dey Alright, but one needs to be the default, because it's an operator function in this case :) |
I would go with no-OOMing implementation as default :) |
@asm0dey I agree! Is there an annotation that can give tips to users aside from |
ReplaceWith IIRC, but it's barely documented :( |
But what you're saying is inspection, not annotation in't it? |
@asm0dey Yes I think you're right... Inspections need an IntelliJ plugin, don't they? |
Hypothetically they may be user-provided, but we can't provide them from inside library. |
@asm0dey But it's also possible to make a small plugin https://plugins.jetbrains.com/docs/intellij/code-inspections.html |
Impressive work!
…On Mon, Jul 26, 2021 at 11:55 PM Jolan Rensen ***@***.***> wrote:
[image: image]
<https://user-images.githubusercontent.com/17594275/127057568-185a94a3-c0ed-4998-9fa8-af6a1be8d217.png>
This is actually quite interesting. It's a bit hard due to lack of
documentation, but by using samples from SimplifiableCallInspection.kt, for
example, I do manage to create simple hints for users.
I'll probably first finish the stdlib functions itself and afterward look
at the plugin again, but, as a proof of concept, it does work :).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#100 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ4XAQQBAQQUM4TBTUTJELTZXDVRANCNFSM5ARAIWIA>
.
|
One of the things that makes Kotlin so great to work with, compared to other languages, is the extensive and declarative standard library functions.
Functions like
mapNotNull { }
andfirst { a > 4 }
. To promote Kotlin for Spark, it might be helpful to bring the standard library closer to Datasets and RDD calculations.There are multiple ways we could achieve this.
The first way is to simply convert Datasets to Iterables and Sequences:
However, I am not sure whether this would impact performance since the Spark functions like
filter
,map
etc. are probably optimized.The second option would be to copy the standard library functions for Sequences/Iterables and put them in place as extensions for Datasets and RRDs.
What do you think, @asm0dey ?
The text was updated successfully, but these errors were encountered: