Skip to content

Conversation

@zero323
Copy link
Member

@zero323 zero323 commented Sep 26, 2020

What changes were proposed in this pull request?

This PR:

  • removes annotations for modules which are not part of the public API.
  • removes __init__.pyi files, if no annotations, beyond exports, are present.

Why are the changes needed?

Primarily to reduce maintenance overhead and as requested in the comments to #29591

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests and additional MyPy checks:

mypy --no-incremental --config python/mypy.ini python/pyspark
MYPYPATH=python/ mypy --no-incremental --config python/mypy.ini examples/src/main/python/ml examples/src/main/python/sql examples/src/main/python/sql/streaming

@SparkQA
Copy link

SparkQA commented Sep 26, 2020

Test build #129137 has finished for PR 29879 at commit d996c6e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 26, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33753/

@SparkQA
Copy link

SparkQA commented Sep 26, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33753/

@SparkQA
Copy link

SparkQA commented Sep 29, 2020

Test build #129253 has finished for PR 29879 at commit 037acdd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zero323 zero323 marked this pull request as ready for review September 29, 2020 22:17
@zero323
Copy link
Member Author

zero323 commented Sep 29, 2020

I think it is as much as we can remove here.

@SparkQA
Copy link

SparkQA commented Sep 29, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33870/

@SparkQA
Copy link

SparkQA commented Sep 29, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33870/

@SparkQA
Copy link

SparkQA commented Sep 30, 2020

Test build #129263 has finished for PR 29879 at commit 065a99f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 30, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33880/

@SparkQA
Copy link

SparkQA commented Sep 30, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33880/

@dongjoon-hyun
Copy link
Member

Retest this please

@SparkQA
Copy link

SparkQA commented Oct 1, 2020

Test build #129300 has finished for PR 29879 at commit 065a99f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 1, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33916/

@SparkQA
Copy link

SparkQA commented Oct 1, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33916/

@SparkQA
Copy link

SparkQA commented Oct 2, 2020

Test build #129361 has finished for PR 29879 at commit 341ab2a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zero323
Copy link
Member Author

zero323 commented Oct 2, 2020

Retest this please

@SparkQA
Copy link

SparkQA commented Oct 2, 2020

Test build #129364 has finished for PR 29879 at commit 341ab2a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 2, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33974/

@SparkQA
Copy link

SparkQA commented Oct 2, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/33974/

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove python/join.pyi, python/rddsampler.pyi and python/shell.pyi too?

LGTM otherwise.

@zero323
Copy link
Member Author

zero323 commented Oct 6, 2020

Should we remove python/join.pyi,

I considered that but overall I'd prefer to keep it as-is:

  • It is exposed to end user through RDD joins, but can be used directly (kind of useful, back when we still supported Python 2, not so much in 3, where we can use methods, without any caveats).
  • It is stable, have useful annotations and nothing indicates it's non-public character.

python/rddsampler.pyi

I don't have strong opinion about it. However, same as join, it is stable with meaningful annotations and never marked as internal (I used it once or twice directly, so I suspect that others might as well). So indicates that it will cause serious maintenance overhead or is just a deadweight.

and python/shell.pyi too?

Yes, that's an omission.

@HyukjinKwon
Copy link
Member

Hmm, but technically python/join.pyi and python/rddsampler.pyi are not documented so far. I believe they are for internal purposes. I got that many undocumented stuffs can be useful (also in Scala or Java sides) but I would prefer to hide it unless we explicitly document and expose.

@SparkQA
Copy link

SparkQA commented Oct 6, 2020

Test build #129438 has finished for PR 29879 at commit 045834e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 6, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34045/

@SparkQA
Copy link

SparkQA commented Oct 6, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34045/

@zero323
Copy link
Member Author

zero323 commented Oct 6, 2020

Hmm, but technically python/join.pyi and python/rddsampler.pyi are not documented so far. I believe they are for internal purposes. I got that many undocumented stuffs can be useful (also in Scala or Java sides) but I would prefer to hide it unless we explicitly document and expose.

Fair point. If you feel strong about these, I'll be happy to remove them. But I think we need a bit precise criteria of inclusion / exclusion in a long run. In general the most important factors are:

  • Stability, which influences cost of maintenance.
  • Precision of annotations, which influences potential usefulness (I avoided annotating many parts of the "internal" API, primarily because possible annotations would be far to generic to be useful).
  • Likelihood that given part of the API will be used by the end users.
  • Direct cost of not including (amount of ignores required for things to type check now, and possibly in the future, if we decide to switch to inline variants).

(I am primarily thinking about SPARK-33003 here)

@HyukjinKwon
Copy link
Member

HyukjinKwon commented Oct 7, 2020

I personally tend to agree with what you listed up. However, one concern is that it doesn't work well out of the box when the criteria becomes verbose especially for new contributors given my experience - the same criteria is interpreted differently often. So I ended up with focusing on simplifying it unless it's required.

Also, one more thing is that I would like to be explicit what we expose and hide as APIs. Something has been internal so it was changed but users ended up with complaining - this happens not very often but not so rarely. I would like to avoid this kind of cost.

To sum up, we can keep python/join.pyi and python/rddsampler.pyi but let's make sure these are special cases. Do the removal of both, for example, cause many ignores? If that's the case, we can keep probably with leaving some comments. Otherwise, let's remove both and start with the simpler set ..

@HyukjinKwon
Copy link
Member

Thanks @zero323.

@zero323
Copy link
Member Author

zero323 commented Oct 7, 2020

I personally tend to agree with what you listed up. However, one concern is that it doesn't work well out of the box when the criteria becomes verbose especially for new contributors given my experience - the same criteria is interpreted differently often. So I ended up with focusing on simplifying it unless it's required.

Makes sense. I am still thinking about this from a perspective of the main contributor :)

Do the removal of both, for example, cause many ignores?

These two have rather limited scope so we're good for now. I didn't to make merge test, but if it becomes a problem later, we can always restore things ‒ stability works both ways here..

If that's the case, we can keep probably with leaving some comments. Otherwise, let's remove both and start with the simpler set ..

Sounds good.

@SparkQA
Copy link

SparkQA commented Oct 7, 2020

Test build #129499 has finished for PR 29879 at commit c714a1c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Oct 7, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34105/

@SparkQA
Copy link

SparkQA commented Oct 7, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34105/

@HyukjinKwon
Copy link
Member

Merged to master.

@zero323
Copy link
Member Author

zero323 commented Oct 7, 2020

Thanks @HyukjinKwon!

@zero323 zero323 deleted the SPARK-33002 branch October 7, 2020 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants