Add missing methods #464

Fokko · 2020-08-27T15:31:10Z

When adding the files to the Spark project, we got errors because it was unable to find some methods.

zero323 · 2020-08-27T21:44:04Z

third_party/3/pyspark/accumulators.pyi

@@ -29,7 +29,8 @@ T = TypeVar("T")
 U = TypeVar("U", bound=SupportsIAdd)

 import socketserver as SocketServer
-from typing import Any
+
+_accumulatorRegistry: Dict = {}


If we annotate this, could we be more precise?

Yes, it holds the Accumulators, where the int is a key: https://github.com/apache/spark/blob/7deb67c28f948cca4e768317ade6d68d2534408f/python/pyspark/context.py#L902

Just wondering ‒ what is the reason for including this? I am not strictly against this, but I haven't seen any need for this one so far and in general we seem to lean towards excluding internal APIs, unless strictly necessary.

The _accumulatorRegistry is used by other files, therefore we have to include it. Otherwise, mypy will complain that it can't find the dict.

Interesting. Could your provide some details on the test setup? Haven't seen any mypy failures in apache/spark#29591 (still work in progress, though I don't expect it will make a huge impact on type checks).

I'll whip up an example first thing tomorrow morning!

Thanks. Have a good night.

This should be easily be reproducable on your machine as well. I'm working on my branch of apache/spark#29180:

MacBook-Pro-van-Fokko:spark fokkodriesprong$ git branch * SPARK-17333

Right now everything passes:

MacBook-Pro-van-Fokko:spark fokkodriesprong$ ./dev/lint-python starting python compilation test... python compilation succeeded. starting pycodestyle test... pycodestyle checks passed. starting flake8 test... flake8 checks passed. starting mypy test... flake8 checks passed. The sphinx-build command was not found. Skipping Sphinx build for now. all lint-python tests passed!

If we remove the _accumulatorRegistry:

MacBook-Pro-van-Fokko:spark fokkodriesprong$ nano python/pyspark/accumulators.pyi MacBook-Pro-van-Fokko:spark fokkodriesprong$ git diff diff --git a/python/pyspark/accumulators.pyi b/python/pyspark/accumulators.pyi index f60de25704..6eafe46a46 100644 --- a/python/pyspark/accumulators.pyi +++ b/python/pyspark/accumulators.pyi @@ -30,7 +30,7 @@ U = TypeVar("U", bound=SupportsIAdd) import socketserver as SocketServer -_accumulatorRegistry: Dict[int, Accumulator] +# _accumulatorRegistry: Dict[int, Accumulator] class Accumulator(Generic[T]): aid: int

Then it fails:

MacBook-Pro-van-Fokko:spark fokkodriesprong$ ./dev/lint-python starting python compilation test... python compilation succeeded. starting pycodestyle test... pycodestyle checks passed. starting flake8 test... flake8 checks passed. starting mypy test... mypy checks failed: python/pyspark/worker.py:34: error: Module 'pyspark.accumulators' has no attribute '_accumulatorRegistry' Found 1 error in 1 file (checked 185 source files) 1

It looks like the _accumulatorRegistry is private, since it starts with a _, but it is actually being imported in worker.py.

This should be easily be reproducable on your machine as well

Now I see... This won't happen on my side, as worker.py has dynamic stub.

It looks like the _accumulatorRegistry is private, since it starts with a _, but it is actually being imported in worker.py.

There are quite a few of these imports all over the place. In general I tend to use specific ignores for these, as they're not user facing API.

If you don't mind, I'll keep this open for now and finish syncing things for SPARK-32714, as it might require some further discussion about the scope of annotations.

Sure, let me know if I can help somewhere

third_party/3/pyspark/broadcast.pyi

third_party/3/pyspark/serializers.pyi

third_party/3/pyspark/broadcast.pyi

…sing-annotations

Fokko · 2020-09-01T19:55:31Z

Sorry for the late reply. Caught up in daily work :) I'm also installing pandas and numpy. They are being imported in the new PyArrow example: https://github.com/apache/spark/blob/master/examples/src/main/python/sql/arrow.py

third_party/3/pyspark/serializers.pyi

third_party/3/pyspark/util.pyi

zero323 · 2020-10-09T07:40:56Z

I think that we already picked everything, that was required for project port. I didn't experience any problems with the remaining bits, so let's keep these out for the time being and include directly upstream, if it ever proves necessary.

Thanks for all your work @Fokko!

Fokko added 2 commits August 27, 2020 17:30

Add missing methods

1821a3f

Make Black happy

1bcfdde

zero323 reviewed Aug 27, 2020

View reviewed changes

third_party/3/pyspark/serializers.pyi Outdated Show resolved Hide resolved

zero323 reviewed Aug 31, 2020

View reviewed changes

third_party/3/pyspark/broadcast.pyi Outdated Show resolved Hide resolved

Fokko added 3 commits September 1, 2020 21:30

Thanks for the feedback!

aa508b0

Merge branch 'master' of github.com:zero323/pyspark-stubs into fd-mis…

f24c478

…sing-annotations

Install both numpy and pandas

de1da71

zero323 reviewed Sep 3, 2020

View reviewed changes

third_party/3/pyspark/serializers.pyi Outdated Show resolved Hide resolved

Fokko added 2 commits September 3, 2020 21:52

Remove Python2 relics

c4237dc

Merge branch 'master' into fd-missing-annotations

70c2bf7

zero323 mentioned this pull request Sep 4, 2020

Add missing functions to pyspark.serializers #513

Merged

Merge branch 'master' into fd-missing-annotations

3f0d18e

This was referenced Sep 6, 2020

Add static annotation to VersionUtils.majorMinorVersion #520

Merged

Fail on stopiteration #521

Merged

zero323 and others added 2 commits September 7, 2020 21:19

Merge branch 'master' into fd-missing-annotations

6d0a90b

Add missing annotation

5e394c9

zero323 reviewed Sep 7, 2020

View reviewed changes

third_party/3/pyspark/util.pyi Outdated Show resolved Hide resolved

Merge branch 'master' into fd-missing-annotations

44f6aae

Fokko mentioned this pull request Sep 9, 2020

[SPARK-32714][PYTHON] Initial pyspark-stubs port. apache/spark#29591

Closed

4 tasks

Merge branch 'master' into fd-missing-annotations

6b22da8

zero323 mentioned this pull request Sep 26, 2020

Add annotations for registries #540

Merged

zero323 added 2 commits September 27, 2020 00:58

Merge branch 'master' into fd-missing-annotations

cbfcbf4

Merge branch 'master' into fd-missing-annotations

db3c5e7

zero323 closed this Oct 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add missing methods #464

Add missing methods #464

Fokko commented Aug 27, 2020

zero323 Aug 27, 2020

Fokko Sep 1, 2020

zero323 Sep 3, 2020

Fokko Sep 3, 2020

zero323 Sep 3, 2020

Fokko Sep 3, 2020

zero323 Sep 3, 2020

Fokko Sep 4, 2020

zero323 Sep 4, 2020

Fokko Sep 6, 2020

Fokko commented Sep 1, 2020 •

edited

Loading

zero323 commented Oct 9, 2020

Add missing methods #464

Add missing methods #464

Conversation

Fokko commented Aug 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fokko commented Sep 1, 2020 • edited Loading

zero323 commented Oct 9, 2020

Fokko commented Sep 1, 2020 •

edited

Loading