Skip to content
This repository has been archived by the owner on Nov 22, 2022. It is now read-only.

Add missing methods #464

Closed
wants to merge 14 commits into from
5 changes: 3 additions & 2 deletions third_party/3/pyspark/accumulators.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
# Stubs for pyspark.accumulators (Python 3.7)
#

from typing import Callable, Generic, Tuple, Type, TypeVar
from typing import Callable, Generic, Tuple, Type, TypeVar, Dict

import socketserver.BaseRequestHandler # type: ignore

Expand All @@ -29,7 +29,8 @@ T = TypeVar("T")
U = TypeVar("U", bound=SupportsIAdd)

import socketserver as SocketServer
from typing import Any

_accumulatorRegistry: Dict = {}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we annotate this, could we be more precise?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering ‒ what is the reason for including this? I am not strictly against this, but I haven't seen any need for this one so far and in general we seem to lean towards excluding internal APIs, unless strictly necessary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _accumulatorRegistry is used by other files, therefore we have to include it. Otherwise, mypy will complain that it can't find the dict.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Could your provide some details on the test setup? Haven't seen any mypy failures in apache/spark#29591 (still work in progress, though I don't expect it will make a huge impact on type checks).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll whip up an example first thing tomorrow morning!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Have a good night.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be easily be reproducable on your machine as well. I'm working on my branch of apache/spark#29180:

MacBook-Pro-van-Fokko:spark fokkodriesprong$ git branch
* SPARK-17333

Right now everything passes:

MacBook-Pro-van-Fokko:spark fokkodriesprong$ ./dev/lint-python 
starting python compilation test...
python compilation succeeded.

starting pycodestyle test...
pycodestyle checks passed.

starting flake8 test...
flake8 checks passed.

starting mypy test...
flake8 checks passed.

The sphinx-build command was not found. Skipping Sphinx build for now.


all lint-python tests passed!

If we remove the _accumulatorRegistry:

MacBook-Pro-van-Fokko:spark fokkodriesprong$ nano python/pyspark/accumulators.pyi 
MacBook-Pro-van-Fokko:spark fokkodriesprong$ git diff
diff --git a/python/pyspark/accumulators.pyi b/python/pyspark/accumulators.pyi
index f60de25704..6eafe46a46 100644
--- a/python/pyspark/accumulators.pyi
+++ b/python/pyspark/accumulators.pyi
@@ -30,7 +30,7 @@ U = TypeVar("U", bound=SupportsIAdd)
 
 import socketserver as SocketServer
 
-_accumulatorRegistry: Dict[int, Accumulator]
+# _accumulatorRegistry: Dict[int, Accumulator]
 
 class Accumulator(Generic[T]):
     aid: int

Then it fails:

MacBook-Pro-van-Fokko:spark fokkodriesprong$ ./dev/lint-python 
starting python compilation test...
python compilation succeeded.

starting pycodestyle test...
pycodestyle checks passed.

starting flake8 test...
flake8 checks passed.

starting mypy test...
mypy checks failed:
python/pyspark/worker.py:34: error: Module 'pyspark.accumulators' has no attribute '_accumulatorRegistry'
Found 1 error in 1 file (checked 185 source files)
1

It looks like the _accumulatorRegistry is private, since it starts with a _, but it is actually being imported in worker.py.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be easily be reproducable on your machine as well

Now I see... This won't happen on my side, as worker.py has dynamic stub.

It looks like the _accumulatorRegistry is private, since it starts with a _, but it is actually being imported in worker.py.

There are quite a few of these imports all over the place. In general I tend to use specific ignores for these, as they're not user facing API.

If you don't mind, I'll keep this open for now and finish syncing things for SPARK-32714, as it might require some further discussion about the scope of annotations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me know if I can help somewhere


class Accumulator(Generic[T]):
aid: int
Expand Down
10 changes: 6 additions & 4 deletions third_party/3/pyspark/broadcast.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,13 @@
# specific language governing permissions and limitations
# under the License.

# Stubs for pyspark.broadcast (Python 3.5)
#

import threading
from typing import Any, Generic, Optional, TypeVar
from typing import Any, Generic, Optional, TypeVar, Dict

T = TypeVar("T")

_broadcastRegistry: Dict
Fokko marked this conversation as resolved.
Show resolved Hide resolved

class Broadcast(Generic[T]):
def __init__(
self,
Expand All @@ -47,3 +46,6 @@ class BroadcastPickleRegistry(threading.local):
def __iter__(self) -> None: ...
def add(self, bcast: Any) -> None: ...
def clear(self) -> None: ...

class InheritableThread:
def __init__(self) -> None: ...
zero323 marked this conversation as resolved.
Show resolved Hide resolved
9 changes: 8 additions & 1 deletion third_party/3/pyspark/serializers.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,14 @@
# specific language governing permissions and limitations
# under the License.

from typing import Any, Optional
from typing import Any, Dict

basestring = str
unicode = str
xrange = range

__cls: Dict

class SpecialLengths:
END_OF_DATA_SECTION: int = ...
PYTHON_EXCEPTION_THROWN: int = ...
Expand Down Expand Up @@ -117,5 +119,10 @@ class ChunkedStream:
@property
def closed(self): ...

def write_with_length(obj: Any, stream: Any): ...
def pack_long(value): ...
def read_int(stream): ...
def read_long(stream): ...
def read_bool(stream): ...
def write_int(value, stream): ...
def write_long(write_long, stream): ...
zero323 marked this conversation as resolved.
Show resolved Hide resolved
18 changes: 13 additions & 5 deletions third_party/3/pyspark/util.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,19 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# Stubs for pyspark.util (Python 3.7)
#
# NOTE: This dynamically typed stub was automatically generated by stubgen.

from typing import Any
from typing import List, Callable, Any, Tuple

__all__: List[str]

def fail_on_stopiteration(f: Callable) -> Callable: ...
def _parse_memory(s: str) -> int: ...
def _print_missing_jar(lib_name: str, pkg_name: str, jar_name: str, spark_version: str) -> None: ...

class InheritableThread:
def __init__(self, target: Any, *args: Any, **kwargs: Any): ...

def fail_on_stopiteration(f: Any): ...
class VersionUtils:
@staticmethod
def majorMinorVersion(sparkVersion: str) -> Tuple[int, int]: ...