-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32714][PYTHON] Initial pyspark-stubs port. #29591
Changes from all commits
49174d7
affe0e6
b094d4d
fa946b6
3103b5e
7ee3a81
683c7d2
c31274c
94ae3d8
8a2a31e
f12a37c
2e346fd
6e3c872
f76f4a9
e918e72
bc26621
87aa2b5
c5ff0dd
1ad399e
ac93d18
203b2a9
13be932
e96414e
3518bd3
4ce7ae3
c4fdda8
3a20820
fccc6dd
bdc8e84
a587107
7d6732e
c6ddfd7
ded1dae
6f48556
af4459f
a8990ca
afe5222
6aaef20
19bf189
601b99a
7d359a9
53107a1
b9ac4f8
fab00f1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -124,3 +124,4 @@ GangliaReporter.java | |
application_1578436911597_0052 | ||
config.properties | ||
app-20200706201101-0003 | ||
py.typed |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -32,8 +32,8 @@ | |
|
||
|
||
def dataframe_with_arrow_example(spark): | ||
import numpy as np | ||
import pandas as pd | ||
import numpy as np # type: ignore[import] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Numpy has hinting as well, why ignore it? I got this working on zero323/pyspark-stubs#464 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because, AFAIK, released versions are not typed ‒ doesn't seem there were any released versions (nothing for 1.19.1 ‒https://github.com/numpy/numpy/tree/v1.19.1/numpy) since numpy/numpy@11b95d1. |
||
import pandas as pd # type: ignore[import] | ||
|
||
# Enable Arrow-based columnar data transfers | ||
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true") | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,4 +22,5 @@ recursive-include deps/data *.data *.txt | |
recursive-include deps/licenses *.txt | ||
recursive-include deps/examples *.py | ||
recursive-include lib *.zip | ||
recursive-include pyspark *.pyi py.typed | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note: Is it possible to do it through package_data={
'': ['*.pyi', 'py.typed'],
... Doesn't seem to work here... |
||
include README.md |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
; | ||
; Licensed to the Apache Software Foundation (ASF) under one or more | ||
; contributor license agreements. See the NOTICE file distributed with | ||
; this work for additional information regarding copyright ownership. | ||
; The ASF licenses this file to You under the Apache License, Version 2.0 | ||
; (the "License"); you may not use this file except in compliance with | ||
; the License. You may obtain a copy of the License at | ||
; | ||
; http://www.apache.org/licenses/LICENSE-2.0 | ||
; | ||
; Unless required by applicable law or agreed to in writing, software | ||
; distributed under the License is distributed on an "AS IS" BASIS, | ||
; WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
; See the License for the specific language governing permissions and | ||
; limitations under the License. | ||
; | ||
|
||
[mypy] | ||
|
||
[mypy-pyspark.cloudpickle.*] | ||
ignore_errors = True | ||
|
||
[mypy-py4j.*] | ||
ignore_missing_imports = True | ||
|
||
[mypy-numpy] | ||
ignore_missing_imports = True | ||
|
||
[mypy-scipy.*] | ||
ignore_missing_imports = True | ||
|
||
[mypy-pandas.*] | ||
ignore_missing_imports = True | ||
|
||
[mypy-pyarrow] | ||
ignore_missing_imports = True |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
from typing import Callable, Optional, TypeVar | ||
|
||
from pyspark.accumulators import ( # noqa: F401 | ||
Accumulator as Accumulator, | ||
AccumulatorParam as AccumulatorParam, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. quick question, why should we do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. With MyPy can disable implicit reexport ‒ that's default configuration for typeshed. If we replaced above with from pyspark.accumulators import Accumulator we couldn't do from pyspark import Accumulator Technically speaking we could also provide |
||
) | ||
from pyspark.broadcast import Broadcast as Broadcast # noqa: F401 | ||
from pyspark.conf import SparkConf as SparkConf # noqa: F401 | ||
from pyspark.context import SparkContext as SparkContext # noqa: F401 | ||
from pyspark.files import SparkFiles as SparkFiles # noqa: F401 | ||
from pyspark.status import ( | ||
StatusTracker as StatusTracker, | ||
SparkJobInfo as SparkJobInfo, | ||
SparkStageInfo as SparkStageInfo, | ||
) # noqa: F401 | ||
from pyspark.profiler import ( # noqa: F401 | ||
BasicProfiler as BasicProfiler, | ||
Profiler as Profiler, | ||
) | ||
from pyspark.rdd import RDD as RDD, RDDBarrier as RDDBarrier # noqa: F401 | ||
from pyspark.serializers import ( # noqa: F401 | ||
MarshalSerializer as MarshalSerializer, | ||
PickleSerializer as PickleSerializer, | ||
) | ||
from pyspark.status import ( # noqa: F401 | ||
SparkJobInfo as SparkJobInfo, | ||
SparkStageInfo as SparkStageInfo, | ||
StatusTracker as StatusTracker, | ||
) | ||
from pyspark.storagelevel import StorageLevel as StorageLevel # noqa: F401 | ||
from pyspark.taskcontext import ( # noqa: F401 | ||
BarrierTaskContext as BarrierTaskContext, | ||
BarrierTaskInfo as BarrierTaskInfo, | ||
TaskContext as TaskContext, | ||
) | ||
from pyspark.util import InheritableThread as InheritableThread # noqa: F401 | ||
|
||
# Compatiblity imports | ||
from pyspark.sql import ( # noqa: F401 | ||
SQLContext as SQLContext, | ||
HiveContext as HiveContext, | ||
Row as Row, | ||
) | ||
|
||
T = TypeVar("T") | ||
F = TypeVar("F", bound=Callable) | ||
|
||
def since(version: str) -> Callable[[T], T]: ... | ||
def copy_func( | ||
f: F, | ||
name: Optional[str] = ..., | ||
sinceversion: Optional[str] = ..., | ||
doc: Optional[str] = ..., | ||
) -> F: ... | ||
def keyword_only(func: F) -> F: ... |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
# NOTE: This dynamically typed stub was automatically generated by stubgen. | ||
|
||
from typing import Any | ||
|
||
__ALL__: Any | ||
|
||
class _NoValueType: | ||
def __new__(cls): ... | ||
def __reduce__(self): ... |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
from typing import Callable, Iterable, Sized, TypeVar, Union | ||
from typing_extensions import Protocol | ||
|
||
F = TypeVar("F", bound=Callable) | ||
T = TypeVar("T", covariant=True) | ||
|
||
PrimitiveType = Union[bool, float, int, str] | ||
|
||
class SupportsIAdd(Protocol): | ||
def __iadd__(self, other: SupportsIAdd) -> SupportsIAdd: ... | ||
|
||
class SupportsOrdering(Protocol): | ||
def __le__(self, other: SupportsOrdering) -> bool: ... | ||
|
||
class SizedIterable(Protocol, Sized, Iterable[T]): ... |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
from typing import Callable, Generic, Tuple, Type, TypeVar | ||
|
||
import socketserver.BaseRequestHandler # type: ignore | ||
|
||
from pyspark._typing import SupportsIAdd | ||
|
||
T = TypeVar("T") | ||
U = TypeVar("U", bound=SupportsIAdd) | ||
|
||
import socketserver as SocketServer | ||
|
||
class Accumulator(Generic[T]): | ||
aid: int | ||
accum_param: AccumulatorParam[T] | ||
def __init__( | ||
self, aid: int, value: T, accum_param: AccumulatorParam[T] | ||
) -> None: ... | ||
def __reduce__( | ||
self, | ||
) -> Tuple[ | ||
Callable[[int, int, AccumulatorParam[T]], Accumulator[T]], | ||
Tuple[int, int, AccumulatorParam[T]], | ||
]: ... | ||
@property | ||
def value(self) -> T: ... | ||
@value.setter | ||
def value(self, value: T) -> None: ... | ||
def add(self, term: T) -> None: ... | ||
def __iadd__(self, term: T) -> Accumulator[T]: ... | ||
|
||
class AccumulatorParam(Generic[T]): | ||
def zero(self, value: T) -> T: ... | ||
def addInPlace(self, value1: T, value2: T) -> T: ... | ||
|
||
class AddingAccumulatorParam(AccumulatorParam[U]): | ||
zero_value: U | ||
def __init__(self, zero_value: U) -> None: ... | ||
def zero(self, value: U) -> U: ... | ||
def addInPlace(self, value1: U, value2: U) -> U: ... | ||
|
||
class _UpdateRequestHandler(SocketServer.StreamRequestHandler): | ||
def handle(self) -> None: ... | ||
|
||
class AccumulatorServer(SocketServer.TCPServer): | ||
auth_token: str | ||
def __init__( | ||
self, | ||
server_address: Tuple[str, int], | ||
RequestHandlerClass: Type[socketserver.BaseRequestHandler], | ||
auth_token: str, | ||
) -> None: ... | ||
server_shutdown: bool | ||
def shutdown(self) -> None: ... |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
import threading | ||
from typing import Any, Generic, Optional, TypeVar | ||
|
||
T = TypeVar("T") | ||
|
||
class Broadcast(Generic[T]): | ||
def __init__( | ||
self, | ||
sc: Optional[Any] = ..., | ||
value: Optional[T] = ..., | ||
pickle_registry: Optional[Any] = ..., | ||
path: Optional[Any] = ..., | ||
sock_file: Optional[Any] = ..., | ||
) -> None: ... | ||
def dump(self, value: Any, f: Any) -> None: ... | ||
def load_from_path(self, path: Any): ... | ||
def load(self, file: Any): ... | ||
@property | ||
def value(self) -> T: ... | ||
def unpersist(self, blocking: bool = ...) -> None: ... | ||
def destroy(self, blocking: bool = ...) -> None: ... | ||
def __reduce__(self): ... | ||
|
||
class BroadcastPickleRegistry(threading.local): | ||
def __init__(self) -> None: ... | ||
def __iter__(self) -> None: ... | ||
def add(self, bcast: Any) -> None: ... | ||
def clear(self) -> None: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with ignoring these for now, maybe we can re-enable them later on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I see your point, however I wanted to avoid duplicating your work from #29180 and we cannot setup the environment anyway.