-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dropnull to FirstK #556
Conversation
Your free trial has expired. To keep using Ellipsis, sign up at https://app.ellipsis.dev for $20/seat/month or reach us at help@ellipsis.dev |
limit=10, | ||
dedup=False, | ||
window=Continuous("1d"), | ||
drop_nulls=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please make this a single word dropnull
so that it reads similar to existing dropnull
operator?
This is WIP. Unit tests pass, but integration test fails right now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you land the first pr, rebase and then send this for review ?
Aggregation to computes a rolling list of the earliest values for each group | ||
within a window. | ||
|
||
#### Parameters | ||
<Expandable title="of" type="str"> | ||
Name of the field in the input dataset over which the aggregation should be | ||
computed. | ||
</Expandable> | ||
|
||
<Expandable title="window" type="Window"> | ||
The continuous window within which aggregation needs to be computed. Possible | ||
values are `"forever"` or any [time duration](/api-reference/data-types/duration). | ||
</Expandable> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will all go once you rebase right?
4a84f49
to
8d55c0e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add to changelog and bump pyproject.toml
fennel/client_tests/test_dataset.py
Outdated
@mock | ||
def test_firstk_null(client): | ||
@dataset | ||
@source(webhook.endpoint("A"), disorder="14d", cdc="append") | ||
class A: | ||
id: int | ||
value: Optional[int] | ||
ts: datetime | ||
|
||
@dataset(index=True) | ||
class B: | ||
id: int = field(key=True) | ||
value: List[Optional[int]] | ||
ts: datetime | ||
|
||
@pipeline | ||
@inputs(A) | ||
def pipeline(cls, event: Dataset): | ||
return event.groupby("id").aggregate( | ||
value=FirstK( | ||
of="value", | ||
window=Continuous("forever"), | ||
dedup=False, | ||
limit=2, | ||
dropnull=False | ||
) | ||
) | ||
|
||
client.commit(datasets=[A, B], message="test") | ||
|
||
now = datetime.now(timezone.utc) | ||
yesterday = now - timedelta(days=1) | ||
day_before_yesterday = now - timedelta(days=2) | ||
df = pd.DataFrame( | ||
{ | ||
"id": [1, 1, 2, 2, 3, 3], | ||
"value": [None, 1, None, 2, None, 3], | ||
"ts": [yesterday, now, now, day_before_yesterday, now, yesterday], | ||
} | ||
) | ||
df["value"] = df["value"].astype("Int64") | ||
client.log("fennel_webhook", "A", df) | ||
client.sleep(30) | ||
|
||
results, _ = client.lookup( | ||
B, | ||
keys=pd.DataFrame({"id": [1, 2, 3]}), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could parameterize this test and combine the two into 1.
you can explore using - @pytest.mark.parametrize(
only the assert would change according to the param i believe ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to make parameterize work. First I ran into pytest-dev/pytest#6810. Then I ran into:
$ pytest fennel/client_tests/test_dataset.py -k test_firstk_dropnull
========================================================= test session starts =========================================================
platform darwin -- Python 3.11.5, pytest-7.1.3, pluggy-1.5.0
rootdir: /Users/satrana42/Codes/client, configfile: pyproject.toml
plugins: rerunfailures-13.0, anyio-4.4.0, timeout-2.3.1, mock-3.14.0
collected 54 items / 52 deselected / 2 selected
fennel/client_tests/test_dataset.py EE [100%]
=============================================================== ERRORS ================================================================
________________________________ ERROR at setup of test_firstk_dropnull[False-Optional-values_firstk0] ________________________________
file /Users/satrana42/Codes/client/fennel/client_tests/test_dataset.py, line 4518
@pytest.mark.integration
@pytest.mark.parametrize(
"dropnull,value_type,values_firstk",
[
(False, Optional[int], [[pd.NA, 1], [2, pd.NA], [3, pd.NA]]),
(True, int, [[1], [2], [3]])
]
)
@mock
def test_firstk_dropnull(client, dropnull, value_type, values_firstk):
E fixture 'client' not found
> available fixtures: anyio_backend, anyio_backend_name, anyio_backend_options, cache, capfd, capfdbinary, caplog, capsys, capsysbinary, class_mocker, doctest_namespace, mocker, module_mocker, monkeypatch, package_mocker, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, session_mocker, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
> use 'pytest --fixtures [testpath]' for help on them.
/Users/satrana42/Codes/client/fennel/client_tests/test_dataset.py:4518
___________________________________ ERROR at setup of test_firstk_dropnull[True-int-values_firstk1] ___________________________________
file /Users/satrana42/Codes/client/fennel/client_tests/test_dataset.py, line 4518
@pytest.mark.integration
@pytest.mark.parametrize(
"dropnull,value_type,values_firstk",
[
(False, Optional[int], [[pd.NA, 1], [2, pd.NA], [3, pd.NA]]),
(True, int, [[1], [2], [3]])
]
)
@mock
def test_firstk_dropnull(client, dropnull, value_type, values_firstk):
E fixture 'client' not found
> available fixtures: anyio_backend, anyio_backend_name, anyio_backend_options, cache, capfd, capfdbinary, caplog, capsys, capsysbinary, class_mocker, doctest_namespace, mocker, module_mocker, monkeypatch, package_mocker, pytestconfig, record_property, record_testsuite_property, record_xml_attribute, recwarn, session_mocker, tmp_path, tmp_path_factory, tmpdir, tmpdir_factory
> use 'pytest --fixtures [testpath]' for help on them.
/Users/satrana42/Codes/client/fennel/client_tests/test_dataset.py:4518
======================================================= short test summary info =======================================================
ERROR fennel/client_tests/test_dataset.py::test_firstk_dropnull[False-Optional-values_firstk0]
ERROR fennel/client_tests/test_dataset.py::test_firstk_dropnull[True-int-values_firstk1]
================================================== 52 deselected, 2 errors in 9.22s ===================================================
Do you have an example where it works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed on call: Don't use pytest's parametrize but pass the parameters in a loop or function with the deduplicated code of the two tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran into further issues with loop based parametrization. Reuse of client is the likely root cause because separate tests passed. I am skipping parametrization of these two tests.
93c3349
to
b375f50
Compare
Add dropnull to FirstK aggregation to filter out None values from Optional input.