[SCHEMATIC-183] Update tests - Use magic mock and add parentId #1554

thomasyu888 · 2024-11-23T02:13:10Z

I noticed the same thing Gianna observed: the SynapseStorage class automatically fires off a query which isn't great, BUT that's not something we fix right now and is outside the scope of this ticket. This is most likely due to the perform_query=True parameter when the class is instantiated
You don't have to nest the patch.object, but there's that weird syntax (choose your poison situation).
Tests are sort of all over the place, but let's try to run the test suites locally for the parts you add code for whenever you modify code (and maybe parts of the integration tests which the code you add impact) - because the tests on GitHub take 6 hours to run, it increases the development time if you just rely on that. (It's completely reasonable to miss some, that's what the CI is here for)
query_fileview is the function called to get the fileview, so you want to mock that
Nit: Use snakecase for everything
Adding the HACK adds more lines in the output, which is why Gianna also increased the row count on text output because she also added a fileview query. the positioning of the row output is actually a bit unstable - so I updated some of the tests to not always check the positioning of the output. It's more important that the output is there.

notes about the hack, this is what happens before the HACK

The SynapseStorage class is instantiated, when this happens, it queries the latest state of the fileview
New files or folders are added within the project which is contained within the scope of the fileview
It will throw an error because the fileview isn't re-queried for Dataset not found.

thomasyu888 · 2024-11-23T18:27:09Z

schematic/store/synapse.py

+        # HACK: must requery the fileview to get new files, since SynapseStorage will query the last state
+        # of the fileview which may not contain any new folders in the fileview.
+        # This is a workaround to fileviews not always containing the latest information
+        self.query_fileview(force_requery=True)


Post in the team channel to discuss this hack and there's a message in #synapse channel for us to chime in on

I wonder if there is something that we can do here with this API:

https://rest-docs.synapse.org/rest/POST/entity/id/table/query/async/start.html

Specifically:
"The last updated on date of the table (lastUpdatedOn) = 0x80"

If this get's updated as data is indexed into the table we could fetch the lastUpdatedOn field before we query the table to know if we need to re-query the table or not.

Another thing to note: if we look at the failing test that was the motivation behind this hack, it's because it falls outside of the overall schematic flow.

So for example: when people upload a bunch of files, they usually first run manifest generate. Now manifest generate will have a "this dataset doesn't exist" situation but then afterwards, the fileview should always consist of the data - theoretically. There have been incidents when it doesn't but that's very little in the grand scheme of things.

So we could remove the hack and trigger the fileview indexing within the test but that's probably best as a team decision (but that probably complicates things a bit as it requires re-querying within the synapse storage context that is being passed along per testing function)

@GiaJordan After a team discussion, we decided that it would be better to modify the test to re-query the fileview since this is such an edgecase. It does throw an error already, and upon multiple runs of this code outside of the test, it would work. This only doesn't work so smoothly in the test because resources are created and destroyed.

FAILED tests/integration/test_metadata_model.py::TestMetadataModel::test_submit_filebased_manifest_file_and_entities_valid_manifest_submitted - LookupError: Dataset syn64313762 could not be found in fileview syn23643253. FAILED tests/integration/test_metadata_model.py::TestMetadataModel::test_submit_filebased_manifest_file_and_entities_mock_filename - LookupError: Dataset syn64313765 could not be found in fileview syn23643253.

I commented out the HACK for now and the CI will run and I updated the HACK into the test: b8360cb

@thomasyu888 thanks for the information and the update!

thomasyu888 · 2024-11-23T18:27:48Z

schematic/store/synapse.py

@@ -705,7 +705,10 @@ def getFilesInStorageDataset(
            ValueError: Dataset ID not found.
        """
        file_list = []


Work with Bryan to see the difference in speeds between dev branch and prod branch within signoz.

#1552

Is necessary to filter out difference between branches in gh runs. It's possible to get the data now, just a bit more difficult to filter it out. As of now the average duration of this function plotted over time:

Feel free to pull develop into this feature branch and we can compare the develop branch to this feature branch performance.

Thanks @BryanFauble. Done

@thomasyu888 These are the results over the past 5 days. We can see your branch has a much better average execution time for this function. Although, similar to bwmac/SCHEMATIC-163/error-message-update for some reason.

We can selected on a few fields to filter for what we want, perform an average of the duration for the function, then group by a few fields to get these results.

Some other tests:

thomasyu888 · 2024-11-23T20:27:39Z

tests/integration/test_store_synapse.py

@@ -400,38 +400,31 @@ def test_mock_get_files_in_storage_dataset(
        with patch(
            "schematic.store.synapse.CONFIG", return_value=TEST_CONFIG
        ) as mock_config:
-            with patch.object(synapse_store, "syn") as mock_synapse_client:


This is me misleading Gianna, but there are times when we want a connection with Synapse and there are times we don't.

This is probably a case where it is ok for there to a direct connection with Synapse to simplify the mocking - since you've included in your other test/test_store.py the mocked version.

thomasyu888 · 2024-11-23T20:32:27Z

schematic/store/synapse.py

@@ -19,12 +19,10 @@
 import numpy as np
 import pandas as pd
 import synapseclient
-import synapseutils


If you're using vscode, you may see unused imports "grayed" out. It's ok to remove them in PRs, the intent is that each PR that we contribute should put the codebase in a better place.

Pylint would also catch this, but this module hasn't been gone through and fixed so that is passes, so it isn't included in the pylint check.

BryanFauble

I agree with the changes you've made on this so far, although, as you mention - includes some hacks.

…-tests

sonarqubecloud · 2024-12-03T01:38:00Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

linglp

Looks good to me. Thanks Tom!

thomasyu888 · 2024-12-03T15:13:06Z

@GiaJordan feel free to merge this after you review it since I may be on a flight. Thanks everyone for their reviews!

Let's make any additional changes on your branch

GiaJordan

Thanks for the changes @thomasyu888 looks good to me!

thomasyu888 added 3 commits November 22, 2024 18:10

Use magic mock and add parentId

6ea80d8

Clean up mocking

98552d3

Remove comments

040cc01

thomasyu888 marked this pull request as ready for review November 23, 2024 02:17

thomasyu888 requested a review from a team as a code owner November 23, 2024 02:17

thomasyu888 changed the title ~~Use magic mock and add parentId~~ [SCHEMATIC-183] Use magic mock and add parentId Nov 23, 2024

thomasyu888 mentioned this pull request Nov 23, 2024

[SCHEMATIC-183] Use paths from file view for manifest generation #1529

Merged

thomasyu888 added 2 commits November 22, 2024 18:23

Remove unused code

8f454ff

Edit mock patch

6a0383c

thomasyu888 changed the title ~~[SCHEMATIC-183] Use magic mock and add parentId~~ [SCHEMATIC-183] Update tests - Use magic mock and add parentId Nov 23, 2024

Add HACK

498af2b

thomasyu888 commented Nov 23, 2024

View reviewed changes

some tests should connect with synapse

98d345b

thomasyu888 commented Nov 23, 2024

View reviewed changes

Remove unused imports

c4676cb

thomasyu888 commented Nov 23, 2024

View reviewed changes

Update tests to be more robust around cli output checking

8dd2d09

BryanFauble approved these changes Nov 25, 2024

View reviewed changes

thomasyu888 added 2 commits November 25, 2024 12:14

Merge branch 'fds-2293-file-paths-for-manifest-gen' into fds-2293-fix…

0ee2fd2

…-tests

Merge branch 'fds-2293-file-paths-for-manifest-gen' into fds-2293-fix…

0d24776

…-tests

andrewelamb approved these changes Dec 2, 2024

View reviewed changes

thomasyu888 added 2 commits December 2, 2024 13:21

Remove the hack

a4a5dba

Force a query with a HACK within the tests

b8360cb

linglp approved these changes Dec 3, 2024

View reviewed changes

GiaJordan approved these changes Dec 3, 2024

View reviewed changes

GiaJordan merged commit 3261dcc into fds-2293-file-paths-for-manifest-gen Dec 3, 2024
8 checks passed

GiaJordan deleted the fds-2293-fix-tests branch December 3, 2024 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SCHEMATIC-183] Update tests - Use magic mock and add parentId #1554

[SCHEMATIC-183] Update tests - Use magic mock and add parentId #1554

thomasyu888 commented Nov 23, 2024 •

edited

Loading

thomasyu888 Nov 23, 2024

BryanFauble Nov 25, 2024

thomasyu888 Nov 26, 2024 •

edited

Loading

thomasyu888 Dec 2, 2024 •

edited

Loading

GiaJordan Dec 3, 2024

thomasyu888 Nov 23, 2024

BryanFauble Nov 25, 2024

BryanFauble Nov 25, 2024

thomasyu888 Nov 25, 2024

BryanFauble Nov 26, 2024 •

edited

Loading

thomasyu888 Nov 23, 2024

thomasyu888 Nov 23, 2024

andrewelamb Dec 2, 2024

BryanFauble left a comment

sonarqubecloud bot commented Dec 3, 2024

linglp left a comment

thomasyu888 commented Dec 3, 2024 •

edited

Loading

GiaJordan left a comment

[SCHEMATIC-183] Update tests - Use magic mock and add parentId #1554

[SCHEMATIC-183] Update tests - Use magic mock and add parentId #1554

Conversation

thomasyu888 commented Nov 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomasyu888 Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

thomasyu888 Dec 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BryanFauble Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BryanFauble left a comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Dec 3, 2024

Quality Gate passed

linglp left a comment

Choose a reason for hiding this comment

thomasyu888 commented Dec 3, 2024 • edited Loading

GiaJordan left a comment

Choose a reason for hiding this comment

thomasyu888 commented Nov 23, 2024 •

edited

Loading

thomasyu888 Nov 26, 2024 •

edited

Loading

thomasyu888 Dec 2, 2024 •

edited

Loading

BryanFauble Nov 26, 2024 •

edited

Loading

thomasyu888 commented Dec 3, 2024 •

edited

Loading