New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Simpler stats timing #1

Merged

TeachMeTW merged 2 commits into TeachMeTW:feature/stats_timing from JGreenlee:simpler_stats_timing

Oct 17, 2024

JGreenlee commented Oct 17, 2024

No description provided.

JGreenlee added 2 commits

October 16, 2024 21:22


          simplify tests for store_dashboard_time / store_dashboard_error

a24b03b


          move TestFunctionTiming -> TestStatsQueries

39f763d

JGreenlee commented

View reviewed changes

Author

JGreenlee left a comment

These are the changes I would recommend to make this test file more concise, more comprehensible, and more functional.
I documented my thought process in comments – I hope you will find them useful in your learning

emission/storage/decorations/stats_queries.py

                       metadata_key="stats/dashboard_time",
                       name=code_fragment_name,
                       ts=timestamp,
-                      reading=elapsed_ms
+                      reading=timer.elapsed_ms

Author

JGreenlee Oct 17, 2024

If you look at the source of emission.core.timer, you'll see that it actually has elapsed_ms as a property. No need to do the conversion yourself

emission/tests/funcTests/TestFunctionTiming.py Outdated

    
            @@ -1,32 +1,14 @@
          
              # emission/tests/funcTests/TestFunctionTiming.py

              # rename/move this to emission/tests/storageTests/TestStatsQueries.py

Author

JGreenlee Oct 17, 2024

The whole point of this file is to test store_dashboard_time and store_dashboard_error, right?
And those functions come from emission.storage.decorations.stats_queries.py.

So, this test file should be called TestStatsQueries and it should go under emission.tests.storageTests. Look at the other test files under emission.tests and you will see they follow the same convention.

emission/tests/funcTests/TestFunctionTiming.py Outdated

Comment on lines 8 to 11

+              import emission.core.timer as ect
+              import emission.storage.decorations.stats_queries as esdsq
+              import emission.storage.timeseries.abstract_timeseries as esta
+              import emission.tests.common as etc

Author

JGreenlee Oct 17, 2024

I adjusted these to the convention we use for imports within emission. We abbreviate from the full module path (including e for emission).

ect instead of ec_timer
esdsq instead of sdq

Author

JGreenlee Oct 17, 2024

I recommend listing the emission.* imports alphabetically for tidiness

emission/tests/funcTests/TestFunctionTiming.py Outdated

                   def test_single_function_timing(self):
                       """
                       Test execution and timing of a single function.
                       """
                       def sample_function():
-                          logging.info("Executing sample_function")
+                          logging.debug("Executing sample_function")

Author

JGreenlee Oct 17, 2024

Use logging.info sparingly; most of your less-important log statements should be logging.debug

emission/tests/funcTests/TestFunctionTiming.py Outdated

Comment on lines 39 to 42

+                      logging.debug(f"Ensuring {len(expected_entries)} entries exist in DB.")
+                      key_list = [key for (key, _, _) in expected_entries]
+                      stored_entrys = list(self.timeseries_db.find_entries(key_list))
+                      self.assertEqual(len(stored_entrys), len(expected_entries))

Author

JGreenlee Oct 17, 2024

Retrieving all entries with the relevant keys can be simplified to this

emission/tests/funcTests/TestFunctionTiming.py Outdated

Comment on lines 80 to 87

+                      with ect.Timer() as timer_both:
+                          with ect.Timer() as timer_one:
+                              function_one()
+                          esdsq.store_dashboard_time("function_one", timer_one)
-                          logging.info("Step 2: Data processing")
-                          time.sleep(1)  # Simulate processing time
-                          processed_data = {k: str(v).upper() for k, v in data.items()}
-                          logging.info(f"Processed data: {processed_data}")
+                          with ect.Timer() as timer_two:
+                              function_two()
+                          esdsq.store_dashboard_time("function_two", timer_two)

Author

JGreenlee Oct 17, 2024

You might have nested timers. This will be necessary if you want to time the "stages" of some larger operation, as well as the total time of the operation.

emission/tests/funcTests/TestFunctionTiming.py Outdated

Comment on lines 91 to 95

+                      # function_one should've taken ~1000ms; function_two should've taken ~1500ms;
+                      # both functions together should've taken ~2500ms
+                      self.assertAlmostEqual(timer_one.elapsed_ms, 1000, delta=100)
+                      self.assertAlmostEqual(timer_two.elapsed_ms, 1500, delta=100)
+                      self.assertAlmostEqual(timer_both.elapsed_ms, 2500, delta=100)

Author

JGreenlee Oct 17, 2024

Below, we will call verify_stats_entries, which will make sure that the recorded entries line up with what the timers recorded.

But I also wanted to verify that what the timers recorded is actually reasonable. function_one sleeps for 1s so it should finish in a little over 1000ms. I allowed 100ms tolerance by using assertAlmostEqual with delta=100. This was convenient because it saves us from writing out our own if / else blocks to handle tolerance

Note that I didn't specifically know that delta existed. I also couldn't remember the name for assertAlmostEqual. But I googled "unittest close to" which led me to documentation for assertAlmostEqual and I read about the delta parameter there.
My point here is that you don't need to know everything, but if you learn where to find things you will save time and write cleaner code

Owner

TeachMeTW Oct 17, 2024

I thought we didnt need tolerance?

emission/tests/funcTests/TestFunctionTiming.py Outdated

Comment on lines 114 to 120

+                          with ect.Timer() as timer:
+                              faulty_function()
+                      except ValueError as e:
+                          logging.info(f"Caught expected error: {e}")
+                          esdsq.store_dashboard_error('faulty_function', timer)
+                          pass

Author

JGreenlee Oct 17, 2024 •

edited

Loading

pass prevents the test from failing

emission/tests/funcTests/TestFunctionTiming.py Outdated

Comment on lines 128 to 130

		etc.configLogging()
		unittest.main()

Author

JGreenlee Oct 17, 2024

You had this before and I had mistakenly told you to remove it. I went back and looked at other files in emission.tests and saw that they do have it, and they also call etc.configLogging() which sets a logging format and the level to DEBUG. So you won't need to configure the logger in setUpClass

emission/tests/funcTests/TestFunctionTiming.py Outdated

Comment on lines 32 to 38

+                  def verify_stats_entries(self, expected_entries: list[tuple[str, str, float]]):
                       """
-                      Verifies that an entry for the given name exists under the specified key and matches the expected elapsed time.
+                      Verifies that each of these expected entries, in the form of (key, name, elapsed_ms),
+                      are stored in the database.
-                      Parameters:
-                      - key (str): The database key to query ("stats/dashboard_time" or "stats/dashboard_error").
-                      - name (str): The name of the function or code block.
-                      - expected_elapsed_ms (float): The expected elapsed time in milliseconds.
+                      :param expected_entries: A list of tuples which have (key, name, expected_elapsed_ms).
                       """

Author

JGreenlee Oct 17, 2024

verify_entry was good because it reduced the repeated lines of code that you had in a previous version.
However, this approach prevents you from checking for the existence of multiple entries, which is something Shankari asked for.

You need to be able to check for multiple expected entries, so I changed this to accept a list and renamed it to verify_stats_entries
After it queries the timeseries for relevant keys, it iterates and compares stored vs. expected values at each index

TeachMeTW marked this pull request as ready for review

October 17, 2024 15:56

TeachMeTW merged commit 1bdc922 into TeachMeTW:feature/stats_timing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet