Skip to content
This repository has been archived by the owner on Nov 26, 2021. It is now read-only.

Caching: factorizing / toughening #270

Merged
merged 7 commits into from
Oct 25, 2017

Conversation

valschneider
Copy link
Contributor

I've been struggling to reproduce the bugs generated by caching (I did encounter them in the past so I'm not doubting their existence), as such I mostly went through the code and improved what I thought needed to be improved.

I'm not re-enabling caching in this PR, but I will try to convince some people to re-enable it on their side and do some bug-finding for me.

Copy link
Contributor

@bjackman bjackman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this, I've just reviewed it hastily and I'll start testing it. Once I've been using it for a while I'll do a more careful review and then we can go ahead.

I hope you don't mind my being super conservative here, but broken trace parsing is a really scary so hopefully it's justified.

trappy/ftrace.py Outdated
# Check if cache is valid
if cache_info["md5sum"] != trace_md5sum:
warnstr = "Cached data is from another trace, invalidating cache."
warnings.warn(warnstr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need these warnings - I think I supported them when we first did the caching thing, but actually they're quite irritating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that on a "finalized" version it should go, but since I'm still sort of hunting for cache issues I find it quite helpful.

trappy/ftrace.py Outdated

# Check if cache can be used with given parameters
# Convert to a json string for comparison
if json.dumps(cache_info["params"]) != json.dumps(params):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lazy question: is 'basetime' redundant in the cache_info, i.e. if it changes won't 'params' also be different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably is. It was already saved in the previous implementation - it's mostly to quickly restore it, as it's obtained when parsing the trace.txt file.

trappy/ftrace.py Outdated
from subprocess import check_output

cmd = ["rm", self.trace_path]
check_output(cmd)
Copy link
Contributor

@bjackman bjackman Oct 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of overwriting and deleting trace.txt (which someone else might have created), can't we just use a NamedTemporaryFile or similar? i.e. totally ignore every file other than the cache and trace.dat, if we need to parse the trace then write the textual version to a tempfile, set up the cache, and then throw away the temp file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could. I instored the txt_deletable flag to detect whether or not the trace.txt has been created by a user, but maybe there are cases that are not covered ?

trappy/ftrace.py Outdated
shutil.rmtree(cache_path)
return False

self.cache_window_differs = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we have had correctness issues with this stuff in the past I would not really be comfortable with all this window checking unless there is really good test coverage.. if you have time to add that, then let's go ahead, but otherwise I am more than happy to live with overly-aggressive cache invalidation in return for a little bit more confidence about correctness. After all I don't think the window param even gets used very often..

Copy link
Contributor Author

@valschneider valschneider Oct 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't use the window parameters that much myself either - I manually trim the dataframes if I have to. Thing is, if someone does use them, this makes quite a big difference in terms of parsing time.

Also, if windows aren't that useful, maybe we could consider getting rid of them (or simplifying them to a single window parameter instead of two). As you can see in the code, handling these windows can get very confusing very fast.

self.assertGreater(
trace2.sched_wakeup.data_frame.index[-1],
trace1.sched_wakeup.data_frame.index[-1]
)
Copy link
Contributor

@bjackman bjackman Oct 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't spot this, it should probably go in the same commit as the functionality it tests. This coverage definitely helps, but as I said I'd rather be really conservative here and only go for the window checking if we have tests for multiple corners of the input space.

@valschneider
Copy link
Contributor Author

  • Removed noise
  • Squashed window test commit into window-aware commit

@valschneider
Copy link
Contributor Author

I'm looking into using a Tempfile instead of a trace.txt whenever possible. Sadly, I realised my "Delete trace.txt if caching is enabled" commit isn't actually working. If the txt file is correctly deleted after the cache is created, it is re-generated even if the cache exists. I can fix this but it requires a fair amount of rework, stay tuned.

@valschneider
Copy link
Contributor Author

GenericFTrace has gone through a facelift, and I've broken systrace (I haven't tested it but I'm sure I did), however it looks like there are no systrace tests so I'll add some next week.

This "v2" doesn't have any window check as @bjackman suggested, but it has:

  • Shiny new factorized code for GenericFtrace
  • trace-cmd report outputted to a tempfile that is deleted once parsing is complete

@bjackman
Copy link
Contributor

Cool, thanks, I'll start testing this first and then review later.

@valschneider
Copy link
Contributor Author

Surprisingly, I hadn't broken systrace. I did make it parse the file twice though, this is fixed now. I also realized that removing the file extension from the cache dir name wasn't such a smart idea: there could be a trace.dat and trace.html living in the same directory, and they should each get their own cache.

tl;dr:

  • Fixed Systrace and added Systrace tests
  • Reverted cache directory format from ".trace.cache/" to ".trace.{txt/dat/html}.cache/"

@valschneider
Copy link
Contributor Author

valschneider commented Oct 17, 2017

@bjackman: I had a chat with @derkling, we could have a fairly simple handling of the windows. If caching is enabled, we would disregard windows and parse the whole trace to cache it, and then return the requested window. As such, whatever window may be requested later on, no more trace parsing would be required.

@bjackman
Copy link
Contributor

OK, makes sense

@valschneider
Copy link
Contributor Author

Pushed that cached-window-handling thing, but for some reason the PR page won't update.

@valschneider
Copy link
Contributor Author

Okay looks like it eventually made its way there.

Copy link
Contributor

@bjackman bjackman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this effort, it's making me feel a lot more comfortable, I think this will end up being really worthwhile.

Also, I say add a commit at the end enabling caching by default. I haven't seen any issues while I've been testing.

trappy/ftrace.py Outdated
@@ -253,7 +255,8 @@ def _do_parse(self):
self._parsing_teardown()

def _parsing_setup(self):
pass
# Parsing can require an intermediate file
self.file_to_parse = self.trace_path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the default is that file_to_parse is trace_path, but a subclass may override it by creating an intermediate file and setting file_to_parse to the path to that, right? Makes sense but could you extend the comment a bit?

"By default we parse the file at trace_path directly, but subclasses can override this and set file_to_parse to an intermediate file" or something

trappy/ftrace.py Outdated
trace_name = os.path.basename(self.trace_path)
trace_dir = os.path.dirname(os.path.abspath(self.trace_path))
cache_dir = '.' + trace_name + '.cache'
cache_path = os.path.join(trace_dir, cache_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something.. what does this hunk do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a piece of orphaned code that ended up here after a few rewrites. I'll clean that up.

trappy/ftrace.py Outdated
elif os.path.isfile(trace_txt):
# Warn users if txt file is all we have
warnstr = "Reading from .txt file, .dat is preferred " +\
"as it takes less disk space"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not just disk space - we also don't know the format of trace.txt due to some events needing raw formatting and not others - it will only work if generated with a specific trace-cmd command. So I think this warning needs to be a bit shoutier.

trappy/ftrace.py Outdated
if self.read_from_dat:
from subprocess import check_output
cmd = ["rm", self.file_to_parse]
check_output(cmd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.remove

trappy/ftrace.py Outdated
@@ -680,24 +683,22 @@ def __init__(self, path=".", name="", normalize_time=True, scope="all",
def _parsing_setup(self):
super(FTrace, self)._parsing_setup()

self.__generate_trace_txt(self.trace_path)
if self.read_from_dat:
self.file_to_parse = self.__generate_trace_txt(self.trace_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say add a comment here to explain that if not self.read_from_dat, we're just leaving self.file_to_parse as the superclass set it.

trappy/ftrace.py Outdated
return trace_to_read

def __generate_trace_txt(self, trace_dat):
return self.__run_trace_cmd_report(trace_dat)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move the body of __run_trace_cmd_report in here, or remove this method and just call __run_trace_cmd_report directly.

trappy/ftrace.py Outdated
for trace_class in self.trace_classes:
try:
csv_file = self._get_csv_path(trace_class)
trace_class.read_csv(csv_file)
self._windowify_class(trace_class, self.max_window)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have a comment here explaining that the cache was written without the window applied, so we need to apply it after reading.

trappy/ftrace.py Outdated
(timestamp < self.abs_window[0]):
# Disregard windows if updating/creating cache
# This way the cache is ready for later window requests
if not self.__class__.disable_cache:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is unecessary complexity: disabled caching is supposed to be a niche feature for testing and debugging, so we don't care about performance in that case. So I think when parsing a trace file we can always ignore the window, and then always apply the window after parsing is finished, using the new windowify methods. Does that work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work, but the incentive for doing this with a cache is that we're paying a cost (parsing time) that will later on be cushioned by the use of the cache. If caching is disabled, we should still mind the window so we stop parsing the trace once we're after the window. I understand disabled cache should become an exotic use-case but I felt like this was still somewhat justified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see where you're coming from but I think we should make our task as easy as possible by only having one use case that we really optimise, i.e. cache_disabled=False

Copy link
Contributor Author

@valschneider valschneider Oct 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll remove the if check.

trappy/ftrace.py Outdated
# Disregard windows if updating/creating cache
# This way the cache is ready for later window requests
if not self.__class__.disable_cache:
self.max_window = self._max_window()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this is set in two separate places depending on whether we're reading from the cache, but it's actually independent of that. Can we just do it in one place at the beginning? Or even make self.max_window a property?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I wanted to do, sadly it needs to be set in two separate places. If we have the cache, we can read the basetime parameter from it and create the max window. If not, we must start parsing the file to finally get the basetime and then we can create that max window value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh I see. Can we have a comment like "now that we know the basetime we can derive max_window"?

trappy/ftrace.py Outdated
@@ -273,6 +319,14 @@ def __get_trace_class(self, line, cls_word):
return trace_class
return trace_class

def __before_window(self, timestamp):
return (timestamp < self.window[0] + self.basetime) or \
(timestamp < self.abs_window[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, too lazy to figure this out myself: can this be simplified using the new max_window thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So at first the code was:

if (timestamp < self.window[0] + self.basetime) or \
(timestamp < self.abs_window[0]):
    self.lines += 1
    continue

if (self.window[1] and timestamp > self.window[1] + self.basetime) or \
    (self.abs_window[1] and timestamp > self.abs_window[1]):
    return

I then turned each if condition in a method (__before_window and __after_window) in the Methodify window inclusion check commit. These methods are used in the computation of max_window, but it still makes sense to me to keep them as they are used in __populate_data for two different checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait no, the __{before/after}_window methods are no longer used in the computation of max_window, that was in a previous version.

@valschneider
Copy link
Contributor Author

@bjackman I think I addressed all of yours comments.

I also made a slight change to _do_parse: I factorized the windowification and time normalization in a _apply_user_parameters method

@valschneider
Copy link
Contributor Author

valschneider commented Oct 19, 2017

Food for thoughts: had a chat with @derkling, following the "disregard window" initiative we could also parse all possible events to store them in the cache, so we would only need to cache a trace once, no matter the parameters. This might be a bit more extreme than to simply disregard windows, so it might be better to have it in another PR - but it's a thing to consider, at least.

Compute the maximum window of the 'window' & 'abs_window' intersection
"""
max_window = [0, None]
max_window[0] = max(self.window[0] + self.basetime, self.abs_window[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, so an event is dropped if it's earlier than either window[0]s or later than both window[1]s? That doesn't sound good. But OK, let's leave it for another day.

trappy/ftrace.py Outdated
# Disregard windows if updating/creating cache
# This way the cache is ready for later window requests
if not self.__class__.disable_cache:
self.max_window = self._max_window()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh I see. Can we have a comment like "now that we know the basetime we can derive max_window"?

trappy/ftrace.py Outdated
# window requests
self.max_window = self._max_window()
self.window = (0, None)
self.abs_window = (0, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of setting a window that does nothing here, why don't we just delete the __before/after_window checks (and methods themselves) below?

AFAICT self.window and self.abs_window are no longer used except to calculate self.max_window.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to truly commit to having cache enabled by default, I suppose we could, yes.

trappy/ftrace.py Outdated

# file_to_parse is already set to trace_path in the superclass,
# so no "else" is needed here

self.__populate_metadata()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer to put self.__populate_metadata at the end of __init__ directly, putting it here gives a false impression that it needs to be done at this specific stage of parsing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SysTrace is also a subclass of GenericFtrace but does not have any __populate_metadata method, which is why I put it here.

Copy link
Contributor

@bjackman bjackman Oct 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But can't it just go at the end of FTrace.__init__?

Copy link
Contributor Author

@valschneider valschneider Oct 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, good thing you told me to look at that. Turns out I hadn't updated __populate_metadata to use file_to_parse instead of trace_path, so it wasn't doing anything actually. I'll fix that and add a test.

Also, yes, you're right, it can be moved there. I thought you were talking about the superclass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh ! Metadata aren't being saved in the cache ! This means we were missing out things like trace._cpus, which would explain some of the issues we had with cache. Thanks for pointing me in the right direction, I got some more work to do...

@bjackman bjackman closed this Oct 19, 2017
@bjackman bjackman reopened this Oct 19, 2017
@bjackman
Copy link
Contributor

bjackman commented Oct 19, 2017

Sorry, that "Close and comment" button always looks so much like a "cancel posting this comment".

Food for thoughts: had a chat with @derkling, following the "disregard window" initiative we could also parse all possible events to store them in the cache, so we would only need to cache a trace once, no matter the parameters. This might be a bit more extreme than to simply disregard windows, so it might be better to have it in another PR - but it's a thing to consider, at least.

Basically: yeah, agree.

@valschneider
Copy link
Contributor Author

valschneider commented Oct 19, 2017

Can't seem to reply directly to 2 of your comments, replying here:

Huh, so an event is dropped if it's earlier than either window[0]s or later than both window[1]s? That doesn't sound good. But OK, let's leave it for another day.

We never encounter this because we either use no window, or just one (never both at the same time). But yes, it's a bit weird.

Ahhh I see. Can we have a comment like "now that we know the basetime we can derive max_window"?

Will do.

@valschneider
Copy link
Contributor Author

Removed the window checks altogether from _do_parse()

@valschneider
Copy link
Contributor Author

valschneider commented Oct 20, 2017

Alright, we're getting there - thanks a lot @bjackman for the careful review.

I've (once again) re-arranged the commits and did some pedantic changes here and there, but it's roughly what it was before with the following changes:

  • Fiddling with trace metadata made me realise what I called 'cache_info' is really just metadata, so the file is now named 'metadata.json' and it makes more sense to me.
  • Extra data can be appended in to this file with by overriding the _get_extra_data_to_cache and _load_extra_data_from_cache methods. In the case of FTrace, this is used to store trace metadata (lines of data that appear in the trace but are not actual trace events).
  • Added cached/uncached FTrace metadata test

I think it would be clearer to put self.__populate_metadata at the end of init directly, putting it here gives a false impression that it needs to be done at this specific stage of parsing.

Right now it has to be in there, otherwise metadata will not be cached. Might not be the most obvious implementation though.

@valschneider valschneider force-pushed the caching/bugfix branch 2 times, most recently from 8cfa918 to 9aba7e1 Compare October 20, 2017 11:21
@valschneider
Copy link
Contributor Author

Fixed an annoying warning that said cache was invalid when it simply hadn't been created yet.

Copy link
Contributor

@bjackman bjackman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, getting there.

trappy/ftrace.py Outdated

with open(trace_metadata_path, 'w') as f:
json.dump(trace_metadata, f)

def _get_csv_path(self, trace_class):
path = self._trace_cache_path()
return os.path.join(path, trace_class.__class__.__name__ + '.csv')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think _get_csv_path is supposed to be down with the rest of the cache bits.

trappy/ftrace.py Outdated
@@ -138,14 +139,41 @@ def unregister_parser(cls, cobject):
if cobject == obj:
del scope_classes[name]

def _max_window(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some pedantry just occured to me: this should be called _calc_max_window

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes this is a terrible name, I'll change it.

def test_ftrace_metadata(self):
"""Test that caching keeps trace metadata"""
GenericFTrace.disable_cache = False
trace = trappy.FTrace()
Copy link
Contributor

@bjackman bjackman Oct 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think technically you need to construct the trace twice to ensure there's a cache - tests are supposed to be self-contained but IIRC this is relying on other tests to create the cache? In fact you may find that TRAPpy's infra is causing this to not really test anything.

I'd actually go so far as to suggest just calling self.test_cache_created() at the beginning of this test.

trappy/ftrace.py Outdated

# Load metadata
self.basetime = metadata["basetime"]
self.max_window = self._max_window()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: IMO setting max_window doesn't belong here, if possible it should go after _load_cache() in _do_parse

Copy link
Contributor Author

@valschneider valschneider Oct 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're parsing (and not using the cache), it has to be set in __populate_data. As such, I would like to keep a "symmetry", in that it is not done in _do_parse but in each appropriate submethod. Otherwise it feels a bit confusing to me: in the scope of _do_parse, a newcomer would see max_window is only being sent after the cache is loaded (though that is my own opinion).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yep, that does make sense actually.

trappy/ftrace.py Outdated
open(self.trace_path, 'rb').read()
).hexdigest()
metadata["basetime"] = self.basetime
metadata["extra"] = self._get_extra_data_to_cache()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having an "extra" key in metadata and _get_extra_data_to_cache and _load_extra_data_from_cache, have you considered just having _get_metadata_to_cache and _load_metadata_from_cache, then having the subclasses use super then extend the result?

So, GenericFTrace._get_metadata_to_cache just returns {"md5sum": <hash>, "basetime": self.basetime"}, then FTrace overrides it like:

def _get_metadata_to_cache(self):
    ret = super(FTrace, self)._get_metadata_to_cache()
    ret['_version'] = self._version
    ret['_cpus'] = self._cpus
    return ret

Then GenericFTrace._load_metadata_from_cache (called before _is_cache_valid) just sets self.basetime, and FTrace overrides it like:

def _load_metadata_from_cache(self, metadata):
    ret = super(FTrace, self)._load_metadata_from_cache(metadata):
    self._cpus = metadata['_cpus']
    self._version = metadata['_version']

It feels more in the noble spirit of OOP, rather than drawing a line in the inheritance tree after which everything is 'extra'. Do you think that's clearer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS one benefit of writing it this way is you can grep for "self._cpus =" when debugging. That's orthogonal to the inheritance design though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did think about that - it would have worked the same way as _parsing_setup & _parsing_teardown.

The thing is, file_to_parse is set in _parsing_setup but it is meant to be changed. So you can override it and not even call the super method, and it's fine.
You could break _get_metadata_to_cache by not calling the super method (though that'd just be stupid) OR by writing over pre-existing metadata values (unlikely, but I wanted to play safe). The idea of that "extra" key is to provide a "safe" dict where any key can be used without compromising the required base metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably this "extra" dict could be introduced by whatever subclass is overriding this method... So if you're okay with it we could have the metadata handling be done in overridable methods as you suggested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I considered your points I became more sure that super is better 😁

You could break _get_metadata_to_cache by not calling the super method (..) OR by writing over pre-existing metadata values (..). The idea of that "extra" key is to provide a "safe" dict where any key can be used without compromising the required base metadata.

I guess this is a question of how to do coupling between parent and child classes - should the child be aware of the parent (must call the super method and be aware of keys it shouldn't touch) or the parent be aware of the child (must define and call an "extra" method and write an "extra" key). super is a language feature and all child classes must be aware of their parents anyway, so I think the former is more harmonious.

On the other hand, the thing about the child having to know which keys it mustn't touch is a good point. I'm not sure how much it matters when the classes are this tightly coupled anyway, though (they live in the same file).

Anyway, this is probably getting a bit too philosophical so I'll leave the decision to you :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that all things considered, the overriding of a base _get/load_metadata_from/to_cache is the less confusing way to go. I'll still use an "extra" key, but defined in the child.

@valschneider
Copy link
Contributor Author

valschneider commented Oct 23, 2017

@valschneider valschneider force-pushed the caching/bugfix branch 2 times, most recently from d3655cd to 978fd87 Compare October 24, 2017 09:14
@valschneider
Copy link
Contributor Author

I think I've addressed all of your comments:

  • Fixed the metadata caching test
  • Turned the caching metadata methods into proper OOP citizens

Valentin Schneider added 5 commits October 24, 2017 10:49
Cache information were spread out in several files: md5sum, params.json,
basetime. This commit makes it so a single file is created and used,
metadata.json
_do_parse() was defined in BaseFtrace, and was used in all of its
daughter class' __init__ method.

This commit flips this around:
_do_parse() is now called in BaseFtrace's __init__, and its daughter
classes can override the newly created _parsing_setup() and
_parsing_teardown() methods to tailor the parsing to their needs.
FTrace metadata (cpus, version) were not being saved to the cache.
This meant that this information would be missing when reading a
trace from the cache.
@valschneider
Copy link
Contributor Author

trappy/ftrace.py Outdated

self.__populate_trace_metadata(self.__get_trace_metadata())


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just discovered GitHub doesn't have a way to highlight it (??? wtf github ???) but I happened to view this in my terminal and there are 8 spaces on this empty line

Valentin Schneider added 2 commits October 25, 2017 17:52
This change also means we will ignore any pre-exisiting trace.txt
file, as long as a trace.dat is available. The tempfile is removed
after parsing.
@valschneider valschneider force-pushed the caching/bugfix branch 2 times, most recently from b9ebbef to 93d4548 Compare October 25, 2017 16:58
@valschneider
Copy link
Contributor Author

Undesired whitespaces have been trialled and sentenced.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants