Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make parsing log filenames future-proof #4518

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

europaul
Copy link
Contributor

Main commit is a8725c7. Instead of relying on a certain structure of log filenames, we should just use regexp to look for timestamps or UUIDs - this way the filename structure can change in the future, but it will still be backwards compatible with our filename parsing logic.

This PR contains the changes for pillar. In a later PR those changes will also be applied to newlog and edge-view.

@europaul europaul force-pushed the newlog-filenames branch 2 times, most recently from 02876f6 to 58d6ee9 Compare January 10, 2025 18:31
@europaul
Copy link
Contributor Author

the yetus warning can probably be ignored since we do the same in other files that import gomega

yetus: pkg/pillar/types/newlogtypes_test.go#L10
revive: should not use dot imports https://revive.run/r#dot-imports

@rene
Copy link
Contributor

rene commented Jan 13, 2025

the yetus warning can probably be ignored since we do the same in other files that import gomega

yetus: pkg/pillar/types/newlogtypes_test.go#L10 revive: should not use dot imports https://revive.run/r#dot-imports

Actually not in all places, like here gomega it's not dot imported: https://github.com/lf-edge/eve/blob/master/pkg/pillar/types/patchenvelopestypes_test.go#L10

I would avoid as much as we can to introduce new Yetus errors and in this case it's really not hard to fix it...

@rene
Copy link
Contributor

rene commented Jan 13, 2025

@europaul , LGTM and I liked the flexibility added by using regex. However, I would fix the Yetus error in order to avoid YAYEE (Yet Another Yetus Error on EVE 😃 )

Copy link
Member

@OhmSpectator OhmSpectator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we must address the same functions in the EdgeView package; otherwise, the next time we upgrade the Pillar package used by EdgeView, it will be broken.

pkg/pillar/types/newlogtypes.go Outdated Show resolved Hide resolved
@@ -85,21 +86,43 @@ type NewlogMetrics struct {
AppMetrics logfileMetrics // App metrics
}

// GetTimestampFromGzipName - get timestamp from gzip file name
func GetTimestampFromGzipName(fName string) (time.Time, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I see, we still use this func in another package, edgeveiew:

timestamp, err := types.GetTimestampFromGzipName(f.Name())
and
timestamp, err := types.GetTimestampFromGzipName(l.Name())

So, we should be careful here...

@europaul
Copy link
Contributor Author

I think we must address the same functions in the EdgeView package; otherwise, the next time we upgrade the Pillar package used by EdgeView, it will be broken.

The new functions are (mostly) backward compatible with the old ones. The only problem that I see is with logs that get created before the devices sets the correct time (probably through NTP), so the initial log files look like dev.log.123.gz instead of dev.log.1731491904032.gz. The new function will not pick up a timestamp from them since it doesn't fit into \b\d{13,16}\b regexp.

@europaul
Copy link
Contributor Author

As far as I see, we still use this func in another package, edgeveiew:

@OhmSpectator yes, I will change them once this is merged and I can update the pillar dependency.

Instead of parsing filenames according to a certain schema,
we should use regexp to look for timestamps or UUIDs. This
way the file names can change in the future without affecting
backward compatibility with the existing code for parsing
filenames.

Signed-off-by: Paul Gaiduk <paulg@zededa.com>
Since not all logs in persist are to be sent, we rename
LogRemainToSendMBytes to MaxGzipLogMBytesInPersist. This
parameter represents the maxinum size of all gzipped logs
in persist.

Signed-off-by: Paul Gaiduk <paulg@zededa.com>
@rene
Copy link
Contributor

rene commented Jan 14, 2025

I think we must address the same functions in the EdgeView package; otherwise, the next time we upgrade the Pillar package used by EdgeView, it will be broken.

The new functions are (mostly) backward compatible with the old ones. The only problem that I see is with logs that get created before the devices sets the correct time (probably through NTP), so the initial log files look like dev.log.123.gz instead of dev.log.1731491904032.gz. The new function will not pick up a timestamp from them since it doesn't fit into \b\d{13,16}\b regexp.

@europaul , can we affirm that the timestamp will always precede the file extension, like this <any file name>.<TIMESTAMP>.<extension (.gz, etc)> ?

If so, the following regex could be used \b(\d{3,16}).\w{2,3}, then the first occurrence will be the timestamp and it will cover the dev.log.123.gz case. If you want to be more picky with the timestamps, another options is to use this regex: \b(\d{13,16}|\d{3}).\w{2,3}, which will cover only valid timestamps and the case for 3 digits....

@OhmSpectator
Copy link
Member

As far as I see, we still use this func in another package, edgeveiew:

@OhmSpectator yes, I will change them once this is merged and I can update the pillar dependency.

So, we are expecting a new PR very after this one is merged... Have you tested the changes in edge view somehow?

Copy link
Member

@OhmSpectator OhmSpectator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run the tests

@europaul
Copy link
Contributor Author

If so, the following regex could be used \b(\d{3,16}).\w{2,3}, then the first occurrence will be the timestamp and it will cover the dev.log.123.gz case. If you want to be more picky with the timestamps, another options is to use this regex: \b(\d{13,16}|\d{3}).\w{2,3}, which will cover only valid timestamps and the case for 3 digits....

it's actually not just 3 digits - the timestamp can be anything between 0 and now microseconds.

@europaul
Copy link
Contributor Author

As far as I see, we still use this func in another package, edgeveiew:

@OhmSpectator yes, I will change them once this is merged and I can update the pillar dependency.

So, we are expecting a new PR very after this one is merged... Have you tested the changes in edge view somehow?

ok, let's postpone this merge until we can execute it really in sync with the PR for edge-view and newlog. I'll convert it to draft for now

@europaul europaul marked this pull request as draft January 15, 2025 05:31
appUUID = fStr[0]
appUUID, err := types.GetUUIDFromFileName(fName)
if err != nil {
log.Errorf("buildAppUUIDMap: cannot parse app file name %s: %v", fName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this generate/log an error when it is run on a device log filename?

// GetTimestampFromFileName extracts a microsecond timestamp from a filename
func GetTimestampFromFileName(filename string) (time.Time, error) {
// Regular expression to match a microsecond timestamp (sequence of 13-16 digits)
timestampRegex := regexp.MustCompile(`\b\d{13,16}\b`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it better to declare this as a variable in the file so that it is compiled at init time and not for each Get call?

return time.Time{}, fmt.Errorf("getTimestampFromGzipName: invalid log file name %s", fName)
// GetTimestampFromFileName extracts a microsecond timestamp from a filename
func GetTimestampFromFileName(filename string) (time.Time, error) {
// Regular expression to match a microsecond timestamp (sequence of 13-16 digits)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment below says millisecond and here you said microsecond. Which one is it?

// GetUUIDFromFileName extracts a UUID from a filename with dot-delimited parts
func GetUUIDFromFileName(filename string) (string, error) {
// UUID regex pattern (supports v4 UUIDs like "123e4567-e89b-12d3-a456-426614174000")
uuidRegex := regexp.MustCompile(`[a-fA-F0-9]{8}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{4}-[a-fA-F0-9]{12}`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define as a local variable in the file?


// Check each part for a UUID match
for _, part := range parts {
if uuidRegex.MatchString(part) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or call the UUID parser function we use elsewhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants