Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ext/har: add HAR logger extension #610

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

CameronBadman
Copy link

ext/har: add HAR logger extension

Port HAR (HTTP Archive) logging functionality from abourget/goproxy
to the ext/ package. Features include:

  • Request and response capture
  • Content body logging with toggle
  • HAR 1.2 spec compliance
  • Thread-safe operation
  • Comprehensive test coverage

This is a self-contained extension that doesn't modify core functionality.
Credit to abourget/goproxy for the original implementation.

Closes #609

Port HAR logging from abourget/goproxy to ext package.
Closes elazarl#609
ext/har/types.go Outdated Show resolved Hide resolved
ext/har/types.go Outdated Show resolved Hide resolved
ext/har/types.go Outdated Show resolved Hide resolved
ext/har/types.go Outdated Show resolved Hide resolved
ext/har/types.go Outdated Show resolved Hide resolved
ext/har/types.go Outdated Show resolved Hide resolved
ext/har/types.go Outdated Show resolved Hide resolved
ext/har/types.go Outdated Show resolved Hide resolved
ext/har/types.go Outdated Show resolved Hide resolved
ext/har/types.go Outdated Show resolved Hide resolved
ext/har/types.go Outdated Show resolved Hide resolved
ext/har/types.go Outdated Show resolved Hide resolved
ext/har/types.go Outdated Show resolved Hide resolved
ext/har/types.go Outdated Show resolved Hide resolved
ext/har/logger.go Outdated Show resolved Hide resolved
ext/har/logger.go Outdated Show resolved Hide resolved
ext/har/logger.go Outdated Show resolved Hide resolved
ext/har/logger_test.go Outdated Show resolved Hide resolved
Comment on lines 81 to 110
func (l *Logger) SaveToFile(filename string) error {
l.mu.Lock()
defer l.mu.Unlock()

file, err := os.Create(filename)
if err != nil {
return err
}
defer file.Close()

encoder := json.NewEncoder(file)
encoder.SetIndent("", " ")
return encoder.Encode(l.har)
}

// Clear resets the HAR log
func (l *Logger) Clear() {
l.mu.Lock()
defer l.mu.Unlock()
l.har = New()
}

// GetEntries returns a copy of the current HAR entries
func (l *Logger) GetEntries() []Entry {
l.mu.Lock()
defer l.mu.Unlock()
entries := make([]Entry, len(l.har.Log.Entries))
copy(entries, l.har.Log.Entries)
return entries
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HAR exporting part must be redesigned from scratch, here there are a bunch of issues.
Let me know if you want to do that, or in case I can fix this personally.

Basically, instead of this, the most common use will be the periodically send (e.g. every 5 minutes) of the HAR entries to an external logger API service or graphana or whatever.
So we have to accept a function from the user that, periodically, will consume the list of entries, pass it to this user function, and delete it from our queue.
The user will do whatever he wants with this list of entries, because he passed us its own function.

A goroutine will read from the entries channel and append to a local slice.
For the function call, there can be two options that the user can select:

  • every N requests we call the function (we implement this using a counter)
  • every time.Duration (e.g. 5 minutes) we call the user function

Let me know about this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually fixing all the issues you have pointed out right now, I can do this part tonight, but if not tomorrow. I haven't worked with go routines before, but I think I understand the general idea for the saving function. thank you.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah don't worry, it's holiday time, so feel free to take your time.
Your help is really appreciated here!
It's not an urgent feature.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nah I should be thanking you! the feedback is appreciated.

Yeah don't worry, it's holiday time, so feel free to take your time. Your help is really appreciated here! It's not an urgent feature.

Copy link
Author

@CameronBadman CameronBadman Jan 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ErikPelli went ahead and updated the code to the specification you asked for, I found it actually pretty difficult and it required a surprising amount of updates to the rest of the package, but I think it works correctly. Hope to hear from you soon.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CameronBadman you're doing a 100 milliseconds polling with mutex locks, this will dangerously increase the CPU usage and it's not the correct approach for Go.
You should create a channel where the OnReponse() function can send the parsed data.
The function that you called exportLoop() will then contonuously collect from this channel (no polling at all).
If the user specified a duration, in the exportLoop() you will have a select between a ticker with this duration and the data channel read.
When you read from the data channel, you append to a local response slice.
If the user specified a length amount (> 0) and it has been reached, or If the ticker has been triggered and length is > 0, you pass the slice to the callback and set the value of the local variable to nil.
You don't need mutexes at all with this approach, nor polling.

Comment on lines +190 to +216
// If it's not an IP address, perform a DNS lookup with a timeout
resolver := &net.Resolver{}
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

ips, err := resolver.LookupIP(ctx, "ip", host)
if err != nil {
// If lookup fails, just use the hostname
entry.ServerIpAddress = host
return
}

// Prefer IPv4, but fall back to IPv6 if necessary
for _, ip := range ips {
if ipv4 := ip.To4(); ipv4 != nil {
entry.ServerIpAddress = ipv4.String()
return
}
}

// If no IPv4 address found, use the first IP (IPv6) in the list
if len(ips) > 0 {
entry.ServerIpAddress = ips[0].String()
} else {
// If no IPs found, fall back to the hostname
entry.ServerIpAddress = host
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Server IP address is an optional field, leave it empty if net.ParseIP() fails, as I explained in the previous comment (#610 (comment)).
Don't call net.Resolver at all.

harPostData.MimeType = mediaType

if err := req.ParseForm(); err != nil {
log.Printf("Error parsing form: %v", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using "log" package, directly use the proxy logger.
You can access it by using ctx.Proxy.Logger.
Apply this change also to the other logging calls.

Comment on lines +326 to +329
statusText := resp.Status
if len(resp.Status) > 4 {
statusText = resp.Status[4:]
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this manual stuff, use http.StatusText(resp.StatusCode), directly inside the Response{} creation in the next lines

Comment on lines +56 to +57
const startingEntrySize int = 1000;
return make([]Entry, 0, startingEntrySize)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove ;
Never use semicolons!

Reformat the lines

Comment on lines +47 to +53
func (har *Har) AppendEntry(entry ...Entry) {
har.Log.Entries = append(har.Log.Entries, entry...)
}

func (har *Har) AppendPage(page ...Page) {
har.Log.Pages = append(har.Log.Pages, page...)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you exposing these functions if they are never used and the Har struct members are public?

Comment on lines +141 to +168
func (l *Logger) SaveToFile(filename string) error {
l.mu.Lock()
defer l.mu.Unlock()
file, err := os.Create(filename)
if err != nil {
return err
}
defer file.Close()

har := &Har{
Log: Log{
Version: "1.2",
Creator: Creator{
Name: "GoProxy",
Version: "1.0",
},
Entries: l.entries,
},
}

jsonData, err := json.Marshal(har)
if err != nil {
return err
}

_, err = file.Write(jsonData)
return err
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this function, it's unused

Comment on lines +171 to +185
func (l *Logger) Clear() {
l.mu.Lock()
defer l.mu.Unlock()
l.entries = make([]Entry, 0)
l.currentCount = 0
}

// GetEntries returns a copy of the current HAR entries
func (l *Logger) GetEntries() []Entry {
l.mu.Lock()
defer l.mu.Unlock()
entries := make([]Entry, len(l.entries))
copy(entries, l.entries)
return entries
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete these two functions, user will handle entries only inside its custom handler

captureContent bool
exportFunc ExportFunc
exportInterval time.Duration
exportCount int
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
exportCount int
exportThreshold int

Comment on lines +23 to +24
currentCount int
lastExport time.Time
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these variables, they are useless

Comment on lines +101 to +110
time.Sleep(200 * time.Millisecond)

// Verify HAR entry
entries := logger.GetEntries()
require.Len(t, entries, 1, "Should have one log entry")
entry := entries[0]
assert.Equal(t, tc.expectedMethod, entry.Request.Method, "Request method should match")

// Verify exported entries
assert.Len(t, exportedEntries, 0, "Should not have exported entries yet")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big NO, don't call GetEntries, this will lead to flaky tests.
Instead, you can add the test entries directly inside the callback handler, and it will be deterministic.

}

// Wait for export interval
time.Sleep(600 * time.Millisecond)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove time.Sleep from all your tests and use testify directly inside the callback handler to have a determinisitc behavior (maybe using request counter instead of time based calllback call).
And maybe use a sync.WaitGroup to wait for the finish of the callback, to make sure it's executed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HAR dumper
2 participants