Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERR panic: runtime error: invalid memory address or nil pointer dereference^ #204

Open
filidav opened this issue Sep 26, 2018 · 47 comments
Open

Comments

@filidav
Copy link

filidav commented Sep 26, 2018

2018-09-26T11:07:47.66-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"service-instance_908b922d-8768-4769-b9fd-8e8a383cef11","event_type":"ValueMetric","ip":"","job":"mongod_node","job_index":"fe539b85-6546-4098-ac9f-cd98e4eafcb1","level":"info","msg":"","origin":"bosh-system-metrics-forwarder","system.swap.percent":1,"time":"2018-09-26T15:07:47Z","unit":"Percent"}
2018-09-26T11:07:47.66-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.72","job":"router","job_index":"dcea8fb8-5b51-4510-9873-fd1551c8d912","level":"info","msg":"","origin":"gorouter","route_lookup_time":4900,"time":"2018-09-26T15:07:47Z","unit":"ns"}
2018-09-26T11:07:47.66-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.72","job":"router","job_index":"dcea8fb8-5b51-4510-9873-fd1551c8d912","latency":8,"level":"info","msg":"","origin":"gorouter","time":"2018-09-26T15:07:47Z","unit":"ms"}
2018-09-26T11:07:47.66-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","LeasesRenewRequestTime":47.699400000000004,"cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.34","job":"diego_database","job_index":"6babf9d0-2623-40ae-943a-50fee1eca036","level":"info","msg":"","origin":"silk-controller","time":"2018-09-26T15:07:47Z","unit":"ms"}
2018-09-26T11:07:47.75-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","LeasesIndexRequestTime":10.113199999999999,"cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.34","job":"diego_database","job_index":"6babf9d0-2623-40ae-943a-50fee1eca036","level":"info","msg":"","origin":"silk-controller","time":"2018-09-26T15:07:47Z","unit":"ms"}
^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR panic: runtime error: invalid memory address or nil pointer dereference^[[0m
^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x764298]^[[0m
^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR goroutine 27 [running]:^[[0m
^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR git/firehose-to-syslog/caching.(*CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc420194480, 0x9b0aa1, 0x4, 0xc4201fc5d0, 0x24, 0x0, 0x0, 0x0)^[[0m
^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR ........../caching/caching_lazyfill.go:145 +0x88^[[0m
^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR git/firehose-to-syslog/caching.(*CacheLazyFill).getEntity(0xc420194480, 0x9b0aa1, 0x4, 0xc4201fc390, 0x24, 0xc420231cf0, 0xc420231cf0, 0x4bd94e)^[[0m
^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR .........caching/caching_lazyfill.go:121 +0x2bc^[[0m
^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR git/caching.(*CacheLazyFill).GetApp(0xc420194480, 0xc4201fc390, 0x24, 0x1, 0x1, 0xc4201fc390)^[[0m
^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR .........caching/caching_lazyfill.go:242 +0x8a^[[0m
^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR git/events.(*Event).AnnotateWithAppData(0xc420414510, 0xa1dca0, 0xc420194480)^[[0m
^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR .........events/events.go:129 +0x12f^[[0m
^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR git./eventRouting.(*EventRoutingDefault).RouteEvent(0xc42009b680, 0xc4203242d0)^[[0m
^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR .........eventRouting/eventrouting_default.go:68 +0x14d^[[0m
^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR git/firehoseclient.(*FirehoseNozzle).ReadLogsBuffer(0xc4203e28c0, 0xa21960, 0xc4203e44c0)^[[0m
^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR .........firehoseclient/firehoseclient.go:119 +0x1b3^[[0m
^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR created by git/firehoseclient.(*FirehoseNozzle).Start^[[0m
^[[31m2018-09-26T11:07:47.77-0400 [APP/PROC/WEB/1] ERR .........firehoseclient/firehoseclient.go:77 +0xb4^[[0m
2018-09-26T11:07:47.80-0400 [APP/PROC/WEB/1] OUT Exit status 2
2018-09-26T11:07:47.80-0400 [CELL/SSHD/1] OUT Exit status 0
2018-09-26T11:07:47.84-0400 [CELL/1] OUT Cell db1be119-8594-481d-92bc-59eb0dc30522 stopping instance 790d8b14-45b7-49aa-6d29-257f
2018-09-26T11:07:47.84-0400 [CELL/1] OUT Cell db1be119-8594-481d-92bc-59eb0dc30522 destroying container for instance 790d8b14-45b7-49aa-6d29-257f
2018-09-26T11:07:48.20-0400 [CELL/1] OUT Cell db1be119-8594-481d-92bc-59eb0dc30522 successfully destroyed container for instance 790d8b14-45b7-49aa-6d29-257f

##################\

We think it is this.

import errors

if nv == nil {
			return errors.New("invalid nv entity")
		}
@shinji62
Copy link
Contributor

It seems is there https://github.com/cloudfoundry-community/firehose-to-syslog/blob/develop/caching/caching_lazyfill.go#L145 and according to the stacktrace nv is 0x0 (nil)

Not sure why this happen, @aeijdenberg any idea ?

@filidav
Copy link
Author

filidav commented Sep 27, 2018 via email

@shinji62
Copy link
Contributor

same no ? https://github.com/cloudfoundry-community/firehose-to-syslog/blob/master/caching/caching_lazyfill.go#L145

@filidav
Copy link
Author

filidav commented Sep 27, 2018 via email

@shinji62
Copy link
Contributor

I will propose a fix soon. But the problem is I don't know how you get this issue.

So I can reproduce in my env.

Thanks.

@filidav
Copy link
Author

filidav commented Sep 27, 2018 via email

@shinji62
Copy link
Contributor

Can you checkout the branch fix/nil_error and try in your env.
Thanks

@aeijdenberg
Copy link
Contributor

@shinji62 - that might fix the symptom, but I don't think helps with the cause. e.g. I can't understand why https://github.com/cloudfoundry-community/firehose-to-syslog/blob/develop/caching/caching_lazyfill.go#L180 would ever see a nil r.Entity. A caller shouldn't need to check for a nil, it simply shouldn't be returning one.

ie why would the API ever return a GUID for an app, but no entity record? I'd to see the full response that came back from the server....

@shinji62
Copy link
Contributor

@aeijdenberg I don't know neither ... I can't reproduce..
I just provided a quick / dirty fix.

@aeijdenberg
Copy link
Contributor

It's not a fix though - it just masks what could be a lurking and legitimate problem on either the server, or our understanding of how the API call is meant to work.

Maybe could try something like this first: #205

@shinji62
Copy link
Contributor

yeah quick and dirty :) booohhh

@elliottpolk
Copy link
Contributor

Hi, I'm a colleague of @filidav who's been advising on this issue. I recommend a couple of things for this:

Do the nil check in both places. For FilleCache() this fixes that specific nil issue. Including if nz == nil {...} in the normaliseAndSaveEntityToDatabase(...) will future proof this for any other callers trying to do this.

I'd also recommend updating the fix/nil_error patch to look like:

// Populate bolt with all apps
for guid, app := range allApps {
    // escape early to avoid unnecessary allocations
    if app == nil {
        continue
    }

    // Canonicalise the guid
    u, err := uuid.FromString(guid)
    if err != nil {
        return err
    }

    uuid := u.String()

    // ...
}
 

@aeijdenberg
Copy link
Contributor

It is counter-productive to check for nil and then continue. Please don't do that.

Why is that a bad idea? Because it's masking a different problem - these should never be nil to begin with - that they are indicates a bug or API misunderstanding elsewhere.

I think it's reasonable to check and raise an error that can help diagnose the issue, but to check and continue as though there's no issue is worse than crashing.

I would be interested to see what log messages result if you merge #205. That should print the URL which you could curl to look at the response and see if we can figure out why key elements of the JSON aren't being populated that we expect to be.

@filidav
Copy link
Author

filidav commented Sep 28, 2018

If I do test this will ever be merged into master?

@aeijdenberg
Copy link
Contributor

I think if you test it, then you'd be helping us figure out what the actual bug is and where it might lie.

@filidav
Copy link
Author

filidav commented Sep 28, 2018

Sounds good
I can assume there have no changes to this branch since yesterday?

@elliottpolk
Copy link
Contributor

It is counter-productive to check for nil and then continue. Please don't do that.

Why is that a bad idea? Because it's masking a different problem - these should never be nil to begin with - that they are indicates a bug or API misunderstanding elsewhere.

I think it's reasonable to check and raise an error that can help diagnose the issue, but to check and continue as though there's no issue is worse than crashing.

I would be interested to see what log messages result if you merge #205. That should print the URL which you could curl to look at the response and see if we can figure out why key elements of the JSON aren't being populated that we expect to be.

I agree that "hiding" this issue with a continue is bad practice depending on the situation. In this case, it is probably not good to continue with only part of the data.

@filidav
Copy link
Author

filidav commented Sep 28, 2018

Deployed the firehose nozzle app with this branch
Error is back

2018-09-28T08:03:13.561-04:00 [APP/PROC/WEB/1] [OUT] [2018-09-28 12:03:13.561663524 +0000 UTC m=+0.050649701] Using wss://doppler.sys.use.sandbox.pcf.manulife.com:443 as doppler endpoint
2018-09-28T08:03:13.561-04:00 [APP/PROC/WEB/1] [OUT] [2018-09-28 12:03:13.561752924 +0000 UTC m=+0.050739801] Pre-filling cache...
2018-09-28T08:03:18.011-04:00 [APP/PROC/WEB/0] [OUT] [2018-09-28 12:03:18.011354613 +0000 UTC m=+4.617422301] Cache filled.
2018-09-28T08:03:18.115-04:00 [APP/PROC/WEB/0] [OUT] [2018-09-28 12:03:18.115699613 +0000 UTC m=+4.721767701] Connected to Syslog Server! Connecting to Firehose...
2018-09-28T08:03:18.175-04:00 [APP/PROC/WEB/1] [OUT] [2018-09-28 12:03:18.175488127 +0000 UTC m=+4.664474004] Cache filled.
2018-09-28T08:03:18.256-04:00 [APP/PROC/WEB/1] [OUT] [2018-09-28 12:03:18.256154427 +0000 UTC m=+4.745140904] Connected to Syslog Server! Connecting to Firehose...
2018-09-28T09:21:33.918-04:00 [APP/PROC/WEB/0] [ERR] panic: runtime error: invalid memory address or nil pointer dereference
2018-09-28T09:21:33.918-04:00 [APP/PROC/WEB/0] [ERR] [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78dae4]
2018-09-28T09:21:33.918-04:00 [APP/PROC/WEB/0] [ERR] goroutine 66 [running]:
2018-09-28T09:21:33.918-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(*CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc0001751a0, 0x9e481d, 0x4, 0xc00022e420, 0x24, 0x0, 0x0, 0x0)
2018-09-28T09:21:33.918-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:148 +0x84
2018-09-28T09:21:33.918-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(*CacheLazyFill).getEntity(0xc0001751a0, 0x9e481d, 0x4, 0xc00022e390, 0x24, 0xc0002efcf0, 0xc0002efcf0, 0x4c3540)
2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:124 +0x2b8
2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(*CacheLazyFill).GetApp(0xc0001751a0, 0xc00022e390, 0x24, 0x1, 0x1, 0xc00022e390)
2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:245 +0x8a
2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/events.(*Event).AnnotateWithAppData(0xc00022ba10, 0xa82e80, 0xc0001751a0)
2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/events/events.go:129 +0x129
2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting.(*EventRoutingDefault).RouteEvent(0xc000236ff0, 0xc0001d0b40)
2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting/eventrouting_default.go:68 +0x34d
2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).ReadLogsBuffer(0xc000400150, 0xa86be0, 0xc0004e0300)
2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:119 +0x189
2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] created by git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).Start
2018-09-28T09:21:33.919-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:77 +0xb4
2018-09-28T09:21:33.930-04:00 [APP/PROC/WEB/0] [OUT] Exit status 2
2018-09-28T09:21:33.933-04:00 [CELL/SSHD/0] [OUT] Exit status 0
2018-09-28T09:21:33.951-04:00 [API/1] [OUT] Process has crashed with type: "web"
2018-09-28T09:21:33.965-04:00 [API/1] [OUT] App instance exited with guid 3b8e29d9-b161-4c14-b9fc-21db53e63e25 payload: {"instance"=>"66ce6fbf-705b-44ff-43d3-54c2", "index"=>0, "reason"=>"CRASHED", "exit_description"=>"APP/PROC/WEB: Exited with status 2", "crash_count"=>1, "crash_timestamp"=>1538140893928354200, "version"=>"dd24d491-efea-442c-9563-4b5f48567d03"}
2018-09-28T09:21:33.972-04:00 [CELL/0] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 stopping instance 66ce6fbf-705b-44ff-43d3-54c2
2018-09-28T09:21:33.972-04:00 [CELL/0] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 destroying container for instance 66ce6fbf-705b-44ff-43d3-54c2
2018-09-28T09:21:34.005-04:00 [CELL/0] [OUT] Cell 8e12ecd0-fb5c-4cde-acda-a3df0326c641 creating container for instance 36edab2d-ac3b-4660-7dd4-3d26
2018-09-28T09:21:34.231-04:00 [CELL/0] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 successfully destroyed container for instance 66ce6fbf-705b-44ff-43d3-54c2
2018-09-28T09:21:34.384-04:00 [CELL/0] [OUT] Cell 8e12ecd0-fb5c-4cde-acda-a3df0326c641 successfully created container for instance 36edab2d-ac3b-4660-7dd4-3d26
2018-09-28T09:21:35.353-04:00 [APP/PROC/WEB/0] [OUT] [2018-09-28 13:21:35.351654476 +0000 UTC m=+0.005510501] Starting firehose-to-syslog 0.0.0
2018-09-28T09:21:35.381-04:00 [APP/PROC/WEB/0] [OUT] wss://doppler.sys.use.sandbox.pcf.manulife.com:443

@filidav
Copy link
Author

filidav commented Sep 28, 2018

What else do you require?

@filidav
Copy link
Author

filidav commented Sep 28, 2018

changed manifest to debug : true

Not much more detail

2018-09-28T12:06:33.72-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"p-mysql-8f4fb8de90f6a3757e58","event_type":"ValueMetric","ip":"","job":"monitoring","job_index":"c92ca7a8-a416-445d-9af2-a51f72ea816b","level":"info","msg":"","origin":"bosh-system-metrics-forwarder","time":"2018-09-28T16:06:33Z","unit":"Percent","valMetric.GetName":0}
2018-09-28T12:06:33.72-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"p-mysql-8f4fb8de90f6a3757e58","event_type":"ValueMetric","ip":"","job":"monitoring","job_index":"c92ca7a8-a416-445d-9af2-a51f72ea816b","level":"info","msg":"","origin":"bosh-system-metrics-forwarder","time":"2018-09-28T16:06:33Z","unit":"Load","valMetric.GetName":0.3}
2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":-1}
2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":1263625}
2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":-1}
2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.30","job":"router","job_index":"f5c4821a-a54c-47d1-b548-ec33d47d3491","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ns","valMetric.GetName":3100}
2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.74","job":"router","job_index":"c2d0977d-7eb6-46ca-a37a-ba01b9f150bf","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"seconds","valMetric.GetName":129496}
2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":0}
2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":1263626}
2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":1261013}
2018-09-28T12:06:33.74-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.72","job":"router","job_index":"dcea8fb8-5b51-4510-9873-fd1551c8d912","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ns","valMetric.GetName":3400}
^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR panic: runtime error: invalid memory address or nil pointer dereference^[[0m
^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78dae4]^[[0m
^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR goroutine 67 [running]:^[[0m
^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(*CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc00010e600, 0x9e481d, 0x4, 0xc0004e6120, 0x24, 0x0, 0x0, 0x0)^[[0m
^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:148 +0x84^[[0m
^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(*CacheLazyFill).getEntity(0xc00010e600, 0x9e481d, 0x4, 0xc0004e60c0, 0x24, 0xc00030bcf0, 0xc00030bcf0, 0x4c3540)^[[0m
^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:124 +0x2b8^[[0m
^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(*CacheLazyFill).GetApp(0xc00010e600, 0xc0004e60c0, 0x24, 0x1, 0x1, 0xc0004e60c0)^[[0m
^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:245 +0x8a^[[0m
^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/events.(*Event).AnnotateWithAppData(0xc000242c90, 0xa82e80, 0xc00010e600)^[[0m
^[[31m2018-09-28T12:06:33.78-0400 [APP/PROC/WEB/1] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/events/events.go:129 +0x129^[[0m
2018-09-28T12:06:33.82-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.29","job":"cloud_controller","job_index":"99244cb7-e921-44e7-b6be-e67b9cc0782b","level":"info","msg":"","origin":"cc","time":"2018-09-28T16:06:33Z","unit":"counter","valMetric.GetName":0}
2018-09-28T12:06:33.82-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.73","job":"router","job_index":"79a95419-2bf9-4f69-be41-da1da70ffcc3","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ms","valMetric.GetName":23}
2018-09-28T12:06:33.82-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.73","job":"router","job_index":"79a95419-2bf9-4f69-be41-da1da70ffcc3","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ms","valMetric.GetName":39}
2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.74","job":"router","job_index":"c2d0977d-7eb6-46ca-a37a-ba01b9f150bf","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ms","valMetric.GetName":19}
2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.73","job":"router","job_index":"79a95419-2bf9-4f69-be41-da1da70ffcc3","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ns","valMetric.GetName":4300}
2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.54","job":"diego_cell","job_index":"f9db5322-ba4d-407f-8c66-b99f143a3b19","level":"info","msg":"","origin":"netmon","time":"2018-09-28T16:06:33Z","unit":"count","valMetric.GetName":1368552}
2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.54","job":"diego_cell","job_index":"f9db5322-ba4d-407f-8c66-b99f143a3b19","level":"info","msg":"","origin":"netmon","time":"2018-09-28T16:06:33Z","unit":"count","valMetric.GetName":4}
2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.54","job":"diego_cell","job_index":"f9db5322-ba4d-407f-8c66-b99f143a3b19","level":"info","msg":"","origin":"netmon","time":"2018-09-28T16:06:33Z","unit":"count","valMetric.GetName":3601817}
2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.73","job":"router","job_index":"79a95419-2bf9-4f69-be41-da1da70ffcc3","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ms","valMetric.GetName":23}
2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.73","job":"router","job_index":"79a95419-2bf9-4f69-be41-da1da70ffcc3","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ms","valMetric.GetName":39}
2018-09-28T12:06:33.89-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.30","job":"router","job_index":"f5c4821a-a54c-47d1-b548-ec33d47d3491","level":"info","msg":"","origin":"gorouter","time":"2018-09-28T16:06:33Z","unit":"ms","valMetric.GetName":147}

@filidav
Copy link
Author

filidav commented Sep 28, 2018

I thought with this branch you put in a fix that would avoid this issue. Seem like the issue is still present.

@filidav
Copy link
Author

filidav commented Sep 28, 2018

I need to merge #205 and then test again.

@shinji62
Copy link
Contributor

shinji62 commented Oct 1, 2018

@filidav Please let me know the status of your test.

You will need to merge #205 to get some error message. From that we will be able to understand a bit better what is going on.

@filidav
Copy link
Author

filidav commented Oct 1, 2018

I merged pull #205

018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] panic: runtime error: invalid memory address or nil pointer dereference
2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78dae4]
2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] goroutine 28 [running]:
2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(*CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc000150300, 0x9e481d, 0x4, 0xc0000c6090, 0x24, 0x0, 0x0, 0x0)
2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:145 +0x84
2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(*CacheLazyFill).getEntity(0xc000150300, 0x9e481d, 0x4, 0xc0000c6030, 0x24, 0xc000605d28, 0xc000605d28, 0x4c3540)
2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:121 +0x2b8
2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(*CacheLazyFill).GetApp(0xc000150300, 0xc0000c6030, 0x24, 0x1, 0x1, 0xc0000c6030)
2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:245 +0x8a
2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/events.(*Event).AnnotateWithAppData(0xc0002722a0, 0xa82ec0, 0xc000150300)
2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/events/events.go:129 +0x129
2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting.(*EventRoutingDefault).RouteEvent(0xc000294eb0, 0xc0001127e0)
2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting/eventrouting_default.go:68 +0x14e
2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).ReadLogsBuffer(0xc00023c0e0, 0xa86c20, 0xc0002f61c0)
2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:119 +0x189
2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] created by git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).Start
2018-10-01T11:46:34.204-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:77 +0xb4
2018-10-01T11:46:34.217-04:00 [APP/PROC/WEB/1] [OUT] Exit status 2

@filidav
Copy link
Author

filidav commented Oct 1, 2018

Does it help ^^ ??

@shinji62
Copy link
Contributor

shinji62 commented Oct 1, 2018

Not really you don't have some logs before this one ?

@filidav
Copy link
Author

filidav commented Oct 2, 2018 via email

@filidav
Copy link
Author

filidav commented Oct 2, 2018

Nothing additional you can use.

Just the connection before and the destroy/recreate of the container

2018-10-01T20:43:27.221-04:00 [API/1] [OUT] Starting app with guid 4cc5493a-8559-4d28-a489-cf6d972d348f
2018-10-01T20:43:27.393-04:00 [CELL/0] [OUT] Cell 04c9251e-4555-4f96-8553-d66d852316ff creating container for instance 737c769b-7519-41dd-531e-1111
2018-10-01T20:43:27.410-04:00 [CELL/1] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 creating container for instance 1b74d5f1-a4ed-4d50-452c-2d4c
2018-10-01T20:43:27.722-04:00 [CELL/0] [OUT] Cell 04c9251e-4555-4f96-8553-d66d852316ff successfully created container for instance 737c769b-7519-41dd-531e-1111
2018-10-01T20:43:28.094-04:00 [CELL/1] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 successfully created container for instance 1b74d5f1-a4ed-4d50-452c-2d4c
2018-10-01T20:43:28.517-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 00:43:28.516795289 +0000 UTC m=+0.006308301] Starting firehose-to-syslog 0.0.0
2018-10-01T20:43:28.544-04:00 [APP/PROC/WEB/0] [OUT] wss://doppler.sys.use.sandbox.pcf.manulife.com:443
2018-10-01T20:43:28.544-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 00:43:28.544562889 +0000 UTC m=+0.034074801] Using wss://doppler.sys.use.sandbox.pcf.manulife.com:443 as doppler endpoint
2018-10-01T20:43:28.544-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 00:43:28.544708989 +0000 UTC m=+0.034221101] Pre-filling cache...
2018-10-01T20:43:29.190-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 00:43:29.187993954 +0000 UTC m=+0.008754101] Starting firehose-to-syslog 0.0.0
2018-10-01T20:43:29.244-04:00 [APP/PROC/WEB/1] [OUT] wss://doppler.sys.use.sandbox.pcf.manulife.com:443
2018-10-01T20:43:29.244-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 00:43:29.243482054 +0000 UTC m=+0.064241201] Using wss://doppler.sys.use.sandbox.pcf.manulife.com:443 as doppler endpoint
2018-10-01T20:43:29.244-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 00:43:29.243571554 +0000 UTC m=+0.064330801] Pre-filling cache...
2018-10-01T20:43:32.903-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 00:43:32.903195291 +0000 UTC m=+4.392707003] Cache filled.
2018-10-01T20:43:32.984-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 00:43:32.984479391 +0000 UTC m=+4.473991503] Connected to Syslog Server! Connecting to Firehose...
2018-10-01T20:43:33.576-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 00:43:33.575924054 +0000 UTC m=+4.396682601] Cache filled.
2018-10-01T20:43:33.665-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 00:43:33.665286754 +0000 UTC m=+4.486046001] Connected to Syslog Server! Connecting to Firehose...
2018-10-01T21:56:40.135-04:00 [APP/PROC/WEB/1] [ERR] panic: runtime error: invalid memory address or nil pointer dereference
2018-10-01T21:56:40.135-04:00 [APP/PROC/WEB/1] [ERR] [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78dae4]
2018-10-01T21:56:40.136-04:00 [APP/PROC/WEB/1] [ERR] goroutine 41 [running]:
2018-10-01T21:56:40.137-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(*CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc000101b30, 0x9e481d, 0x4, 0xc000254390, 0x24, 0x0, 0x0, 0x0)
2018-10-01T21:56:40.137-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:145 +0x84
2018-10-01T21:56:40.138-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(*CacheLazyFill).getEntity(0xc000101b30, 0x9e481d, 0x4, 0xc0002541e0, 0x24, 0xc00034dd28, 0xc00034dd28, 0x4c3540)
2018-10-01T21:56:40.138-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:121 +0x2b8
2018-10-01T21:56:40.139-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(*CacheLazyFill).GetApp(0xc000101b30, 0xc0002541e0, 0x24, 0x1, 0x1, 0xc0002541e0)
2018-10-01T21:56:40.139-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:245 +0x8a
2018-10-01T21:56:40.140-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/events.(*Event).AnnotateWithAppData(0xc0001439b0, 0xa82ec0, 0xc000101b30)
2018-10-01T21:56:40.140-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/events/events.go:129 +0x129
2018-10-01T21:56:40.140-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting.(*EventRoutingDefault).RouteEvent(0xc00018c140, 0xc00029f320)
2018-10-01T21:56:40.141-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting/eventrouting_default.go:68 +0x14e
2018-10-01T21:56:40.141-04:00 [APP/PROC/WEB/1] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).ReadLogsBuffer(0xc0000982a0, 0xa86c20, 0xc00020a0c0)
2018-10-01T21:56:40.141-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:119 +0x189
2018-10-01T21:56:40.141-04:00 [APP/PROC/WEB/1] [ERR] created by git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).Start
2018-10-01T21:56:40.142-04:00 [APP/PROC/WEB/1] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:77 +0xb4
2018-10-01T21:56:40.165-04:00 [APP/PROC/WEB/1] [OUT] Exit status 2
2018-10-01T21:56:40.167-04:00 [CELL/SSHD/1] [OUT] Exit status 0
2018-10-01T21:56:40.201-04:00 [API/0] [OUT] Process has crashed with type: "web"
2018-10-01T21:56:40.205-04:00 [CELL/1] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 stopping instance 1b74d5f1-a4ed-4d50-452c-2d4c
2018-10-01T21:56:40.205-04:00 [CELL/1] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 destroying container for instance 1b74d5f1-a4ed-4d50-452c-2d4c
2018-10-01T21:56:40.215-04:00 [API/0] [OUT] App instance exited with guid 4cc5493a-8559-4d28-a489-cf6d972d348f payload: {"instance"=>"1b74d5f1-a4ed-4d50-452c-2d4c", "index"=>1, "reason"=>"CRASHED", "exit_description"=>"APP/PROC/WEB: Exited with status 2", "crash_count"=>1, "crash_timestamp"=>1538445400183506084, "version"=>"32f2bc3c-2fc2-4b7f-a5d4-76e9de73d352"}
2018-10-01T21:56:40.248-04:00 [CELL/1] [OUT] Cell 8e12ecd0-fb5c-4cde-acda-a3df0326c641 creating container for instance 885b4b0c-2fcd-40f0-533f-bba0
2018-10-01T21:56:40.554-04:00 [CELL/1] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 successfully destroyed container for instance 1b74d5f1-a4ed-4d50-452c-2d4c
2018-10-01T21:56:40.609-04:00 [CELL/1] [OUT] Cell 8e12ecd0-fb5c-4cde-acda-a3df0326c641 successfully created container for instance 885b4b0c-2fcd-40f0-533f-bba0
2018-10-01T21:56:41.568-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:41.568107473 +0000 UTC m=+0.003780401] Starting firehose-to-syslog 0.0.0
2018-10-01T21:56:41.597-04:00 [APP/PROC/WEB/1] [OUT] wss://doppler.sys.use.sandbox.pcf.manulife.com:443
2018-10-01T21:56:41.597-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:41.596951573 +0000 UTC m=+0.032622901] Using wss://doppler.sys.use.sandbox.pcf.manulife.com:443 as doppler endpoint
2018-10-01T21:56:41.597-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:41.597034973 +0000 UTC m=+0.032707001] Pre-filling cache...
2018-10-01T21:56:45.952-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:45.952673773 +0000 UTC m=+4.388345001] Cache filled.
2018-10-01T21:56:46.033-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:46.033287973 +0000 UTC m=+4.468959701] Connected to Syslog Server! Connecting to Firehose...

@filidav
Copy link
Author

filidav commented Oct 2, 2018

I thought I would be able to see a URL that I could curl that is emitting the nil pointer.
I don't see any URL

@filidav
Copy link
Author

filidav commented Oct 2, 2018

failed again

2018-10-01T21:56:40.248-04:00 [CELL/1] [OUT] Cell 8e12ecd0-fb5c-4cde-acda-a3df0326c641 creating container for instance 885b4b0c-2fcd-40f0-533f-bba0
2018-10-01T21:56:40.554-04:00 [CELL/1] [OUT] Cell db1be119-8594-481d-92bc-59eb0dc30522 successfully destroyed container for instance 1b74d5f1-a4ed-4d50-452c-2d4c
2018-10-01T21:56:40.609-04:00 [CELL/1] [OUT] Cell 8e12ecd0-fb5c-4cde-acda-a3df0326c641 successfully created container for instance 885b4b0c-2fcd-40f0-533f-bba0
2018-10-01T21:56:41.568-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:41.568107473 +0000 UTC m=+0.003780401] Starting firehose-to-syslog 0.0.0
2018-10-01T21:56:41.597-04:00 [APP/PROC/WEB/1] [OUT] wss://doppler.sys.use.sandbox.pcf.manulife.com:443
2018-10-01T21:56:41.597-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:41.596951573 +0000 UTC m=+0.032622901] Using wss://doppler.sys.use.sandbox.pcf.manulife.com:443 as doppler endpoint
2018-10-01T21:56:41.597-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:41.597034973 +0000 UTC m=+0.032707001] Pre-filling cache...
2018-10-01T21:56:45.952-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:45.952673773 +0000 UTC m=+4.388345001] Cache filled.
2018-10-01T21:56:46.033-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-02 01:56:46.033287973 +0000 UTC m=+4.468959701] Connected to Syslog Server! Connecting to Firehose...
2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] panic: runtime error: invalid memory address or nil pointer dereference
2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78dae4]
2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] goroutine 40 [running]:
2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(*CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc000128480, 0x9e481d, 0x4, 0xc000267500, 0x24, 0x0, 0x0, 0x0)
2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:145 +0x84
2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(*CacheLazyFill).getEntity(0xc000128480, 0x9e481d, 0x4, 0xc0002674a0, 0x24, 0xc000233d28, 0xc000233d28, 0x4c3540)
2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:121 +0x2b8
2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(*CacheLazyFill).GetApp(0xc000128480, 0xc0002674a0, 0x24, 0x1, 0x1, 0xc0002674a0)
2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:245 +0x8a
2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/events.(*Event).AnnotateWithAppData(0xc00047be00, 0xa82ec0, 0xc000128480)
2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/events/events.go:129 +0x129
2018-10-01T22:16:33.818-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting.(*EventRoutingDefault).RouteEvent(0xc00008ac80, 0xc0000f0c60)
2018-10-01T22:16:33.819-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting/eventrouting_default.go:68 +0x14e
2018-10-01T22:16:33.819-04:00 [APP/PROC/WEB/0] [ERR] git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).ReadLogsBuffer(0xc0001f3b90, 0xa86c20, 0xc00014db00)
2018-10-01T22:16:33.819-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:119 +0x189
2018-10-01T22:16:33.819-04:00 [APP/PROC/WEB/0] [ERR] created by git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).Start
2018-10-01T22:16:33.819-04:00 [APP/PROC/WEB/0] [ERR] /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:77 +0xb4
2018-10-01T22:16:33.831-04:00 [APP/PROC/WEB/0] [OUT] Exit status 2
2018-10-01T22:16:33.834-04:00 [CELL/SSHD/0] [OUT] Exit status 0
2018-10-01T22:16:33.855-04:00 [API/1] [OUT] Process has crashed with type: "web"
2018-10-01T22:16:33.871-04:00 [API/1] [OUT] App instance exited with guid 4cc5493a-8559-4d28-a489-cf6d972d348f payload: {"instance"=>"737c769b-7519-41dd-531e-1111", "index"=>0, "reason"=>"CRASHED", "exit_description"=>"APP/PROC/WEB: Exited with status 2", "crash_count"=>1, "crash_timestamp"=>1538446593828050115, "version"=>"32f2bc3c-2fc2-4b7f-a5d4-76e9de73d352"}
2018-10-01T22:16:33.872-04:00 [CELL/0] [OUT] Cell 04c9251e-4555-4f96-8553-d66d852316ff stopping instance 737c769b-7519-41dd-531e-1111
2018-10-01T22:16:33.872-04:00 [CELL/0] [OUT] Cell 04c9251e-4555-4f96-8553-d66d852316ff destroying container for instance 737c769b-7519-41dd-531e-1111
2018-10-01T22:16:33.900-04:00 [CELL/0] [OUT] Cell f8c6141c-d989-4040-b27c-e891461b69e4 creating container for instance da5b65f7-9dc5-4900-51fd-841c
2018-10-01T22:16:34.219-04:00 [CELL/0] [OUT] Cell 04c9251e-4555-4f96-8553-d66d852316ff successfully destroyed container for instance 737c769b-7519-41dd-531e-1111
2018-10-01T22:16:34.369-04:00 [CELL/0] [OUT] Cell f8c6141c-d989-4040-b27c-e891461b69e4 successfully created container for instance da5b65f7-9dc5-4900-51fd-841c
2018-10-01T22:16:35.489-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 02:16:35.488065621 +0000 UTC m=+0.006920801] Starting firehose-to-syslog 0.0.0
2018-10-01T22:16:35.534-04:00 [APP/PROC/WEB/0] [OUT] wss://doppler.sys.use.sandbox.pcf.manulife.com:443
2018-10-01T22:16:35.534-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 02:16:35.534180321 +0000 UTC m=+0.053035001] Using wss://doppler.sys.use.sandbox.pcf.manulife.com:443 as doppler endpoint
2018-10-01T22:16:35.534-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 02:16:35.534245521 +0000 UTC m=+0.053100301] Pre-filling cache...
2018-10-01T22:16:39.800-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-02 02:16:39.800264524 +0000 UTC m=+4.319119004] Cache filled.

@shinji62
Copy link
Contributor

shinji62 commented Oct 2, 2018

Can you try with debug mode enabled ?

@filidav
Copy link
Author

filidav commented Oct 2, 2018

With debug

"job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","requests.outstanding":3,"time":"2018-10-02T03:26:40Z","unit":"counter"}
2018-10-01T23:26:40.14-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf-47abe4dc6b20d063f500","event_type":"ValueMetric","health.check.cliCommand.start":1,"ip":"","job":"healthwatch-forwarder","job_index":"09bc9061-de37-46f1-bdb9-7409a2e9c6b1","level":"info","msg":"","origin":"healthwatch","time":"2018-10-02T03:26:40Z","unit":"Metric"}
2018-10-01T23:26:40.14-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf-47abe4dc6b20d063f500","event_type":"ValueMetric","health.check.cliCommand.logs":1,"ip":"","job":"healthwatch-forwarder","job_index":"09bc9061-de37-46f1-bdb9-7409a2e9c6b1","level":"info","msg":"","origin":"healthwatch","time":"2018-10-02T03:26:40Z","unit":"Metric"}
2018-10-01T23:26:40.14-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf-47abe4dc6b20d063f500","event_type":"ValueMetric","health.check.cliCommand.stop":1,"ip":"","job":"healthwatch-forwarder","job_index":"09bc9061-de37-46f1-bdb9-7409a2e9c6b1","level":"info","msg":"","origin":"healthwatch","time":"2018-10-02T03:26:40Z","unit":"Metric"}
2018-10-01T23:26:40.14-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.74","job":"router","job_index":"c2d0977d-7eb6-46ca-a37a-ba01b9f150bf","latency":13,"level":"info","msg":"","origin":"gorouter","time":"2018-10-02T03:26:40Z","unit":"ms"}
2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.73","job":"router","job_index":"79a95419-2bf9-4f69-be41-da1da70ffcc3","level":"info","msg":"","origin":"gorouter","time":"2018-10-02T03:26:40Z","unit":"seconds","uptime":35804}
2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.37","job":"diego_cell","job_index":"04c9251e-4555-4f96-8553-d66d852316ff","level":"info","msg":"","numGoRoutines":3,"origin":"grootfs","time":"2018-10-02T03:26:40Z","unit":"count"}
2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.37","job":"diego_cell","job_index":"04c9251e-4555-4f96-8553-d66d852316ff","level":"info","msg":"","numCPUS":4,"origin":"grootfs","time":"2018-10-02T03:26:40Z","unit":"count"}
2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.37","job":"diego_cell","job_index":"04c9251e-4555-4f96-8553-d66d852316ff","level":"info","memoryStats.numFrees":1481,"msg":"","origin":"grootfs","time":"2018-10-02T03:26:40Z","unit":"count"}
2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.37","job":"diego_cell","job_index":"04c9251e-4555-4f96-8553-d66d852316ff","level":"info","memoryStats.numBytesAllocatedStack":589824,"msg":"","origin":"grootfs","time":"2018-10-02T03:26:40Z","unit":"count"}
2018-10-01T23:26:39.77-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.72","job":"router","job_index":"dcea8fb8-5b51-4510-9873-fd1551c8d912","level":"info","msg":"","origin":"gorouter","route_lookup_time":3200,"time":"2018-10-02T03:26:39Z","unit":"ns"}
2018-10-01T23:26:39.77-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.74","job":"router","job_index":"c2d0977d-7eb6-46ca-a37a-ba01b9f150bf","latency":41,"level":"info","msg":"","origin":"gorouter","time":"2018-10-02T03:26:39Z","unit":"ms"}
2018-10-01T23:26:39.77-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.74","job":"router","job_index":"c2d0977d-7eb6-46ca-a37a-ba01b9f150bf","latency.CloudController":41,"level":"info","msg":"","origin":"gorouter","time":"2018-10-02T03:26:39Z","unit":"ms"}
2018-10-01T23:26:39.77-0400 [APP/PROC/WEB/0] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.40","job":"windows_diego_cell","job_index":"7b377737-6019-4661-b95f-829aeed7381b","level":"info","memoryStats.lastGCPauseTimeNS":0,"msg":"","origin":"rep","time":"2018-10-02T03:26:39Z","unit":"ns"}
^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR panic: runtime error: invalid memory address or nil pointer dereference^[[0m
^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x78dae4]^[[0m
^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR goroutine 45 [running]:^[[0m
^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(*CacheLazyFill).normaliseAndSaveEntityToDatabase(0xc0001323f0, 0x9e481d, 0x4, 0xc00036ec90, 0x24, 0x0, 0x0, 0x0)^[[0m
^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:145 +0x84^[[0m
^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(*CacheLazyFill).getEntity(0xc0001323f0, 0x9e481d, 0x4, 0xc00036ec30, 0x24, 0xc0003abd28, 0xc0003abd28, 0x4c3540)^[[0m
^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:121 +0x2b8^[[0m
^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/caching.(*CacheLazyFill).GetApp(0xc0001323f0, 0xc00036ec30, 0x24, 0x1, 0x1, 0xc00036ec30)^[[0m
^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/caching/caching_lazyfill.go:245 +0x8a^[[0m
^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/events.(*Event).AnnotateWithAppData(0xc00050e390, 0xa82ec0, 0xc0001323f0)^[[0m
^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/events/events.go:129 +0x129^[[0m
^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting.(*EventRoutingDefault).RouteEvent(0xc00027adc0, 0xc000514bd0)^[[0m
^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/eventRouting/eventrouting_default.go:68 +0x14e^[[0m
^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).ReadLogsBuffer(0xc0003d40e0, 0xa86c20, 0xc0002991c0)^[[0m
^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:119 +0x189^[[0m
^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR created by git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient.(*FirehoseNozzle).Start^[[0m
^[[31m2018-10-01T23:26:39.80-0400 [APP/PROC/WEB/0] ERR /tmp/build/63c5f882/src/git.platform.manulife.io/platformreliability/firehose-to-syslog/firehoseclient/firehoseclient.go:77 +0xb4^[[0m
2018-10-01T23:26:39.81-0400 [APP/PROC/WEB/0] OUT Exit status 2
2018-10-01T23:26:39.82-0400 [CELL/SSHD/0] OUT Exit status 0
2018-10-01T23:26:39.86-0400 [CELL/0] OUT Cell 04c9251e-4555-4f96-8553-d66d852316ff stopping instance 234b126c-dce9-4754-69b6-ae62
2018-10-01T23:26:39.86-0400 [CELL/0] OUT Cell 04c9251e-4555-4f96-8553-d66d852316ff destroying container for instance 234b126c-dce9-4754-69b6-ae62
2018-10-01T23:26:40.15-0400 [CELL/0] OUT Cell 04c9251e-4555-4f96-8553-d66d852316ff successfully destroyed container for instance 234b126c-dce9-4754-69b6-ae62
2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.37","job":"diego_cell","job_index":"04c9251e-4555-4f96-8553-d66d852316ff","level":"info","memoryStats.lastGCPauseTimeNS":0,"msg":"","origin":"grootfs","time":"2018-10-02T03:26:40Z","unit":"count"}
2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","ImageDeletionTime":21269100,"cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.37","job":"diego_cell","job_index":"04c9251e-4555-4f96-8553-d66d852316ff","level":"info","msg":"","origin":"grootfs","time":"2018-10-02T03:26:40Z","unit":"nanos"}
2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.34","job":"diego_database","job_index":"6babf9d0-2623-40ae-943a-50fee1eca036","level":"info","msg":"","origin":"policy-server","time":"2018-10-02T03:26:40Z","totalPolicies":0,"unit":""}
2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","requests.outstanding":2,"time":"2018-10-02T03:26:40Z","unit":"counter"}
2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","requests.completed":2176833,"time":"2018-10-02T03:26:40Z","unit":"counter"}
2018-10-01T23:26:40.19-0400 [APP/PROC/WEB/1] OUT {"ENV":"pcf_sandbox","cf_origin":"firehose","deployment":"cf","event_type":"ValueMetric","http_status.4XX":3708,"ip":"10.234.250.89","job":"cloud_controller","job_index":"16bad1d6-4d25-40b8-8116-32809fb17d70","level":"info","msg":"","origin":"cc","time":"2018-10-02T03:26:40Z","unit":"counter"}
@ @

@filidav
Copy link
Author

filidav commented Oct 2, 2018

Any other suggestions?

@aeijdenberg
Copy link
Contributor

@filidav - sorry for the delayed response, have been at an offsite without laptop earlier this week. Looking at the new stack trace, it seems to be hitting a slightly different code path. I've added a second commit to PR #205 to catch this and log an appropriate message.

Would be possible to rebuild with this patch and then look for log messages containing nil entity returned?

@filidav
Copy link
Author

filidav commented Oct 4, 2018

Added your changes

2018-10-03T20:26:06.820-04:00 [APP/PROC/WEB/1] [OUT] [2018-10-04 00:26:06.819826368 +0000 UTC m=+4.128743401] Connected to Syslog Server! Connecting to Firehose...
2018-10-03T20:26:07.648-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-04 00:26:07.648295102 +0000 UTC m=+3.917315101] Cache filled.
2018-10-03T20:26:07.729-04:00 [APP/PROC/WEB/0] [OUT] [2018-10-04 00:26:07.729628602 +0000 UTC m=+3.998649301] Connected to Syslog Server! Connecting to Firehose...
2018-10-03T22:43:39.978-04:00 [APP/PROC/WEB/1] [ERR] [2018-10-04 02:43:39.978537972 +0000 UTC m=+8257.287454505] Exception occurred! Message: Error while reading from the Firehose: Details: websocket: close 1006 (abnormal closure): unexpected EOF
2018-10-03T22:43:39.979-04:00 [APP/PROC/WEB/1] [ERR] [2018-10-04 02:43:39.979025572 +0000 UTC m=+8257.287942105] Exception occurred! Message: Error while reading from the Firehose: Details: websocket: close 1006 (abnormal closure): unexpected EOF

@aeijdenberg
Copy link
Contributor

@filidav - I think it's unlikely the change would have introduced that particular error. Are there any instances of nil entity returned present in the logs?

(if I'm reading the above correctly, is it mixing messages from 2 different processes, WEB/0 and WEB/1 ?)

@filidav
Copy link
Author

filidav commented Oct 4, 2018

No other errors
I noticed someone else reported the same error I just did as well.

#180

@filidav
Copy link
Author

filidav commented Oct 10, 2018

Any other suggestions on how we can trap for this condition
Ever since we installed this fix I don't see the nil pointer. Are we now ignoring it?

@filidav
Copy link
Author

filidav commented Oct 10, 2018

Why would the API ever return a GUID for an app, but no entity record which would cause this panic error?
Anyway we can get closer to this issue and identify how this is happening?

@shinji62
Copy link
Contributor

Why would the API ever return a GUID for an app, but no entity record which would cause this panic error? ==> We don't know, that the all problem here.

Ever since we installed this fix I don't see the nil pointer. Are we now ignoring it? ==> We don't ignore it, we trap the error and ignore the record not the error.

@filidav
Copy link
Author

filidav commented Oct 10, 2018 via email

@shinji62
Copy link
Contributor

btw, @aeijdenberg add more change to #205 did you include all of them ?

@filidav
Copy link
Author

filidav commented Oct 10, 2018

Yes all those changes are in.

@aeijdenberg
Copy link
Contributor

The changes in #205 should stop the panics, but I would have expected that something would be catching and logging the error that is now returned. I'm surprised there isn't, but then again, I was surprised that the underlying API would ever have returned that type of data to begin with, so perhaps it was a transient error and has resolved itself?

Either way, I think #205 should be safe to merge.

@elliottpolk
Copy link
Contributor

Just a side note, it is still technically possible to hit the nil pointer issue when calling normaliseAndSaveEntityToDatabase. I would suggest checking for nv == nil and throwing an error to ensure all bases are covered.

@aeijdenberg
Copy link
Contributor

@elliottpolk - you're correct that if you pass nil to that method, there will be a panic. However a quick glance over the code shows this is true of nearly every single function / method definition in this and other Go packages. I think there's a fairly well known convention that one shouldn't pass nil as an argument to a function / method unless the documentation for that function explicitly calls out that it's OK to do so. I don't see a compelling reason to special case this particular method.

@jgeorge300
Copy link

Has this been resolved? I've been seeing this issue (or something very similar) in my PCF environments. Firehose has been crashing on a regular basis. Occasionally, when PCF restarts firehose it fails to start processing the logs again. We've had to set a schedule to automatically restart firehose every couple hours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants