Skip to content
This repository has been archived by the owner on May 21, 2020. It is now read-only.

"Reload Metrics Library and Update" doesn't finish update? #70

Closed
PlexPro opened this issue Jan 5, 2017 · 18 comments
Closed

"Reload Metrics Library and Update" doesn't finish update? #70

PlexPro opened this issue Jan 5, 2017 · 18 comments
Assignees

Comments

@PlexPro
Copy link

PlexPro commented Jan 5, 2017

I initially successfully executed "Reload Metrics Library App" and then "Update Metrics in Apps"; both completed, although the latter took fifteen or twenty minutes.

Later, I added another dimension definition to my source table, and this time I decided to use the "Reload Metrics Library and Update" button. The "reload" step successfully completed; however, after perhaps an hour, the "update" step still has not indicated completion. Did it fail to run? Did it abend? Is it still correctly executing? Is it in an infinite loop? Is there any way that I can tell? Any suggestions?

Thanks. Below is the output of the status window:

Status output:

Testing connection...
...connection test successful.

Starting metrics library reload...
...Task Completed in 4698 milliseconds with message Changing task state from Started to FinishedSuccess

Updating metrics in all the apps...
...All apps processed.

Reloading metrics library and update apps started...
...reloading app was successful, now updating the master library...

@PlexPro
Copy link
Author

PlexPro commented Jan 5, 2017

I hope when the server reboots at midnight it won't cause any corruption in Qlik apps being updated by GMS...

@PlexPro
Copy link
Author

PlexPro commented Jan 5, 2017

This may or may not be helpful. I am providing it in case it be relevant.

About the time that the GMS app update may have abended, this event appeared in the Windows Application event log. Event ID 300 is common with Qlik; however, it is usually Level "Information", not "Error". The core of the matter is this:

RestClient: HTTP request for url /qrs/app/24baa5f1-206d-4c7f-973d-792463e56d9f/bulk?xrfkey=ichBinEinBerlinR failed with error code 12002 (source: 1)

The Qlik app referenced in this URL is the app that I was reviewing when I decided to add another dimension and reload everything. I did go back into it, possibly while the GMS app update was executing, though I don't think that I made any changes.

A second error had this associated information:

CDocSerializer: Flush caught exception -1

Here are the details of the two errors that were recorded.

Log Name: Application
Source: Engine
Date: 1/5/2017 1:47:23 PM
Event ID: 300
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: VM-Win-Analytics.polywoodinc.local
Description:
The description for Event ID 300 from source Engine cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

RestClient: HTTP request for url /qrs/app/24baa5f1-206d-4c7f-973d-792463e56d9f/bulk?xrfkey=ichBinEinBerlinR failed with error code 12002 (source: 1)

Log Name: Application
Source: Engine
Date: 1/5/2017 1:47:23 PM
Event ID: 300
Task Category: None
Level: Error
Keywords: Classic
User: N/A
Computer: VM-Win-Analytics.polywoodinc.local
Description:
The description for Event ID 300 from source Engine cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

CDocSerializer: Flush caught exception -1

@goldbergjeffrey
Copy link
Collaborator

GMS generates a log in %programfiles%\qlik\sense\eapowertools\governedmetricsservice\logs. you may want to have a look at that to see what caused the hold up.

@PlexPro
Copy link
Author

PlexPro commented Jan 5, 2017

Okay, I decided to kick off the "Update Metrics in Apps" function and watch. (This was before your comment about logs, I got interrupted here.) It immediately reproduced the two errors recorded above for "Analytics R&D --dev", and another pair for a different app, "Sales - Chicago 2016 September --dev".

I found the log that you mention. A lot of it appears to be informational, but I am seeing some "operation timed out" errors:

{"module":"checkRepo","app":"Sales - Chicago 2016 September --dev","level":"debug","message":"Number of AppObjects owned by sa_repository inside app 27d1204a-95c0-434b-8354-7ecd2c83d634: {"statusCode":200,"body":{"value":0}}","timestamp":"2017-01-05T20:35:04.060Z"}
{"module":"objectManagement","app":"Sales - Chicago 2016 September --dev","level":"info","message":"Checking repository for the addition of objects: 0","timestamp":"2017-01-05T20:35:04.061Z"}
{"module":"objectManagement","app":"Sales - Chicago 2016 September --dev","level":"info","message":"I timed out. ","timestamp":"2017-01-05T20:35:04.140Z"}
{"module":"updateMetrics","app":"Sales - Chicago 2016 September --dev","level":"debug","message":"[TimeoutError: operation timed out]","timestamp":"2017-01-05T20:35:04.142Z"}
{"module":"objectManagement","app":"Sales - Chicago 2016 September --dev","level":"error","message":"[TimeoutError: operation timed out]","timestamp":"2017-01-05T20:35:04.141Z"}
{"module":"objectManagement","app":"Sales - Chicago 2016 September --dev","level":"info","message":"undefined","timestamp":"2017-01-05T20:35:04.142Z"}
{"module":"objectManagement","app":"Sales - Chicago 2016 September --dev","level":"info","message":"Object Management Complete","timestamp":"2017-01-05T20:35:04.143Z"}

I need to leave shortly, so I'll have to pick this up again tomorrow. Thanks!

@PlexPro
Copy link
Author

PlexPro commented Jan 6, 2017

I am mystified. I checked the connection this morning; just fine. I ran the reload, and it ran fine in about 5.6 seconds, same as several times yesterday. Today's log shows all this. Then I kicked off the app update.

The first of four apps ran yesterday the first time in about 64 seconds / 958 log entries; the update ran today in about 18 seconds / 570 log entries:

{"module":"doWork","method":"updateAll","app":"Analytics R&D --dev","level":"info","message":"Process took 64013 milliseconds","timestamp":"2017-01-05T17:56:41.090Z"}

{"module":"doWork","method":"updateAll","app":"Analytics R&D --dev","level":"info","message":"Process took 17683 milliseconds","timestamp":"2017-01-06T16:55:45.011Z"}

The second, 59 seconds / 904 log entries and 26 seconds / 557 log entries:

{"module":"doWork","method":"updateAll","app":"Sales - Chicago 2016 September --dev","level":"info","message":"Process took 58667 milliseconds","timestamp":"2017-01-05T17:57:39.758Z"}

{"module":"doWork","method":"updateAll","app":"Sales - Chicago 2016 September --dev","level":"info","message":"Process took 25619 milliseconds","timestamp":"2017-01-06T16:56:10.630Z"}

The third, yesterday at 61 seconds / 1010 log entries, and today unfinished at 191 seconds / 2536 log entries when it apparently abended for no obvious reason at the last pair of log entries shown:

{"module":"doWork","method":"updateAll","app":"Sales - All Customers --dev","level":"info","message":"Process took 60912 milliseconds","timestamp":"2017-01-05T17:58:40.670Z"}

{"module":"checkRepo","app":"Sales - All Customers --dev","level":"debug","message":"Number of AppObjects owned by sa_repository inside app 9bb9e833-0705-4ef2-89e3-aa8616fddbaa: {"statusCode":200,"body":{"value":33}}","timestamp":"2017-01-06T16:59:21.651Z"}
{"module":"objectManagement","app":"Sales - All Customers --dev","level":"info","message":"Checking repository for the addition of objects: 33","timestamp":"2017-01-06T16:59:21.652Z"}

Even with the apps that apparently were processed normally, the log entries appear to show about three dozen metrics being created, even though no new ones had been added in the source table. Further, I didn't find these "new" ones in the Qlik app when I searched by the name given, not even in the tags:

{"module":"createDimension","app":"Analytics R&D --dev","level":"debug","message":"Dim: general_128 does not exist. Creating","timestamp":"2017-01-06T16:55:30.033Z"}

@PlexPro
Copy link
Author

PlexPro commented Jan 6, 2017

Another attempt to update the apps results in the first three apps completing with the completion message "Process took nnnn milliseconds".

The fourth one appears to go into an infinite loop, except that it terminates after about four minutes and 3463 lines in the log (instead of 65 seconds and 1004 lines when it completes normally).

If you would like to see any of the log files, or a fresh one, I can do that.

@PlexPro
Copy link
Author

PlexPro commented Jan 16, 2017

I noticed the resolution to Issue # 66: "...to do a calculated dimension in GMS today, make sure to place an equal sign = before your expression."

I made that change to all of my calculated dimensions (2) and measures (68). I have not attempted anything between my last comment (ten days ago) and this action this afternoon.

Then I did this attempted connecting, reloading, and updating:

Status output:

Testing connection...
...connection test successful.

Starting metrics library reload...
...Task Completed in 6496 milliseconds with message Changing task state from Started to FinishedSuccess

Updating metrics in all the apps...
...All apps processed.

As you can see, it worked, the update process taking about 6.5 minutes. Perhaps the lack of an '=' in front of a calculation could have caused the earlier failures?

@PlexPro
Copy link
Author

PlexPro commented Jan 18, 2017

Well, today, after adding a couple of new metrics, I ran the reload successfully. The update ran until it timed out after 60 seconds, and stopped recording log entries about 40 seconds later.

I've copied in a few of the log lines below, which have at the core the timeout messages, but this was preceded by 467 duplicate line pairs (the only variation being the timestamp), and followed by another 222 identical pairs, before it quit recording.

So, it wasn't adding the '=' to the formulae that permitted it to work yesterday.

{"module":"checkRepo","app":"Analytics R&D --dev","level":"debug","message":"Number of AppObjects owned by sa_repository inside app 24baa5f1-206d-4c7f-973d-792463e56d9f: {"statusCode":200,"body":{"value":35}}","timestamp":"2017-01-18T17:56:56.262Z"}
{"module":"objectManagement","app":"Analytics R&D --dev","level":"info","message":"Checking repository for the addition of objects: 35","timestamp":"2017-01-18T17:56:56.263Z"}
{"module":"objectManagement","app":"Analytics R&D --dev","level":"info","message":"I timed out. ","timestamp":"2017-01-18T17:56:56.413Z"}
{"module":"updateMetrics","app":"Analytics R&D --dev","level":"debug","message":"[TimeoutError: operation timed out]","timestamp":"2017-01-18T17:56:56.415Z"}
{"module":"objectManagement","app":"Analytics R&D --dev","level":"error","message":"[TimeoutError: operation timed out]","timestamp":"2017-01-18T17:56:56.414Z"}
{"module":"objectManagement","app":"Analytics R&D --dev","level":"info","message":"undefined","timestamp":"2017-01-18T17:56:56.416Z"}
{"module":"objectManagement","app":"Analytics R&D --dev","level":"info","message":"Object Management Complete","timestamp":"2017-01-18T17:56:56.417Z"}
{"module":"checkRepo","app":"Analytics R&D --dev","level":"debug","message":"Number of AppObjects owned by sa_repository inside app 24baa5f1-206d-4c7f-973d-792463e56d9f: {"statusCode":200,"body":{"value":35}}","timestamp":"2017-01-18T17:56:56.424Z"}
{"module":"objectManagement","app":"Analytics R&D --dev","level":"info","message":"Checking repository for the addition of objects: 35","timestamp":"2017-01-18T17:56:56.424Z"}

@goldbergjeffrey
Copy link
Collaborator

I'm rewriting it and taking the promise while timeout out because clearly it's not working. I'm going to be implementing the notification service from QRS to trigger ownership change post creation. There should be no appreciable increase in the time it takes to push the metrics, however, I'm trying to figure out the best way to log and message the process because it's going to require some waiting between creating in the app, updating the repo (the bottleneck), and then triggering the remaining work.

It's in process and will be part of late Jan early Feb release.

@PlexPro
Copy link
Author

PlexPro commented Jan 18, 2017

Hi Jeff,

I reran the update a bit ago. It apparently updated three of my four apps, from what I could tell from the log, but the change that I made wasn't such that I could positively conclude that. However, on the fourth app, it timed out 60 seconds after it started processing it. (What's curious to me, but you probably know what's going on, is that it is writing log lines during the entire time. They appear duplicate to me; perhaps because they are, but perhaps the messages don't include the data that would distinguish them.)

Now, I again ran the reload and update, and it succeeded now, so the timeout doesn't always happen.

What I want to ask is whether increasing the "objectManagementTimeout" setting in the config.js file may be of help?

René

@goldbergjeffrey
Copy link
Collaborator

Doubtful. The repeated lines are the promisewhile waiting for content to be added to repo. It does time out because of the timer setting, but it's a shitty implementation now that I reflect on it. You can try to increase the timeout, and it may indeed fix the problem, but I'm changing it anyway because the repo needs to tell GMS when it's ok to go, not GMS plays a waiting game that it can't be guessing at.

@PlexPro
Copy link
Author

PlexPro commented Jan 18, 2017

When it doesn't time out, it processes fairly quickly, four large apps and one smallish one in 3m38s.

Thanks Jeff!

@PlexPro
Copy link
Author

PlexPro commented Jan 24, 2017

When the process experiences problems, what happens? Does it actually terminate, or has it only slowed down to an excruciatingly slow pace? I have found that after thinking that a process had abended and kicking it off again, that it actually hadn't. What is happening then? Are there two processes running concurrently attempting to update the same set of apps?

Is there any way to know whether and to ensure that a process have completed? Could a process be terminated safely? What happens to the apps?

Thanks,

René

@PlexPro
Copy link
Author

PlexPro commented Jan 24, 2017

The repository service is consuming over half the system resources... I will reboot and try again.

@goldbergjeffrey
Copy link
Collaborator

@PlexPro I think I've indicated my plan to attempt to resolve the issues related to reload and update. It's planned for first full week in February.

@PlexPro
Copy link
Author

PlexPro commented Jan 24, 2017

Yes, thanks. I'm providing info in case it might affect or be affected by something other than what you've thought about.

Apparently, though, you would recommend not trying the load again until the next update be out?

René

@goldbergjeffrey
Copy link
Collaborator

I've instituted the QRS notification system to start the change ownership and publishing process for metrics. This should resolve the issue with multiple apps not updating properly.

@goldbergjeffrey
Copy link
Collaborator

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants