Delay in viewing specimen records after entry #6705

ufarrell · 2023-09-04T10:34:28Z

Issue Documentation is http://handbook.arctosdb.org/how_to/How-to-Use-Issues-in-Arctos.html

Describe the bug
I entered two individual mineral specimens (catalogue numbers 1575 and 1595) into the Trinity College Dublin Mineral collection (guid_prefix: TCDGM:Mineral). They both made it through 'autoload_core' but then went missing.

To Reproduce
Steps to reproduce the behavior:

Go to 'Data Entry > Enter Records'
Click on 'TCDGM:Mineral'
Enter record (Accn: [TCDGM:Mineral]9999), )...
Go to 'Data Entry>Browse and Edit'
Load all selected records

Expected behavior
The specimen is entered and appears in searches

Screenshots
This is a screenshot of what I get when I search for the record by catalogue number. The blue arrow goes to this address: 'https://arctos.database.museum/guid' (i.e. missing the actual guid) before redirecting to the search page. If I search by the collection 'TCDGM:Mineral' I get no results.

Data
N/A

Desktop (please complete the following information):

OS: MacOS Ventura 13.5
Browser: Firefox
Version: 117.0

Additional context

Priority

dustymc · 2023-09-04T15:01:31Z

https://arctos.database.museum/info/flat_status.cfm

DerekSikes · 2023-09-04T17:42:48Z

Friday Sep 1 after Arctos came back on line, I added a new identification to 4 specimens but searching now I am not finding them - with many different search strategies to find them. It's as if the IDs I added vanished.

ufarrell · 2023-09-06T09:37:16Z

One has appeared now. Probably rookie question on my part @dustymc - but can you help me understand what happened? Just a general backlog, or some error in data entry that slowed it down?

The reason being I was hoping to enter a mineral a day for the "Mineral Cup" 2023 bracket! My fossil records all appeared pretty quickly. But if I can't link to the mineral record on the same day I'll abandon that ship and focus on the real work (random data entry by bracket not exactly the most efficient approach anyway)

dustymc · 2023-09-06T13:39:08Z

@ufarrell I'm not sure how to say it better than what's on the top of the page, but clearly someone needs to!

I have limited resources so I have to pay the costs up front (when something changes, rather than when something's requested) by caching the core of the complex data into a simpler (so cheaper to query) structure. 90-some-odd-something percent of the time, that happens within a minute. When someone makes some huge change (or occasionally when there's some sort of problem) it can take days to catch up. I'm not sure if this is related or not, but around this time I noticed that someone messed with https://arctos.database.museum/name/unidentifiable#Arctos which set a millionish records to update. If anything ever happens on #6111 it'll take several days to catch up. Etc.

I usually have some flexibility in my side of that, and I can prioritize one (or a dozen, or MAYBE a hundred) records ahead of some huge mass by request.

@ArctosDB/documentation (I think maybe that's not the current project - help @ebraker @Jegelewicz !) - help making the blurb at the top of https://arctos.database.museum/info/flat_status.cfm better (or something to use as pagehelp or whatever) please?

Jegelewicz · 2023-09-06T13:55:59Z

I don't see a way to make the blurb better, and right now it looks like stuff is current

I never check this thing so I don't know what other statuses appear here, but I think that perhaps a description of the status or a way for me to know what data is included in "error_in_processing" might help.

ufarrell · 2023-09-06T14:21:24Z

Thanks Dusty, I get it now. Yes, I think a definition of status would be good - 'status X = in progress'

Seems obvious now, but when I returned to this I had a vague memory of a third status, but couldn't remember what it said, and I wasn't sure if there was some error indicated by the error-in-processing that maybe I needed to go away and find. Also had a brief moment of wondering if "current" meant "currently in progress".

If possible, maybe keeping that third status visible, even when NumberRecords = 0 would be informative as well

dustymc · 2023-09-06T14:53:01Z

How's this?

Jegelewicz · 2023-09-06T14:55:22Z

That's helpful, but the next step for me is what are the errors? Are they mine? what should I be doing?

Any way to give us that?

dustymc · 2023-09-06T15:02:31Z

errors

If you can do anything about them then you'll have no problem finding them....

Just ping me. I try to keep on top of errors, it doesn't always work. (That one was what almost all of them are, someone swaparooing a GUID around. I don't have a technical and sustainable solution for that and the situation would not be allowed if I had my way so I just cringe and force-refresh.)

Jegelewicz · 2023-09-06T15:07:32Z

swaparooing a GUID

What does that mean exactly?

campmlc · 2023-09-06T15:19:08Z

Can we get some description on the status page as to what current, flat, filtered flat etc mean, and what it means if there are large vs small numbers there? It still isn't obvious what numbers in those various columns mean in terms of potential wait times.

dustymc · 2023-09-06T15:37:00Z

I've gathered some related conversations below. If I had any better ideas for how any of this could work, it'd already work like that. If someone else has better ideas then PLEASE spell them out here!

#6320
#4714
#4659
#6700
#6106
#5511
#6146
#6121

mkoo · 2023-10-06T06:10:36Z

The form help text at the top of the page is better (except for a typo)
currently:

 FLAT is the cache which supports most searches and provides the data available in many Arctos views. Changes to underlying data should propate to FLAT and become visible.

Updates and catalog record entries are processed in the order they enter the queue. Most updates are available everywhere within a minute, occasionally large changes can necessitate days or even weeks of processing.

Very occasionally, a change to underlying data is not reflected in FLAT. This should be reported in Issues. Individual records may be marked for refresh by any Operator with access. Larger jobs may be coordinated with the DBA team.

FILTERED_FLAT serves the same purpose to public users.

More discussion: https://github.com/ArctosDB/arctos/issues/6705

Only comment is to fix typo (propate to propogate)
When completed, please close this issue ! @dustymc

dustymc · 2024-07-17T15:06:13Z

Commenting here because this Issue is linked from flat status.

Someone (or some intersection, I'm not sure) loaded a bunch of 'no data' records, which resulted in a very large locality (and maybe event) merge, after which tens (hundreds?) of thousands of records need to refresh. This resulted in

How can we better communicate any of this?

dustymc · 2024-07-17T16:25:06Z

Addressing @AdrienneRaniszewski comment from #7946 (comment) here in an attempt to centralize information.

better explanations right on the flat_status page would be great, so I don't have to sort through github discussions.

Can someone please help with appropriate text?

A way to figure out how long I need to wait would also be super useful, I think.

I don't have that information. This is often caused by incoming changes - the count is going up and down at the same time, occasionally from multiple sources. The rate of change also influences both the rate at which other things can change, and the rate at which refreshes can be processed.

how many records are processed per minute.

I don't have that information. The processor just uses what resources it can, and the data across records is wildly different. I could make it more predictable, but I'd have to aim that at the worst case scenario. Wild guess, that's maybe 500 records per minute, and for comparison it's running at around 3000 right now. I've done everything I can to make things more efficient, and part of the cost is a lack of transparency.

where are my records in the queue.

Absolutely no idea. It tries for first in / first out, but that's not very informative.

Until then, I'm just periodically searching for my records and crossing my fingers.

Very open to better ideas, no matter how radical. (Not allowing public access to the primary DB might mean - after a LOT of code rewrites - that we don't need a cache, or we could potentially try to make updates less often in larger chunks, or ????????? Or throwing massive amounts of hardware at the current environment is usually the cheapest option, if someone wants to tackle that.)

But luckily this extreme delay doesn't happen often. Thanks Dusty!

I think still something well over 90% of the time the cache is less than a minute stale. I understand that's not very comforting when you're waiting for something to happen.

campmlc · 2024-07-17T16:38:19Z

We hear this over and over that the cause of so many problems is "limited resources". What do we do to fix this? What "resources" do we need to acquire? What grant do we need to write or what hardware or cloud storage do we need to pay for? Do we have a list of this somewhere?
And shouldn't things like locality merges on a subset of records take lower priority than new entry? Are we dealing with one single pipeline or lane for all inputs, so that a single issue can delay everything for everyone?

dustymc · 2024-07-17T19:17:03Z

"limited resources"

https://github.com/ArctosDB/internal/issues/330

priority

https://github.com/ArctosDB/PG/issues/30

and from #7946 (comment):

collection_cde basis

#7952

Also #7953 from the same comment.

ufarrell added Priority-High (Needed for work) High because this is causing a delay in important collection work.. Bug Arctos is not performing as it should. labels Sep 4, 2023

dustymc mentioned this issue Sep 19, 2023

Feature Request - more user friendly info on cache status #5511

Closed

mkoo changed the title ~~Missing specimen records after entry~~ Delay in viewing specimen records after entry Oct 6, 2023

mkoo added Display/Interface I don't like the way Arctos looks or it isn't working for me aesthetically. and removed Bug Arctos is not performing as it should. labels Oct 6, 2023

dustymc added this to the 3.2 milestone Oct 6, 2023

dustymc closed this as completed Oct 6, 2023

This was referenced Feb 14, 2024

Records appearing blank after bulkloading #7415

Closed

search using attribute remark fails #7421

Closed

dustymc mentioned this issue Mar 11, 2024

Blank/Missing Records From Bulkloader #7517

Closed

dustymc mentioned this issue Jul 17, 2024

disappearing bulkloaded records #7946

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delay in viewing specimen records after entry #6705

Delay in viewing specimen records after entry #6705

ufarrell commented Sep 4, 2023

dustymc commented Sep 4, 2023

DerekSikes commented Sep 4, 2023

ufarrell commented Sep 6, 2023

dustymc commented Sep 6, 2023

Jegelewicz commented Sep 6, 2023

ufarrell commented Sep 6, 2023

dustymc commented Sep 6, 2023

Jegelewicz commented Sep 6, 2023

dustymc commented Sep 6, 2023

Jegelewicz commented Sep 6, 2023

campmlc commented Sep 6, 2023

dustymc commented Sep 6, 2023

mkoo commented Oct 6, 2023

dustymc commented Jul 17, 2024

dustymc commented Jul 17, 2024

campmlc commented Jul 17, 2024

dustymc commented Jul 17, 2024

Delay in viewing specimen records after entry #6705

Delay in viewing specimen records after entry #6705

Comments

ufarrell commented Sep 4, 2023

dustymc commented Sep 4, 2023

DerekSikes commented Sep 4, 2023

ufarrell commented Sep 6, 2023

dustymc commented Sep 6, 2023

Jegelewicz commented Sep 6, 2023

ufarrell commented Sep 6, 2023

dustymc commented Sep 6, 2023

Jegelewicz commented Sep 6, 2023

dustymc commented Sep 6, 2023

Jegelewicz commented Sep 6, 2023

campmlc commented Sep 6, 2023

dustymc commented Sep 6, 2023

mkoo commented Oct 6, 2023

dustymc commented Jul 17, 2024

dustymc commented Jul 17, 2024

campmlc commented Jul 17, 2024

dustymc commented Jul 17, 2024