Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add basic support for setStatusMessage #1790

Merged
merged 4 commits into from
Mar 2, 2023
Merged

Conversation

barjin
Copy link
Contributor

@barjin barjin commented Feb 15, 2023

No description provided.

@barjin barjin marked this pull request as draft February 15, 2023 12:35
packages/basic-crawler/src/internals/basic-crawler.ts Outdated Show resolved Hide resolved
packages/memory-storage/src/memory-storage.ts Outdated Show resolved Hide resolved

const log = async () => {
const currentStats = this.stats.calculate();
await client.setStatusMessage?.(`Crawled ${this.handledRequestsCount}/${currentStats.requestsTotal} pages.`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we wanted to have the messages a bit smarter, our initial thoughts were this:

  • as long as the crawler runs normally, we could write something like Crawling 5/123 pages, 8 requests failed, 5 results in default dataset., we’d have to define what means the “normally”, e.g. as long as there are no hard failures (requests that reached all retries), we could consider normal
  • when we see higher error rate, we could print something about the errors, e.g. the most common error, maybe something from the retry histogram or statistics object?
  • maybe some stats about average response time?

https://apifier.slack.com/archives/CG3EJESUX/p1671462495090689

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit updated / more granular now, but the final format is still WIP - imo in this case, less is more - status message should inform you about the run status as fast as possible - having long, convoluted messages filled with data isn't the way to go (again, just my opinion, I've seen the Slack thread)

@B4nan B4nan changed the title feat: setStatusMessage for MemoryStorage feat: add basic support for setStatusMessage Mar 2, 2023
@B4nan B4nan merged commit c318980 into master Mar 2, 2023
@B4nan B4nan deleted the feat/setStatusMessage branch March 2, 2023 10:02
@barjin
Copy link
Contributor Author

barjin commented Mar 3, 2023

Let's release a new version so the changes can be implemented in the sdk | local-storage ?

@B4nan
Copy link
Member

B4nan commented Mar 3, 2023

Let me check the E2E tests first, I'd like to see how this works on the platform.

@B4nan
Copy link
Member

B4nan commented Mar 3, 2023

I was expecting the status message to be printed to the console logs, which apparently it is not. I'd like the behaviour to be the same regardless of the storage being used, it should always log to console, and if apify client is used, it will also make the API call.

Other than that, it seems to be working fine.

@barjin
Copy link
Contributor Author

barjin commented Mar 3, 2023

Well, we haven't released yet, this is still doable. I would argue that the output will be a bit duplicit on the Platform (seeing the new status message both in the log and the status message pill), but it's probably better than having different logs locally and on the platform 🤔 Let's log from Crawlee directly and noop the statusMessage in the local storage implementations

@B4nan
Copy link
Member

B4nan commented Mar 3, 2023

Well, we haven't released yet, this is still doable. I would argue that the output will be a bit duplicit on the Platform (seeing the new status message both in the log and the status message pill)

The log should be the main source of truth, and only there you can see the history - status message changes over time.

@B4nan
Copy link
Member

B4nan commented Mar 3, 2023

Yeah I don't mind logging from crawlee. I think it sounds sane to have setStatusMessage on storage level that only handles API call (and is no-op in memory/local), and handle the actual logging to console on higher level, e.g. we already do that with Actor.exit, that is the method that logs the message to console + calls setStatusMessage to make the API call.

@barjin
Copy link
Contributor Author

barjin commented Mar 3, 2023

pr in here #1808

@metalwarrior665
Copy link
Member

Agree with log everywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants