Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting/Ordering is changed, but no change in content #189

Closed
Hans-Maulwurf opened this issue Sep 26, 2023 · 11 comments · Fixed by #190
Closed

Sorting/Ordering is changed, but no change in content #189

Hans-Maulwurf opened this issue Sep 26, 2023 · 11 comments · Fixed by #190
Labels
pinned Prevent from automatically closing due to inactivity question Further information is requested

Comments

@Hans-Maulwurf
Copy link

Hans-Maulwurf commented Sep 26, 2023

Hey,

first I'm happy that I found your tool, it's very very useful.

I have one special case for a change-tracking. I want to keep track of changes on the site https://antcheck.info/species/Colobopsis_leonardi
There are cards with prices. The cards seem to be generated, so the order of the cards changes with every call. Of course website-stalker does recognize this as a change. Is there a way to re-order/sort the elements so there are only changes detected, when there are really changes? at the moment I use this config

  - url: https://antcheck.info/species/Colobopsis_leonardi
    editors:
      - css_select: .card-body
      - css_remove: img
      - regex_replace:
          pattern: "Last updated: \\d+ (hour|minute)(s)? ago"
          replace: ""
      - html_prettify
@Hans-Maulwurf Hans-Maulwurf added the question Further information is requested label Sep 26, 2023
@EdJoPaTo
Copy link
Owner

I thought about something like this but didn't continued on that thought as I had no important use case.

An idea I had was something like this:

editors:
  - css_sort: .card-body
  # or
  - css_select:
      selector: .card-body
      sort: true
  # or
  - css_select:
      selector: .card-body
      sort: .badge-pill

The two second ones would be a bit more complicated to implement but seem more natural to use.

Basically this would need a selector to get what should be sorted to get a list of item. Then this list can be sorted by outerHTML or a selector.

This would also allow to add something like unique to filter out duplicated items:

editors:
  - css_select:
      selector: .card-body
      unique: true

I haven't thought much about how to implement it. Basically it's just an idea how it could be used afterwards. Any thoughts on this?

@Hans-Maulwurf
Copy link
Author

Well something like that would help i guess ;)

But on the other hand, I found that this site has a separate API
https://antcheck.info/api/docs

The response is json and so I use json_prettify. But in this there are values of "id" that change from time to time, so there is again a change detected, when there is no real change in content ;) if there would be a configuration like json_remove (like the css_remove) then I could use the API.

btw is there a way to get more details or an error? I often get
ERROR ... expected value at line 1 column 1

but I cant see what went wrong. maybe because of rate limiting or something? the seconds between calls to the same URL (with different parameters) are not configurable, are they?

@EdJoPaTo
Copy link
Owner

Currently the duration between calls to the same domain are 5 seconds which is not configurable currently. Not sure if it would be interesting to configure that?

When there is an HTTP error it would already fail with that. So the request seems to respond with a successful but empty response which can not be parsed as JSON?
There currently is no more information than the error printout. I am not sure what could be done better there. Maybe print what editor is failing there?

json_select / json_remove is the goal of #77 but was never really needed so I never approached it yet. Looks like you are the first one that is interested in it.

@Hans-Maulwurf
Copy link
Author

Well I think the "best" improvement would be the thing with sorting/ordering html elements. If this would work, I wouldnt need the debug option or the json-remove. It would have to be designed that way that elements could be sorted but the sorting-agrument is inside this element in some sub-element.

Sorting/reordering with regex seems to be not really possible.

@Hans-Maulwurf
Copy link
Author

@EdJoPaTo do you have an idea if you are able to develop this feature in the near future?

@EdJoPaTo
Copy link
Owner

I would like this feature myself so it’s definitely on my todo list. Not sure when exactly I have time for it but I would like to say sooner than later. Thank you for reminding me to increase its priority for me 😇

@EdJoPaTo
Copy link
Owner

btw is there a way to get more details or an error? I often get ERROR ... expected value at line 1 column 1

I improved the error message on editor errors (see cea6b73). It now looks like this:

ERROR: https://edjopato.de/post/ in editor[4] json_prettify: expected value at line 1 column 1

The sorting is its own part which I haven't approached yet.

@EdJoPaTo
Copy link
Owner

Current working state is something like this:

- css_sort:
    selector: article
    sort_by: # here you can use every editor again which is applied to every selected html element
      - css_select: a
      - html_sanitize

In my testing case I found out I need the sanitize because of irregular links where the attributes are different on the links. But this was mainly because I was able to add debug prints into website-stalker while trying to understand what is happening there. This is not possible for users of website-stalker. Not sure how to deal with something like that in a useful way.

This comment was marked as outdated.

@github-actions github-actions bot added the stale label Dec 19, 2023
@EdJoPaTo EdJoPaTo removed the stale label Dec 19, 2023
@EdJoPaTo
Copy link
Owner

I am still not happy with the current approach as it’s hard to understand what’s going on…

This comment was marked as outdated.

@github-actions github-actions bot added the stale label Feb 18, 2024
@EdJoPaTo EdJoPaTo added pinned Prevent from automatically closing due to inactivity and removed stale labels Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pinned Prevent from automatically closing due to inactivity question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants