Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update the response result format #407

Closed
mirsella opened this issue Oct 19, 2024 · 9 comments
Closed

update the response result format #407

mirsella opened this issue Oct 19, 2024 · 9 comments

Comments

@mirsella
Copy link

this is no longer correct:

{
  url: String,
  title: String,
  description: String,
  image: String,
  author: String,
  favicon: String,
  content: String,
  published: Date String,
  type: String, // page type
  source: String, // original publisher
  links: Array, // list of alternative links
  ttr: Number, // time to read in second, 0 = unknown
}

latest broke my application as the result for example author, is now something like

  "author": {
    "@type": "Person",
    "name": "Chris Baraniuk"
  },

and published if not found, is now null instead of a empty string

@ndaidong
Copy link
Collaborator

ndaidong commented Oct 19, 2024

@mirsella yes, the latest update relates to published. It should be empty string instead of null when no date value detected.
However it has nothing to do with author property unless the target page contains jsonld data.
Could you share your article url?

@mirsella
Copy link
Author

hey !
for example this url https://www.bbcearth.com/news/the-winged-giant-that-was-bigger-than-t-rex

return as author:

  "author": {
    "@type": "Person",
    "name": "Chris Baraniuk"
  },

and as published, instead of the usual empty string when not finding a date, return null, on the latest version of this library, tested with the bun and deno version:

  "published": null,

@mirsella
Copy link
Author

on your demo site https://extractor-demos.pages.dev/article-extractor this is still a empty string, so i guess its on a recent version that wasn't yet put on your site

@ndaidong
Copy link
Collaborator

@mirsella yes, we should avoid the inconsistency. I will release a new version.

@mirsella
Copy link
Author

personally i've adapted my code to match the api return so i dont "care" anymore, but updating the schema for new user would be nice, and avoid breaking change like this on a minor version bump (both author and published are breaking change)

thanks, have a good day

@ndaidong
Copy link
Collaborator

@mirsella your are right, I didn't carefully check the recent pr!

ndaidong added a commit that referenced this issue Oct 19, 2024
- Fix inconsistent output (#407)
- Modify some stuff at LdJson extraction (#405)
  - Only use value from LdJson if missed from meta tags
  - Only accept string value from LdJson
  - Stop converting LdJson value to lowercase
@ndaidong ndaidong mentioned this issue Oct 19, 2024
@mirsella
Copy link
Author

mirsella commented Oct 19, 2024

awesome !

just missing the update response format in the readme, i would have done a PR but i don't know enough the project to says what exactly can be assigned to the field author like the @type

@ndaidong
Copy link
Collaborator

@mirsella I try to keep the output structure as before. published and author must be string. So there is no change.

@mirsella
Copy link
Author

oh ok author is also getting it's previous comportement, nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants