Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing structured data (ld+json) #16

Open
spekulatius opened this issue Aug 24, 2020 · 4 comments
Open

Parsing structured data (ld+json) #16

spekulatius opened this issue Aug 24, 2020 · 4 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@spekulatius
Copy link
Owner

It would make sense to parse the structured data JSON provided by some sites within the head-tag. This way the already accessed information from the meta tags could be made more robust and possibility extended later on.

Ref: https://developers.google.com/search/docs/data-types/article

@spekulatius
Copy link
Owner Author

Context: https://json-ld.org/

@spekulatius spekulatius added enhancement New feature or request help wanted Extra attention is needed labels Sep 11, 2020
@spekulatius spekulatius added this to the 2.0 milestone Oct 28, 2022
@eposjk
Copy link

eposjk commented Dec 8, 2022

Some thoughts:

A website can contain multiple JSONLD blocks. It seems possible to combine them ( https://stackoverflow.com/a/48295719 ) - probably, we should use the Array notation:

[
  {
     "@context": "http://schema.org",
     "@type": "Organization"
  },
  {
     "@context": "http://schema.org",
     "@type": "BreadcrumbList"
  }
]

Would it make sense to always return an array - even if the page contains only one JSONLD block? (probably yes)

@spekulatius
Copy link
Owner Author

Hey @eposjk,

good point on the multiple ld+json blocks.

Yeah, if data exists in multiple positions we should go for an array. It might be only one element, but at least it's future proof. Merging blocks into one might be an option too.`

Cheers,
Peter

@joshua-bn
Copy link

This is what I'm using:

        $jsonLd = [];
        foreach ($dom->getElementsByTagName('script') as $script) {
            if ($script->getAttribute('type') === 'application/ld+json') {
                $json_txt = preg_replace('@/\*.*?\*/@', '', $script->textContent);
                $json_txt = preg_replace("/\r|\n/", ' ', trim($json_txt));
                $schema = json_decode($json_txt, true);
                if (isset($schema['@graph'])) {
                    $jsonLd += $schema['@graph'];
                } else {
                    $jsonLd[] = $schema;
                }
            }
        }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants