Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ BUG ] Premature end of data in tag url line 1 #413

Closed
miladmeidanshahi opened this issue Jun 3, 2023 · 2 comments
Closed

[ BUG ] Premature end of data in tag url line 1 #413

miladmeidanshahi opened this issue Jun 3, 2023 · 2 comments

Comments

@miladmeidanshahi
Copy link

Hi,

I wrote the script for generating custom sitemap data but sometimes get the error: if I reduce the length of data to around 500 items it works perfectly but if the script goes around 3000 data this error rises. when I manually add the </url></urlset> end tag to the stored file the problem is fixed but why?!

xml is invalid Error: Command failed: xmllint --schema /home/sitemap-generator/node_modules/.pnpm/sitemap@7.1.1/node_modules/sitemap/schema/all.xsd --noout -
-:1: parser error : Premature end of data in tag url line 1
AF%DB%8C%D8%AF%DB%8C%20%D9%86%D8%B4%D8%B1%20%D8%B4%D9%85%D8%B4%D8%A7%D8%AF</loc>
                                                                               ^

    at ChildProcess.exithandler (node:child_process:419:12)
    at ChildProcess.emit (node:events:513:28)
    at maybeClose (node:internal/child_process:1091:16)
    at ChildProcess._handle.onexit (node:internal/child_process:302:5) {
  code: 1,
  killed: false,
  signal: null,
  cmd: 'xmllint --schema /home/milad/Public/Projects/sitemap-generator/node_modules/.pnpm/sitemap@7.1.1/node_modules/sitemap/schema/all.xsd --noout -'
} -:1: parser error : Premature end of data in tag url line 1
AF%DB%8C%D8%AF%DB%8C%20%D9%86%D8%B4%D8%B1%20%D8%B4%D9%85%D8%B4%D8%A7%D8%AF</loc>

my script:

#!/usr/bin/env node
import { createWriteStream, createReadStream } from 'fs'

import yargs from 'yargs'
import { hideBin } from 'yargs/helpers'
import { createGzip } from 'zlib'
import { xmlLint, parseSitemap, SitemapStream } from 'sitemap'

class HTTPResponseError extends Error {
  constructor(response) {
    super(`HTTP Error Response: ${response.status} ${response.statusText}`)
    this.response = response
  }
}

const checkStatus = response => {
  if (response.ok) {
    // response.status >= 200 && response.status < 300
    return response
  } else {
    throw new HTTPResponseError(response)
  }
}

const argv = yargs(hideBin(process.argv)).argv

if (argv.url) {
  if (!/^(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\$&'\(\)\*\+,=.]+$/.test(argv.url)) {
    console.log('URL is not valid!')
    process.exit(0)
  }

  try {
    const URL = argv.url
    const sitemap = new SitemapStream({
      hostname: URL,
      lastmodDateOnly: true,
      xmlns: { // XML namespaces to turn on - all by default
        news: true,
        xhtml: true
      }
    })

    sitemap.pipe(createGzip())

    const writeStream = createWriteStream(argv.output ?? './sitemap.xml')

    sitemap.pipe(writeStream)

    const request = await fetch(`${URL}/api/sandbox/settings/sitemap`)

    checkStatus(request)
    
    const data = await request.json()

    data.products.forEach(({ id, name }) => {
      sitemap.write({
        url: `${URL}/products/${id}/${name}`,
        lastmod: new Date(),
        changefreq: 'weekly',
        priority: 0.9
      })
    })

    data.categories.forEach(category => {
      const typeOfCategory = () => {
        if (category.is_tag) return 'tag'
        if (category.is_brand) return 'brand'
        if (!category.is_brand && !category.is_tag) return 'category'
      }
      sitemap.write({
        url: `${URL}/collections?filter=${typeOfCategory()}&filter_title=${category.name}`,
        changefreq: 'weekly',
        priority: 0.9
      })
    })

    sitemap.end()

    console.log(URL)
    console.log('Products', data.products.length)
    console.log('Categories', data.categories.length)
    console.log('Successfully generated.')

    xmlLint(createReadStream(argv.output ?? './sitemap.xml')).then(
      () => console.log('xml is valid'),
      ([err, stderr]) => console.error('xml is invalid', err, stderr)
    )
  } catch (error) {
    console.error(error)

    const errorBody = await error.response.text()
    console.error(`Error body: ${errorBody}`)
  }
} else {
  console.log('URL is required! pass --url https://example.com')
}
@huntharo
Copy link
Contributor

It's happening because you are not waiting for the stream to close, so any buffered contents are not being flushed to the file. Streams are async but most Node.js devs do not seem to realize that. Even write technically is async and needs you to wait for a callback before throwing another 10k writes at it.

To fix the race condition you need to add this:

import { finished } from 'stream/promises';

// [...]

await finished(sitemap);
await finished(writeStream);

@huntharo
Copy link
Contributor

@derduher - I think we can close this as a duplicate of #362

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants