gatsby-transformer-xml README with wrong example #13773

violetamenendez · 2019-05-01T15:30:49Z

Summary

The 'How to query' section of the README for gatsby-transformer-xml package is wrong.

Relevant information

I have run the example from this page to try to understand better how queries work. I have saved the book.xml file into my project, and when querying with graphql with the query in the example:

{
  allBooks {
    edges {
      node {
        content
      }
    }
  }
}

what I get is:

{
  "data": {
    "allBooksXml": {
      "edges": [
        {
          "node": {
            "content": ""
          }
        },
        {
          "node": {
            "content": ""
          }
        }
      ]
    }
  }
}

when the example says I should be getting this:

{
  allBooks: {
    edges: [
      {
        node: {
          content: "Gambardella, Matthew",
        },
      },
      {
        node: {
          content: "XML Developer's Guide",
        },
      },
    ]
  }
}

As far as I understand, the books are the nodes, so it'd be expected to get back content: "" from within the nodes, having to go one layer deeper to get the content.
So with the following query:

{
  allBooksXml {
    edges {
      node {
        name
        content
        xmlChildren {
          content
        }
      }
    }
  }
}

I get all these data, which is more than just the author and title as in the example:

{
  "data": {
    "allBooksXml": {
      "edges": [
        {
          "node": {
            "name": "book",
            "content": "",
            "xmlChildren": [
              {
                "content": "Gambardella, Matthew"
              },
              {
                "content": "XML Developer's Guide"
              },
              {
                "content": "Computer"
              },
              {
                "content": "44.95"
              },
              {
                "content": "2000-10-01"
              },
              {
                "content": "An in-depth look at creating applications\n      with XML."
              }
            ]
          }
        },
        {
          "node": {
            "name": "book",
            "content": "",
            "xmlChildren": [
              {
                "content": "Ralls, Kim"
              },
              {
                "content": "Midnight Rain"
              },
              {
                "content": "Fantasy"
              },
              {
                "content": "5.95"
              },
              {
                "content": "2000-12-16"
              },
              {
                "content": "A former architect battles corporate zombies,\n      an evil sorceress, and her own childhood to become queen\n      of the world."
              }
            ]
          }
        }
      ]
    }
  }
}

Is this example wrong, or am I completely misunderstanding how this should work and doing something wrong?

The text was updated successfully, but these errors were encountered:

DSchau · 2019-05-01T18:09:05Z

Hi Violet!

Thanks for reporting this--this definitely seems to be a gap in the current documentation.

Could you provide a reproduction so we could experiment with this, ourselves?

Thank you!

eclectic-coding · 2019-05-01T18:22:14Z

Very good. I stepped through this one as well, and you are correct. It looks like the How to Query is incorrect and needs to fixed. Why don't you submit a PR?

Make sure you read the How to Contribute.

Or you could put up an example repo for testing.

eclectic-coding · 2019-05-01T18:27:03Z

@DSchau I just posted that. 👍

If it would help, here is a barebones Repository I set up to test this one locally.

jonniebigodes · 2019-05-01T18:32:38Z

@violetamenendez i've just created a Gatsby website using the hello world and added the dependencies needed.

Upon testing it. The example is not entirely accurate.

Following the documentation, i've created a folder called content and added the xml content inside a file called book.xml. Configured gatsby-source-filesystem to point where the file is held.

First of all using the following query in graphiql, will not work.

{
  allBooks {
    edges {
      node {
        content
      }
    }
  }
}

It should be:

{
  allBookXml {
    edges {
      node {
        content
      }
    }
  }
}

The example is based for this type of xml node structure:

<?xml version="1.0"?>
<catalog>
  <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies,
      an evil sorceress, and her own childhood to become queen
      of the world.</description>
   </book>
</catalog>

To actually fetch the data you need you'll have to "go down one level" (pardon the bad pun).
You'll have to modify it to:

{
  allBooksXml {
    edges {
      node {
        name
        content
        xmlChildren {
          content
        }
      }
    }
  }
}

That query is retrieving all of the descendants of book, namely author, title and so on.

The original query

{
  allBookXml {
    edges {
      node {
        name
        content
      }
    }
  }
}

will only work on xml node structures like the one below:

<?xml version="1.0"?>
<note>
  <to>Tobi</to>
  <from>Loki</from>
  <heading>Reminder</heading>
  <body>You're a ferret</body>
</note>

eclectic-coding · 2019-05-01T19:56:23Z

@jonniebigodes As usual, you are absolutely correct, and your documentation is impeccable.

I verified your documentation with the simple repo I published. If @violetamenendez is willing, I think it would be appropriate for this to be, I believe, her first PR to Gatsby.

jonniebigodes · 2019-05-01T20:18:17Z

@polishedwp i was kinda in a hurry when i was writting my comment and i forgot to mention that, @violetamenendez once again if you're willing, go for it. It was a nice catch on your end 👍

violetamenendez · 2019-05-02T14:19:23Z

Thanks all! I will read the How to Contribute page and prepare a PR (yes, my first one to Gatsby!) to fix this.

violetamenendez · 2019-05-03T15:34:20Z

Hi,

I've tried reproducing this with the repo @polishedwp set up. And while doing so I've come up with more questions.
First note that to reproduce it I had to delete

    {
      resolve: `gatsby-source-filesystem`,
      options: {
        path: `${__dirname}/src/images`,
        name: `images`
      }
    },

from the gastby-config.js file.

So then, in the doc section about xml parsing here, it says that two nodes are created, which I guess they refer to the books. But in the example you can see

{
  "root": {
    "name": "catalog",
    "attributes": {},
    "children": [
      {
        "name": "book",
...

and when I actually query it, I can't see anything called root, or anything with the name catalog. As I'm completely new to this and I'm still struggling to grasp how it's all structured, I want to make sure I'm not missing anything, and that if I modify it it makes sense.
In the xml there is indeed a "catalog" tag, does this get lost to eternity or something? and if so... why?

<?xml version="1.0"?>
<catalog>
  <book id="bk101">
...

There's also the case that in the root element allBooksXml we have an edges field that contains node, but also a nodes field that contains the same nodes, but just with one layer of depth out of the way. Why does this happen? What's the use? And what should I use for querying?
Example:

{
  allBooksXml {
    nodes {
      id
      name
    }
    edges {
      node {
        id
        name
      }
    }
  }
}

gives

{
  "data": {
    "allBooksXml": {
      "nodes": [
        {
          "id": "bk101",
          "name": "book"
        },
        {
          "id": "bk102",
          "name": "book"
        }
      ],
      "edges": [
        {
          "node": {
            "id": "bk101",
            "name": "book"
          }
        },
        {
          "node": {
            "id": "bk102",
            "name": "book"
          }
        }
      ]
    }
  }
}

And going back to the original query in the docs, where it only fetches the author and the title content. How would I go to actually just fetch those fields with the query? If I'm not interested in price or any other field.

Thanks for your help

jonniebigodes · 2019-05-03T17:12:20Z

@violetamenendez regarding your questions:
1- Regarding this configuration

 {
      resolve: `gatsby-source-filesystem`,
      options: {
        path: `${__dirname}/src/images`,
        name: `images`
      }
    },

That's "baked" into the starter itself, it's pointing to that specific folder, if the folder does not exist, it starts to break. You can safely remove it for now, for your testing purposes.

2- Regarding this code

{
  "root": {
    "name": "catalog",
    "attributes": {},
    "children": [
      {
        "name": "book",
...

That's the result on how the package responsible for reading and parsing the xml works. It will return a structure similar to that. It will then be transformed as the plugin in question does it's processing.

3- Regarding this item:

and when I actually query it, I can't see anything called root, or anything with the name catalog. As I'm completely new to this and I'm still struggling to grasp how it's all structured, I want to make sure I'm not missing anything, and that if I modify it it makes sense.
In the xml there is indeed a "catalog" tag, does this get lost to eternity or something? and if so... why?

You're right, the catalog element is not injected. The actual work when generating the nodes to be consumed starts with the children of the root, so if catalog is root, it will be overlooked. As you can see here

4- Regarding this item:

There's also the case that in the root element allBooksXml we have an edges field that contains node, but also a nodes field that contains the same nodes, but just with one layer of depth out of the way. Why does this happen? What's the use? And what should I use for querying?

To the best of my knowledge, the nodes array is always created in a Gatsby plugin. This specific case a node will be the xml element book that contains all of the attributes of said element as well as other data, some internal to Gatsby others from the result of the parsing algorithm. Me personally, from my experience with Gatsby i always go with the edges level so to speak.

5- Regarding this item

And going back to the original query in the docs, where it only fetches the author and the title content. How would I go to actually just fetch those fields with the query? If I'm not interested in price or any other field.

I was testing this out, and as of the current state of the plugin, to extract those elements will require some work.

Basing on the following query:

{
  allCatalogXml {
    edges {
      node {
        xmlChildren {
          name
          content
        }
      }
    }
  }
}

You would have to iterate over the edges array and then over xmlChildren extract the required elements you need.

drillep · 2019-05-04T17:50:19Z

Is there any particular reason why xml-parser was used for this plugin? Losing key fields adds an additional layer of complexity to resolvers, and the dependency hasn't been updated for a while.

jonniebigodes · 2019-05-04T18:37:53Z

@drillep my take on this, is that the plugin when was being developed, that specific package was suitable for the case at hand. And it stayed that way till today. If you want and are willing, read the contribution docs, make the changes and submit a pull request with the change. Probably some rework in terms of use of a diferent package to handle the xml file parsing and a better way to structure the nodes generated. And probably incorporate handling of CDATA as if memory serves me right it's still not implemented and someone asked in here for that. i started fiddling around, but dropped it, as i am a bit without time to take on that.

violetamenendez · 2019-05-07T14:13:54Z

I've created a PR in here #13907 with the documentation changes I think are necessary.

I agree that xml-parser seems to be a pain to work with, and according to the description of their npm package: "...you probably don't want to use this unless you also have similar needs".

I'm also not convinced about the losing of the root element from the xml structure. The catalog example seems to be a good one to see why that wouldn't be ideal.

violetamenendez · 2019-05-08T07:48:54Z

Actual PR here #13921 that actually passes all the tests 0:-)

gatsbot bot added the type: documentation An issue or pull request for improving or updating Gatsby's documentation label May 1, 2019

DSchau added the status: needs reproduction This issue needs a simplified reproduction of the bug for further troubleshooting. label May 1, 2019

violetamenendez mentioned this issue May 7, 2019

Update gatsby-transformer-xml README.md #13907

Closed

violetamenendez mentioned this issue May 8, 2019

chore(gatsby-transformer-xml): Update README #13921

Merged

LekoArts closed this as completed in #13921 May 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gatsby-transformer-xml README with wrong example #13773

gatsby-transformer-xml README with wrong example #13773

violetamenendez commented May 1, 2019

DSchau commented May 1, 2019

eclectic-coding commented May 1, 2019

eclectic-coding commented May 1, 2019

jonniebigodes commented May 1, 2019

eclectic-coding commented May 1, 2019

jonniebigodes commented May 1, 2019

violetamenendez commented May 2, 2019

violetamenendez commented May 3, 2019

jonniebigodes commented May 3, 2019

drillep commented May 4, 2019 •

edited

Loading

jonniebigodes commented May 4, 2019 •

edited

Loading

violetamenendez commented May 7, 2019

violetamenendez commented May 8, 2019

gatsby-transformer-xml README with wrong example #13773

gatsby-transformer-xml README with wrong example #13773

Comments

violetamenendez commented May 1, 2019

Summary

Relevant information

DSchau commented May 1, 2019

eclectic-coding commented May 1, 2019

eclectic-coding commented May 1, 2019

jonniebigodes commented May 1, 2019

eclectic-coding commented May 1, 2019

jonniebigodes commented May 1, 2019

violetamenendez commented May 2, 2019

violetamenendez commented May 3, 2019

jonniebigodes commented May 3, 2019

drillep commented May 4, 2019 • edited Loading

jonniebigodes commented May 4, 2019 • edited Loading

violetamenendez commented May 7, 2019

violetamenendez commented May 8, 2019

drillep commented May 4, 2019 •

edited

Loading

jonniebigodes commented May 4, 2019 •

edited

Loading