Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gatsby-transformer-xml README with wrong example #13773

Closed
violetamenendez opened this issue May 1, 2019 · 13 comments · Fixed by #13921
Closed

gatsby-transformer-xml README with wrong example #13773

violetamenendez opened this issue May 1, 2019 · 13 comments · Fixed by #13921
Labels
status: needs reproduction This issue needs a simplified reproduction of the bug for further troubleshooting. type: documentation An issue or pull request for improving or updating Gatsby's documentation

Comments

@violetamenendez
Copy link
Contributor

Summary

The 'How to query' section of the README for gatsby-transformer-xml package is wrong.

Relevant information

I have run the example from this page to try to understand better how queries work. I have saved the book.xml file into my project, and when querying with graphql with the query in the example:

{
  allBooks {
    edges {
      node {
        content
      }
    }
  }
}

what I get is:

{
  "data": {
    "allBooksXml": {
      "edges": [
        {
          "node": {
            "content": ""
          }
        },
        {
          "node": {
            "content": ""
          }
        }
      ]
    }
  }
}

when the example says I should be getting this:

{
  allBooks: {
    edges: [
      {
        node: {
          content: "Gambardella, Matthew",
        },
      },
      {
        node: {
          content: "XML Developer's Guide",
        },
      },
    ]
  }
}

As far as I understand, the books are the nodes, so it'd be expected to get back content: "" from within the nodes, having to go one layer deeper to get the content.
So with the following query:

{
  allBooksXml {
    edges {
      node {
        name
        content
        xmlChildren {
          content
        }
      }
    }
  }
}

I get all these data, which is more than just the author and title as in the example:

{
  "data": {
    "allBooksXml": {
      "edges": [
        {
          "node": {
            "name": "book",
            "content": "",
            "xmlChildren": [
              {
                "content": "Gambardella, Matthew"
              },
              {
                "content": "XML Developer's Guide"
              },
              {
                "content": "Computer"
              },
              {
                "content": "44.95"
              },
              {
                "content": "2000-10-01"
              },
              {
                "content": "An in-depth look at creating applications\n      with XML."
              }
            ]
          }
        },
        {
          "node": {
            "name": "book",
            "content": "",
            "xmlChildren": [
              {
                "content": "Ralls, Kim"
              },
              {
                "content": "Midnight Rain"
              },
              {
                "content": "Fantasy"
              },
              {
                "content": "5.95"
              },
              {
                "content": "2000-12-16"
              },
              {
                "content": "A former architect battles corporate zombies,\n      an evil sorceress, and her own childhood to become queen\n      of the world."
              }
            ]
          }
        }
      ]
    }
  }
}

Is this example wrong, or am I completely misunderstanding how this should work and doing something wrong?

@gatsbot gatsbot bot added the type: documentation An issue or pull request for improving or updating Gatsby's documentation label May 1, 2019
@DSchau
Copy link
Contributor

DSchau commented May 1, 2019

Hi Violet!

Thanks for reporting this--this definitely seems to be a gap in the current documentation.

Could you provide a reproduction so we could experiment with this, ourselves?

Thank you!

@DSchau DSchau added the status: needs reproduction This issue needs a simplified reproduction of the bug for further troubleshooting. label May 1, 2019
@eclectic-coding
Copy link
Contributor

Very good. I stepped through this one as well, and you are correct. It looks like the How to Query is incorrect and needs to fixed. Why don't you submit a PR?

Make sure you read the How to Contribute.

Or you could put up an example repo for testing.

@eclectic-coding
Copy link
Contributor

@DSchau I just posted that. 👍

If it would help, here is a barebones Repository I set up to test this one locally.

@jonniebigodes
Copy link

@violetamenendez i've just created a Gatsby website using the hello world and added the dependencies needed.

Upon testing it. The example is not entirely accurate.

Following the documentation, i've created a folder called content and added the xml content inside a file called book.xml. Configured gatsby-source-filesystem to point where the file is held.

First of all using the following query in graphiql, will not work.

{
  allBooks {
    edges {
      node {
        content
      }
    }
  }
}

It should be:

{
  allBookXml {
    edges {
      node {
        content
      }
    }
  }
}

The example is based for this type of xml node structure:

<?xml version="1.0"?>
<catalog>
  <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies,
      an evil sorceress, and her own childhood to become queen
      of the world.</description>
   </book>
</catalog>

To actually fetch the data you need you'll have to "go down one level" (pardon the bad pun).
You'll have to modify it to:

{
  allBooksXml {
    edges {
      node {
        name
        content
        xmlChildren {
          content
        }
      }
    }
  }
}

That query is retrieving all of the descendants of book, namely author, title and so on.

The original query

{
  allBookXml {
    edges {
      node {
        name
        content
      }
    }
  }
}

will only work on xml node structures like the one below:

<?xml version="1.0"?>
<note>
  <to>Tobi</to>
  <from>Loki</from>
  <heading>Reminder</heading>
  <body>You're a ferret</body>
</note>

@eclectic-coding
Copy link
Contributor

@jonniebigodes As usual, you are absolutely correct, and your documentation is impeccable.

I verified your documentation with the simple repo I published. If @violetamenendez is willing, I think it would be appropriate for this to be, I believe, her first PR to Gatsby.

@jonniebigodes
Copy link

@polishedwp i was kinda in a hurry when i was writting my comment and i forgot to mention that, @violetamenendez once again if you're willing, go for it. It was a nice catch on your end 👍

@violetamenendez
Copy link
Contributor Author

Thanks all! I will read the How to Contribute page and prepare a PR (yes, my first one to Gatsby!) to fix this.

@violetamenendez
Copy link
Contributor Author

Hi,

I've tried reproducing this with the repo @polishedwp set up. And while doing so I've come up with more questions.
First note that to reproduce it I had to delete

    {
      resolve: `gatsby-source-filesystem`,
      options: {
        path: `${__dirname}/src/images`,
        name: `images`
      }
    },

from the gastby-config.js file.

So then, in the doc section about xml parsing here, it says that two nodes are created, which I guess they refer to the books. But in the example you can see

{
  "root": {
    "name": "catalog",
    "attributes": {},
    "children": [
      {
        "name": "book",
...

and when I actually query it, I can't see anything called root, or anything with the name catalog. As I'm completely new to this and I'm still struggling to grasp how it's all structured, I want to make sure I'm not missing anything, and that if I modify it it makes sense.
In the xml there is indeed a "catalog" tag, does this get lost to eternity or something? and if so... why?

<?xml version="1.0"?>
<catalog>
  <book id="bk101">
...

There's also the case that in the root element allBooksXml we have an edges field that contains node, but also a nodes field that contains the same nodes, but just with one layer of depth out of the way. Why does this happen? What's the use? And what should I use for querying?
Example:

{
  allBooksXml {
    nodes {
      id
      name
    }
    edges {
      node {
        id
        name
      }
    }
  }
}

gives

{
  "data": {
    "allBooksXml": {
      "nodes": [
        {
          "id": "bk101",
          "name": "book"
        },
        {
          "id": "bk102",
          "name": "book"
        }
      ],
      "edges": [
        {
          "node": {
            "id": "bk101",
            "name": "book"
          }
        },
        {
          "node": {
            "id": "bk102",
            "name": "book"
          }
        }
      ]
    }
  }
}

And going back to the original query in the docs, where it only fetches the author and the title content. How would I go to actually just fetch those fields with the query? If I'm not interested in price or any other field.

Thanks for your help

@jonniebigodes
Copy link

@violetamenendez regarding your questions:
1- Regarding this configuration

 {
      resolve: `gatsby-source-filesystem`,
      options: {
        path: `${__dirname}/src/images`,
        name: `images`
      }
    },

That's "baked" into the starter itself, it's pointing to that specific folder, if the folder does not exist, it starts to break. You can safely remove it for now, for your testing purposes.

2- Regarding this code

{
  "root": {
    "name": "catalog",
    "attributes": {},
    "children": [
      {
        "name": "book",
...

That's the result on how the package responsible for reading and parsing the xml works. It will return a structure similar to that. It will then be transformed as the plugin in question does it's processing.

3- Regarding this item:

and when I actually query it, I can't see anything called root, or anything with the name catalog. As I'm completely new to this and I'm still struggling to grasp how it's all structured, I want to make sure I'm not missing anything, and that if I modify it it makes sense.
In the xml there is indeed a "catalog" tag, does this get lost to eternity or something? and if so... why?

You're right, the catalog element is not injected. The actual work when generating the nodes to be consumed starts with the children of the root, so if catalog is root, it will be overlooked. As you can see here

4- Regarding this item:

There's also the case that in the root element allBooksXml we have an edges field that contains node, but also a nodes field that contains the same nodes, but just with one layer of depth out of the way. Why does this happen? What's the use? And what should I use for querying?

To the best of my knowledge, the nodes array is always created in a Gatsby plugin. This specific case a node will be the xml element book that contains all of the attributes of said element as well as other data, some internal to Gatsby others from the result of the parsing algorithm. Me personally, from my experience with Gatsby i always go with the edges level so to speak.

5- Regarding this item

And going back to the original query in the docs, where it only fetches the author and the title content. How would I go to actually just fetch those fields with the query? If I'm not interested in price or any other field.

I was testing this out, and as of the current state of the plugin, to extract those elements will require some work.

Basing on the following query:

{
  allCatalogXml {
    edges {
      node {
        xmlChildren {
          name
          content
        }
      }
    }
  }
}

You would have to iterate over the edges array and then over xmlChildren extract the required elements you need.

@drillep
Copy link

drillep commented May 4, 2019

Is there any particular reason why xml-parser was used for this plugin? Losing key fields adds an additional layer of complexity to resolvers, and the dependency hasn't been updated for a while.

@jonniebigodes
Copy link

jonniebigodes commented May 4, 2019

@drillep my take on this, is that the plugin when was being developed, that specific package was suitable for the case at hand. And it stayed that way till today. If you want and are willing, read the contribution docs, make the changes and submit a pull request with the change. Probably some rework in terms of use of a diferent package to handle the xml file parsing and a better way to structure the nodes generated. And probably incorporate handling of CDATA as if memory serves me right it's still not implemented and someone asked in here for that. i started fiddling around, but dropped it, as i am a bit without time to take on that.

@violetamenendez
Copy link
Contributor Author

I've created a PR in here #13907 with the documentation changes I think are necessary.

I agree that xml-parser seems to be a pain to work with, and according to the description of their npm package: "...you probably don't want to use this unless you also have similar needs".

I'm also not convinced about the losing of the root element from the xml structure. The catalog example seems to be a good one to see why that wouldn't be ideal.

@violetamenendez
Copy link
Contributor Author

Actual PR here #13921 that actually passes all the tests 0:-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: needs reproduction This issue needs a simplified reproduction of the bug for further troubleshooting. type: documentation An issue or pull request for improving or updating Gatsby's documentation
Projects
None yet
5 participants