XMLNodeToXMLDocument [XMLNodeToXMLDocumentHelp] ** Overview This Help covers the protocol from XMLNode to XMLDocument (and Siblings) It provides Doits and discussion. An XMLDocument is provided for exploration in every protocol by evaluating… #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. #+END_EXAMPLE or (in case you do not have an internet connection): #+BEGIN_EXAMPLE | tree | tree := (XMLDOMParser on: ‘<?xml version=”1.0” encoding=”UTF-8”?> <cardset> <card> <cardname lang=”en”>Arcane Lighthouse</cardname> <types>Land</types> <year>2014</year> <rarity>Uncommon</rarity> <expansion>Commander 2014</expansion> <cardtext>Tap: Add 1 uncolor to you mana pool. 1 uncolor + Tap: Until end of turn, creatures your opponents control lose hexproof and shroud and can”t have hexproof or shroud.</cardtext> </card> <card> <cardname lang=”en”>Desolate Lighthouse</cardname> <types>Land</types> <year>2013</year> <rarity>Rare</rarity> <expansion>Avacyn Restored</expansion> <cardtext>Tap: Add Colorless to your mana pool. 1BlueRed, Tap: Draw a card, then discard a card.</cardtext> </card> </cardset>’) parseDocument. tree explore #+END_EXAMPLE ** Getting The XML Packages First, unload the default squeak xml(topa) using the Monticello Browser then evaluate the following: #+BEGIN_EXAMPLE Installer ss project: ‘MetacelloRepository’; install: ‘ConfigurationOfXMLParser’. (Smalltalk at:#ConfigurationOfXMLParser) loadDefault . Installer ss project: ‘MetacelloRepository’; install:’ConfigurationOfXMLSupport’. (Smalltalk at:#ConfigurationOfXMLSupport) loadDefault . Installer ss project: ‘MetacelloRepository’; install: ‘ConfigurationOfXMLWriter’. (Smalltalk at:#ConfigurationOfXMLWriter) load . Installer ss project: ‘MetacelloRepository’; install: ‘ConfigurationOfXMLParserHTML’. (Smalltalk at:#ConfigurationOfXMLParserHTML) load . Installer ss project: ‘MetacelloRepository’; install: ‘ConfigurationOfXMLParserStAX’. (Smalltalk at:#ConfigurationOfXMLParserStAX) load . Installer ss project: ‘MetacelloRepository’; install: ‘ConfigurationOfXMLRPC’. (Smalltalk at:#ConfigurationOfXMLRPC) load . Installer ss project: ‘MetacelloRepository’; install: ‘ConfigurationOfXPath’. (Smalltalk at:#ConfigurationOfXPath) load . #+END_EXAMPLE ** XMLNode [XMLNodeToXMLDocumentXMLNodeHelp] *** Comment This is a base class for XML nodes. It has 1. testing messages 2. messages to access 1. the parent 2. sibling nodes 3. ancestor nodes 3. messages to control printing. #+BEGIN_EXAMPLE not from the class comment, but informative. https://stackoverflow.com/questions/132564/whats-the-difference-between-an-element-and-a-node-in-xml #+END_EXAMPLE *** XPath Core accessing #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self followingNodes. self followingSiblingNodes. self precedingNodes. self precedingSiblingNodes. self stringValue. #+END_EXAMPLE *** XPath Core converting #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self asXPathBoolean . “trhows error” self asXPathComparable. “throws error” self asXPathFilterExpressionLocationPathRoot. self asXPathFilterExpressionPredicateRoot. “throws error” self asXPathNodeSet. “throws error” self asXPathNumber. “throws error” self asXPathString. “throws error” self asXPathUnionable. “throws error” #+END_EXAMPLE *** XPath Core deprecated ommitted. *** XPath Core enumerating #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self allNodesReverseDo:[:node | node ]. self followingNodesDo:[:node | node ]. self followingSiblingNodesDo:[:node | node ]. self precedingNodesDo:[:node | node ]. self precedingSiblingNodesDo:[:node | node ]. #+END_EXAMPLE *** XPath Core enumerating axis #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self % ‘credit’. self %% ‘credit’. self %%~ ‘credit’. self / ‘credit’. self // ‘credit’. self //~ ‘credit’. self << ‘credit’. self <<< ‘credit’. self >> ‘credit’. self >>> ‘credit’. self @ ‘credit’. self @@ ‘credit’. self ~ ‘credit’. #+END_EXAMPLE *** XPath Core evaluating #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self xpath: ‘.’. “add your xpath expressions rightcher” self xpath: ‘.’ context: XPathContext. “this works. XPathContext to be explored later” #+END_EXAMPLE *** XPath Core private #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. ommitted. May add after XPath study. #+END_EXAMPLE *** XPath Core testing #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self hasExpandedName. self isNamespace. #+END_EXAMPLE *** Accessing #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self ancestorElements. “every element is a node.” self ancestorNodes. “not all nodes are elements” self configuration. self contentString. “prints content, no tags” self document. self documentRoot. self nextNode. self nodeFactory. self parent. self previousNode. self rawContentString. self sortKey. #+END_EXAMPLE *** Converting #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self asString. #+END_EXAMPLE *** Copying #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self copy. self copySharingConfiguration. self postCopy. self postCopyConfiguration. #+END_EXAMPLE *** Defaults #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self parserHandlerClass. self xmlWriterClassOrNil. #+END_EXAMPLE *** Deprecated Ommitted. *** Enumerating #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self allNodesDo: [:node | node]. “not all nodes are elements. A Node is the primary dataType of XML” self ancestorElementsDo:[:element | element]. “all elements are nodes” self ancestorNodesDo:[:node | node]. self descendantNodesDo:[:node | node]. #+END_EXAMPLE *** Instance Creation #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. “many/all these have no parent” self newCData: ‘dude’. self newComment: ‘dude’. self newDocument. self newElement. self newElementNamed: ‘foo’. self newElementNamed: ‘dude’ attributes: {‘foo’->’bar’}. self newElementNamed: ‘dude’ namespaceURI: ‘foo’. self newElementNamed: ‘dude’ namespaceURI: ‘foo’ attributes: {‘foo’->’bar’}. self newListForCollect. self newListForSelect. self newPI. self newPITarget:’dude’ data:’foo’. self newStringNode: ‘Dude’. #+END_EXAMPLE *** Printing #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self canonicallyPrintOn: ios. self canonicallyPrintToFileNamed: ‘dude.xml’. “broken work around below” self canonicallyPrinted. self prettyPrintToFileNamed: ‘dude.xml’. “See workaround below” self prettyPrinted. self printContentOn: ios. “see workaround below”. self printOn: aStream. self printOn: aStream beforeWritingDo: aBlock. self printRawContentOn: ios. self printToFileNamed: ‘dude.xml’. “see workaround below” self printToFileNamed: ‘dude.xml’ beforeWritingDo: aBlock. self printWithoutSelfClosingTagsOn: ios. self printWithoutSelfClosingTagsToFileNamed: ‘dude.xml’. self printedWithoutSelfClosingTags. self writeXMLOn: aWriter. “see method comment”. |ios| ios := ReadWriteStream on:”. self canonicallyPrintOn: ios. FileStream forceNewFileNamed: ‘dude.xml’ do:[:file | file nextPutAll: ios contents]. #+END_EXAMPLE *** Private Not tested, included for completeness. #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self errorXMLWritingUnsupported. self escapesContentEntitiesWhenWriting. self hasNodeList: aNodeList. “returns false unequivically” self hasParentWithNodeList: aNodeList. “looks like it is used in maintaing internal state” self initializeFileWriteStream. self isCoalescingStringNode. self parent: aNode. self withNewWriteStreamOnFileNamed: aFileName do: aBlock. self withNewXMLWriterOn: aStream do: aOneArgBlock. self withNewXMLWriterOn: aStream do: aOneArgBlock whenAbsent: aZeroArgBlock. #+END_EXAMPLE *** Testing #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self canHaveChildren. self canonicallyEquals: self. self hasChildren. self hasParent. self isAttribute. self isCData. self isComment. self isContentNode. self isDeclaration. self isDocument. self isElement. self isElementNamed:’foo’. self isElementNamedAny: {‘foo’ . ‘bar’} “returns false unequivically”. self isInLanguage: ‘en-*’. “see method comment ” self isPI. “XMLProcessingInstruction” self isStringNode. self isStringNode: ‘foo’ “returns false unequivically”. #+END_EXAMPLE *** Validating #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self validate. self validateWith: aValidator “different than XMLDocument validateWith.” #+END_EXAMPLE *** Visiting #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. acceptNodeVisitor: aNodeVisitor ^self #+END_EXAMPLE ** XMLNamespaceNode This class is a sublcass of XMLNode From the class comment “This class models an element namespace prefix and URI mapping as a DOM node for compatibility with the XPath standard. Namespace nodes are equal only if they have the same name, namespace URI, and belong to the same element.” I am ommitting documenting this for now. ** XMLNodeWithChildren [XMLNodeToXMLDocumentXMLNodeWithChildrenHelp] *** Class Comment #+BEGIN_EXAMPLE This is an abstract class for nodes that can contain child nodes. 1. It has messages to 1. Access child nodes 2. Add child nodes 3. Remove child nodes. 2. The nodes are stored in a kind of XMLObservableList returned by #nodes 1. which can be modified directly to add or remove nodes from the owner of #nodes 2. (copy it first if that isn’t what you want). 3. There are three types of “enumerating” messages 1. The #nodes* messages enumerate child nodes of the receiver 2. The #allNode* forms enumerate (using depth-first traversal) the receiver and all descendant nodes 3. The #descendantNode* forms enumerate only descendant nodes. #+END_EXAMPLE *** XPath Core enumerating #+BEGIN_EXAMPLE After study of XPath, return to these. (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. allNodesReverseDo: aBlock #+END_EXAMPLE *** XPath Core private After study of XPath, return to these. #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. childAxisAnySatisfy: aNodeTest childAxisSelect: aNodeTest at: aPosition into: aNodeSet childAxisSelect: aNodeTest into: aNodeSet descendantAxisAnySatisfy: aNodeTest descendantAxisSelect: aNodeTest at: aPosition into: aNodeSet descendantAxisSelect: aNodeTest ifNotPresentInto: aNodeSet descendantAxisSelect: aNodeTest into: aNodeSet descendantOrSelfAxisAnySatisfy: aNodeTest descendantOrSelfAxisSelect: aNodeTest at: aPosition into: aNodeSet descendantOrSelfAxisSelect: aNodeTest ifNotPresentInto: aNodeSet descendantOrSelfAxisSelect: aNodeTest into: aNodeSet #+END_EXAMPLE *** Accessing #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self allNodes. self descendentNodes. self firstNode. self innerXML. self innerXMLPrettyPrinted. self lastNode. self nodeAfter: ‘credit’. “need working example” self root nodeAfter: ‘credit’. “need working example” self nodeAt:1. self nodeAt:2 ifAbsent:[self inform: ‘none’]. self nodeAt:1 put: (self newStringNode:’dude’). “destroys existing” self nodeAt:1 put: (self newElementNamed:’dude’). “destroys existing” self nodeBefore: ( self root). “need more nodes!” self nodes. self replaceNode: (self firstNode) with:(self newElementNamed: ‘dude’). #+END_EXAMPLE *** Adding #+BEGIN_EXAMPLE TODO: Figure out and document the intricacies here. (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self addComment:’Dude!’. self addNode: (self newStringNode:’dude’). self addNode: (self newStringNode:’foo’) after: ‘dude’. “fix me” self addNode: (self newElementNamed:’bar’) before: ‘credit’. “fix me” self addNodeFirst:(self newStringNode:’baz’). self addNodes: ”.”see senders, do not understand yet” self addPITarget: ‘target’ data: ‘data’. “This is an XML Processing Instruction Class XMLPI” #+END_EXAMPLE *** Copying #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self postCopy. self = self postCopy #+END_EXAMPLE *** Defaults #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self nodeListClass #+END_EXAMPLE *** Deprecated ommitted. *** Enumerating #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self allNodesCollect:[:each | each] “interesting to see what ‘nodes’ are” self allNodesDetect:[:each | each isElementNamed:’credit’]. self allNodesDetect:[:each | each isElementNamed:’credit_’] ifNone:[self inform:’dude’]. |count| count := 0. self allNodesDo:[:each | count := count +1]. self inform: (count asString) self allNodesSelect:[:each | each isElement not]. “these are intersting for distinction between Element and Child” self allNodesSelect:[:each | each hasChildren]. self allNodesSelect:[:each | each hasChildren not]. self descendantNodesCollect: [:each | each]. self descendantNodesDetect:[:each | each hasChildren]. self descendantNodesDetect:[:each | each hasChildren not] ifNone:[self inform: ‘not’]. self descendantNodesDo:[:each | “something interesting here”]. self descendantNodesSelect:[:each | “something interesting here”]. self nodesCollect:[:each | each isElement]. self nodesDetect:[:each | each isElement not]. self nodesDetect:[:each | each isElement not] ifNone:[self inform: ‘none’]. self nodesDo:[:each | “something interesting”]. self nodesSelect:[:each | true]. TODO. WHAT IS THE DIFFERENCE BETWEEN allNodesFOO and nodesFoo ? #+END_EXAMPLE *** Notifying #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self addedNode: aNode “probably automatically done when a node is added. Document this design pattern later” self removedNode: aNode #+END_EXAMPLE *** Printing #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. |ios| ios := ReadWriteStream on:”. self printInnerXMLOn: ios. ios contents inspect. self printInnerXMLOn: aStream beforeWritingDo: aBlock “see senders” self writeInnerXMLOn: aWriter. “need example” self writeXMLOn: aWriter. “need example” #+END_EXAMPLE *** Private #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self hasNodeList: (XMLNodeList new). #+END_EXAMPLE *** Removing #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self removeNode: (self root). “oops” self explore. self removeNode:’credit’ ifAbsent:[self inform:’absent’] self removeNodes. self removeNodes: aNodeCollection “need a good example here.tests have some.” #+END_EXAMPLE *** Testing #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self canHaveChildren. self hasChildren. self includesNode: ‘aNode’ “get good example here”. #+END_EXAMPLE ** XMLNodeWithElements [XMLNodeToXMLDocumentXMLNodeWithElementsHelp] *** Class Comment #+BEGIN_EXAMPLE This is an abstract class for nodes with elements. 1. Instances provide “accessing” messages to retrieve child elements by their name and namespace information. 1. The #elementAt: forms return the first matching element, 2. The #elementsAt: forms return all matching child elements. 2. There are three different modes of enumeration: 1. The #elements* enumerating messages enumerate child elements 2. The #allElements* forms enumerate the receiver (if it’s an element) and all descendant elements 3. The #descendantElement* forms enumerate descendant elements only. 3. The #findElementNamed:* forms search the receicer (if it’s an element) and descendants for a specific element. 4. Element name matching is done the qualified and local name, 1. So ‘prefix:element-name’ will only match ‘prefix:element-name’ 2. While ‘element-name’ will match ‘element-name’, ‘prefix:element-name’ or ‘different-prefix:element-name’ and so on. 5. The inner XML can be accessed as a string using #innerXML and set (reparsed) using #innerXML:. #+END_EXAMPLE *** XMLElement Definition #+BEGIN_EXAMPLE These examples and notes taken from: https://www.w3schools.com/xml/xml_elements.asp An XML element is everything from (including) the element’s start tag to (including) the element’s end tag. <price>29.99</price> An element can contain: 1. text 2. attributes 3. other elements 4. or a mix of the above <bookstore> <book category=”children”> <title>Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> <book category=”web”> <title>Learning XML</title> <author>Erik T. Ray</author> <year>2003</year> <price>39.95</price> </book> </bookstore> In the example above: <title>, <author>, <year>, and <price> have text content because they contain text (like 29.99). <bookstore> and <book> have element contents, because they contain elements. <book> has an attribute (category=”children”). #+END_EXAMPLE *** Example Document #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. #+END_EXAMPLE *** Accessing #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. “XPath-Core accessing” self stringValue. “XPath-Core-enumerating” self / self // “accessing” self allElements inspect. self allElementsNamed: ‘xml_url’. self allElementsNamed:’xml_url’ namespaceURI:’absent’. self configuration. self configuration:’todo’. “yikes” self contentNodes. self contentStringAt: ‘wx_station_index’. self descendentElements. self descendantElementsNamed: ‘station_id’. self descendantElementsNamed: ‘station_id’ namespaceURI: ‘dude’. self elementAfter:’station’. “todo figure out” self root elementAfter: ‘credit’. “todo figure out” self elementAt: ‘station’. “todo figure out” self root elementAt:’station’. self elementAt: ‘station’ ifAbsent:[self inform: ‘dude’]. “todo figure out” self root elementAt:’station’ ifAbsent:[self inform: ‘dude’]. self root elementAt:’station’ namespaceURI: ‘dude’. self root elementAt:’station’ namespaceURI: ‘dude’ ifAbsent:[self inform: ‘dude’]. self elementBefore: (self root elementAt: ‘station_id’). self elementNames. self elements. self elements first. self elementsAt: (self elements first localName). self elementsAt: (self elements first localName) namespaceURI: ‘dude’. self firstElement. self lastElement. (self firstElement) = (self lastElement). self nodeFactory. self nodeFactory: ‘no clue’. “no idea” self rawContentStringAt: (self elements first localName). self stringNodes. self strings. self usesNamespaces: “no clue”. “XPath-Core accessing” self stringValue. “XPath-Core-enumerating” self / self // “accessing” self allElements inspect. self allElementsNamed: ‘xml_url’. self allElementsNamed:’xml_url’ namespaceURI:’absent’. self configuration. self configuration:’todo’. “yikes” self contentNodes. self contentStringAt: ‘wx_station_index’. self descendentElements. self descendantElementsNamed: ‘station_id’. self descendantElementsNamed: ‘station_id’ namespaceURI: ‘dude’. self elementAfter:’station’. “todo figure out” self root elementAfter: ‘credit’. “todo figure out” self elementAt: ‘station’. “todo figure out” self root elementAt:’station’. self elementAt: ‘station’ ifAbsent:[self inform: ‘dude’]. “todo figure out” self root elementAt:’station’ ifAbsent:[self inform: ‘dude’]. self root elementAt:’station’ namespaceURI: ‘dude’. self root elementAt:’station’ namespaceURI: ‘dude’ ifAbsent:[self inform: ‘dude’]. self elementBefore: (self root elementAt: ‘station_id’). self elementNames. self elements. self elements first. self elementsAt: (self elements first localName). self elementsAt: (self elements first localName) namespaceURI: ‘dude’. self firstElement. self lastElement. (self firstElement) = (self lastElement). self nodeFactory. self nodeFactory: ‘no clue’. “no idea” self rawContentStringAt: (self elements first localName). self stringNodes. self strings. self usesNamespaces: “no clue”. #+END_EXAMPLE *** Adding #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self addCData: ‘dude’. self addElementNamed: ‘squeak_weather’. “figure out later” self addElementNamed: ‘squeak_weather’ attributes: ‘no clue’. “figure out later” self addElementNamed: ‘squeak_weather’ namespaceURI: ‘no clue’. “figure out later” self addElementNamed: ‘squeak_weather’ namespaceURI: ‘no clue’ attributes: ‘no clue’. “figure out later” self addString: ‘squeak_weather, dude’. “figure out later” #+END_EXAMPLE *** Copying #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. postCopyConfiguration “no idea” #+END_EXAMPLE *** Defaults #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self configurationClass. self nodeListClass. self parserHandlerClass. #+END_EXAMPLE *** Deprecated #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self addContent: ‘No Clue’. self contentString: ‘No Clue’. self newString: ‘No Clue’. #+END_EXAMPLE *** Enumerating #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self allElementsCollect:[:each | each localName]. self allElementsDetect:[:each | each localName=’station_id’]. self allElementsDetect:[:each | each localName=’station_id’] ifNone:[self inform: ‘none’]. self allElementsDo:[:each | Transcript show: (each localName); cr]. “takes long time” self allElementsNamed: ‘station_id’ do: [:each | Transcript show: (each localName); cr]. self allElementsSelect:[:each | each localName = ‘station_id’]. self contentNodesDo:[:each | self break.]. self descendantElementsCollect:[:each | each localName]. self descendantElementsDetect:[:each | each localName=’station_id’]. self descendantElementsDetect:[:each | each localName=’station_id’] ifNone:[self inform: ‘none’]. self descendantElementsDo:[:each | Transcript show: (each localName); cr]. “takes long time” self descendantElementsNamed: ‘station_id’ do: [:each | Transcript show: (each localName); cr]. self descendantElementsSelect:[:each | each localName = ‘station_id’]. self elementsAt:’wx_station_index’ do:[:each | self break]. self elementsDetect:[:each | each localName=’wx_station_index’]. self elementsDetect:[:each | each localName=’wx_station_index’] ifNone:[self inform: ‘none’]. self elementsDo:[:each | Transcript show: (each localName); cr]. self elementsSelect:[:each | each localName = ‘wx_station_index’]. self stringNodesDo:[:each | self break.]. self stringsDo:[:each | self break]. self root stringNodesDo:[:each | self break.]. self root stringsDo:[:each | self break]. #+END_EXAMPLE *** Notifying #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self addedNode: ‘aNode’. “no clue. check out the senders. looks like way of maintaining internal state” self renamedElement: ‘anElement’ from: ‘anOldName’ to: ‘aNewName’ “no clue. check out the senders. looks like way of maintaining internal state” #+END_EXAMPLE *** Parsing #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. “TODO: flesh this out” self innerXML: ‘aStringOrStream’. self innerXMLParsedWith: ‘aParser’. self outerXML: ‘aStringOrStream’ forNode: ‘aNode’. self outerXMLForNode: ‘aNode’ parsedWith: ‘aParser’. #+END_EXAMPLE *** Printing #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. |ios| ios := ReadWriteStream on:”. self printRawContentOn: ios. ios contents inspect. #+END_EXAMPLE *** Private Not shown. *** Removing #+BEGIN_EXAMPLE todo: come up with a before and after case (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self removeAllFormattingNodes. self explore. #+END_EXAMPLE *** Searching #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self findElementNamed:’wx_station_index’. self findElementNamed:’credit_URL’. self findElementNamed:’wx_station_index’ namespaceURI: ‘no clue’. self findElementNamed:’wx_station_index’ namespaceURI: ‘no clue’ with:[1 = 1]. self findElementNamed:’wx_station_index’ with:[:each | 1 = 1]. self findElementWithID:’foo’. #+END_EXAMPLE *** Testing #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self hasContentNodes. self hasElements. self hasStringNodes. self root hasStringNodes. self includesElement: ‘wx_station_index’. self root includesElement: ‘wx_station_index’. self includesElement: ‘wx_station_index’ namespaceURI:’no clue’. self isContentNode. self root isContentNode. self usesNamespaces. #+END_EXAMPLE *** Visiting #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self acceptNodeVisitor: aNodeVisitor. “no clue” #+END_EXAMPLE ** XMLDocument [XMLNodeToXMLDocumentXMLDocumentHelp] *** Class Comment #+BEGIN_EXAMPLE This class represents a document node, which is often the root of a DOM tree. Nodes can access their document ancestor with #document. #+END_EXAMPLE *** Example Document #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. #+END_EXAMPLE *** Manually Adding Nodes #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self firstNode attributeAt:’lang’ ifAbsentPut:’en’. #+END_EXAMPLE *** Doits some handy doits to copy-n-paste into the explorer #+BEGIN_EXAMPLE “class side” XMLDocument root: (self root). “instance side” self doctypeDeclaration. self doctypeDefinition. self documentRoot = self. self encoding. self errorCannotHaveNonElementRoot. “throws an error” self hasDoctypeDeclaration. self hasEncoding. self hasRoot. self innerXMLStateClass explore. self isDocument. self isStandalone. self postCopy explore. self postCopy = self. self root explore. self validate. self validateWith: “TODO at DTD or other validator” self version. self writeDoctypeDeclarationOn: ”. “TODO” self writeXMLDeclarationOn:’aWriter’. “TODO” self writeXMLOn: ‘aWriter’. “TODO” #+END_EXAMPLE ** XMLDocumentWithCachingNodeList [XMLNodeToXMLDocumentXMLDocumentWithCachingNodeListHelp] *** Class Comment “A class for testing documents that use XMLCachingNodeList instead of XMLNodeList.” assuming it is only used for testing. ** XMLFDocument [XMLNodeToXMLDocumentXMLFDocumentHelp] *** Class Comment A dummy subclass of XMLDocument ** XMLElement [XMLNodeToXMLDocumentXMLElementHelp] *** Class Comment #+BEGIN_EXAMPLE The class represents an element node, which has a qualified or unqualified name and optionally attributes, namespace declarations and child nodes. Element names can be tested using #isNamed: and #isNamedAny:, which test both the qualified and local name. If the name is qualified and namespace support is enabled (the default), then the prefix must be mapped to a namespace URI in the element or an ancestor. The class-side instance creation #name:namespaceURI:* and #name:namespaces:* messages and the instance-side #name:namespaceURI: message can set both simultaneously. If namespace support is disabled, prefixes are not checked. The #attribute* messages provide a Dictionary-like protocol for manipulating attribute nodes. Unlike the #elementAt:* messages, they match qualified names only, and attribute value accessors return empty strings if the attribute is absent. The underlying attribute node list can be accessed using #attributeNodes (copy before modifying if you don’t want to change the element’s attributes), and the names/values can be obtained as an (order-preserving) dictionary using #attributes. See the superclasses for more info. #+END_EXAMPLE *** XPath Core accessing #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self firstNode namespaceNodes. “yes, it does” #+END_EXAMPLE *** XPath core enumerating #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self firstNode namespaceNodesDo:[:each | Transcript show: each.]. #+END_EXAMPLE *** XPath core private ommitting *** XPath Core testing #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self firstNode hasExpandedName. self firstNode hasNamespaceNodes. #+END_EXAMPLE *** accessing #+BEGIN_EXAMPLE note: If you would like to add some attributes for experimentation, use these: self firstNode attributeAt:’lang’ ifAbsentPut:’en’. self firstNode attributeAt:’encoding’ ifAbsentPut:’UTF-8’. self firstNode attributeAt:’biz’ ifAbsentPut:’baz’. self firstNode attributeNodes. #+END_EXAMPLE #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self firstNode attributeAssociations. self firstNode attributeAt:’lang’. self firstNode attributeAt:’encoding’ ifAbsent:[]. self firstNode attributeAt:’zert’ ifAbsentPut:’zounds’. self firstNode attributeAt: ‘lang’ put:’rus’. self firstNode attributeNames. self firstNode attributeNodeAt:’zert’. self firstNode attributeNodeAt:’lang’ ifAbsent:[]. self firstNode attributeNodeAt:’lang’ namespaceURI:’defaultNS’. self firstNode attributeNodeAt:’lang’ namespaceURI:’defaultNS’ ifAbsent:[self inform:’not’]. self firstNode attributeNodes. self firstNode attributes. self firstNode expandedName. self firstNode localName. self firstNode name. self firstNode name:’dude_index’. self firstNode name:’wx_station_index’. self firstNode name:’dude_index’ namespaceURI:’dude’. self firstNode expandedName. self firstNode name:’wx_station_index’ namespaceURI:”. self firstNode expandedName. self firstNode namespaceURI. self firstNode nextElement. self firstNode prefix. self firstNode prefix:’dude’. “what is this for?” self firstNode previousElement. self firstNode sortKey. #+END_EXAMPLE *** copying #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self firstNode postCopy #+END_EXAMPLE *** defaults #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self firstNode attributeListClass #+END_EXAMPLE *** enumerating #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self firstNode allElementsDo:[:each | Transcript show: each name; cr]. self firstNode attributeNamesAndValuesDo:[:n :v | Transcript show: n, ‘->’ ,v ; cr]. self firstNode attributeNamesDo:[:each | Transcript show: each name; cr]. self firstNode attributeNodesDo:[:each | Transcript show: each name; cr]. #+END_EXAMPLE *** initialization ommitting *** namespacing ommitting *** notifying ommitting. *** printing #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self writeXMLOn: aWriter. “need example” #+END_EXAMPLE *** private ommitted. *** removing #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self firstNode removeAttribute:’biz’. self firstNode removeAttribute:’biz’ ifAbsent:[self inform:’no biz’]. self firstNode removeAttributeNode: (self firstNode attributeNodeAt:’encoding’). self firstNode removeAttributeNode: (self firstNode attributeNodeAt:’biz’) ifAbsent:[self inform:’no biz’]. self firstNode removeAttributes. #+END_EXAMPLE *** testing #+BEGIN_EXAMPLE (XMLDOMParser parseURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’) explore. self root declaresDefaultNamespace self firstNode declaresDefaultNamespace. self firstNode declaresPrefix: ‘xml’ uri: ‘defaultNS’. self firstNode hasAttributes. self firstNode hasID:’123’. self firstNode hasNamespaceURI. self firstNode hasNamespaces. self firstNode hasPrefix. self firstNode includesAttribute:’lang’. self firstNode includesAttributeNode:’lang’. self firstNode includesAttributeNode:’encoding’ namespaceURI: ‘ns’. self firstNode isDeclaredPrefix:’xml’. self firstNode isDeclaredPrefix:’xml’ uri:’ns’. self isElement. self firstNode isElement. self firstNode isElementNamed:’wx_station_index’. self firstNode isElementNamedAny:{‘credit’ . ‘wx_station_index’}. self firstNode isInLanguage:’en’. self firstNode isNamed:’wx_station_index’. self firstNode isNamedAny:{‘credit’ . ‘wx_station_index’}. self firstNode isRoot. #+END_EXAMPLE *** visiting Not sure how this is used yet. #+BEGIN_EXAMPLE acceptNodeVisitor: aNodeVisitor ^ aNodeVisitor visitElement: self #+END_EXAMPLE ** Interoperability Fix #+BEGIN_EXAMPLE The examples in this document use the following to retrieve the XML and Parse it. (XMLDOMParser onURL: ‘https://w1.weather.gov/xml/current_obs/index.xml’ upToLimit:nil) parseDocument; explore. If that does not work for you then the below should: |tree url | url := ‘https://w1.weather.gov/xml/current_obs/index.xml’. tree := (XMLDOMParser on: (HTTPLoader default retrieveContentsFor: url) contents ) parseDocument. tree explore. For the adventurous, a fix is to edit. and add the two top lines. (the second is commented out, but may come in handy some day) XMLHTTPWebClientRequest>>basicSend self webClientClient userAgent ifNotNil:[:ua | webClientRequest headerAt: ‘User-Agent’ put: ua]. ” self webClientClient contentDecoders ifNotNil: [:decoders | webClientRequest headerAt: ‘Accept-Encoding’ put: decoders].” ^ self responseClass request: self webClientResponse: (self webClientClient “#sendRequest: unfortunately requires #initializeFromUrl: to be sent first” initializeFromUrl: self url; sendRequest: self webClientRequest) That was a patch I made and I am awaiting approval from the package maintainers. #+END_EXAMPLE ** Bibliography [XMLNodeToXMLDocumentXMLNodeBibliographyHelp] *** Sources #+BEGIN_EXAMPLE http://books.pharo.org/booklet-Scraping/pdf/2020-02-04-scrapingbook.pdf https://stackoverflow.com/questions/132564/whats-the-difference-between-an-element-and-a-node-in-xml #+END_EXAMPLE