Skip to content
This repository has been archived by the owner on Oct 22, 2022. It is now read-only.

HTMLFileContent and component for extracting IDL fragments #46

Merged
merged 27 commits into from
Jun 27, 2017

Conversation

m-cheung
Copy link
Contributor

@m-cheung m-cheung commented May 30, 2017

This work is towards #41. This PR is dependent on foam-framework/foam2#415 and foam-framework/foam2#423

@m-cheung m-cheung requested a review from arobins May 30, 2017 18:27
});
} else {
// Currently not doing anything for:
// - Self closing tags (item.type.name === OPEN_CLOSE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OPEN_CLOSE that aren't pre


if (foam.core.FObject.isInstance(item)) {
var top = tags[tags.length - 1];
if (top === undefined || item.type.name === OPEN) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need the top defined check. Only open tags should be pushed, so if you somehow ended up with an empty stack and either an extra close tag (or plain text, if parseString returns text), not pushing is probably the right thing to do.

});

it('should parse a pre tag with no content', function() {
var content = '<pre class="idl"></pre>';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing the parse step here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, this was actually one of the old test that I am no longer using. Will likely modify this test or remove in next commit

// Determine if node is of class IDL
var isIDL = false;
item.attributes.forEach(function(attr) {
if (attr.name === 'class' && attr.value.split(' ').includes('idl')) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't there something about having to scan back up the stack for a parent tag with the right class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this for a little and decided to implement it slightly differently.

Scanning the stack every time we have a potential tag we are interested in seemed like it could be very costly (e.g. when there is deep nesting of tags occurring). So I am using a variable (skipStack) to keep track of the level in the stack where these excluded tags are included. It itself behaves as a stack as we may potentially have nesting of excluded tags.

Please let me know if you see any flaws / problems with this approach.

var expectedContent = fs.readFileSync(`${testDirectory}/${filename}`).toString();
expect(preBlocks[testNum].content.trim()).toBe(expectedContent.trim());
} else if (filename !== 'spec.html') {
console.warn(`${filename} was not used in ${testName} spec test`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably fail the test.

@@ -0,0 +1,552 @@
typedef ([AllowShared] Uint32Array or sequence<GLuint>) Uint32List;
Copy link
Contributor Author

@m-cheung m-cheung May 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is currently causing problems in parsing. The W3C Grammar does seem to have a rule for [ExtendedAttributes] proceeding a type in a typedef definition. It is present in HeyCam's Grammar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the approach taken so far has been to extend the IDL parser for each source of IDL content if something is encountered that doesn't follow the spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment in Parser.js seems to imply that it was designed around HeyCam's Grammar. Perhaps there were changes to the spec / grammar between the time the Parser was written and the current specs. I am currently working on adding this change. Hoping to have a PR before the end of the day today (hopefully).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved by #47

@m-cheung
Copy link
Contributor Author

m-cheung commented Jun 7, 2017

Refactor is complete. The test will not pass until foam-framework/foam2#415 and foam-framework/foam2#423 are merged into the beta-1 branch.

@m-cheung m-cheung changed the title Adding HTMLFileContent Class for parsing HTML files HTMLFileContent and component for extract IDL fragments Jun 8, 2017
@codecov-io
Copy link

codecov-io commented Jun 8, 2017

Codecov Report

Merging #46 into master will increase coverage by 0.59%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #46      +/-   ##
==========================================
+ Coverage   94.32%   94.92%   +0.59%     
==========================================
  Files          81       84       +3     
  Lines         564      630      +66     
==========================================
+ Hits          532      598      +66     
  Misses         32       32
Impacted Files Coverage Δ
lib/org/chromium/webidl/IDLFileContents.js 100% <ø> (ø) ⬆️
config/files.js 100% <ø> (ø) ⬆️
lib/org/chromium/webidl/IDLFragmentExtractor.js 100% <100%> (ø)
lib/org/chromium/webidl/HTMLFileContents.js 100% <100%> (ø)
lib/org/chromium/webidl/URLExtractor.js 100% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 441f1c5...c93dec6. Read the comment docs.

@m-cheung m-cheung changed the title HTMLFileContent and component for extract IDL fragments HTMLFileContent and component for extracting IDL fragments Jun 8, 2017

if (!tagMatching) {
// Ignoring all tags. Only extracting text within pre tags.
if (isTag && item.nodeName === 'pre') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to handle nested pre tags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to say at this point whether nested pre tags affect the information we care about. From my current observations, there hasn't been any IDL fragments within nest pre tags (they seem to be mostly used for formatting). Thus, it seems like it currently is sufficient for our purposes (at least I hope so)

We could attempt to put the content through another round of processing or implement a proper HTML parser (which was my first attempt at this problem, but was scrapped since it did a lot more than it needed to and likely had other issues too).

tagStack.push(item);
} else if (top && item.type.name === CLOSE && top.nodeName === item.nodeName) {
var parentCls = extractAttr(top, 'class');
if (isExcluded(parentCls)) exclude = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for there to be an excluded tag inside an excluded tag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not yet observed an instance of an exclude tag nested within another, but there is a relatively simple fix to this issue (using a stack to track excluded tags instead of a bool), so that has been implemented. Will be in next set of changes.

@m-cheung
Copy link
Contributor Author

All requested changes should be made by now. @arobins please let me know if you have any feedback the latest set of changes and comments.

@arobins
Copy link
Contributor

arobins commented Jun 13, 2017

LGTM. @mdittmer should probably look over it as well, because I'm definitely not a FOAM expert.

class: 'String',
name: 'url',
required: true,
final: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add trailing comma

Copy link
Contributor

@mdittmer mdittmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm comfortable with this design once comments are addressed. Future alternative also proposed at #53

class: 'String',
name: 'url',
required: true,
final: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere: May not be able to get away with "final: true" anymore; DatastoreDAO (which we will use eventually) instantiates objects first, then sets properties. I think "final: true" will create setters that silently fail.

},
{
class: 'String',
name: 'content',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTMLFileContents.contents would be a more consistent name.

package: 'org.chromium.webidl',
name: 'HTMLFileContents',

documentation: 'An HTML file that stores it contents.',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More docs: Is the HTMLFileContents.contents pre-processed in any way? (E.g., &foo;-escaped?) or is it the raw request body?

foam.CLASS({
package: 'org.chromium.webidl',
name: 'IDLFragmentExtractor',
documentation: 'extracts IDL Fragments from HTML files',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Full sentence. (with capital and period)

var lexer = self.HTMLLexer.create();
var OPEN = lexer.TagType.OPEN.name;
var CLOSE = lexer.TagType.CLOSE.name;
var extractAttr = function(node, attrName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's legitimate to have:

<node-name attr="value1
    value2" attr="value3">

to yield {attr: ['value1, 'value2', 'value3']}

I assume HTMLLexer doesn't collapse whitespace, so I think we need to revise this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made minor changes to the extract code which allows for this. It will be part of the next set of changes.

// As of this writing, there has not been any IDL fragments
// that has been found within nested pre tags.
if (!tagMatching) {
// Ignoring all tags. Only extracting text within pre tags.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this comment. Can we get a comment at the top of each if branch in this method? The logic is pretty complex.

}
tagStack.push(item);
} else if (top && item.type.name === CLOSE && top.nodeName === item.nodeName) {
var parentCls = extractAttr(top, 'class');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't parent, right? It's openTag? Maybe the code would be easier to read if this branch started with:

var openTag = top;
var closeTag = item

and open and close prefixes are used to refer to tag-related things.

expect(file.content).toBe(content);
});

it('should fail to set HTMLFileContent props after creation', function() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably nix this due to how DatastoreDAO works.

@@ -0,0 +1,51 @@
// Copyright 2017 The Chromium Authors. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file name: we just test HTMLFileContents, yes? Let's name this file after that: HTMLFileContents-test.js.

var IDLFragmentExtractor;
var Parser;

function cmpTest(testName, testDirectory, expectedIDL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expectedIDL is a count/length, right? numExpectedIDLFragments?

Copy link
Contributor

@mdittmer mdittmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor things, and I think we can drop the test.

class: 'Array',
of: 'String',
name: 'references',
factory: function() { return []; },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the default factory for Array; you can leave it out.

},
{
class: 'Array',
of: 'String',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documentation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits: usually in order: class, of, documentation, name. Please uncomment documentation.

}
});
return retVal;
};

var results = lexer.parseString(self.file.content).value;
var results = lexer.parseString(this.file.contents).value;
if (!results) throw "IDL Parse was not successful.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: throw new Error(<msg>).

@@ -0,0 +1,28 @@
// Copyright 2017 The Chromium Authors. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't seem worthwhile. It amounts to checking that FOAM's create() implementation is correct.

If you intend to guard against mistakenly finaling something that may be set, we could switch the test to do foo.create(); foo.bar = 'bar'; expect(foo.bar).toBe('bar'), but if that's not what you meant to test, then I'd say we can drop this test entirely.

Copy link
Contributor

@mdittmer mdittmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please merge after addressing last nits.

@m-cheung m-cheung merged commit f5bdb2e into GoogleChromeLabs:master Jun 27, 2017
@m-cheung m-cheung deleted the htmlContent branch June 27, 2017 14:27
@m-cheung m-cheung mentioned this pull request Jun 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants