-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFP Flowifier – HTML to HtmlFlow translator #43
Comments
@lcduarte What about using JSoup for parsing? I'll have to write this kind of tool as I plan to use HtmlFlow to replace my blog (with hundreds of articles) and the website of my game by a single static website. JSoup can sanitize and output tidy HTML. The DOM traversal would allow to go through all nodes and elements to produce their HtmlFlow equivalents. I plan to leave Github, would you accept a patch? Is there any mean to contact you both outside Github? |
@gouessej Thanks for your proposal. I agree that Jsoup may be the best option for parsing. I am not used to deal with patch life cycle and coexistence with further releases of the main repository. A pull request looks a better option. I am not sure if I have understood how do you plan to integrate your contributions!? You can always contact me out of Github to my outlook.com address with username miguelgamboa. Thanks |
@fmcarvalho You're welcome. Yesterday, I switched my website from HTML 4 to HTML 5 in order to work on converting some clean HTML 5 source code into "flowified" Java source code first. I suggest to put the Flowifier into a dedicated package "flowifier" in this project as a first step. When the project is a bit more mature, we could move it into a separate project in order to avoid adding JSoup into the dependencies of HtmlFlow (it's a dependency only in the "test" scope right now) whereas only its flowifier will really need it. Ok, I'll try making a pull request before leaving Github. By the way, I suggested to use JSoup instead of DOM, JDOM, Sax, Stax, etc because the input is in HTML format, not in XHTML. |
Yes, it seems to be the better choice.
Yes, you are right. |
@fmcarvalho The first blueprint produces some valid Java source code. I'll test it on some much more complicated webpages. |
@gouessej WoW you were fast! Great! Your approach sounds nicely. From my analysis of your code I have only a few questions/suggestions:
|
@fmcarvalho I plan to improve the lifecycle of the visitor so that I can safely accept an object instead of accepting a class and creating an instance. My first try was a bit overcomplicated. I have improved the look of the code, I'll look at javapoet. Good idea, I'll return an HtmlView. Ok, I'll move it into htmlflow.flowifier. |
@fmcarvalho Now I use an object instead of a class, I return an HtmlView, the test is in a separate class and I support partial conversion (conversion starting from a node instead of the whole document). I have some problems with comments and scripts. When they're fixed, I'll be able to convert Wikipedia's homepage. I don't use Javapoet yet but it might help to produce prettier Java source code. |
@fmcarvalho Only a very few content types are currently supported :s |
@fmcarvalho Please can you clarify which elements need end tags in HtmlFlow? It would help to implement this method correctly. I borrow 2 methods of javapoet. I still need to pass Long instances to attrWidth and attrHeight and to use addAttr for unknown attributes. |
Hello, The elements without end tags are the following:
All the remaining elements should have end tags. I haven't had much time to analyze your work but from what I've seen it's on the right track, thanks for your help :) |
@lcduarte Thank you. Eclipse's workbench often crashes when the resulting Java class is opened. I'll take your suggestion into account as soon as possible. |
I had the same issue myself with IntelliJ. Since HtmlFlow relies heavily on types to enforce the language rules the IDEs sometimes have some issues validating the multiple chained calls because of the depth of HTML tree. The only solution I've found to that was to extract variables every now and then, which eases the load from the IDE. |
@gouessej I will try to check your work tomorrow and return some feedback. Thank you |
So does Flowifier expect a well formed html5 document as input? Does the document need to pass strict validation first? If yes, does that mean it would fail to run on html files built to use Mustache, Handlebars, Thymeleaf, FreeMarker, and Velocity? |
@bwfrieds I think so. This is still under development. And for a first release I would not include support for template specific tags. Since this current proposal is using JSoup to parse the HTML input document I am not aware if it is hard or not to add that feature. But from a quick search in Stackoverflow it looks that JSoup has not seamless support for template engine tags. In your case, I would plan to use only the HTML resulting from your template engines to be used via Thanks |
@bwfrieds Yes but you can use an HTML4 document, use JSoup to clean the mess and pass it to the flowifier but it would fail on template specific tags. |
@gouessej Great! I took a look to your branch patch-1. It is very nice, organized and very well commented. I see that you are still not using Javapoet or any similar tool. I tried to manually compile the resulting It would be good if your Unit test asserts the result of Ideally the unit test should compile at run-time and load the resulting |
@fmcarvalho I use 2 methods of Javapoet. I'm not against using it later. Yes, it's a known limitation, the same happens with Eclipse. Yes, it's an excellent idea, I can use javax.tools.ToolProvider. I have to see how to pass the dependencies to the Java compiler task: |
@gouessej Regarding JavaPoet I was thinking that you could use its Taking a closer look into I do not sympathies either with the I think you can move this array into a static field. Try to create more auxiliary methods with intuitive names that suggest the purpose of their actions. For instance, it is hard to read: https://github.com/gouessej/HtmlFlow/blob/patch-1/src/main/java/htmlflow/flowifier/AbstractHtmlToJavaHtmlFlowNodeVisitor.java#L216 It would be easier to understand if you had invoked a method like Thanks, |
@fmcarvalho Thank you for your feedbacks, I followed your advice, it's easier to read now and there is no longer any I haven't had enough spare time to look at JavaPoet, is it this method you were talking about? |
@fmcarvalho Now the unit test check if the generated Java class can be successfully compiled. I get this exception trace when testing the flowifier on Wikipedia's homepage:
|
@fmcarvalho I've just implemented the comparison between the original HTML and the generated HTML, disabling the indentation doesn't seem to work, I call |
@fmcarvalho I've just found my mistake, I have to call |
It's now possible to enable/disable indentation. |
I've just fixed this bug, I had to escape the content of the text nodes. To sum up, I have to remove the nasty warning message caused by the use of the reflection API to modify the class loader and I have to work around the StackOverflowError occurring in the compiler by splitting the generated Java source code into small pieces. Then, I'll be ready to make a pull request (except if I decide to spend some time in using Java Poet or if I find some new bugs). The first JUnit test passes :) |
I've just got rid of the warnings :) The last remaining bug is the crash occurring when there are too many chained method calls. |
I just have to use a more recent version of OpenJDK, I suspect that it's fixed in OpenJDK 11.0.5: I'll give it a try as soon as possible. |
@gouessej WoW you have been working hard. I just checked out your branch and I see that |
@fmcarvalho Thank you for the compliment :) The second JUnit test still crashes, it's possibly coming from an unfixed compiler bug, I tested with OpenJDK 11.0.5 yesterday, I obtained the same stack trace (except the version number of course). I'll provide a small Maven project with a single Java class to reproduce this bug, I'll fill a bug report against OpenJDK. If it's not enough, I'll try to write a smaller reproducer, some kind of SSCCE. In my humble opinion, the safest option is to fix the bug in the compiler instead of implementing a dirty kludge in our project. While I don't know exactly why it crashes, implementing a workaround only risks to drive the source code harder to read and to maintain on the long term, it would be like using a solution without knowing the problem to solve. |
@gouessej I think the bellow tips may help. Before that, I am not sure if that compilation behavior is a BUG but a feature instead. I made the following observations:
I changed a little bit this assertion statement to the following to figure out where the assertion is failing: Iterator<String> actual = Arrays
.stream(generatedHtmlSourceCode.split("<"))
.iterator();
Arrays
.stream(originalHtmlSourceCode.split("<"))
.forEach(expected -> {
Assert.assertEquals(
expected.replaceAll("\\s", "").replaceAll("\\h", "").replaceAll("\\v", "").toLowerCase(),
actual.next().replaceAll("\\s", "").replaceAll("\\h", "").replaceAll("\\v", "").toLowerCase());
}); It fails on the following comparison:
|
@gouessej BTW we may add the following configuration on <plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<argLine>-Xss16m</argLine>
</configuration>
</plugin> |
@fmcarvalho Thank you very much. Maybe I have to escape the content of DataNode in order to turn |
@fmcarvalho The second JUnit test passes :) I'll just fix the indentation in the classes I added into the project and I'll make a pull request. Thank you for your valuable help. Please let me know which branch I have to use as a target of the pull request, master or development? |
@gouessej please provide your Pull Request to the development branch. Only a few points:
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<argLine>-Xss16m</argLine>
<systemProperties>
<property>
<name>java.util.logging.config.file</name>
<value>src/test/resources/logging.properties</value>
</property>
</systemProperties>
</configuration>
</plugin>
Thank you, |
I'll make a request for enhancement against JSoup as I need a public method to escape characters inside an attribute. |
Done: jhy/jsoup#1278 |
@gouessej I ran sonar on flowifier and there a couple of code smells. I have pushed results to sonarcloud. You may check it here: https://sonarcloud.io/project/issues?id=com.github.xmlet%3Ahtmlflow&resolved=false&sinceLeakPeriod=true&types=CODE_SMELL |
@fmcarvalho The first code smell comes from Java Poet and is very easy to fix by using some conditional instructions (it will become easier to read even for me), the others are quite trivial to fix too. I'll try to move the management of attributes into a dedicated method to reduce the complexity. This one requires the intervention of the JSoup's maintainer(s): |
@fmcarvalho Done: #52 |
@fmcarvalho I'll have to fix a tiny bug in the comparison of actual and expected results, I think that it doesn't work when the number of lines differs. |
RFP Flowifier – HTML to HtmlFlow translator: issue #43
@gouessej I have already merged your pull request to development branch. Because I am moving my development environment to my new laptop I need to reconfigure GPG and other stuff. So, I will deploy a new release only on next week. For now I am planning to close this Issue which has already tackled many different aspects. Any new bug or feature should be opened on new issues. Thank you, |
@fmcarvalho JSoup's maintainer hasn't replied yet, I might have to provide a less clumsy workaround, preferably before the end of the week. |
In order to setup an HtmlFlow template it would be useful to have a translator tool – Flowifier – that is able to convert an HTML document into the equivalent template definition with HtmlFlow.
This RFP aims to discuss implementation approaches and requirements for Flowifier.
Some of the topics under discussion (not limited):
The text was updated successfully, but these errors were encountered: