Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Malformed URL Exception #1067

Closed
pcolmer opened this issue May 23, 2017 · 0 comments
Closed

Malformed URL Exception #1067

pcolmer opened this issue May 23, 2017 · 0 comments
Milestone

Comments

@pcolmer
Copy link

pcolmer commented May 23, 2017

Some of our sites are accessible via both http and https. In order to avoid browsers complaining about mixed content, references within pages miss off the http/https bit in order to let the browser figure that out by itself.

However, fess then doesn't cope:

2017-05-23 00:20:44,314 [Crawler-20170523000000-8-3] INFO  Crawling URL: https://www.linaro.org/downloads/
2017-05-23 00:20:44,949 [Crawler-20170523000000-8-3] WARN  Could not parse anchor tags.
java.net.MalformedURLException: no protocol: //www.linaro.org/downloads/
	at java.net.URL.<init>(URL.java:593)
	at java.net.URL.<init>(URL.java:490)
	at java.net.URL.<init>(URL.java:439)
	at org.codelibs.fess.crawler.transformer.FessXpathTransformer.getAnchorList(FessXpathTransformer.java:584)
	at org.codelibs.fess.crawler.transformer.FessXpathTransformer.putAdditionalData(FessXpathTransformer.java:320)
	at org.codelibs.fess.crawler.transformer.FessXpathTransformer.storeData(FessXpathTransformer.java:171)
	at org.codelibs.fess.crawler.transformer.impl.HtmlTransformer.transform(HtmlTransformer.java:120)
	at org.codelibs.fess.crawler.processor.impl.DefaultResponseProcessor.process(DefaultResponseProcessor.java:77)
	at org.codelibs.fess.crawler.CrawlerThread.processResponse(CrawlerThread.java:330)
	at org.codelibs.fess.crawler.CrawlerThread.run(CrawlerThread.java:176)
	at java.lang.Thread.run(Thread.java:745)
@marevol marevol added this to the 11.2.0 milestone May 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants