-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send data URI as base64 #22
Conversation
this prevents invalid URI characters from messing with page rendering, as seen in westmonroe#21
@@ -19,7 +19,7 @@ let convertHTMLToPDF = async (html, callback, options = null, puppeteerArgs=null | |||
} | |||
|
|||
if (remoteContent === true) { | |||
await page.goto(`data:text/html,${html}`, { | |||
await page.goto(`data:text/html;base64,${Buffer.from(html).toString('base64')}`, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrfrase3 This would be great functionality to include in pdf-puppetteer. Two comments on the PR. First, base64-encoding should be an opt-in feature so that users can explicitly determine how their html should be intepreted. Base64-encoded strings can get rather long and some users may not be willing to accept that tradeoff. Secondly, adding a unit test to this feature would be valuable to ensure that these two methods for creating a pdf return the same results. I wrote up a sample unit test below. Feel free to tweak if you have another way to test this functionality and let me know if you have any comments or questions regarding my feedback.
Thanks
- Updated function signature:
convertHTMLToPDF = async (html, callback, puppeteerOptions = null, puppeteerArgs=null, remoteContent=true, options = {})
then in the body, something like this could be done:
if (remoteContent === true) {
if(options.base64Encode){
console.log('base 64 encoded');
await page.goto(`data:text/html;base64,${Buffer.from(html).toString('base64')}`, {
waitUntil: 'networkidle0'
});
}
else {
console.log('regular');
await page.goto(`data:text/html,${html}`);
}
}
- Sample unit test:
test('Converts remote Base64 encoded HTML To PDF with Puppeteer Arguments', async done => {
let base64 = null;
let reg = null;
let options = { 'base64Encode': true };
let t1 = convertHTMLToPDF(remoteTemplate, pdf => {
expect(Object.prototype.toString.call(pdf)).toBe('[object Uint8Array]');
base64 = pdf;
}, {format: 'A4'}, {
args: [
'--no-sandbox',
'--disable-setuid-sandbox'
]
}, true, options);
let t2 = convertHTMLToPDF(remoteTemplate, pdf => {
expect(Object.prototype.toString.call(pdf)).toBe('[object Uint8Array]');
reg = pdf;
}, {format: 'A4'}, {
args: [
'--no-sandbox',
'--disable-setuid-sandbox'
]
}, true. options);
await Promise.all([t1, t2]);
expect(base64.length).toBe(reg.length);
let result = true;
for(let i = 0; i < base64.length; i++){
if(base64[i] !== reg[i]){
result = false;
}
}
expect(result).toBe(true);
done();
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @cr7boulos,
When you say base64 is long, do you mean time-wise or storage space-wise? Storage space is effectively irrelevant for sending to a local process, and the increase in size is ~1.37x the original.
The # error is an invisible problem, this took me hours of debugging to track down. At the end of the day, HTML can contain illegal characters for a URI, so your default should always operate on the assumption that it does contain those characters. One could implement something that scans and sends a warning, but that would be costly. If someone wants to take on the risk for a speed up, then they should knowingly enable that.
I'm pushed on a deadline at the moment, so I can't devote more time to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I am a new maintainer of the package so still learning how the code works. Please disregard my previous comments. I will approve and push out an updated version to npm. Thanks for submitting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the comment on the PR.
@@ -19,7 +19,7 @@ let convertHTMLToPDF = async (html, callback, options = null, puppeteerArgs=null | |||
} | |||
|
|||
if (remoteContent === true) { | |||
await page.goto(`data:text/html,${html}`, { | |||
await page.goto(`data:text/html;base64,${Buffer.from(html).toString('base64')}`, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I am a new maintainer of the package so still learning how the code works. Please disregard my previous comments. I will approve and push out an updated version to npm. Thanks for submitting.
this prevents invalid URI characters from messing with page rendering, as seen in #21