-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Review Ticket: Working with batches of PDF files #258
Comments
Thanks @maehr for this submission 👋 👋 |
Hi @maehr , before posting my initial feedback on the tutorial, may I ask you on the lesson's layout? You 've used a different template to generate the lesson's preview? there are a number of formatting issues related to this, as you can see, and it might worth correcting them before moving to the next phase of peer review, as it is really hard to read (esp the last sections). Cheers! |
Hi @amsichani |
'Text recognition in PDF files' and til the end is really messed up - v difficult to even read it. |
I removed alert divs with inline code / code blocks and yaml errors; hopefully it helps. I cannot build the jekyll site locally because of parsing errors of other lessons.(The repo gh pages are not rebuild at the moment, so the change is only visible over here https://github.com/programminghistorian/ph-submissions/blob/gh-pages/lessons/working-with-batches-of-pdf-files.md ) |
@amsichani please refer to the updated guidelines: https://programminghistorian.org/en/editor-guidelines#3-add-yaml-metadata-to-the-lesson-file unfortunately what Adam had posted up briefly included a lot of square brackets |
The issue with formatting going on after Text Recognition in PDF files is that you try to put markdown inside the HTML block of the |
Many thanks @mdlincoln for this! @maehr could you amend this bit so we can have a clear reading version of the preview of the lesson ? |
Also, we're going to use this editorial process to help familiarize a newer member of the editorial team with the process and workflow. So @fdlaramee will be shadowing along as I work with you. |
To my knowledge, I changed everything accordingly. Jekyll builds locally without warnings. Please tell me, if any other problem pops up or if I forgot to fix a problem. |
@mdlincoln In the lesson template the endnote formatting is invalid. It is like this: #### An End Note:
This is some text.[^1]
This is some more text.[^2]
##### Endnotes
[^1] Properly formatted citation using Chicago Manual of Style
[^2] Properly formatted citation using Chicago Manual of Style Should be like this, with #### An End Note:
This is some text.[^1]
This is some more text.[^2]
##### Endnotes
[^1]: Properly formatted citation using Chicago Manual of Style
[^2]: Properly formatted citation using Chicago Manual of Style |
Hi @maehr ,
Let me know if there is anything unclear. Given that my remarks are minor, we could try for a quick turnaround. Once you have made these revisions, I could then contact reviewers and move things forward. |
solved most of the issues noted in #258 (comment)
Hi @amsichani Thanks for your feedback. I fixed everything mentioned above, retro-digitsation (which is a quite literal translation from the German Retrodigitalisierung) and images included. I found another little issue within the YAML frontmatter of the lesson template. The original: LEAVE BLANK
review-ticket: LEAVE BLANK
difficulty: LEAVE BLANK
activity: LEAVE BLANK
topics: LEAVE BLANK
abstract: LEAVE BLANK |
Fantastic @maehr ! I will now try to contact reviewers for your lesson and I ll get back to you here once I have some news . Stay tuned! |
@maehr and @amsichani, thank you for the opportunity to review this lesson in advance. It will be a great addition to The Programming Historian. It provides a really nice walkthrough of the various steps a researcher might take when working with text files. I especially appreciated that it was structured around a case study since that informed what the driving goal was for each of the steps. The code snippets are also concise and accessible and will be terrific to have on hand. For the notes that follow, I included my top 3 thoughts/suggestions first, after which I listed light edits or error messages I received. I would be happy to clarify or discuss any of them. For reference, I was using a computer with MacOS Mojave.
|
As per the guideline I include a summary of my main observations, followed by any point specific observations, edits, points. Thank you very much for this opportunity. Summary: Navigating file structures: Topic Modelling:
'Each word has a probability to belong to one or more topics. The algorithm finds the corresponding probabilities of the individual words.' Technically all words appear in every topic with some probability but are higher in others and therefore define a topic.
|
Thanks @jackpay for this! |
Thank you all very much. I agree, a summary is not necessary. I will have uploaded a revised version by Monday 2.12.2019 at the latest. |
Fantastic @maehr ! Looking forward to the edited version of the lesson . If you have any questions , please don't hesitate to ask me here, or if there is anything you need to clarify with the reviewers. |
Hi @amsichani In P12 I tried to help the user navigate the filesystem. Do you expect more specific explanations? P37 and P39 P39 ILO got back to me with more specific questions. Hopefully we can put the dataset on Zenodo. I should be able to get a definitiv answer before Christmas. My ORCID is 0000-0002-1367-1618. |
Fantastic @maehr ! Zenodo should work fine and we are currently exploring this option for hosting large assets. , so this might be a interesting case study for us. |
Hi @svmelton , Here are the lesson files you'll need:
Please note that there isn't an asset folder for this lesson ; instead we are still waiting for a dataset to be deposited on Zenodo. @maehr will let us know when is up and I guess he will also need to update the lesson accordingly. also note:
Let me know if I'm missing anything. |
Hi @amsichani—I'm just waiting on the dataset, and then we can move forward with publication. Thanks! |
@svmelton The ILO got back to me and I was able to publish the dataset on Zenodo https://doi.org/10.5281/zenodo.3582736. I added the link to the dataset to the lesson. IMO we can move forward. PS: Sorry, my commit message closed the issue automatically . |
Excellent! I'll work on it this weekend and ping y'all if I need anything. |
Hi all—I've just run through the lesson, and it looks good! @amsichani—we're just missing a bit of metadata (reviewers, editors, review ticket, difficulty, activity, topics, abstract, avatar_alt). |
Happy New Year everyone! Many thanks for the heads up @svmelton ! I m now working on these -- @maehr could you provide me a small lesson's abstract / description (have a look here https://programminghistorian.org/en/lessons/ )? |
@amsichani I have tried to capture the essence. Feel free to enhance or correct my version. Learn how to perform OCR and text extraction with free command line tools like Tesseract and Poppler and how to get an overview of large numbers of PDF documents using topic modeling. |
avatar_alt = the title of the avatar image of the lesson once the image will be selected by the editor (as per this PR) |
Hi @svmelton , all metadata is now in place. |
Thanks so much, @amsichani! Exciting news: we'll be able to pilot our new external copyediting with this piece! We're getting it set up, but I'll let you know ASAP when we have a timeline. Thanks for everyone's patience; I'm excited to have this piece as our first professionally copyedited publication! |
Thanks for your patience everyone. The copyeditor has now had a chance to look through the text and make suggestions based on the styleguide. I've attached the PDF with comments to this ticket. Her instructions include:
This is our first copyedited lesson, so I think the best thing is for @amsichani and @maehr to incorporate the suggestions and discuss between themselves anywhere they disagree or need further conversation. Once you're both happy you can proceed with the rest of the publication process. |
Hi @amsichani and @acrymble I really love the change requests made by the copyeditor. As a non native speaker this is a blessing! I corrected everything according to the notes. The last section (Mueller Report) needs some more attention. Thanks a lot
Hi @amsichani and @acrymble I really love the change requests made by the copyeditor. As a non native speaker this is a blessing! I corrected everything according to the notes. The last section (Mueller Report) needs some more attention. |
Fantastic! I will work on this over the next couple of days and ping you if I have any questions. :) |
And we're published! Thanks to everyone for your work, I'm excited to see this live! |
The Programming Historian has received the following proposal for a lesson on 'Working with batches of PDF files' by @maehr. This lesson is now under review and can be read at:
http://programminghistorian.github.io/ph-submissions/lessons/working-with-batches-of-pdf-files
Please feel free to use the line numbers provided on the preview if that helps with anchoring your comments, although you can structure your review as you see fit.
@amsichani will act as editor. Her role is to solicit two reviews from the community and to manage the discussions, which should be held here on this forum.
Members of the wider community are also invited to offer constructive feedback which should post to this message thread, but they are asked to first read our Reviewer Guidelines (http://programminghistorian.org/reviewer-guidelines) and to adhere to our anti-harassment policy (below). We ask that all reviews stop after the second formal review has been submitted so that the author can focus on any revisions. I will make an announcement on this thread when that has occurred.
I will endeavor to keep the conversation open here on Github. If anyone feels the need to discuss anything privately, you are welcome to email me. You can always turn to @amandavisconti if you feel there's a need for an ombudsperson to step in.
Anti-Harassment Policy
This is a statement of the Programming Historian's principles and sets expectations for the tone and style of all correspondence between reviewers, authors, editors, and contributors to our public forums.
The text was updated successfully, but these errors were encountered: