Skip to content

Latest commit

 

History

History
35 lines (31 loc) · 3.58 KB

guestbook_frances.md

File metadata and controls

35 lines (31 loc) · 3.58 KB

Frances Harrington

Hi! This is my project where I will be analyzing documents scraped from the Latin Library website.

Visit (Emily)

  • I like the whole idea of your project, I think it is a really interesting idea. I also like how you commented your spider script so it was easy to follow
  • I know this is a work in progress, but it would be nice to see a notebook with a collection of the data.
  • I learned a lot about building a spider, and also that there are an awful lot of declensions in Latin! Response:
  • Thank you for your comments! Yeah, Latin has 5 (6 if you count the vocative) cases!

Visit (Emma)

  • Based off of the progress report, project plan, and notebook, you definitely seem to know what you’re working with/what you’re looking to get out of the data
  • The data frame might be a little easier to read if the data was cleaned up a little (removing the ‘\n’ and ‘\t’), unless you want those in there for a reason!
  • I honestly didn’t even think of making a scraper on Jupyter notebook (I made mine using command line), so this was definitely helpful to see in action. Response:
  • Thanks for your response! I did end up cleaning the data (you're right, it makes it difficult to read). I didn't even think about building a scraper through command line, that's so neat!

Visit (Sonia)

  • Your code is really easy to follow since you show plenty of samples of data structures and other objects as you're working with them. It's easy to just skim through and glance.
  • I was wondering, is there a way you're going to automatically label the works in your dataset for what era and genre they are? And similarly with metric conformity in the text as mentioned in your project plan. It might be time-consuming to do this manually unless you're just planning to pick a few examples.
  • For some reason I always imagined web crawling/scraping spiders as being really slow, so it was good/surprising to see yours only taking about 10 seconds. Response:
  • Thank you for your response! Most of the authors pages are labeled with the time period they wrote in, so I think I’ll only have to manually fill in maybe 5 tops. The other stuff I’m working on trying to not do manually and I might try out some super basic machine learning to differentiate between verse and prose. Visit (Abby)
  • Your overall plan/idea is extremely precise and pointed (get it??). I think by focusing on one word (or synset, since there are 5 separate words for it), you'll be able to draw some very interesting and definitive conclusions from the data.
  • I'm a little confused by the choice to do dataframe index separated by a "|" for each declension. Wouldn't it save a lot of typing to use for-looping and dictionaries?
  • I had no idea that a "|" is just an "or" operator". I'm sure I learned it at one point, then forgot it somewhere along the way. Response:
  • Thanks for your comments! For-looping probably would be less time consuming but I just chose to use the | because I remembered it from doing regular expressions.

Visit (Michael)

  • I like how specific this project is. It makes everything very easy to follow as it is very clear what the data is, what is being looked at, and what the analysis will cover. Good job!
  • I'm a little confused why ne_df is seperate from latlib_df, as it seems to contain some relevant info for a text. I recommend combining those, though it might mean altering the spyder a touch.
  • I can appreciate the effort put in to capture all the declensions in the text as that seems like a nightmare to deal with!