-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out which images we are missing on AWS and copy them up #158
Comments
I did a very rough estimate based on the progressbar currently loading cardimages for guide cards, and I got 4,375,000. I think the original number reflects the number of images for subguide cards, so the total may be something like 5.5 - 6 million. |
|
|
deploy@imagecat-staging1:/var/tmp/imagecat$ for i in {1..22}; do echo disk${i}; find disk${i} -type f -name "*.tiff" | wc -l ; done |
from aws: |
There's a report of all the content that's in the S3 bucket which will get generated to the dls-transfer bucket, but it might take up to two days to generate. After that it's daily. |
Ran the following, ignore the files = Dir.glob("tmp/*.csv")
all_lines = files.flat_map do |f|
File.read(f).split("\n").map do |entry|
corrected = entry.split(",").last.gsub("imagecat-","").gsub('"', "").gsub("-", "/").gsub(/tif$/, "tiff")
"./#{corrected}"
end
end; nil
set = Set.new(all_lines); nil
local_files = File.read("tmp/file_list.txt").split("\n"); nil
local_files = Set.new(local_files); nil
diff = local_files - set; nil 6,557 missed files. |
Those 6,557 have been uploaded to puliiif-production and puliiif-staging via #192. There were some weird permission things (files in tmp were owned by pulsys, but I had to run commands as deploy) so I had to do some permission stuff to get it to upload, but they're there now. |
Ah - the files weren't missing, they just aren't called |
The reason is there are There's also a
|
The readme says there are ~1.5 million images. This is probably just the images for the subguide cards. Figure out how many there actually are, and update that. As part of this, validate that we successfully got all the images
Do all of the following, numbers should align:
CardImage.count
.If we do all 3 of these we can validate that we successfully copied everything to AWS and then successfully loaded all of those into the prod database.
If those 3 counts are off, figure out what's missing and get it copied / loaded.
The text was updated successfully, but these errors were encountered: