Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 8 - data.py #7

Open
TheStoneMX opened this issue Jul 14, 2019 · 10 comments
Open

Chapter 8 - data.py #7

TheStoneMX opened this issue Jul 14, 2019 · 10 comments

Comments

@TheStoneMX
Copy link

TheStoneMX commented Jul 14, 2019

Hi there,
I was trying to run the code but it does not run, in line 41 you are looking for

image_names = [p for p in os.listdir(images_path)if p.startswith('cut_') and p.endswith('.png')]

But there are no png images in the rep that I downloaded from Kaggle, all the images are in jpeg format.
and in the list you build in :

for im in image_names:
if im.endswith('.jpeg') and not im.startswith('cut_') and not 'cut_' + im in image_names:
raw_images.append(im)

Does not get used at all the raw_images ....

I am trying to understand why you are looking for 'cut_' there is no image that starts or ends with 'cuts_'

Can you please help me get a working version.

Thanks.
Oscar.

@miaecle
Copy link
Contributor

miaecle commented Jul 14, 2019

Hi Oscar,
Thanks for raising the issue! So in the pipeline we first read all raw images (those without cut_ prefix and in the format of jpeg) and do a preprocessing step of cutting. This is done by calling cut_raw_images function (line 35) and it will generate cut_*.png, which are read in the next step. Let me know if you have any problems with this step.
Best,
Michael

@TheStoneMX
Copy link
Author

TheStoneMX commented Jul 15, 2019

Hi Miaecle,

Thanks for your quick response! this is the problem I am having.....

image_names = [ p for p in os.listdir(images_path) if p.startswith('cut_') and p.endswith('.png') ]
-- it returns Zero --

-- image_name -- before
image_names_before

-- image_name -- after image_names = [ p for p in os.listdir(images_path) if p.startswith('cut_') and p.endswith('.png') ]

image_names_after

and you can see in the lower right pane that 'cut_raw_images' is reading all the images

Hope you can help me.

Thanks,
-Oscar-

@TheStoneMX
Copy link
Author

Hi There,

I was looking into it more, and this is what I found, is there anything I am missing? look at the screenshot
code_never_executed
and the code and the left-top pane to see the variables names.

I hope you can help me run this code, I am very interested in the DeepChem library.

Thanks,
-Oscar

@miaecle
Copy link
Contributor

miaecle commented Jul 15, 2019

@TheStoneMX Oh I think I find where might go wrong. I don't have the codes with me right now but I will try to run the codes later today. Can you check if there is a new folder called cut under your path for raw images? All the preprocessed images (cut_*.png) might be stored there. If you move these into their root folder (the same folder for raw images) it might run.

@TheStoneMX
Copy link
Author

@miaecle , Thanks for the response, there is a cut directory, but there is nothing there because the code never gets executed....

if os.path.join(path, 'cut_' + os.path.splitext(img_path)[0] + '.png'):
  continue

**#### THIS CODE BELOW NEVER GETS EXCUTED #####**
img = cv2.imread(os.path.join(path, img_path))
edges = cv2.Canny(img, 10, 30)
coords = zip(*np.where(edges > 0))
n_p = len(coords)

coords.sort(key=lambda x: (x[0], x[1]))
center_0 = int((coords[int(0.01 * n_p)][0] + coords[int(0.99 * n_p)][0]) / 2)
coords.sort(key=lambda x: (x[1], x[0]))
center_1 = int((coords[int(0.01 * n_p)][1] + coords[int(0.99 * n_p)][1]) / 2)

edge_size = min( [center_0, img.shape[0] - center_0, center_1, img.shape[1] - center_1])
img_cut = img[(center_0 - edge_size):(center_0 + edge_size), (center_1 - edge_size):(center_1 + edge_size)]
img_cut = cv2.resize(img_cut, (512, 512))
cv2.imwrite(os.path.join(path + '/cut/', 'cut_' + os.path.splitext(img_path)[0] + '.png'),img_cut)

Thanks,
Hope to get the updated code soon, so I can run the sample chapter.

-Oscar.

@TheStoneMX
Copy link
Author

@miaecle ,

Why do we need to make every image a png and not leave it as a jpeg ?

Thanks,
-Oscar.

@miaecle miaecle mentioned this issue Jul 15, 2019
@miaecle
Copy link
Contributor

miaecle commented Jul 15, 2019

@TheStoneMX So it is not a jpeg/png issue, basically we need to cut the image so that we can fit it into the network. Please see PR #8 for the quick fix, right now the data loading part should be clean. Let me know if you find any further issues.

@TheStoneMX
Copy link
Author

@miaecle thanks a lot for the fix! it is working now, but there is one more thing that needs fixing.... sorry.

I found that the code wasn't writing any images to disk.... and found the cv2.imwrite does not raise an exception when it can't find the path.

try:
cv2.imwrite(os.path.join(path + '/cut/', 'cut_' + os.path.splitext(img_path)[0] + '.png'),img_cut)
except:
  logger.critical("error - cv2.imwrite")
continue 

so looking at the code I found that it creates a directory cut, one level abobe train, but it tries to write to /train/cut/ cut being inside the train directory.
So I created the directory manually and everything is working meaning writing png images to cut directory.

@TheStoneMX
Copy link
Author

So it is not a jpeg/png issue

I never said it was a jpeg/ png issue....

The questions are why do you need to feed the network png's and not just jpeg's ..... because the way it is being done, it takes about 4 days to write 35 thousand images to disk, it has been 12 hours since I started and I have only written 2764 images to disk.... and I have x299 board with X9960 processor with SSD disk....

I can't imagine how low it will take someone with the less fast computer and a regular hard drive. unless I am missing something here.

Thanks for your great support! I will make sure I mention it on amazon review of the book.

@miaecle
Copy link
Contributor

miaecle commented Jul 16, 2019

@TheStoneMX I see what you mean, thanks for the feedback! I will try optimize the pipeline to accelerate the preprocessing step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants