Chapter 8 - data.py #7

TheStoneMX · 2019-07-14T01:15:49Z

Hi there,
I was trying to run the code but it does not run, in line 41 you are looking for

image_names = [p for p in os.listdir(images_path)if p.startswith('cut_') and p.endswith('.png')]

But there are no png images in the rep that I downloaded from Kaggle, all the images are in jpeg format.
and in the list you build in :

for im in image_names:
if im.endswith('.jpeg') and not im.startswith('cut_') and not 'cut_' + im in image_names:
raw_images.append(im)

Does not get used at all the raw_images ....

I am trying to understand why you are looking for 'cut_' there is no image that starts or ends with 'cuts_'

Can you please help me get a working version.

Thanks.
Oscar.

miaecle · 2019-07-14T22:24:00Z

Hi Oscar,
Thanks for raising the issue! So in the pipeline we first read all raw images (those without cut_ prefix and in the format of jpeg) and do a preprocessing step of cutting. This is done by calling cut_raw_images function (line 35) and it will generate cut_*.png, which are read in the next step. Let me know if you have any problems with this step.
Best,
Michael

TheStoneMX · 2019-07-15T03:25:50Z

Hi Miaecle,

Thanks for your quick response! this is the problem I am having.....

image_names = [ p for p in os.listdir(images_path) if p.startswith('cut_') and p.endswith('.png') ]
-- it returns Zero --

-- image_name -- before

-- image_name -- after image_names = [ p for p in os.listdir(images_path) if p.startswith('cut_') and p.endswith('.png') ]

and you can see in the lower right pane that 'cut_raw_images' is reading all the images

Hope you can help me.

Thanks,
-Oscar-

TheStoneMX · 2019-07-15T16:57:19Z

Hi There,

I was looking into it more, and this is what I found, is there anything I am missing? look at the screenshot

and the code and the left-top pane to see the variables names.

I hope you can help me run this code, I am very interested in the DeepChem library.

Thanks,
-Oscar

miaecle · 2019-07-15T17:03:49Z

@TheStoneMX Oh I think I find where might go wrong. I don't have the codes with me right now but I will try to run the codes later today. Can you check if there is a new folder called cut under your path for raw images? All the preprocessed images (cut_*.png) might be stored there. If you move these into their root folder (the same folder for raw images) it might run.

TheStoneMX · 2019-07-15T17:26:20Z

@miaecle , Thanks for the response, there is a cut directory, but there is nothing there because the code never gets executed....

if os.path.join(path, 'cut_' + os.path.splitext(img_path)[0] + '.png'):
  continue

**#### THIS CODE BELOW NEVER GETS EXCUTED #####**
img = cv2.imread(os.path.join(path, img_path))
edges = cv2.Canny(img, 10, 30)
coords = zip(*np.where(edges > 0))
n_p = len(coords)

coords.sort(key=lambda x: (x[0], x[1]))
center_0 = int((coords[int(0.01 * n_p)][0] + coords[int(0.99 * n_p)][0]) / 2)
coords.sort(key=lambda x: (x[1], x[0]))
center_1 = int((coords[int(0.01 * n_p)][1] + coords[int(0.99 * n_p)][1]) / 2)

edge_size = min( [center_0, img.shape[0] - center_0, center_1, img.shape[1] - center_1])
img_cut = img[(center_0 - edge_size):(center_0 + edge_size), (center_1 - edge_size):(center_1 + edge_size)]
img_cut = cv2.resize(img_cut, (512, 512))
cv2.imwrite(os.path.join(path + '/cut/', 'cut_' + os.path.splitext(img_path)[0] + '.png'),img_cut)

Thanks,
Hope to get the updated code soon, so I can run the sample chapter.

-Oscar.

TheStoneMX · 2019-07-15T17:27:54Z

@miaecle ,

Why do we need to make every image a png and not leave it as a jpeg ?

Thanks,
-Oscar.

miaecle · 2019-07-15T21:49:21Z

@TheStoneMX So it is not a jpeg/png issue, basically we need to cut the image so that we can fit it into the network. Please see PR #8 for the quick fix, right now the data loading part should be clean. Let me know if you find any further issues.

TheStoneMX · 2019-07-16T14:42:11Z

@miaecle thanks a lot for the fix! it is working now, but there is one more thing that needs fixing.... sorry.

I found that the code wasn't writing any images to disk.... and found the cv2.imwrite does not raise an exception when it can't find the path.

try:
cv2.imwrite(os.path.join(path + '/cut/', 'cut_' + os.path.splitext(img_path)[0] + '.png'),img_cut)
except:
  logger.critical("error - cv2.imwrite")
continue

so looking at the code I found that it creates a directory cut, one level abobe train, but it tries to write to /train/cut/ cut being inside the train directory.
So I created the directory manually and everything is working meaning writing png images to cut directory.

TheStoneMX · 2019-07-16T14:49:05Z

So it is not a jpeg/png issue

I never said it was a jpeg/ png issue....

The questions are why do you need to feed the network png's and not just jpeg's ..... because the way it is being done, it takes about 4 days to write 35 thousand images to disk, it has been 12 hours since I started and I have only written 2764 images to disk.... and I have x299 board with X9960 processor with SSD disk....

I can't imagine how low it will take someone with the less fast computer and a regular hard drive. unless I am missing something here.

Thanks for your great support! I will make sure I mention it on amazon review of the book.

miaecle · 2019-07-16T18:46:12Z

@TheStoneMX I see what you mean, thanks for the feedback! I will try optimize the pipeline to accelerate the preprocessing step.

miaecle mentioned this issue Jul 15, 2019

Fix for issue #8

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chapter 8 - data.py #7

Chapter 8 - data.py #7

TheStoneMX commented Jul 14, 2019 •

edited

Loading

miaecle commented Jul 14, 2019

TheStoneMX commented Jul 15, 2019 •

edited

Loading

TheStoneMX commented Jul 15, 2019

miaecle commented Jul 15, 2019

TheStoneMX commented Jul 15, 2019

TheStoneMX commented Jul 15, 2019

miaecle commented Jul 15, 2019

TheStoneMX commented Jul 16, 2019

TheStoneMX commented Jul 16, 2019

miaecle commented Jul 16, 2019

Chapter 8 - data.py #7

Chapter 8 - data.py #7

Comments

TheStoneMX commented Jul 14, 2019 • edited Loading

miaecle commented Jul 14, 2019

TheStoneMX commented Jul 15, 2019 • edited Loading

TheStoneMX commented Jul 15, 2019

miaecle commented Jul 15, 2019

TheStoneMX commented Jul 15, 2019

TheStoneMX commented Jul 15, 2019

miaecle commented Jul 15, 2019

TheStoneMX commented Jul 16, 2019

TheStoneMX commented Jul 16, 2019

miaecle commented Jul 16, 2019

TheStoneMX commented Jul 14, 2019 •

edited

Loading

TheStoneMX commented Jul 15, 2019 •

edited

Loading