You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am curious about the algorithm especially because it is perceived as an
alternative to existing Deep Learning Algorithms.
I went through the outline of the algorithm and the line_patterns.py (“line_1d_alg” from “master”)
to understand the algorithm a bit more. Here is my understanding and questions, per the code:
An image is read in Grayscale with pixel brightness of a picture in X,Y coordinates (X from left
to right; Y from top to bottom)
Cross Comparison
a. Pixels from each row (i.e., pixels from left to right) is selected.
i. Difference between every adjacent pixel (“brightness” because it is
GrayScale image) is calculated “d” in “dert”.
ii. Deviation from an “ave” value to the abs value of “d”. This is “m” value for
that pixel.
(“ave” is a constant = 15. Would this value change, based on
different task/image?).
“abs” value of “d” is taken to consider only the magnitude of
difference and not direction. So, a slight deviation in brightness
(either on -ve or +ve side) can still be clustered because it looks the
same to the human eyes and most likely not part of an edge (or
boundary between two objects)
“m” value is close to “ave” when the deviation of brightness “d”
between adjacent pixels is small. This can happen when.
a. a white pixel is followed by white pixel of somewhat same
brightness. Similar for a black pixel or any value in the 0-256
range
“m” value is low when there is a slightly higher deviation between
the brightness of adjacent pixels.
a. A white pixel is followed more paler or more darker pixel
e.g., 255 to 244 OR 10 to 25
b. This may indicate a graduate increase or decrease in
brightness indicating an edge or could simply be a noisy
region, where such deviations could fluctuate between
pixels for a short range
“m” value is negative and could be much larger when there is a
contrast between the two adjacent pixels
a. A white pixel to a gray pixel or black pixel (255 to 150 OR
255 to 0 OR 55 to 85)
b. This may be indicate a sharp edge between two a light
background and a darker object or could be a noisy speckle
in an image (consider a scanned copy of an application form,
such speckles of black dots can be seen in a poor quality
scan)
iii. Form_PM: As the name indicates, this method forms clusters of pixels using
the sign of “m” (+ve if m>0 and -ve otherwise).
When the pixel brightness is not deviating much i.e., a series of a
white or black patch or the pixels where brightness is gradually
increasing or decreasing, the “m” value is +ve and all these pixels are
clustered in one bucket. But this also includes where brightness
values between adjacent pixel changes slightly more but within
“ave”.
2. There could be singletons (single pixel) or a small range of pixels
where there is a sudden spike (where “m” is negative). In general,
this behavior does not continue for a long range of pixels in a
natural picture. Hence the cluster size (indicated by L, which is a
count of pixels in the cluster) would be small. But in a scanned copy
of a black and white form (e.g., invoices, application forms), there is
a constant flip between black and white pixels. In that case, there
could be too many small clusters.
3. Values “L”, “D”, “M”, “dert” are accumulated values in a cluster 1.
Number of pixels, 2. Sum of all “d”s 3. Sum of all “m”s, 4. List of all
“dert”s (a structure with “d”, “m” and pixel value “p”)
4. Would it be also better to store statistic about the cluster, like
median values, number of -ve or +ve deviation, etc., Maybe the
nature of brightness of the pixels between two clusters could be
understand better by looking at the medians? E.g., a series of white
pixels with similar brightness would exhibit a median value very
close to another series of pixels with similar brightness density.
iv. If there are more than “4” (or a limit) clusters in a row, the algorithms finds
sub-clusters within each cluster. This is divided into two parts. 1 for cluster
with smaller deviation pixels
Cluster with smaller deviation pixels
a. The sub-cluster is formed only when this cluster and the
adjacent cluster has higher deviation (done by checking if
values of “M” and a borrowed value of “M” from adjacent
cluster deviates by a value of “ave_M” or more. Also, the
number of pixels should be more than 4 in that cluster,
which means a sub-layer will not formed for a cluster of size
less than 4 pixel elements.
b. If the deviation between the adjacent cluster is too small, no
sub-clusters or sub-layers are created.
c. The sub-layers are formed by Form_Pm but on a sparser (in
this code every alternative even pixel) pixel range
(range_comp). Did not understand the reason for this. Also,
the “m” values are accumulated.
d. Intra_pm is called recursively to create sub-layers within
layers and sub-layers. Thus forming a hierarchical cluster?
Cluster with higher deviation pixels
a. A sub-layer is formed only if the M value of this cluster and
the borrowed information from adjacent cluster is greater
than “ave_D” (here 5) and if the number of pixel elements
are greater than 3.
b. This indicates that sub-layers are being formed for highly
contrastive areas where deviation is very high between
every pixel. Let us say if there is a series of 5 pixels, where
there is an increase in brightness is exactly 16, the m values
would be -1.5, -1.5, -1.5, -1.5, -1.5 and M value is -7.5, I
understand the “ave_D” is carefully selected as “5”,
otherwise one of the conditions “L > 3” or min(-P.M, adj_M)
ave_D * rdn wouldn’t satisfy. But what is the function of
rdn here? It is passed as 1
c. The sub-layers are formed based on sign in “d” values rather
than “m” values and is done using form_Pd and intra_Pd
and the “d” values are unilateral, which is done in
deriv_comp
d. The sub-layer on Pd (intra_pd) is formed only when
minimum of unilateral accumulated deviation D and relative
adjacent M value is greater than ave_D (constant = 5). Does
it mean that the adjacent cluster’s M value should be at-
least 10 times or more? Maybe I am missing something
here.
After the cross-comparison is done at every row, patterns are searched on these clusters. – I
have not looked at this code yet.
I also looked at the cluster class in “class_cluster.py”, which is used for defining the dert and
P clusters. But I think the methods in the class “ClusterStructure” and “MetaCluster” are not
being used.
The above steps is based on my understanding of the code. Again, it is not perfect, some of the
intricate details of why certain values are used, OR why some steps are being done, is not fully
understood (even though code is well commented). I would like to understand in detail.
Maybe a good understanding of the objective of the algorithm and mathematical concepts
behind the algorithm would fill the gaps.
With my limited understanding, I felt that the algorithm would be very effective in finding image
similarities, and a good input for object detection, OCR? Maybe this can be extended for
Audio/Video inputs, which I think the document says so. I am not sure about text data – maybe
it is useful in clustering important words (nouns, verbs) in a sentence by clustering all the
conjunctions, interjections, etc., in a similar cluster provided the gradient between these words
are very small and between these words and the nouns/verbs are higher? But I have limited
understanding of this. I would like to know more and interested to be a part of the team.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hari:
I am curious about the algorithm especially because it is perceived as an
alternative to existing Deep Learning Algorithms.
I went through the outline of the algorithm and the line_patterns.py (“line_1d_alg” from “master”)
to understand the algorithm a bit more. Here is my understanding and questions, per the code:
An image is read in Grayscale with pixel brightness of a picture in X,Y coordinates (X from left
to right; Y from top to bottom)
Cross Comparison
a. Pixels from each row (i.e., pixels from left to right) is selected.
i. Difference between every adjacent pixel (“brightness” because it is
GrayScale image) is calculated “d” in “dert”.
ii. Deviation from an “ave” value to the abs value of “d”. This is “m” value for
that pixel.
(“ave” is a constant = 15. Would this value change, based on
different task/image?).
“abs” value of “d” is taken to consider only the magnitude of
difference and not direction. So, a slight deviation in brightness
(either on -ve or +ve side) can still be clustered because it looks the
same to the human eyes and most likely not part of an edge (or
boundary between two objects)
“m” value is close to “ave” when the deviation of brightness “d”
between adjacent pixels is small. This can happen when.
a. a white pixel is followed by white pixel of somewhat same
brightness. Similar for a black pixel or any value in the 0-256
range
“m” value is low when there is a slightly higher deviation between
the brightness of adjacent pixels.
a. A white pixel is followed more paler or more darker pixel
e.g., 255 to 244 OR 10 to 25
b. This may indicate a graduate increase or decrease in
brightness indicating an edge or could simply be a noisy
region, where such deviations could fluctuate between
pixels for a short range
“m” value is negative and could be much larger when there is a
contrast between the two adjacent pixels
a. A white pixel to a gray pixel or black pixel (255 to 150 OR
255 to 0 OR 55 to 85)
b. This may be indicate a sharp edge between two a light
background and a darker object or could be a noisy speckle
in an image (consider a scanned copy of an application form,
such speckles of black dots can be seen in a poor quality
scan)
iii. Form_PM: As the name indicates, this method forms clusters of pixels using
the sign of “m” (+ve if m>0 and -ve otherwise).
white or black patch or the pixels where brightness is gradually
increasing or decreasing, the “m” value is +ve and all these pixels are
clustered in one bucket. But this also includes where brightness
values between adjacent pixel changes slightly more but within
“ave”.
2. There could be singletons (single pixel) or a small range of pixels
where there is a sudden spike (where “m” is negative). In general,
this behavior does not continue for a long range of pixels in a
natural picture. Hence the cluster size (indicated by L, which is a
count of pixels in the cluster) would be small. But in a scanned copy
of a black and white form (e.g., invoices, application forms), there is
a constant flip between black and white pixels. In that case, there
could be too many small clusters.
3. Values “L”, “D”, “M”, “dert” are accumulated values in a cluster 1.
Number of pixels, 2. Sum of all “d”s 3. Sum of all “m”s, 4. List of all
“dert”s (a structure with “d”, “m” and pixel value “p”)
4. Would it be also better to store statistic about the cluster, like
median values, number of -ve or +ve deviation, etc., Maybe the
nature of brightness of the pixels between two clusters could be
understand better by looking at the medians? E.g., a series of white
pixels with similar brightness would exhibit a median value very
close to another series of pixels with similar brightness density.
iv. If there are more than “4” (or a limit) clusters in a row, the algorithms finds
sub-clusters within each cluster. This is divided into two parts. 1 for cluster
with smaller deviation pixels
Cluster with smaller deviation pixels
a. The sub-cluster is formed only when this cluster and the
adjacent cluster has higher deviation (done by checking if
values of “M” and a borrowed value of “M” from adjacent
cluster deviates by a value of “ave_M” or more. Also, the
number of pixels should be more than 4 in that cluster,
which means a sub-layer will not formed for a cluster of size
less than 4 pixel elements.
b. If the deviation between the adjacent cluster is too small, no
sub-clusters or sub-layers are created.
c. The sub-layers are formed by Form_Pm but on a sparser (in
this code every alternative even pixel) pixel range
(range_comp). Did not understand the reason for this. Also,
the “m” values are accumulated.
d. Intra_pm is called recursively to create sub-layers within
layers and sub-layers. Thus forming a hierarchical cluster?
Cluster with higher deviation pixels
a. A sub-layer is formed only if the M value of this cluster and
the borrowed information from adjacent cluster is greater
than “ave_D” (here 5) and if the number of pixel elements
are greater than 3.
b. This indicates that sub-layers are being formed for highly
contrastive areas where deviation is very high between
every pixel. Let us say if there is a series of 5 pixels, where
there is an increase in brightness is exactly 16, the m values
would be -1.5, -1.5, -1.5, -1.5, -1.5 and M value is -7.5, I
understand the “ave_D” is carefully selected as “5”,
otherwise one of the conditions “L > 3” or min(-P.M, adj_M)
it mean that the adjacent cluster’s M value should be at-
least 10 times or more? Maybe I am missing something
here.
have not looked at this code yet.
P clusters. But I think the methods in the class “ClusterStructure” and “MetaCluster” are not
being used.
The above steps is based on my understanding of the code. Again, it is not perfect, some of the
intricate details of why certain values are used, OR why some steps are being done, is not fully
understood (even though code is well commented). I would like to understand in detail.
Maybe a good understanding of the objective of the algorithm and mathematical concepts
behind the algorithm would fill the gaps.
With my limited understanding, I felt that the algorithm would be very effective in finding image
similarities, and a good input for object detection, OCR? Maybe this can be extended for
Audio/Video inputs, which I think the document says so. I am not sure about text data – maybe
it is useful in clustering important words (nouns, verbs) in a sentence by clustering all the
conjunctions, interjections, etc., in a similar cluster provided the gradient between these words
are very small and between these words and the nouns/verbs are higher? But I have limited
understanding of this. I would like to know more and interested to be a part of the team.
Thanks,
Hari
Beta Was this translation helpful? Give feedback.
All reactions