diff --git a/read/extraction-ground-truth/1601.03642.txt b/read/extraction-ground-truth/1601.03642.txt index dfca79e..6e2c10b 100644 --- a/read/extraction-ground-truth/1601.03642.txt +++ b/read/extraction-ground-truth/1601.03642.txt @@ -279,8 +279,8 @@ character by character. If the model is good, the text can have the correct punctuation. This would not be possible with a word predictor. -Character predictors can be implemented with RNNs. In con- -trast to standard feed-forward neural networks like multilayer +Character predictors can be implemented with RNNs. In contrast +to standard feed-forward neural networks like multilayer Perceptrons (MLPs) which was shown in Figure 1(b), those networks are trained to take their output at some point as well as the normal input. This means they can keep some information @@ -398,8 +398,8 @@ The new feature of Emily Howell compared to Emmy is that Emily Howell does not necessarily remain in a single, already known style. -Emily Howell makes use of association network. Cope empha- -sizes that this is not a form of a neural network. However, it +Emily Howell makes use of association network. Cope emphasizes +that this is not a form of a neural network. However, it is not clear from [Cop13] how exactly an association network is trained. Cope mentions that Emily Howell is explained in detail in [Cop05]. diff --git a/read/extraction-ground-truth/1602.06541.txt b/read/extraction-ground-truth/1602.06541.txt index d99f9dc..e24b55f 100644 --- a/read/extraction-ground-truth/1602.06541.txt +++ b/read/extraction-ground-truth/1602.06541.txt @@ -5,9 +5,9 @@ info@martin-thoma.de Abstract—This survey gives an overview over different techniques used for pixel-level semantic segmentation. -Metrics and datasets for the evaluation of segmenta- -tion algorithms and traditional approaches for segmen- -tation such as unsupervised methods, Decision Forests +Metrics and datasets for the evaluation of segmentation +algorithms and traditional approaches for segmentation +such as unsupervised methods, Decision Forests and SVMs are described and pointers to the relevant papers are given. Recently published approaches with convolutional neural networks are mentioned and typical @@ -19,22 +19,22 @@ I. INTRODUCTION Semantic segmentation is the task of clustering parts of images together which belong to the same -object class. This type of algorithm has several use- -cases such as detecting road signs [MBLAGJ+07], -detecting tumors [MBVLG02], detecting medical in- -struments in operations [WAH97], colon crypts segmen- -tation [CRSS14], land use and land cover classifica- -tion [HDT02]. In contrast, non-semantic segmentation -only clusters pixels together based on general character- -istics of single objects. Hence the task of non-semantic +object class. This type of algorithm has several use-cases +such as detecting road signs [MBLAGJ+07], +detecting tumors [MBVLG02], detecting medical instruments +in operations [WAH97], colon crypts segmentation +[CRSS14], land use and land cover classification +[HDT02]. In contrast, non-semantic segmentation +only clusters pixels together based on general characteristics +of single objects. Hence the task of non-semantic segmentation is not well-defined, as many different segmentations might be acceptable. Several applications of segmentation in medicine are listed in [PXP00]. -Object detection, in comparison to semantic seg- -mentation, has to distinguish different instances of the +Object detection, in comparison to semantic segmentation, +has to distinguish different instances of the same object. While having a semantic segmentation is certainly a big advantage when trying to get object instances, there are a couple of problems: neighboring @@ -81,8 +81,8 @@ such, the classes on which the algorithm is trained is a central design decision. Most algorithms work with a fixed set of classes; -some even only work on binary classes like fore- -ground vs background [RM07], [CS10] or street vs +some even only work on binary classes like foreground +vs background [RM07], [CS10] or street vs no street [BKTT15]. However, there are also unsupervised segmentation @@ -105,8 +105,8 @@ is the glass and behind it the table, even if we only had a single image and were not allowed to move. This means we simultaneously two labels to the coordinates of the glass: Glass and table. Although there is much more -work being done on single class affiliation segmenta- -tion algorithms, there is a publication about multiple +work being done on single class affiliation segmentation +algorithms, there is a publication about multiple class affiliation segmentation [LRAL08]. Similarly, recent publications in pixel-level object segmentation used layered models [YHRF12]. @@ -121,18 +121,18 @@ inference of a segmentation varies by application. • Grayscale vs colored: Grayscale images are commonly used in medical imaging such as -magnetic resonance (MR) imaging or ultrasonog- -raphy whereas colored photographs are obviously +magnetic resonance (MR) imaging or ultrasonography +whereas colored photographs are obviously widespread. • Excluding or including depth data: RGB-D, -sometimes also called range [HJBJ+96] is avail- -able in robotics, autonomous cars and recently +sometimes also called range [HJBJ+96] is available +in robotics, autonomous cars and recently also in consumer electronics such as Microsoft Kinect [Zha12]. -• Single image vs stereo images vs co- -segmentation: Single image segmentation is the +• Single image vs stereo images vs co-segmentation: +Single image segmentation is the most wide-spread kind of segmentation, but using stereo images was already tried in [BVZ01]. It can be seen as a more natural way of segmentation as @@ -149,8 +149,8 @@ of information to find a meaningful segmentation. This idea can be extended to time series such as videos. -• 2D vs 3D: Segmenting images is a 2D segmenta- -tion task where the smallest unit is called a pixel. +• 2D vs 3D: Segmenting images is a 2D segmentation +task where the smallest unit is called a pixel. In 3D data, such as volumetric X-ray CT images as they were used in [HHR01], the smallest unit is called a voxel. @@ -169,11 +169,10 @@ the algorithm finds a fine-grained segmentation. [BJ00], [RKB04], [PS07] describe systems which work in an interactive mode. -(a) Example Scene (b) Visualization of a found seg- -mentation +(a) Example Scene (b) Visualization of a found segmentation -Figure 1: An example of a scene and a possible visu- -alization of a found segmentation. +Figure 1: An example of a scene and a possible visualization +of a found segmentation. III. EVALUATION AND DATASETS @@ -187,8 +186,8 @@ there are other measures of quality which matter when segmentation algorithms are compared. This section gives an overview of those quality measures. -1) Accuracy: Showing the correctness of the segmen- -tation hypotheses is done in most publications about +1) Accuracy: Showing the correctness of the segmentation +hypotheses is done in most publications about semantic segmentation. However, there are a couple of different ways how this accuracy can be displayed. One way to give readers a first qualitative impression @@ -213,20 +212,14 @@ One way to compare segmentation algorithms is by the pixel-wise accuracy of the predicted segmentation as done in many publications [SWRC06], [CP08], -[LSD14]. This is also called per-pixel rate and de- -fined as - -∑k -i=1 nii∑k -i=1 ti - -. Taking the pixel-wise classification +[LSD14]. This is also called per-pixel rate and defined +as ∑^k_{i=1} n_{ii}/∑k_{i=1} t_i. Taking the pixel-wise classification accuracy has two major drawbacks: P1 Tasks like segmenting images for autonomous cars have large regions which have one class. This makes achieving classification accuracies of more -than 30 % with a priori knowledge only possible. +than 30% with a priori knowledge only possible. For example, a system might learn that a certain position of the image is most of the time “sky” while another position is most of the time “road”. @@ -240,45 +233,10 @@ car” Three accuracy metrics which do not suffer from problem P1 are used in [LSD14]: -• mean accuracy: 1k · - -∑k -i=1 - -nii -ti -∈ [0, 1] - -• mean intersection over union: -1 -k · - -∑k -i=1 - -nii -ti−nii+ - -∑k -j=1 nji - -∈ [0, 1] +• mean accuracy: 1/k · ∑^k_{i=1} n_{ii} t_i ∈ [0, 1] +• mean intersection over union: 1/k · ∑^k_{i=1} n_{ii} t_i−n_{ii} + ∑^k_{j=1} n_{ji} ∈ [0, 1] • frequency weighted intersection over union: - -( -∑k -i=1 ti) - -−1 ∑k -i=1 ti · - -nii -ti−nii+ - -∑k -j=1 nji - -∈ [0, 1] + (∑ki=1 ti)^{−1} ∑^k_{i=1} t_i · n_{ii} t_i−n_{ii}+∑^k_{j=1} n_{ji} ∈ [0, 1] Another problem might be pixels which cannot be assigned to one of the known classes. For this reason, [SWRC06] makes use of a void class. This class gets @@ -291,8 +249,8 @@ is giving the confusion matrix as done in [SWRC06]. However, this approach is not feasible if many classes are given. -The F-measure is useful for binary classifica- -tion task such as the KITTI road segmentation +The F-measure is useful for binary classification +task such as the KITTI road segmentation benchmark [FKG13] or crypt segmentation as done by [CRSS14]. It is calculated as “the harmonic mean of the precision and recall” [PH05]: @@ -309,12 +267,12 @@ Finally, it should be noted that a lot of other measures for the accuracy of segmentations were proposed for non-semantic segmentation. One of those accuracy measures is Normalized Probabilistic Rand (NPR) -index which was introduced in [UPH05] and eval- -uated in [CSI+09] on dermoscopy images. Other +index which was introduced in [UPH05] and evaluated +in [CSI+09] on dermoscopy images. Other non-semantic segmentation measures were introduced in [MFTM01], but the reason for creating them seems to -be to deal with the under-defined task description of non- -semantic segmentation. These accuracy measures try to +be to deal with the under-defined task description of non-semantic +segmentation. These accuracy measures try to deal with different levels of coarsity of the segmentation. This is much less of a problem in semantic segmentation and thus those measures are not explained here. @@ -334,8 +292,8 @@ very hardware, implementation and in some cases even data specific. For example, [HJBJ+96] notes that their algorithm needs 10 s on a Sun SparcStation 20. The fastest CPU ever produced for this system had 200 MHz. -Comparing this directly with results which were ob- -tained using an Intel i7-4820K with 3.9 GHz would not +Comparing this directly with results which were obtained +using an Intel i7-4820K with 3.9 GHz would not be meaningful. However, it does still make sense to mention the @@ -421,22 +379,22 @@ the object boundaries” [SWRC06]. 3) Medical Databases: The Warwick-QU Dataset consists of 165 images with pixel-level annotation of -5 classes: “healthy, adenomatous, moderately differen- -tiated, moderately-to-poorly differentiated, and poorly +5 classes: “healthy, adenomatous, moderately differentiated, +moderately-to-poorly differentiated, and poorly differentiated” [CSM09]. This dataset is part of the Gland Segmentation (GlaS) challenge. -The DIARETDB1 [KKV+14] is a dataset of 89 im- -ages fundus images. Those images show the interior +The DIARETDB1 [KKV+14] is a dataset of 89 images +fundus images. Those images show the interior surface of the eye. Fundus images can be used to detect diabetic retinopathy. The images have four classes of coarse annotations: hard and soft exudates, hemorrhages and red small dots. -20 test and additionally 20 training retinal fun- -dus images are available through the DRIVE data -set [SAN+04]. The vessels were annotated. Addition- -ally, [AP11] added vascular features. +20 test and additionally 20 training retinal fundus +mages are available through the DRIVE data +set [SAN+04]. The vessels were annotated. Additionally, +[AP11] added vascular features. The Open-CAS Endoscopic Datasets [MHMK+14] are 60 images taken from laparoscopic adrenalectomies @@ -450,22 +408,7 @@ One crowd annotation was obtained for each image by a majority vote on a pixel basis of 10 segmentations given by 10 different KWs. -Training -Prediction - -Post- -processing - -Window-wise -Classification - -Window -extraction - -Data -augmentationFeature extraction - -Preprocessing +[IMAGE] Figure 2: A typical segmentation pipeline gets raw pixel data, applies preprocessing techniques @@ -487,14 +430,14 @@ classifier which operates on fixed-size feature inputs and a sliding-window approach [DT05], [YBCK10], [SCZ08]. This means a classifier is trained on images of a fixed size. The trained classifier is then fed with -rectangular regions of the image which are called win- -dows. Although the classifier gets an image patch of e.g. -51 px×51 px of the environment, it might only classify +rectangular regions of the image which are called windows. +Although the classifier gets an image patch of e.g. +51 px × 51 px of the environment, it might only classify the center pixel or a subset of the complete window. This segmentation pipeline is visualized in Figure 2. -This approach was taken by [BKTT15] and a major- -ity of the VOC2007 participants [EVGW+a]. As this +This approach was taken by [BKTT15] and a majority +of the VOC2007 participants [EVGW+a]. As this approach has to apply the patch classifier 512 · 512 = 262 144 times for images of size 512 px×512 px, there are techniques for speeding it up such as applying a @@ -510,8 +453,6 @@ Conditional Random Fields (CRFs) which take the information of the complete image and segment it in an holistic approach. -http://host.robots.ox.ac.uk:8080/ - V. TRADITIONAL APPROACHES Image segmentation algorithms which use traditional @@ -526,8 +467,8 @@ Fields in Section V-E and Support Vector Machines (SVMs) in Section V-D. Postprocessing is covered in Section V-G. -It should be noted that algorithms can use combina- -tion of methods. For example, [TNL14] makes use of a +It should be noted that algorithms can use combination +of methods. For example, [TNL14] makes use of a combination of a SVM and a MRF. Also, auto-encoders can be used to learn features which in turn can be used by any classifier. @@ -576,9 +517,9 @@ were proposed in [DT05] and are used in [BMBM10], 3) SIFT: Scale-invariant feature transform (SIFT) feature descriptors describe keypoints in an image. The -image patch of the size 16× 16 around the keypoint +image patch of the size 16×16 around the keypoint is taken. This patch is divided in 16 distinct parts of -the size 4× 4. For each of those parts a histogram of +the size 4×4. For each of those parts a histogram of 8 orientations is calculated similar as for HOG features. This results in a 128-dimensional feature vector for each keypoint. @@ -606,23 +547,23 @@ classes like humans. However, it is difficult for classes like airplanes, ships, organs or cells where the human annotators do not know the keypoints. Additionally, the keypoints have to be chosen for every single class. There -are strategies to deal with those problems like viewpoint- -dependent keypoints. Poselets were used in [BMBM10] +are strategies to deal with those problems like viewpoint-dependent +keypoints. Poselets were used in [BMBM10] to detect people and in [BBMM11] for general object detection of the PASCAL VOC dataset. 6) Textons: A texton is the minimal building block of vision. The computer vision literature does not give a strict definition for textons, but edge detectors could be -one example. One might argue that deep learning tech- -niques with Convolution Neuronal Networks (CNNs) +one example. One might argue that deep learning techniques +with Convolution Neuronal Networks (CNNs) learn textons in the first filters. An excellent explanation of textons can be found in [ZGWX05]. -7) Dimensionality Reduction: High-resolution im- -ages have a lot of pixels. Having one or more feature per +7) Dimensionality Reduction: High-resolution images +have a lot of pixels. Having one or more feature per pixel results in well over a million features. This makes training difficult while the higher resolution might not contain much more information. A simple approach @@ -662,8 +603,8 @@ directly be applied on the pixels, when one gives a feature vector per pixel. Two clustering algorithms are k-means and the mean-shift algorithm. -The k-means algorithm is a general-purpose cluster- -ing algorithm which requires the number of clusters to +The k-means algorithm is a general-purpose clustering +algorithm which requires the number of clusters to be given beforehand. Initially, it places the k centroids randomly in the feature space. Then it assigns each data point to the nearest centroid, moves the centroid @@ -673,10 +614,9 @@ described in [Har75]. k-means was applied by [CLP98] for medical image segmentation. -Another clustering algorithm is the mean-shift algo- - -rithm which was introduced by [CM02] for segmen- -tation tasks. The algorithm finds the cluster centers +Another clustering algorithm is the mean-shift algorithm +which was introduced by [CM02] for segmentation +tasks. The algorithm finds the cluster centers by initializing centroids at random seed points and iteratively shifting them to the mean coordinate within a certain range. Instead of taking a hard range constraint, @@ -692,8 +632,8 @@ as vertices and an edge weight is a measure of dissimilarity such as the difference in color [FH04], [Fel]. There are several different candidates for edges. -The 4-neighborhood (north, east, south west) or an 8- -neighborhood (north, north-east, east, south-east, south, +The 4-neighborhood (north, east, south west) or an 8-neighborhood +(north, north-east, east, south-east, south, south-west, west, north-west) are plausible choices. One way to cut the edges is by building a minimum spanning tree and removing edges above a threshold. @@ -703,8 +643,8 @@ step, the connected components are the segments. A graph-based method which ranked 2nd in the Pascal VOC 2010 challenge [EVGW+10] is described -in [CS10]. The system makes heavy use of the multi- -cue contour detector globalPb [MAFM08] and needs +in [CS10]. The system makes heavy use of the multi-cue +contour detector globalPb [MAFM08] and needs about 10 GB of main memory [CS11]. 3) Random Walks: Random walks belong to the @@ -770,8 +710,8 @@ branch to descend. Each leaf is a class. One strength of Random Decision Forests compared to many other classifiers like SVMs and neural networks is that the scale of measure of the features (nominal, -ordinal, interval, ratio) can be arbitrary. Another advan- -tage of Random Decision Forests compared to SVMs, +ordinal, interval, ratio) can be arbitrary. Another advantage +of Random Decision Forests compared to SVMs, for example, is the speed of training and classification. Decision trees were extensively studied in the past @@ -794,11 +734,11 @@ according to an error function. Random Decision Forests with texton features (see Section V-A6) are applied in [SJC08] for segmentation. In the [MSC] dataset, they report a per-pixel accuracy -rate of 66.9 % for their best system. This system +rate of 66.9% for their best system. This system requires 415 ms for the segmentation of 320 px×213 px images on a single 2.7 GHz core. On the Pascal VOC 2007 dataset, they report an average per-pixel -accuracy for their best segmentation system of 42 %. +accuracy for their best segmentation system of 42%. An excellent introduction to Random Decision Forests for semantic segmentation is given by [SCZ08]. @@ -807,9 +747,9 @@ D. SVMs SVMs are well-studied binary classifiers which can be described by five central ideas. For those ideas, the -training data is represented as (xi, yi) where xi is the -feature vector and yi ∈ { −1, 1 } the binary label for -training example i ∈ { 1, . . . ,m }. +training data is represented as (x_i, y_i) where x_i is the +feature vector and yi ∈ {−1, 1} the binary label for +training example i ∈ {1, ... , m}. 1) If data is linearly separable, it can be separated by a hyperplane. There is one hyperplane which @@ -817,22 +757,13 @@ maximizes the distance to the next datapoints (support vectors). This hyperplane should be taken: minimize -w,b - -1 - -2 -‖w‖2 - -s.t. ∀mi=1yi · (〈w,xi〉+ b)︸ ︷︷ ︸ -sgn applied to this gives the classification - -≥ 1 +w,b1 2 ‖w‖2 s.t. ∀mi=1yi · (〈w,xi〉+ b)︸ ︷︷ ︸ +sgn applied to this gives the classification ≥ 1 2) Even if the underlying process which generates the features for the two classes is linearly separable, -noise can make the data not separable. The intro- -duction of slack variables to relax the requirement +noise can make the data not separable. The introduction +of slack variables to relax the requirement of linear separability solves this problem. The trade-off between accepting some errors and a more complex model is weighted by a parameter @@ -840,19 +771,7 @@ C ∈ R+0 . The bigger C, the more errors are accepted. The new optimization problem is: minimize -w - -1 - -2 -‖w‖2 + C · - -m∑ -i=1 - -ξi - -s.t. ∀mi=1yi · (〈w,xi〉+ b) ≥ 1− ξi +w1 2 ‖w‖2 + C · m∑ i=1 ξi s.t. ∀mi=1yi · (〈w,xi〉+ b) ≥ 1− ξi Note that 0 ≤ ξi ≤ 1 means that the data point is within the margin, whereas ξi ≥ 1 means it is @@ -863,12 +782,7 @@ a soft-margin SVM. w and the bias b. The dual problem is to express w as a linear combination of the training data xi: -w = - -m∑ -i=1 - -αiyixi +w = m∑ i=1 αiyixi where yi ∈ { −1, 1 } represents the class of the training example and αi are Lagrange multipliers. @@ -886,14 +800,7 @@ maximize αi m∑ -i=1 - -αi − -1 - -2 - -m∑ +i=1 αi −1 2 m∑ i=1 m∑ @@ -909,8 +816,7 @@ i=1 αiyi = 0 -4) Not every dataset is linearly separable. This prob- -lem is approached by transforming the feature +4) Not every dataset is linearly separable. This problem is approached by transforming the feature vectors x with a non-linear mapping Φ into a higher dimensional (probably ∞-dimensional) space. As the feature vectors x are only used @@ -1174,13 +1080,12 @@ sigmoid activation functions e−x + 1 -Krizhevsky et al. implemented those ideas and partici- -pated in the ImageNet Large-Scale Visual Recognition +Krizhevsky et al. implemented those ideas and participated +in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC). The best other system, which -used SIFT features and Fisher Vectors, had a perfor- -mance of about 25.7 % while the network by Alex -Krizhevsky et al. got 17.0 % error rate on the ILSVRC- -2010 dataset. As a preprocessing step, they downsam- +used SIFT features and Fisher Vectors, had a performanceof about 25.7% while the network by Alex +Krizhevsky et al. got 17.0% error rate on the ILSVRC-2010 +dataset. As a preprocessing step, they downsam- pled all images to a fixed size of 256 px×256 px before they fed the features into their network. This network is commonly known as AlexNet. @@ -1214,8 +1119,8 @@ which should be tested. Those cases might not occur often in the training data, but it could still happen in the productive system. -I am not aware of any systematic work which exam- -ined the influence of problems such as the following. +I am not aware of any systematic work which examined +the influence of problems such as the following. A. Lens Flare @@ -1424,27 +1329,24 @@ user/gustavor/chen_isbi_11.pdf [CP08] G. Csurka and F. Perronnin, “A simple high performance approach to semantic segmentation.” -in BMVC, 2008, pp. 1–10. [Online]. Avail- -able: http://www.xrce.xerox.com/layout/set/print/ +in BMVC, 2008, pp. 1–10. [Online]. Available: +http://www.xrce.xerox.com/layout/set/print/ content/download/16654/118653/file/2008-023.pdf [CRSS] A. Cohen, E. Rivlin, I. Shimshoni, and -E. Sabo, “Colon crypt segmentation website.” [On- -line]. Available: http://mis.haifa.ac.il/~ishimshoni/ +E. Sabo, “Colon crypt segmentation website.” [Online]. +Available: http://mis.haifa.ac.il/~ishimshoni/ SegmentCrypt/Download.htm [CRSS14] ——, “Memory based active contour algorithm using pixel-level classified images for colon crypt segmentation,” Computerized Medical Imaging and Graphics, Nov. 2014. [Online]. Available: -http://mis.haifa.ac.il/~ishimshoni/SegmentCrypt/ -Active%20contour%20based%20on%20pixel- -level%20classified%20image%20for%20colon% -20crypts%20segmentation.pdf +http://mis.haifa.ac.il/~ishimshoni/SegmentCrypt/Active%20contour%20based%20on%20pixel-level%20classified%20image%20for%20colon%20crypts%20segmentation.pdf [CS10] J. Carreira and C. Sminchisescu, “Constrained -parametric min-cuts for automatic object segmenta- -tion,” in Computer Vision and Pattern Recognition +parametric min-cuts for automatic object segmentation,” +in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010, pp. 3241–3248. @@ -1461,8 +1363,8 @@ and Technology, vol. 15, no. 4, pp. 444–450, 2009. [Online]. Available: http://arxiv.org/abs/1009.1020 [CSM09] L. P. Coelho, A. Shariff, and R. F. Murphy, “Nuclear -segmentation in microscope cell images: a hand- -segmented dataset and comparison of algorithms,” +segmentation in microscope cell images: a hand-segmented +dataset and comparison of algorithms,” in Biomedical Imaging: From Nano to Macro, 2009. ISBI’09. IEEE International Symposium on. IEEE, 2009, pp. 518–521. [Online]. Available: @@ -1476,8 +1378,8 @@ in Computer Vision and Pattern Recognition 2012, pp. 1656–1663. [Online]. Available: http: //pages.cs.wisc.edu/~jiaxu/pub/rwcoseg.pdf -[DHS15] J. Dai, K. He, and J. Sun, “Instance-aware seman- -tic segmentation via multi-task network cascades,” +[DHS15] J. Dai, K. He, and J. Sun, “Instance-aware semantic +segmentation via multi-task network cascades,” arXiv preprint arXiv:1512.04412, 2015. [DT05] N. Dalal and B. Triggs, “Histograms of oriented @@ -1492,14 +1394,12 @@ abs_all.jsp?arnumber=1467360 [EVGW+a] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge -2007 (VOC2007) Results,” http://www.pascal- -network.org/challenges/VOC/voc2007/workshop/index.html. +2007 (VOC2007) Results,” http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html. [Online]. Available: http://host.robots.ox.ac.uk: 8080/pascal/VOC/voc2007/index.html -[EVGW+b] ——, “The PASCAL Visual Object Classes Chal- -lenge 2012 (VOC2012) Results,” http://www.pascal- -network.org/challenges/VOC/voc2012/workshop/index.html. +[EVGW+b] ——, “The PASCAL Visual Object Classes Challenge +2012 (VOC2012) Results,” http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html. [Online]. Available: http://host.robots.ox.ac.uk: 8080/pascal/VOC/voc2012/index.html @@ -1579,15 +1479,13 @@ Fisher, “An experimental comparison of range image segmentation algorithms,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 18, no. 7, pp. 673–689, Jul. 1996. -[Online]. Available: http://ieeexplore.ieee.org/xpls/ -abs_all.jsp?arnumber=506791 +[Online]. Available: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=506791 [Ho95] T. K. Ho, “Random decision forests,” in Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on, vol. 1. IEEE, 1995, pp. 278–282. -[Online]. Available: http://ect.bell-labs.com/who/ -tkh/publications/papers/odt.pdf +[Online]. Available: http://ect.bell-labs.com/who/tkh/publications/papers/odt.pdf [Hus07] Hustvedt, “File:cctv lens flare.jpg,” Wikipedia Commons, Nov. 2007. [Online]. Avail- @@ -1600,16 +1498,14 @@ labeling,” in Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, vol. 2, Jun. 2004, pp. II–695–II–702 Vol.2. -[Online]. Available: http://ieeexplore.ieee.org/xpl/ -login.jsp?tp=&arnumber=1315232 +[Online]. Available: http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1315232 [JLD03] K. Jiang, Q.-M. Liao, and S.-Y. Dai, “A novel white blood cell segmentation scheme using scale-space filtering and watershed clustering,” in Machine Learning and Cybernetics, 2003 International Conference on, vol. 5, Nov 2003, pp. 2820–2825 -Vol.5. [Online]. Available: http://ieeexplore.ieee.org/ -xpl/login.jsp?tp=&arnumber=1260033 +Vol.5. [Online]. Available: http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1260033 [Kaf07] L. Kaffer, “File:great male leopard in south afrika- jd.jpg,” Wikipedia Commons, Jul. 2007. [Online]. @@ -1650,7 +1546,6 @@ visual recognition,” 2015. [Online]. Available: http://cs231n.stanford.edu/ [Low04] D. Lowe, “Distinctive image features from scale- - invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004. [Online]. Available: http://dx.doi.org/10.1023/B% @@ -1684,14 +1579,14 @@ IEEE Conference on, June 2008, pp. 1–8. abs_all.jsp?arnumber=4587420 [Man12] M. Manske, “File:randabschattung mikroskop -kamera 6.jpg,” Wikipedia Com- -mons, Dec. 2012. [Online]. Avail- -able: https://commons.wikimedia.org/wiki/File: +kamera 6.jpg,” Wikipedia Commons, +Dec. 2012. [Online]. Available: +https://commons.wikimedia.org/wiki/File: Randabschattung_Mikroskop_Kamera_6.JPG [MBLAGJ+07] S. Maldonado-Bascon, S. Lafuente-Arroyo, P. Gil- -Jimenez, H. Gomez-Moreno, and F. Lopez- -Ferreras, “Road-sign detection and recognition +Jimenez, H. Gomez-Moreno, and F. Lopez-Ferreras, +“Road-sign detection and recognition based on support vector machines,” Intelligent Transportation Systems, IEEE Transactions on, vol. 8, no. 2, pp. 264–278, Jun. 2007. @@ -1786,8 +1681,8 @@ on, vol. 16, no. 4, pp. 1046–1057, 2007. [Online]. Available: http://ieeexplore.ieee.org/xpls/ abs_all.jsp?arnumber=4130436 -[PTN09] N. Plath, M. Toussaint, and S. Nakajima, “Multi- -class image segmentation using conditional random +[PTN09] N. Plath, M. Toussaint, and S. Nakajima, “Multi-class +image segmentation using conditional random fields and global classification,” in Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 2009, pp. 817–824. @@ -1804,8 +1699,8 @@ Machine learning, vol. 1, no. 1, pp. 81–106, Aug. 1986. [Online]. Available: http://dx.doi.org/ 10.1023/A%3A1022643204877 -[Qui93] ——, C4.5: Programs for Machine Learning, P. Lan- -gley, Ed. Morgan Kaufmann Publishers, Inc., 1993. +[Qui93] ——, C4.5: Programs for Machine Learning, P. Langley, +Ed. Morgan Kaufmann Publishers, Inc., 1993. [RKB04] C. Rother, V. Kolmogorov, and A. Blake, “Grabcut: Interactive foreground extraction using iterated @@ -1913,8 +1808,8 @@ Conference on. IEEE, 2005, pp. 34–34. viewcontent.cgi?article=1365&context=robotics [vdMPvdH09] L. J. van der Maaten, E. O. Postma, and H. J. -van den Herik, “Dimensionality reduction: A com- -parative review,” Journal of Machine Learning +van den Herik, “Dimensionality reduction: A comparative +review,” Journal of Machine Learning Research, vol. 10, no. 1-41, pp. 66–71, 2009. [VOC10] “Voc2010 preliminary results,” 2010. [Online]. diff --git a/read/extraction-ground-truth/1707.09725.txt b/read/extraction-ground-truth/1707.09725.txt index 03b7bbd..19145c0 100644 --- a/read/extraction-ground-truth/1707.09725.txt +++ b/read/extraction-ground-truth/1707.09725.txt @@ -49,8 +49,8 @@ FZI Research Center for Information Technology Affirmation -Ich versichere wahrheitsgemäß, die Arbeit selbstständig angefertigt, alle benutzten Hilfs- -mittel vollständig und genau angegeben und alles kenntlich gemacht zu haben, was aus +Ich versichere wahrheitsgemäß, die Arbeit selbstständig angefertigt, alle benutzten Hilfsmittel +vollständig und genau angegeben und alles kenntlich gemacht zu haben, was aus Arbeiten anderer unverändert oder mit Abänderungen entnommen wurde. Karlsruhe, Martin Thoma @@ -66,7 +66,7 @@ Abstract Convolutional Neural Networks (CNNs) dominate various computer vision tasks since Alex Krizhevsky showed that they can be trained effectively and reduced the top-5 error -from 26.2 % to 15.3 % on the ImageNet large scale visual recognition challenge. Many +from 26.2% to 15.3% on the ImageNet large scale visual recognition challenge. Many aspects of CNNs are examined in various publications, but literature about the analysis and construction of neural network architectures is rare. This work is one step to close this gap. A comprehensive overview over existing techniques for CNN analysis and topology @@ -86,16 +86,16 @@ Modelle welche auf Convolutional Neural Networks (CNNs) basieren sind in verschi Aufgaben der Computer Vision dominant seit Alex Krizhevsky gezeigt hat dass diese effektiv trainiert werden können und er den Top-5 Fehler in dem ImageNet large scale visual recognition challenge Benchmark von 26.2 % auf 15.3 % drücken konnte. Viele Aspekte -von CNNs wurden in verschiedenen Publikationen untersucht, aber es wurden vergleich- -sweise wenige Arbeiten über die Analyse und die Konstruktion von Neuronalen Netzen +von CNNs wurden in verschiedenen Publikationen untersucht, aber es wurden vergleichsweise +wenige Arbeiten über die Analyse und die Konstruktion von Neuronalen Netzen geschrieben. Diese Masterarbeit stellt einen Schritt dar um diese Lücke zu schließen. Eine umfassende Überblick über Analyseverfahren und Topologielernverfahren wird gegeben. Ein neues Verfahren zur Visualisierung der Klassifikationsfehler mit Konfusionsmatrizen wurde entwickelt. Basierend auf diesem Verfahren wurden hierarchische Klassifizierer eingeführt -und evaluiert. Zusätzlich wurden einige bereits in der Literatur beschriebene Beobachtun- -gen wie z.B. der positive Einfluss von kleinen Batch-Größen, Ensembles, Erhöhung der -Trainingsdatenmenge durch künstliche Transformationen (Data Augmentation) und die In- -varianzbildung durch künstliche Transformationen zur Test-Zeit (Test-time transformations) +und evaluiert. Zusätzlich wurden einige bereits in der Literatur beschriebene Beobachtungen +wie z.B. der positive Einfluss von kleinen Batch-Größen, Ensembles, Erhöhung der +Trainingsdatenmenge durch künstliche Transformationen (Data Augmentation) und die Invarianzbildung +durch künstliche Transformationen zur Test-Zeit (Test-time transformations) experimentell bestätigt. Andere Beobachtungen, wie beispielsweise der positive Einfluss gelernter Farbraumtransformationen konnten nicht bestätigt werden. Ein Modell welches weniger als eine Millionen Parameter nutzt und auf den Benchmark-Datensätzen Asirra, @@ -257,10 +257,9 @@ Computer vision is the academic field which aims to gain a high-level understand low-level information given by raw pixels from digital images. Robots, search engines, self-driving cars, surveillance agencies and many others have -applications which include one of the following six problems in computer vision as sub- -problems: +applications which include one of the following six problems in computer vision as subproblems: -• Classification:1 The algorithm is given an image and k possible classes. The task is +• Classification: 1 The algorithm is given an image and k possible classes. The task is to decide which of the k classes the image belongs to. For example, an image from a self-driving cars on-board camera contains either paved road, unpaved road or no road: Which of those given three classes is in the image? @@ -321,7 +320,7 @@ transition layers in Section 2.4 and nine ways to analyze CNNs are described in A linear image filter (also called a filter bank or a kernel) is an element F ∈ Rkw×kh×d, where kw represents the filter’s width, kh the filter’s height and d the number of input -channels. The filter F is convolved with the image I ∈ Rw×h×d to produce a new image I ′. +channels. The filter F is convolved with the image I ∈ Rw×h×d to produce a new image I′. The output image I ′ has only one channel. Each pixel I ′(x, y) of the output image gets calculated by point-wise multiplication of one filter element with one element of the original image I: @@ -361,7 +360,7 @@ output image, k2 multiplications and k2 additions of the products have to be cal One important detail is how boundaries are treated. There are four common ways of boundary treatment: -• don’t compute: The image I ′ will be smaller than the original image. I ′ ∈ +• don’t compute: The image I′ will be smaller than the original image. I′ ∈ R(w−kw+1)×(h−kh+1)×d3 , to be exact. • zero padding: The image I is padded by zeros where the filter would access elements which do not exist. This will result in edges being detected at the border if the border @@ -699,8 +698,8 @@ where ⊙ is the Hadamard product (A⊙B)i,j := (A)i,j(B)i,j Hence every value of the input gets set to zero with a dropout probability of p. Typically, -Dropout is used with p = 0.5. Layers closer to the input usually have a lower dropout prob- -ability than later layers. In order to keep the expected output at the same value, the +Dropout is used with p = 0.5. Layers closer to the input usually have a lower dropout probability +than later layers. In order to keep the expected output at the same value, the output of a dropout layer is multiplied with 1 1−p when dropout is enabled [Las17, tf-16b]. @@ -712,8 +711,8 @@ layers as it usually increases the test error as pointed out in [GG16]. Models which use Dropout can be interpreted as an ensemble of models with different numbers of neurons in each layer, but also with weight sharing. -Conceptually similar are DropConnect and networks with stochastic depth. DropCon- -nect [WZZ+13] is a generalization of Dropout, which sets weights to zero in contrast to +Conceptually similar are DropConnect and networks with stochastic depth. DropConnect +[WZZ+13] is a generalization of Dropout, which sets weights to zero in contrast to setting the output of a neuron to zero. Networks with stochastic depth as introduced in [HSL+16] dropout only complete layers. This can be done by having Residual networks which have one identity connection and one residual feature connection. Hence the residual @@ -906,8 +905,8 @@ but dense blocks have L(L+1) 2 connections between layers. The input feature maps are -concatenated in depth. According to the authors, this prevents features from being re- -learned and allows much fewer filters per convolutional layer. Where AlexNet and VGG-16 +concatenated in depth. According to the authors, this prevents features from being re-learned +and allows much fewer filters per convolutional layer. Where AlexNet and VGG-16 have several hundred filters per convolutional layer (see Tables D.2 and D.3), the authors used only on the order of 12 feature maps per layer. @@ -1106,8 +1105,8 @@ training data and loses its capability to generalize. At this point the quality the training set and the validation set diverge. While the classifier is still improving on the training set, it gets worse on the validation and the test set. -When the epoch-loss validation curve has plateaus as in Figure 2.8, this means the opti- -mization process did not improve for several epochs. Three possible ways to reduce the +When the epoch-loss validation curve has plateaus as in Figure 2.8, this means the optimization +process did not improve for several epochs. Three possible ways to reduce the problem of plateaus are (i) to change weight initialization if the plateau was at the beginning, (ii) regularizing the model or (iii) changing the optimization algorithm. @@ -1180,8 +1179,8 @@ The optimization process might also be stuck in a local minimum. • Loss being NAN might be due to too high learning rates. Another reason is division by zero or taking the logarithm of zero. In both cases, adding a small constant like 10−7 fixes the problem. -• If the loss-epoch validation curve has a plateau at the beginning, the weight initializa- -tion might be bad. +• If the loss-epoch validation curve has a plateau at the beginning, the weight initialization +might be bad. 18 @@ -1500,8 +1499,8 @@ the necessary number of input nodes and the number of output nodes which are det by the application and the features of the input. They then apply a criterion to insert new layers / neurons into the network. -In the following, Cascade-Correlation, Meiosis Networks and Automatic Structure Opti- -mization are introduced. +In the following, Cascade-Correlation, Meiosis Networks and Automatic Structure Optimization +are introduced. 3.1.1. Cascade-Correlation @@ -1515,14 +1514,10 @@ defined by the problem. Create a minimal, fully connected network for those. 2. Training: Train the network until the error no longer decreases. -3. Candidate Generation: Generate candidate nodes. Each candidate node is con- -nected to all inputs. They are not connected to other candidate nodes and not +3. Candidate Generation: Generate candidate nodes. Each candidate node is connected +to all inputs. They are not connected to other candidate nodes and not connected to the output nodes. -27 - - - 3. Topology Learning 4. Correlation Maximization: Train the weights of the candidates by maximizing S, @@ -1601,8 +1596,8 @@ layers or add skip connections. 3.1.3. Automatic Structure Optimization -Automatic Structure Optimization (ASO) was introduced in [BM93] for the task of on- -line handwriting recognition. It makes use of the confusion matrix C = (cij) ∈ Nk×k≥0 +Automatic Structure Optimization (ASO) was introduced in [BM93] for the task of online +handwriting recognition. It makes use of the confusion matrix C = (cij) ∈ Nk×k≥0 (see Section 2.5.2) to guide the topology learning. They define a confusion-symmetry matrix S with sij = sji = cij · cji. The maximum of S defines where the ASO algorithm adds more parameters. The details how the resources are added are not transferable to CNNs. @@ -1666,13 +1661,13 @@ algorithm achieves only 23.9 % accuracy [VH13]. Kocmánek shows in [Koc15] that HyperNEAT approaches can achieve 96.47 % accuracy on MNIST. Kocmánek mentions that HyperNEAT becomes slower with each hidden layer -so that not more than three hidden layers could be trained. At the same time, VGG- -19 [SZ14] already has 19 hidden layers and ResNets are successfully trained with 1202 layers +so that not more than three hidden layers could be trained. At the same time, VGG-19 +[SZ14] already has 19 hidden layers and ResNets are successfully trained with 1202 layers in [HZRS15a]. [LX17] shows that Genetic algorithms can achieve competitive results on MNIST and -SVHN, but the best results on CIFAR-10 were 7.10 % error whereas the state of the art is -at 3.74 % [HLW16]. Similarly, the Genetic algorithm achieves 29.03 % error on CIFAR-100, +SVHN, but the best results on CIFAR-10 were 7.10% error whereas the state of the art is +at 3.74 % [HLW16]. Similarly, the Genetic algorithm achieves 29.03% error on CIFAR-100, but the state of the art is 17.18 % [HLW16]. 3.4. Reinforcement Learning @@ -2203,9 +2198,9 @@ values are, the less information is lost if the filters are replaced by smaller 5. Experimental Evaluation -Figure 5.2.: Violin plots of the distribution of filter weights of a baseline model trained on CIFAR- -100. The weights of the first layer are relatively evenly spread in the interval [−0.4,+0.4]. -With every layer the interval which contains 95 % of the weights and is centered around +Figure 5.2.: Violin plots of the distribution of filter weights of a baseline model trained on CIFAR-100. +The weights of the first layer are relatively evenly spread in the interval [−0.4,+0.4]. +With every layer the interval which contains 95% of the weights and is centered around the mean becomes smaller, especially with layer 11 where the feature maps are of size 1× 1. In contrast to the other layers, the last convolutional layer has a bimodal distribution. @@ -2524,8 +2519,8 @@ wardrobe + dinosaur + lizard + snake, worm + turtle 9 crocodile, lizard, lobster, cater- -pillar + dinosaur + snake + tur- -tle, crab +pillar + dinosaur + snake + turtle, +crab 6 @@ -2585,9 +2580,9 @@ be due to limited training data, overfitting or the small size of 32 px× 32 px The experiment also shows that most of the errors are due to not identifying the correct cluster. Hence, in this case, more work in improving the root classifier is necessary rather than improving the discrimination of classes within a cluster. -Although the classes within a cluster capture most of the classifications, many misclassifica- -tions happen outside of the clusters. For example, in cluster 3, a perfect leaf classifier would -push the accuracy in the full column only to 63.50 % due to errors of the root classifier +Although the classes within a cluster capture most of the classifications, many misclassifications +happen outside of the clusters. For example, in cluster 3, a perfect leaf classifier would +push the accuracy in the full column only to 63.50% due to errors of the root classifier where the root classifier does not predict the correct cluster. The leaf classifiers use the same topology as the root classifier. By initializing them with the root classifiers weights their performance can be pushed at about the inner accuracy. @@ -2919,12 +2914,12 @@ of the Batch Normalization layers did not noticeably change. 5.11. Learned Color Space Transformation -In [MSM16] it is described that placing one convolutional layer with 10 filters of size 1× 1 -directly after the input and then another convolutional layer with 3 filters of size 1× 1 acts +In [MSM16] it is described that placing one convolutional layer with 10 filters of size 1×1 +directly after the input and then another convolutional layer with 3 filters of size 1×1 acts as a learned transformation in another color space and boosts the accuracy. -This approach was evaluated on CIFAR-100 by adding a convolutional layer with ELU ac- -tivation and 10 filters followed by another convolutional layer with ELU activation and +This approach was evaluated on CIFAR-100 by adding a convolutional layer with ELU activation +and 10 filters followed by another convolutional layer with ELU activation and 3 filters. The mean accuracy of 10 models was 63.31 % with a standard deviation of 1.37. The standard deviation is noticeable higher than the standard deviation of the baseline model (0.55) and the accuracy also decreased by 0.07 percentage points. The accuracy of @@ -2938,11 +2933,11 @@ Hence it is not advisable to use the learned color space transformation. 5.12. Pooling -An alternative to max pooling with stride 2 with a 2× 2 kernel is using a 3× 3 kernel with +An alternative to max pooling with stride 2 with a 2×2 kernel is using a 3×3 kernel with stride 2. This approach was evaluated on CIFAR-100 by replacing all max pooling layers with the -3× 3 kernel max pooling (and SAME padding). The mean accuracy of 10 models was 63.32 % +3×3 kernel max pooling (and SAME padding). The mean accuracy of 10 models was 63.32 % (−0.06) and the standard deviation was 0.57 (+0.02). The ensemble achieved 65.15 % test accuracy (+0.45). @@ -2970,8 +2965,8 @@ other comparisons of eleven activation functions are given in Table B.3. Theoretical explanations why one activation function is preferable to another in some scenarios are the following: -• Vanishing Gradient: Activation functions like tanh and the logistic function sat- -urate outside of the interval [−5, 5]. This means weight updates are very small for +• Vanishing Gradient: Activation functions like tanh and the logistic function saturate +outside of the interval [−5, 5]. This means weight updates are very small for preceding neurons, which is especially a problem for very deep or recurrent networks as described in [BSF94]. Even if the neurons learn eventually, learning is slower [KSH12]. @@ -2995,7 +2990,7 @@ As expected, PReLU and ELU performed best. Unexpected was that the logistic func tanh and softplus performed worse than the identity and it is unclear why the pure-softmax network performed so much better than the logistic function. One hypothesis why the logistic function performs so bad is that it cannot produce negative outputs. Hence the -logistic− function was developed: +logistic−function was developed: logistic−(x) = 1 @@ -3487,10 +3482,10 @@ algorithm in Chapter 4 and evaluated in Sections 4.2 and 5.4. The important insi • Ordering the classes in the confusion matrix allows to display the relevant parts even for several hundred classes. -• A hierarchy of classifiers based on the classes does not improve the results on CIFAR- -100. There are three possible reasons for this: +• A hierarchy of classifiers based on the classes does not improve the results on CIFAR-100. + There are three possible reasons for this: -– 32 px× 32 px is too low dimensional +– 32 px × 32 px is too low dimensional – 100 classes are not enough for this approach @@ -4247,8 +4242,8 @@ used. D. Common Architectures -In the following, some of the most important CNN architectures are explained. Understand- -ing the development of these architectures helps understanding critical insights the machine +In the following, some of the most important CNN architectures are explained. Understanding +the development of these architectures helps understanding critical insights the machine learning community got in the past years for convolutional networks for image recognition. It starts with LeNet-5 from 1998, continues with AlexNet from 2012, VGG-16 D from @@ -4303,8 +4298,8 @@ than fully connected layers. D.2. AlexNet The first CNN which achieved major improvements on the ImageNet dataset was AlexNet [KSH12]. -Its architecture is shown in Figure D.2 and described in Table D.2. It has about 60·106 param- -eters. A trained AlexNet can be downloaded at www.cs.toronto.edu/g̃uerzhoy/tf_alexnet. +Its architecture is shown in Figure D.2 and described in Table D.2. It has about 60·106 parameters. +A trained AlexNet can be downloaded at www.cs.toronto.edu/g̃uerzhoy/tf_alexnet. Note that the uncompressed size is at least 60 965 224 floats · 32 bit float @@ -4777,9 +4772,8 @@ gradient descent,” in Advances in Neural Information Processing Systems 29 learning-to-learn-by-gradient-descent-by-gradient-descent.pdf [AM15] M. T. Alexander Mordvintsev, Christopher Olah, “Inceptionism: -Going deeper into neural networks,” Jun. 2015. [Online]. Avail- -able: https://research.googleblog.com/2015/06/inceptionism-going-deeper- -into-neural.html +Going deeper into neural networks,” Jun. 2015. [Online]. Available: +https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html [Asi17] “Kaggle cats and dogs dataset,” Oct. 2017. [Online]. Available: https: //www.microsoft.com/en-us/download/details.aspx?id=54765 @@ -4801,21 +4795,6 @@ Université de Montréal, Tech. Rep. 1337, 2009. reinforcement learning,” arXiv preprint arXiv:1611.02167, Nov. 2016. [Online]. Available: https://arxiv.org/abs/1611.02167 -103 - -https://arxiv.org/abs/1603.04467 -http://papers.nips.cc/paper/6461-learning-to-learn-by-gradient-descent-by-gradient-descent.pdf -http://papers.nips.cc/paper/6461-learning-to-learn-by-gradient-descent-by-gradient-descent.pdf -https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html -https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html -https://www.microsoft.com/en-us/download/details.aspx?id=54765 -https://www.microsoft.com/en-us/download/details.aspx?id=54765 -http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf -http://jmlr.csail.mit.edu/papers/volume13/bergstra12a/bergstra12a.pdf -https://arxiv.org/abs/1703.10155 -https://arxiv.org/abs/1611.02167 - - [BM93] U. Bodenhausen and S. Manke, Automatically Structured Neural Networks For Handwritten Character And Word Recognition. London: Springer London, Sep. 1993, pp. 956–961. [Online]. Available: http: @@ -4863,28 +4842,11 @@ preprint arXiv:1511.07289, Nov. 2015. [Online]. Available: https: learning,” arXiv preprint arXiv:1410.0759, Oct. 2014. [Online]. Available: https://arxiv.org/abs/1410.0759 -104 - -http://dx.doi.org/10.1007/978-1-4471-2063-6_283 -http://dx.doi.org/10.1007/978-1-4471-2063-6_283 -http://yann.lecun.com/exdb/publis/pdf/boureau-icml-10.pdf -http://ieeexplore.ieee.org/document/143326/ -https://github.com/fchollet/keras -http://cs.stanford.edu/~acoates/papers/coatesleeng_aistats_2011.pdf -http://cs.stanford.edu/~acoates/papers/coatesleeng_aistats_2011.pdf -http://cs.stanford.edu/~acoates/stl10 -https://arxiv.org/abs/1202.2745v1 -https://arxiv.org/abs/1511.07289 -https://arxiv.org/abs/1511.07289 -https://arxiv.org/abs/1410.0759 - - [DBB+01] C. Dugas, Y. Bengio et al., “Incorporating second-order functional -knowledge for better option pricing,” in Advances in Neural Infor- -mation Processing Systems 13 (NIPS), T. K. Leen, T. G. Dietterich, +knowledge for better option pricing,” in Advances in Neural Information +Processing Systems 13 (NIPS), T. K. Leen, T. G. Dietterich, and V. Tresp, Eds. MIT Press, 2001, pp. 472–478. [Online]. -Available: http://papers.nips.cc/paper/1920-incorporating-second-order- -functional-knowledge-for-better-option-pricing.pdf +Available: http://papers.nips.cc/paper/1920-incorporating-second-order-functional-knowledge-for-better-option-pricing.pdf [DDFK16] S. Dieleman, J. De Fauw, and K. Kavukcuoglu, “Exploiting cyclic symmetry in convolutional neural networks,” arXiv preprint arXiv:1602.02660, Feb. @@ -4923,23 +4885,7 @@ royal astronomical society, vol. 450, no. 2, pp. 1441–1459, 2015. exploits interest-aligned manual image categorization,” in ACM Con- ference on Computer and Communications Security (CCS), no. 14. Association for Computing Machinery, Inc., Oct. 2007. [Online]. - -105 - -http://papers.nips.cc/paper/1920-incorporating-second-order-functional-knowledge-for-better-option-pricing.pdf -http://papers.nips.cc/paper/1920-incorporating-second-order-functional-knowledge-for-better-option-pricing.pdf -https://arxiv.org/abs/1602.02660 -http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf -https://arxiv.org/abs/1512.04412 -ftp://ftp.icsi.berkeley.edu/pub/ai/jagota/vol2_6.pdf -http://cs229.stanford.edu/proj2015/054_report.pdf -http://cs229.stanford.edu/proj2015/054_report.pdf -http://papers.nips.cc/paper/5548-discriminative-unsupervised-feature-learning-with-convolutional-neural-networks.pdf -http://papers.nips.cc/paper/5548-discriminative-unsupervised-feature-learning-with-convolutional-neural-networks.pdf - - -Available: https://www.microsoft.com/en-us/research/publication/asirra-a- -captcha-that-exploits-interest-aligned-manual-image-categorization/ +Available: https://www.microsoft.com/en-us/research/publication/asirra-a-captcha-that-exploits-interest-aligned-manual-image-categorization/ [EKS+96] M. Ester, H.-P. Kriegel et al., “A density-based algorithm for discovering clusters in large spatial databases with noise.” in Kdd, vol. 96, no. 34, 1996, @@ -4961,8 +4907,8 @@ vol. 28, no. 4, pp. 594–611, Apr. 2006. [Online]. Available: http: [FFP03] R. F. Fei-Fei and P. Perona, “Caltech 101,” 2003. [Online]. Available: http: //www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html -[FGMR10] P. F. Felzenszwalb, R. B. Girshick et al., “Object detection with discrimina- -tively trained part-based models,” IEEE transactions on pattern analysis and +[FGMR10] P. F. Felzenszwalb, R. B. Girshick et al., “Object detection with discriminatively +trained part-based models,” IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 9, pp. 1627–1645, 2010. [FL89] S. E. Fahlman and C. Lebiere, “The cascade-correlation learning architecture,” @@ -5050,22 +4996,6 @@ preprint arXiv:1611.04231, Nov. 2016. [Online]. Available: https: based image classification,” arXiv preprint arXiv:1312.5402, Dec. 2013. [Online]. Available: https://arxiv.org/abs/1312.5402 -107 - -https://arxiv.org/abs/1506.02158v6 -https://arxiv.org/abs/1412.6071 -http://www.vision.caltech.edu/Image_Datasets/Caltech256/ -http://www.jmlr.org/proceedings/papers/v28/goodfellow13.pdf -http://www.jmlr.org/proceedings/papers/v28/goodfellow13.pdf -https://arxiv.org/abs/1608.08614 -http://papers.nips.cc/paper/227-meiosis-networks.pdf -https://devblogs.nvidia.com/parallelforall/new-features-cuda-7-5/ -https://arxiv.org/abs/1608.06993v1 -https://arxiv.org/abs/1611.04231 -https://arxiv.org/abs/1611.04231 -https://arxiv.org/abs/1312.5402 - - [HPK11] J. Han, J. Pei, and M. Kamber, Data mining: concepts and techniques. Elsevier, 2011. @@ -5112,23 +5042,6 @@ https://arxiv.org/abs/1502.01852 [Ima12] “Imagenet large scale visual recognition challenge 2012 (ILSVRC2012),” -108 - -https://arxiv.org/abs/1607.04381 -http://papers.nips.cc/paper/5784-learning-both-weights-and-connections-for-efficient-neural-network.pdf -http://papers.nips.cc/paper/5784-learning-both-weights-and-connections-for-efficient-neural-network.pdf -https://arxiv.org/abs/1207.0580 -https://arxiv.org/abs/1603.09382 -https://arxiv.org/abs/1603.09382 -http://ee.caltech.edu/Babak/pubs/conferences/00298572.pdf -http://ee.caltech.edu/Babak/pubs/conferences/00298572.pdf -https://arxiv.org/abs/1503.02531 -https://arxiv.org/abs/1406.4729 -https://arxiv.org/abs/1512.03385v1 -https://arxiv.org/abs/1512.03385v1 -https://arxiv.org/abs/1502.01852 - - 2012. [Online]. Available: http://www.image-net.org/challenges/LSVRC/ 2012/nonpub-downloads @@ -5175,26 +5088,6 @@ and neural network approximation,” IEEE Transactions on Information Theory, vol. 48, no. 1, pp. 264–275, Jan. 2002. [Online]. Available: http://ieeexplore.ieee.org/abstract/document/971754/ -109 - -http://www.image-net.org/challenges/LSVRC/2012/nonpub-downloads -http://www.image-net.org/challenges/LSVRC/2012/nonpub-downloads -https://arxiv.org/abs/1502.03167 -https://arxiv.org/abs/1512.07030 -http://karpathy.github.io/2011/04/27/manually-classifying-cifar10/ -http://karpathy.github.io/2011/04/27/manually-classifying-cifar10/ -https://arxiv.org/abs/1412.6980 -https://arxiv.org/abs/1412.6980 -https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf -https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf -https://arxiv.org/abs/1609.04836 -http://kocmi.tk/photos/DiplomaThesis.pdf -https://arxiv.org/abs/1511.06530 -https://www.cs.toronto.edu/~kriz/cifar.html -https://www.cs.toronto.edu/~kriz/cifar.html -http://ieeexplore.ieee.org/abstract/document/971754/ - - [KSH12] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems 25 (NIPS), F. Pereira, C. J. C. Burges @@ -5240,25 +5133,6 @@ processing. IEEE, 2013, pp. 8595–8598. [Online]. Available: http: [LG16] A. Lavin and S. Gray, “Fast algorithms for convolutional neural networks,” in -110 - -http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf -http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf -http://papers.nips.cc/paper/4133-learning-convolutional-feature-hierarchies-for-visual-recognition.pdf -http://papers.nips.cc/paper/4133-learning-convolutional-feature-hierarchies-for-visual-recognition.pdf -https://arxiv.org/abs/1512.02325 -http://lasagne.readthedocs.io/en/latest/modules/layers/noise.html#lasagne.layers.DropoutLayer -http://lasagne.readthedocs.io/en/latest/modules/layers/noise.html#lasagne.layers.DropoutLayer -http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf -http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf -http://www.nature.com/nature/journal/v521/n7553/abs/nature14539.html -http://dx.doi.org/10.1007/3-540-49430-8 -http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf -http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf -http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6639343 -http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6639343 - - Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Sep. 2016, pp. 4013–4021. [Online]. Available: https://arxiv.org/abs/1509.09308 @@ -5306,22 +5180,6 @@ relu_hybrid_icml2013_final.pdf [MM15] D. Mishkin and J. Matas, “All you need is a good init,” arXiv -111 - -https://arxiv.org/abs/1509.09308 -https://arxiv.org/abs/1509.08985v2 -https://arxiv.org/abs/1608.03983 -https://arxiv.org/abs/1608.03983 -https://arxiv.org/abs/1603.06560 -https://arxiv.org/abs/1606.01885 -https://arxiv.org/abs/1411.4038v2 -https://arxiv.org/abs/1703.01513 -https://github.com/titu1994/DenseNet -http://lear.inrialpes.fr/people/marszalek/data/ig02/ -https://web.stanford.edu/~awni/papers/relu_hybrid_icml2013_final.pdf -https://web.stanford.edu/~awni/papers/relu_hybrid_icml2013_final.pdf - - preprint arXiv:1511.06422, Nov. 2015. [Online]. Available: https: //arxiv.org/abs/1511.06422 @@ -5369,21 +5227,6 @@ weight-sharing,” Neural computation, vol. 4, no. 4, pp. 473–493, 1992. [NH02] R. T. Ng and J. Han, “CLARANS: A method for clustering objects for spatial -112 - -https://arxiv.org/abs/1511.06422 -https://arxiv.org/abs/1511.06422 -http://ieeexplore.ieee.org/abstract/document/7301739/ -http://ieeexplore.ieee.org/abstract/document/7301739/ -http://ieeexplore.ieee.org/document/4270110/ -http://ieeexplore.ieee.org/document/4270110/ -https://arxiv.org/abs/1606.02228 -https://arxiv.org/abs/1512.02017 -http://papers.nips.cc/paper/5073-learning-with-noisy-labels.pdf -http://www1.icsi.berkeley.edu/Speech/faq/nn-train.html -https://www.cs.toronto.edu/~hinton/absps/sunspots.pdf - - data mining,” IEEE transactions on knowledge and data engineering, vol. 14, no. 5, pp. 1003–1016, 2002. @@ -5430,20 +5273,6 @@ evolutionary computation, no. 12. ACM, 2010, pp. 563–570. Explaining the predictions of any classifier,” arXiv preprint arXiv:1602.04938, Feb. 2016. [Online]. Available: https://arxiv.org/abs/1602.04938 -113 - -http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf -http://ufldl.stanford.edu/housenumbers/ -https://arxiv.org/abs/1602.03616 -https://arxiv.org/abs/1608.08984 -https://arxiv.org/abs/1511.04508 -http://dx.doi.org/10.1007/3-540-49430-8_3 -http://dx.doi.org/10.1007/3-540-49430-8_3 -https://arxiv.org/abs/1409.0575 -https://arxiv.org/abs/1505.04597 -https://arxiv.org/abs/1602.04938 - - [Rud16] S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint arXiv:1609.04747, Sep. 2016. [Online]. Available: https: //arxiv.org/abs/1609.04747 @@ -5490,22 +5319,6 @@ on Computer Vision and Pattern Recognition (CVPR). IEEE, Sep. 2015, pp. 1–9. [Online]. Available: https://arxiv.org/abs/1409.4842 [SM02] K. O. Stanley and R. Miikkulainen, “Evolving neural networks through - -114 - -https://arxiv.org/abs/1609.04747 -https://arxiv.org/abs/1609.04747 -https://arxiv.org/abs/1204.3968 -http://ieeexplore.ieee.org/document/6792316/ -https://arxiv.org/abs/1312.6229v4 -https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf -http://ieeexplore.ieee.org/document/6638963/?arnumber=6638963 -https://arxiv.org/abs/1602.07261 -https://arxiv.org/abs/1503.03832 -http://ieeexplore.ieee.org/document/6033589/ -https://arxiv.org/abs/1409.4842 - - augmenting topologies,” Evolutionary computation, vol. 10, no. 2, pp. 99–127, 2002. [Online]. Available: http://www.mitpressjournals.org/doi/abs/10.1162/ 106365602320169811 @@ -5551,27 +5364,6 @@ https://arxiv.org/abs/1312.6199v4 [TF-16a] “MNIST for ML beginners,” Dec. 2016. [Online]. Available: https: //www.tensorflow.org/tutorials/mnist/beginners/ -115 - -http://www.mitpressjournals.org/doi/abs/10.1162/106365602320169811 -http://www.mitpressjournals.org/doi/abs/10.1162/106365602320169811 -https://arxiv.org/abs/1312.6120 -https://arxiv.org/abs/1312.6120 -https://arxiv.org/abs/1410.1165 -http://benchmark.ini.rub.de/?section=gtsrb&subsection=news -http://benchmark.ini.rub.de/?section=gtsrb&subsection=news -http://www.sciencedirect.com/science/article/pii/S0893608012000457 -http://www.sciencedirect.com/science/article/pii/S0893608012000457 -https://arxiv.org/abs/1606.02492 -https://arxiv.org/abs/1512.00567v3 -https://arxiv.org/abs/1312.6034 -https://arxiv.org/abs/1312.6034 -https://arxiv.org/abs/1409.1556 -https://arxiv.org/abs/1312.6199v4 -https://www.tensorflow.org/tutorials/mnist/beginners/ -https://www.tensorflow.org/tutorials/mnist/beginners/ - - [tf-16b] “tf.nn.dropout,” Dec. 2016. [Online]. Available: https://www.tensorflow.org/ api_docs/python/nn/activation_functions_#dropout @@ -5618,24 +5410,6 @@ http://ieeexplore.ieee.org/document/21701/ tionist reinforcement learning,” Machine learning, vol. 8, no. 3-4, pp. 229–256, 1992. -116 - -https://www.tensorflow.org/api_docs/python/nn/activation_functions_#dropout -https://www.tensorflow.org/api_docs/python/nn/activation_functions_#dropout -http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf -http://martin-thoma.com/write-math -http://martin-thoma.com/write-math -https://martin-thoma.com/twiddle/ -https://arxiv.org/abs/1602.06541 -https://arxiv.org/abs/1602.06541 -https://arxiv.org/abs/1701.08380 -https://martin-thoma.com/msthesis -https://arxiv.org/abs/1312.5355 -http://dx.doi.org/10.1007/978-94-015-7744-1_2 -https://arxiv.org/abs/1702.00071 -http://ieeexplore.ieee.org/document/21701/ - - [WWQ13] X. Wang, L. Wang, and Y. Qiao, A Comparative Study of Encoding, Pooling and Normalization Methods for Action Recognition. Berlin, Heidelberg: Springer Berlin Heidelberg, Nov. 2013, no. 11, pp. 572–585. [Online]. @@ -5683,22 +5457,6 @@ M. Sugiyama et al., Eds. Curran Associates, Inc., Oct. 2016, pp. 1082–1090. [Online]. Available: http://papers.nips.cc/paper/6340-doubly-convolutional- neural-networks.pdf -117 - -http://dx.doi.org/10.1007/978-3-642-37431-9_44 -https://arxiv.org/abs/1501.02876v4 -http://www.matthewzeiler.com/pubs/icml2013/icml2013.pdf -http://www.matthewzeiler.com/pubs/icml2013/icml2013.pdf -https://arxiv.org/abs/1611.05431v1 -https://arxiv.org/abs/1107.2490 -https://arxiv.org/abs/1505.00853 -https://www.sec.in.tum.de/assets/Uploads/ecai2.pdf -http://yann.lecun.com/exdb/mnist/ -https://arxiv.org/abs/1611.03530 -http://papers.nips.cc/paper/6340-doubly-convolutional-neural-networks.pdf -http://papers.nips.cc/paper/6340-doubly-convolutional-neural-networks.pdf - - [ZDGD14] N. Zhang, J. Donahue et al., “Part-based R-CNNs for fine-grained category detection,” in European Conference on Computer Vision (ECCV). Springer, Jul. 2014, pp. 834–849. [Online]. Available: https://arxiv.org/abs/1407.3867 @@ -5742,24 +5500,6 @@ arXiv preprint arXiv:1506.02351, Jun. 2015. [Online]. Available: https: units,” in International Joint Conference on Neural Networks (IJCNN), Jul. 2015, pp. 1–4. -118 - -https://arxiv.org/abs/1407.3867 -https://arxiv.org/abs/1212.5701v1 -https://arxiv.org/abs/1212.5701v1 -https://arxiv.org/abs/1301.3557v1 -https://arxiv.org/abs/1311.2901 -http://places2.csail.mit.edu/download.html -http://places2.csail.mit.edu/download.html -https://arxiv.org/abs/1605.07146 -https://arxiv.org/abs/1605.07146 -https://arxiv.org/abs/1512.04150 -https://arxiv.org/abs/1610.02055 -https://arxiv.org/abs/1611.01578 -https://arxiv.org/abs/1506.02351v1 -https://arxiv.org/abs/1506.02351v1 - - I. Glossary ANN artificial neural network. 4 @@ -5801,9 +5541,6 @@ NEAT NeuroEvolution of Augmenting Topologies. 83 OBD Optimal Brain Damage. 29 -119 - - PCA principal component analysis. 79 @@ -5814,5 +5551,3 @@ ReLU rectified linear unit. 5, 13, 60, 61, 63, 64, 72, 77, 78, 84 SGD stochastic gradient descent. 5, 30, 45, 46, 82 ZCA Zero Components Analysis. 79 - -120 diff --git a/read/extraction-ground-truth/2201.00021.txt b/read/extraction-ground-truth/2201.00021.txt index 2d55922..804b34e 100644 --- a/read/extraction-ground-truth/2201.00021.txt +++ b/read/extraction-ground-truth/2201.00021.txt @@ -54,8 +54,8 @@ regarded as a reliable thermometer of molecular clouds (e.g., Walmsley & Ungerechts 1983; Danby et al. 1988), ammonia masers have attracted attention since the first detection of maser action in the (J,K) = (3,3) metastable (J = K) line toward the -massive star-forming region W33 (Wilson et al. 1982). Subse- -quent observations have led to the detection of new metastable +massive star-forming region W33 (Wilson et al. 1982). Subsequent +observations have led to the detection of new metastable ammonia masers, including 15NH3 (3,3) (Mauersberger et al. 1986), NH3 (1,1) (Gaume et al. 1996), NH3 (2,2) (Mills et al. 2018), NH3 (5,5) (Cesaroni et al. 1992), NH3 (6,6) (Beuther @@ -117,8 +117,8 @@ J = 6 (e.g., Danby et al. 1988). NH3 (9,6) masers are found to be strongly variable, similar to H2O masers (Madden et al. 1986; Pratap et al. 1991; Henkel et al. 2013). In W51-IRS2, Henkel et al. (2013) found that the (9,6) -line showed significant variation in line shape within a time in- -terval of only two days. Mapping of the (9,6) maser toward W51 +line showed significant variation in line shape within a time interval +of only two days. Mapping of the (9,6) maser toward W51 with very long baseline interferometry (VLBI) suggests that the masers are closer to the H2O masers than to the OH masers or to ultracompact (UC) H ii regions (Pratap et al. 1991). While