results_old.json

[
    {
        "key": "I49GNHEB",
        "version": 573,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/I49GNHEB",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/I49GNHEB",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Karras et al.",
            "parsedDate": "2020-03-23",
            "numChildren": 2
        },
        "data": {
            "key": "I49GNHEB",
            "version": 573,
            "itemType": "journalArticle",
            "title": "Analyzing and Improving the Image Quality of StyleGAN",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Tero",
                    "lastName": "Karras"
                },
                {
                    "creatorType": "author",
                    "firstName": "Samuli",
                    "lastName": "Laine"
                },
                {
                    "creatorType": "author",
                    "firstName": "Miika",
                    "lastName": "Aittala"
                },
                {
                    "creatorType": "author",
                    "firstName": "Janne",
                    "lastName": "Hellsten"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jaakko",
                    "lastName": "Lehtinen"
                },
                {
                    "creatorType": "author",
                    "firstName": "Timo",
                    "lastName": "Aila"
                }
            ],
            "abstractNote": "The style-based GAN architecture (StyleGAN) yields state-of-the-art results in data-driven unconditional generative image modeling. We expose and analyze several of its characteristic artifacts, and propose changes in both model architecture and training methods to address them. In particular, we redesign the generator normalization, revisit progressive growing, and regularize the generator to encourage good conditioning in the mapping from latent codes to images. In addition to improving image quality, this path length regularizer yields the additional benefit that the generator becomes significantly easier to invert. This makes it possible to reliably attribute a generated image to a particular network. We furthermore visualize how well the generator utilizes its output resolution, and identify a capacity problem, motivating us to train larger models for additional quality improvements. Overall, our improved model redefines the state of the art in unconditional image modeling, both in terms of existing distribution quality metrics as well as perceived image quality.",
            "publicationTitle": "arXiv:1912.04958 [cs, eess, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-03-23",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/1912.04958",
            "accessDate": "2021-09-27T11:21:13Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 1912.04958",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Neural and Evolutionary Computing",
                    "type": 1
                },
                {
                    "tag": "Electrical Engineering and Systems Science - Image and Video Processing",
                    "type": 1
                },
                {
                    "tag": "Statistics - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-09-27T11:21:13Z",
            "dateModified": "2021-09-27T11:23:19Z"
        }
    },
    {
        "key": "56VXC5I9",
        "version": 568,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/56VXC5I9",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/56VXC5I9",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Vahdat et al.",
            "parsedDate": "2021-06-10",
            "numChildren": 2
        },
        "data": {
            "key": "56VXC5I9",
            "version": 568,
            "itemType": "journalArticle",
            "title": "Score-based Generative Modeling in Latent Space",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Arash",
                    "lastName": "Vahdat"
                },
                {
                    "creatorType": "author",
                    "firstName": "Karsten",
                    "lastName": "Kreis"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jan",
                    "lastName": "Kautz"
                }
            ],
            "abstractNote": "Score-based generative models (SGMs) have recently demonstrated impressive results in terms of both sample quality and distribution coverage. However, they are usually applied directly in data space and often require thousands of network evaluations for sampling. Here, we propose the Latent Score-based Generative Model (LSGM), a novel approach that trains SGMs in a latent space, relying on the variational autoencoder framework. Moving from data to latent space allows us to train more expressive generative models, apply SGMs to non-continuous data, and learn smoother SGMs in a smaller space, resulting in fewer network evaluations and faster sampling. To enable training LSGMs end-to-end in a scalable and stable manner, we (i) introduce a new score-matching objective suitable to the LSGM setting, (ii) propose a novel parameterization of the score function that allows SGM to focus on the mismatch of the target distribution with respect to a simple Normal one, and (iii) analytically derive multiple techniques for variance reduction of the training objective. LSGM obtains a state-of-the-art FID score of 2.10 on CIFAR-10, outperforming all existing generative results on this dataset. On CelebA-HQ-256, LSGM is on a par with previous SGMs in sample quality while outperforming them in sampling time by two orders of magnitude. In modeling binary images, LSGM achieves state-of-the-art likelihood on the binarized OMNIGLOT dataset.",
            "publicationTitle": "arXiv:2106.05931 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-06-10",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2106.05931",
            "accessDate": "2021-09-27T09:13:45Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2106.05931",
            "tags": [
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Statistics - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {},
            "dateAdded": "2021-09-27T09:13:45Z",
            "dateModified": "2021-09-27T09:13:46Z"
        }
    },
    {
        "key": "2V993BEV",
        "version": 572,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/2V993BEV",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/2V993BEV",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Federici et al.",
            "parsedDate": "2020-02-18",
            "numChildren": 2
        },
        "data": {
            "key": "2V993BEV",
            "version": 572,
            "itemType": "journalArticle",
            "title": "Learning Robust Representations via Multi-View Information Bottleneck",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Marco",
                    "lastName": "Federici"
                },
                {
                    "creatorType": "author",
                    "firstName": "Anjan",
                    "lastName": "Dutta"
                },
                {
                    "creatorType": "author",
                    "firstName": "Patrick",
                    "lastName": "Forré"
                },
                {
                    "creatorType": "author",
                    "firstName": "Nate",
                    "lastName": "Kushman"
                },
                {
                    "creatorType": "author",
                    "firstName": "Zeynep",
                    "lastName": "Akata"
                }
            ],
            "abstractNote": "The information bottleneck principle provides an information-theoretic method for representation learning, by training an encoder to retain all information which is relevant for predicting the label while minimizing the amount of other, excess information in the representation. The original formulation, however, requires labeled data to identify the superfluous information. In this work, we extend this ability to the multi-view unsupervised setting, where two views of the same underlying entity are provided but the label is unknown. This enables us to identify superfluous information as that not shared by both views. A theoretical analysis leads to the definition of a new multi-view model that produces state-of-the-art results on the Sketchy dataset and label-limited versions of the MIR-Flickr dataset. We also extend our theory to the single-view setting by taking advantage of standard data augmentation techniques, empirically showing better generalization capabilities when compared to common unsupervised approaches for representation learning.",
            "publicationTitle": "arXiv:2002.07017 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-02-18",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2002.07017",
            "accessDate": "2021-09-27T08:47:30Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2002.07017",
            "tags": [
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Statistics - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/HT2HBU8C"
            },
            "dateAdded": "2021-09-27T08:47:31Z",
            "dateModified": "2021-09-27T08:47:31Z"
        }
    },
    {
        "key": "V8LURJYZ",
        "version": 562,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/V8LURJYZ",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/V8LURJYZ",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Tishby et al.",
            "parsedDate": "2000-04-24",
            "numChildren": 2
        },
        "data": {
            "key": "V8LURJYZ",
            "version": 562,
            "itemType": "journalArticle",
            "title": "The information bottleneck method",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Naftali",
                    "lastName": "Tishby"
                },
                {
                    "creatorType": "author",
                    "firstName": "Fernando C.",
                    "lastName": "Pereira"
                },
                {
                    "creatorType": "author",
                    "firstName": "William",
                    "lastName": "Bialek"
                }
            ],
            "abstractNote": "We define the relevant information in a signal $x\\in X$ as being the information that this signal provides about another signal $y\\in \\Y$. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. Understanding the signal $x$ requires more than just predicting $y$, it also requires specifying which features of $\\X$ play a role in the prediction. We formalize this problem as that of finding a short code for $\\X$ that preserves the maximum information about $\\Y$. That is, we squeeze the information that $\\X$ provides about $\\Y$ through a `bottleneck' formed by a limited set of codewords $\\tX$. This constrained optimization problem can be seen as a generalization of rate distortion theory in which the distortion measure $d(x,\\x)$ emerges from the joint statistics of $\\X$ and $\\Y$. This approach yields an exact set of self consistent equations for the coding rules $X \\to \\tX$ and $\\tX \\to \\Y$. Solutions to these equations can be found by a convergent re-estimation method that generalizes the Blahut-Arimoto algorithm. Our variational principle provides a surprisingly rich framework for discussing a variety of problems in signal processing and learning, as will be described in detail elsewhere.",
            "publicationTitle": "arXiv:physics/0004057",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2000-04-24",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/physics/0004057",
            "accessDate": "2021-09-27T07:54:33Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: physics/0004057",
            "tags": [
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Condensed Matter - Disordered Systems and Neural Networks",
                    "type": 1
                },
                {
                    "tag": "Nonlinear Sciences - Adaptation and Self-Organizing Systems",
                    "type": 1
                },
                {
                    "tag": "Physics - Data Analysis, Statistics and Probability",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-09-27T07:54:33Z",
            "dateModified": "2021-09-27T07:54:33Z"
        }
    },
    {
        "key": "RT7KAHYC",
        "version": 572,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/RT7KAHYC",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/RT7KAHYC",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Tian et al.",
            "parsedDate": "2021-04-06",
            "numChildren": 2
        },
        "data": {
            "key": "RT7KAHYC",
            "version": 572,
            "itemType": "journalArticle",
            "title": "Farewell to Mutual Information: Variational Distillation for Cross-Modal Person Re-Identification",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Xudong",
                    "lastName": "Tian"
                },
                {
                    "creatorType": "author",
                    "firstName": "Zhizhong",
                    "lastName": "Zhang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Shaohui",
                    "lastName": "Lin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yanyun",
                    "lastName": "Qu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yuan",
                    "lastName": "Xie"
                },
                {
                    "creatorType": "author",
                    "firstName": "Lizhuang",
                    "lastName": "Ma"
                }
            ],
            "abstractNote": "The Information Bottleneck (IB) provides an information theoretic principle for representation learning, by retaining all information relevant for predicting label while minimizing the redundancy. Though IB principle has been applied to a wide range of applications, its optimization remains a challenging problem which heavily relies on the accurate estimation of mutual information. In this paper, we present a new strategy, Variational Self-Distillation (VSD), which provides a scalable, flexible and analytic solution to essentially fitting the mutual information but without explicitly estimating it. Under rigorously theoretical guarantee, VSD enables the IB to grasp the intrinsic correlation between representation and label for supervised training. Furthermore, by extending VSD to multi-view learning, we introduce two other strategies, Variational Cross-Distillation (VCD) and Variational Mutual-Learning (VML), which significantly improve the robustness of representation to view-changes by eliminating view-specific and task-irrelevant information. To verify our theoretically grounded strategies, we apply our approaches to cross-modal person Re-ID, and conduct extensive experiments, where the superior performance against state-of-the-art methods are demonstrated. Our intriguing findings highlight the need to rethink the way to estimate mutual",
            "publicationTitle": "arXiv:2104.02862 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-06",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "Farewell to Mutual Information",
            "url": "http://arxiv.org/abs/2104.02862",
            "accessDate": "2021-09-27T07:36:49Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2104.02862",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/KVPWSBSJ"
            },
            "dateAdded": "2021-09-27T07:36:49Z",
            "dateModified": "2021-09-27T07:37:26Z"
        }
    },
    {
        "key": "2ZXDF7SP",
        "version": 554,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/2ZXDF7SP",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/2ZXDF7SP",
                "type": "text/html"
            }
        },
        "meta": {
            "numChildren": 0
        },
        "data": {
            "key": "2ZXDF7SP",
            "version": 554,
            "itemType": "attachment",
            "linkMode": "imported_url",
            "title": "information.pdf",
            "accessDate": "2021-09-27T06:46:43Z",
            "url": "https://www.princeton.edu/~cuff/ele201/kulkarni_text/information.pdf",
            "note": "",
            "contentType": "application/pdf",
            "charset": "",
            "filename": "information.pdf",
            "md5": "c9c163d05a47dd760ede002eedd5e372",
            "mtime": 1632725203000,
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-09-27T06:46:43Z",
            "dateModified": "2021-09-27T06:46:43Z"
        }
    },
    {
        "key": "LI4EMUQM",
        "version": 546,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/LI4EMUQM",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/LI4EMUQM",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Vahdat et al.",
            "parsedDate": "2021-06-10",
            "numChildren": 2
        },
        "data": {
            "key": "LI4EMUQM",
            "version": 546,
            "itemType": "journalArticle",
            "title": "Score-based Generative Modeling in Latent Space",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Arash",
                    "lastName": "Vahdat"
                },
                {
                    "creatorType": "author",
                    "firstName": "Karsten",
                    "lastName": "Kreis"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jan",
                    "lastName": "Kautz"
                }
            ],
            "abstractNote": "Score-based generative models (SGMs) have recently demonstrated impressive results in terms of both sample quality and distribution coverage. However, they are usually applied directly in data space and often require thousands of network evaluations for sampling. Here, we propose the Latent Score-based Generative Model (LSGM), a novel approach that trains SGMs in a latent space, relying on the variational autoencoder framework. Moving from data to latent space allows us to train more expressive generative models, apply SGMs to non-continuous data, and learn smoother SGMs in a smaller space, resulting in fewer network evaluations and faster sampling. To enable training LSGMs end-to-end in a scalable and stable manner, we (i) introduce a new score-matching objective suitable to the LSGM setting, (ii) propose a novel parameterization of the score function that allows SGM to focus on the mismatch of the target distribution with respect to a simple Normal one, and (iii) analytically derive multiple techniques for variance reduction of the training objective. LSGM obtains a state-of-the-art FID score of 2.10 on CIFAR-10, outperforming all existing generative results on this dataset. On CelebA-HQ-256, LSGM is on a par with previous SGMs in sample quality while outperforming them in sampling time by two orders of magnitude. In modeling binary images, LSGM achieves state-of-the-art likelihood on the binarized OMNIGLOT dataset.",
            "publicationTitle": "arXiv:2106.05931 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-06-10",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2106.05931",
            "accessDate": "2021-09-27T05:17:53Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2106.05931",
            "tags": [
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Statistics - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {},
            "dateAdded": "2021-09-27T05:17:53Z",
            "dateModified": "2021-09-27T05:17:53Z"
        }
    },
    {
        "key": "AKAHQUX3",
        "version": 546,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/AKAHQUX3",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/AKAHQUX3",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Kong and Ping",
            "parsedDate": "2021-06-23",
            "numChildren": 3
        },
        "data": {
            "key": "AKAHQUX3",
            "version": 546,
            "itemType": "journalArticle",
            "title": "On Fast Sampling of Diffusion Probabilistic Models",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Zhifeng",
                    "lastName": "Kong"
                },
                {
                    "creatorType": "author",
                    "firstName": "Wei",
                    "lastName": "Ping"
                }
            ],
            "abstractNote": "In this work, we propose FastDPM, a unified framework for fast sampling in diffusion probabilistic models. FastDPM generalizes previous methods and gives rise to new algorithms with improved sample quality. We systematically investigate the fast sampling methods under this framework across different domains, on different datasets, and with different amount of conditional information provided for generation. We find the performance of a particular method depends on data domains (e.g., image or audio), the trade-off between sampling speed and sample quality, and the amount of conditional information. We further provide insights and recipes on the choice of methods for practitioners.",
            "publicationTitle": "arXiv:2106.00132 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-06-23",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2106.00132",
            "accessDate": "2021-09-27T05:17:50Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2106.00132",
            "tags": [
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {},
            "dateAdded": "2021-09-27T05:17:50Z",
            "dateModified": "2021-09-27T05:17:50Z"
        }
    },
    {
        "key": "9LNPDQAR",
        "version": 543,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/9LNPDQAR",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/9LNPDQAR",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Tian et al.",
            "numChildren": 1
        },
        "data": {
            "key": "9LNPDQAR",
            "version": 543,
            "itemType": "journalArticle",
            "title": "Farewell to Mutual Information: Variational Distillation for Cross-Modal Person Re-Identification",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Xudong",
                    "lastName": "Tian"
                },
                {
                    "creatorType": "author",
                    "firstName": "Zhizhong",
                    "lastName": "Zhang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Shaohui",
                    "lastName": "Lin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yanyun",
                    "lastName": "Qu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yuan",
                    "lastName": "Xie"
                },
                {
                    "creatorType": "author",
                    "firstName": "Lizhuang",
                    "lastName": "Ma"
                }
            ],
            "abstractNote": "The Information Bottleneck (IB) provides an information theoretic principle for representation learning, by retaining all information relevant for predicting label while minimizing the redundancy. Though IB principle has been applied to a wide range of applications, its optimization remains a challenging problem which heavily relies on the accurate estimation of mutual information. In this paper, we present a new strategy, Variational Self-Distillation (VSD), which provides a scalable, ﬂexible and analytic solution to essentially ﬁtting the mutual information but without explicitly estimating it. Under rigorously theoretical guarantee, VSD enables the IB to grasp the intrinsic correlation between representation and label for supervised training. Furthermore, by extending VSD to multi-view learning, we introduce two other strategies, Variational Cross-Distillation (VCD) and Variational Mutual-Learning (VML), which signiﬁcantly improve the robustness of representation to viewchanges by eliminating view-speciﬁc and task-irrelevant information. To verify our theoretically grounded strategies, we apply our approaches to cross-modal person Re-ID, and conduct extensive experiments, where the superior performance against state-of-the-art methods are demonstrated. Our intriguing ﬁndings highlight the need to rethink the way to estimate mutual information.",
            "publicationTitle": "",
            "volume": "",
            "issue": "",
            "pages": "10",
            "date": "",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "en",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "",
            "accessDate": "",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "Zotero",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-09-16T10:21:00Z",
            "dateModified": "2021-09-16T10:21:00Z"
        }
    },
    {
        "key": "MXZBFTHW",
        "version": 539,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/MXZBFTHW",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/MXZBFTHW",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Nachmani et al.",
            "parsedDate": "2021-06-14",
            "numChildren": 2
        },
        "data": {
            "key": "MXZBFTHW",
            "version": 539,
            "itemType": "journalArticle",
            "title": "Non Gaussian Denoising Diffusion Models",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Eliya",
                    "lastName": "Nachmani"
                },
                {
                    "creatorType": "author",
                    "firstName": "Robin San",
                    "lastName": "Roman"
                },
                {
                    "creatorType": "author",
                    "firstName": "Lior",
                    "lastName": "Wolf"
                }
            ],
            "abstractNote": "Generative diffusion processes are an emerging and effective tool for image and speech generation. In the existing methods, the underline noise distribution of the diffusion process is Gaussian noise. However, fitting distributions with more degrees of freedom, could help the performance of such generative models. In this work, we investigate other types of noise distribution for the diffusion process. Specifically, we show that noise from Gamma distribution provides improved results for image and speech generation. Moreover, we show that using a mixture of Gaussian noise variables in the diffusion process improves the performance over a diffusion process that is based on a single distribution. Our approach preserves the ability to efficiently sample state in the training diffusion process while using Gamma noise and a mixture of noise.",
            "publicationTitle": "arXiv:2106.07582 [cs, eess]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-06-14",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2106.07582",
            "accessDate": "2021-09-14T12:15:47Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2106.07582",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Sound",
                    "type": 1
                },
                {
                    "tag": "Electrical Engineering and Systems Science - Audio and Speech Processing",
                    "type": 1
                }
            ],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {},
            "dateAdded": "2021-09-14T12:15:47Z",
            "dateModified": "2021-09-14T12:16:00Z"
        }
    },
    {
        "key": "HXNUEG2D",
        "version": 539,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/HXNUEG2D",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/HXNUEG2D",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Lam et al.",
            "parsedDate": "2021-08-31",
            "numChildren": 2
        },
        "data": {
            "key": "HXNUEG2D",
            "version": 539,
            "itemType": "journalArticle",
            "title": "Bilateral Denoising Diffusion Models",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Max W. Y.",
                    "lastName": "Lam"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jun",
                    "lastName": "Wang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Rongjie",
                    "lastName": "Huang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Dan",
                    "lastName": "Su"
                },
                {
                    "creatorType": "author",
                    "firstName": "Dong",
                    "lastName": "Yu"
                }
            ],
            "abstractNote": "Denoising diffusion probabilistic models (DDPMs) have emerged as competitive generative models yet brought challenges to efficient sampling. In this paper, we propose novel bilateral denoising diffusion models (BDDMs), which take significantly fewer steps to generate high-quality samples. From a bilateral modeling objective, BDDMs parameterize the forward and reverse processes with a score network and a scheduling network, respectively. We show that a new lower bound tighter than the standard evidence lower bound can be derived as a surrogate objective for training the two networks. In particular, BDDMs are efficient, simple-to-train, and capable of further improving any pre-trained DDPM by optimizing the inference noise schedules. Our experiments demonstrated that BDDMs can generate high-fidelity samples with as few as 3 sampling steps and produce comparable or even higher quality samples than DDPMs using 1000 steps with only 16 sampling steps (a 62x speedup).",
            "publicationTitle": "arXiv:2108.11514 [cs, eess]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-08-31",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2108.11514",
            "accessDate": "2021-09-14T12:15:45Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2108.11514",
            "tags": [
                {
                    "tag": "Computer Science - Artificial Intelligence",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Sound",
                    "type": 1
                },
                {
                    "tag": "Electrical Engineering and Systems Science - Audio and Speech Processing",
                    "type": 1
                },
                {
                    "tag": "Electrical Engineering and Systems Science - Signal Processing",
                    "type": 1
                }
            ],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {},
            "dateAdded": "2021-09-14T12:15:45Z",
            "dateModified": "2021-09-14T12:15:55Z"
        }
    },
    {
        "key": "ISYURXMF",
        "version": 542,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/ISYURXMF",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/ISYURXMF",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Song et al.",
            "parsedDate": "2020-10-06",
            "numChildren": 2
        },
        "data": {
            "key": "ISYURXMF",
            "version": 542,
            "itemType": "journalArticle",
            "title": "Denoising Diffusion Implicit Models",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Jiaming",
                    "lastName": "Song"
                },
                {
                    "creatorType": "author",
                    "firstName": "Chenlin",
                    "lastName": "Meng"
                },
                {
                    "creatorType": "author",
                    "firstName": "Stefano",
                    "lastName": "Ermon"
                }
            ],
            "abstractNote": "Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. We empirically demonstrate that DDIMs can produce high quality samples $10 \\times$ to $50 \\times$ faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.",
            "publicationTitle": "arXiv:2010.02502 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-10-06",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2010.02502",
            "accessDate": "2021-09-13T16:55:46Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2010.02502",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/WBPEN649"
            },
            "dateAdded": "2021-09-13T16:55:46Z",
            "dateModified": "2021-09-13T16:55:46Z"
        }
    },
    {
        "key": "D4SAGRQA",
        "version": 526,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/D4SAGRQA",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/D4SAGRQA",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Jang and Agapito",
            "parsedDate": "2021-09-03",
            "numChildren": 3
        },
        "data": {
            "key": "D4SAGRQA",
            "version": 526,
            "itemType": "journalArticle",
            "title": "CodeNeRF: Disentangled Neural Radiance Fields for Object Categories",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Wonbong",
                    "lastName": "Jang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Lourdes",
                    "lastName": "Agapito"
                }
            ],
            "abstractNote": "CodeNeRF is an implicit 3D neural representation that learns the variation of object shapes and textures across a category and can be trained, from a set of posed images, to synthesize novel views of unseen objects. Unlike the original NeRF, which is scene specific, CodeNeRF learns to disentangle shape and texture by learning separate embeddings. At test time, given a single unposed image of an unseen object, CodeNeRF jointly estimates camera viewpoint, and shape and appearance codes via optimization. Unseen objects can be reconstructed from a single image, and then rendered from new viewpoints or their shape and texture edited by varying the latent codes. We conduct experiments on the SRN benchmark, which show that CodeNeRF generalises well to unseen objects and achieves on-par performance with methods that require known camera pose at test time. Our results on real-world images demonstrate that CodeNeRF can bridge the sim-to-real gap. Project page: \\url{https://github.com/wayne1123/code-nerf}",
            "publicationTitle": "arXiv:2109.01750 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-09-03",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "CodeNeRF",
            "url": "http://arxiv.org/abs/2109.01750",
            "accessDate": "2021-09-07T05:41:28Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2109.01750",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Graphics",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-09-07T05:41:28Z",
            "dateModified": "2021-09-07T05:41:28Z"
        }
    },
    {
        "key": "G5B4VQNR",
        "version": 524,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/G5B4VQNR",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/G5B4VQNR",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Sinha et al.",
            "parsedDate": "2021-06-12",
            "numChildren": 2
        },
        "data": {
            "key": "G5B4VQNR",
            "version": 524,
            "itemType": "journalArticle",
            "title": "D2C: Diffusion-Denoising Models for Few-shot Conditional Generation",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Abhishek",
                    "lastName": "Sinha"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jiaming",
                    "lastName": "Song"
                },
                {
                    "creatorType": "author",
                    "firstName": "Chenlin",
                    "lastName": "Meng"
                },
                {
                    "creatorType": "author",
                    "firstName": "Stefano",
                    "lastName": "Ermon"
                }
            ],
            "abstractNote": "Conditional generative models of high-dimensional images have many applications, but supervision signals from conditions to images can be expensive to acquire. This paper describes Diffusion-Decoding models with Contrastive representations (D2C), a paradigm for training unconditional variational autoencoders (VAEs) for few-shot conditional image generation. D2C uses a learned diffusion-based prior over the latent representations to improve generation and contrastive self-supervised learning to improve representation quality. D2C can adapt to novel generation tasks conditioned on labels or manipulation constraints, by learning from as few as 100 labeled examples. On conditional generation from new labels, D2C achieves superior performance over state-of-the-art VAEs and diffusion models. On conditional image manipulation, D2C generations are two orders of magnitude faster to produce over StyleGAN2 ones and are preferred by 50% - 60% of the human evaluators in a double-blind study.",
            "publicationTitle": "arXiv:2106.06819 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-06-12",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "D2C",
            "url": "http://arxiv.org/abs/2106.06819",
            "accessDate": "2021-09-06T16:01:18Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2106.06819",
            "tags": [
                {
                    "tag": "Computer Science - Artificial Intelligence",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "aek"
                }
            ],
            "collections": [],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/766JR7U7"
            },
            "dateAdded": "2021-09-06T16:05:35Z",
            "dateModified": "2021-09-06T16:05:35Z"
        }
    },
    {
        "key": "2SKLEA7A",
        "version": 523,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/2SKLEA7A",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/2SKLEA7A",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Maennel et al.",
            "parsedDate": "2020-11-11",
            "numChildren": 3
        },
        "data": {
            "key": "2SKLEA7A",
            "version": 523,
            "itemType": "journalArticle",
            "title": "What Do Neural Networks Learn When Trained With Random Labels?",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Hartmut",
                    "lastName": "Maennel"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ibrahim",
                    "lastName": "Alabdulmohsin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ilya",
                    "lastName": "Tolstikhin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Robert J. N.",
                    "lastName": "Baldock"
                },
                {
                    "creatorType": "author",
                    "firstName": "Olivier",
                    "lastName": "Bousquet"
                },
                {
                    "creatorType": "author",
                    "firstName": "Sylvain",
                    "lastName": "Gelly"
                },
                {
                    "creatorType": "author",
                    "firstName": "Daniel",
                    "lastName": "Keysers"
                }
            ],
            "abstractNote": "We study deep neural networks (DNNs) trained on natural image data with entirely random labels. Despite its popularity in the literature, where it is often used to study memorization, generalization, and other phenomena, little is known about what DNNs learn in this setting. In this paper, we show analytically for convolutional and fully connected networks that an alignment between the principal components of network parameters and data takes place when training with random labels. We study this alignment effect by investigating neural networks pre-trained on randomly labelled image data and subsequently fine-tuned on disjoint datasets with random or real labels. We show how this alignment produces a positive transfer: networks pre-trained with random labels train faster downstream compared to training from scratch even after accounting for simple effects, such as weight scaling. We analyze how competing effects, such as specialization at later layers, may hide the positive transfer. These effects are studied in several network architectures, including VGG16 and ResNet18, on CIFAR10 and ImageNet.",
            "publicationTitle": "arXiv:2006.10455 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-11-11",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2006.10455",
            "accessDate": "2021-09-02T11:56:33Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2006.10455",
            "tags": [
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Statistics - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/286VVKTS"
            },
            "dateAdded": "2021-09-02T11:56:33Z",
            "dateModified": "2021-09-02T11:56:33Z"
        }
    },
    {
        "key": "A465FIQU",
        "version": 519,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/A465FIQU",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/A465FIQU",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Choi et al.",
            "parsedDate": "2021-08-03",
            "numChildren": 3
        },
        "data": {
            "key": "A465FIQU",
            "version": 519,
            "itemType": "journalArticle",
            "title": "Toward Spatially Unbiased Generative Models",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Jooyoung",
                    "lastName": "Choi"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jungbeom",
                    "lastName": "Lee"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yonghyun",
                    "lastName": "Jeong"
                },
                {
                    "creatorType": "author",
                    "firstName": "Sungroh",
                    "lastName": "Yoon"
                }
            ],
            "abstractNote": "Recent image generation models show remarkable generation performance. However, they mirror strong location preference in datasets, which we call spatial bias. Therefore, generators render poor samples at unseen locations and scales. We argue that the generators rely on their implicit positional encoding to render spatial content. From our observations, the generator's implicit positional encoding is translation-variant, making the generator spatially biased. To address this issue, we propose injecting explicit positional encoding at each scale of the generator. By learning the spatially unbiased generator, we facilitate the robust use of generators in multiple tasks, such as GAN inversion, multi-scale generation, generation of arbitrary sizes and aspect ratios. Furthermore, we show that our method can also be applied to denoising diffusion probabilistic models.",
            "publicationTitle": "arXiv:2108.01285 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-08-03",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2108.01285",
            "accessDate": "2021-08-30T04:10:08Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2108.01285",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/3JIF7LRC"
            },
            "dateAdded": "2021-08-30T04:10:08Z",
            "dateModified": "2021-08-30T04:10:08Z"
        }
    },
    {
        "key": "5TWWD7E9",
        "version": 513,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/5TWWD7E9",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/5TWWD7E9",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Kingma et al.",
            "parsedDate": "2021-07-12",
            "numChildren": 2
        },
        "data": {
            "key": "5TWWD7E9",
            "version": 513,
            "itemType": "journalArticle",
            "title": "Variational Diffusion Models",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Diederik P.",
                    "lastName": "Kingma"
                },
                {
                    "creatorType": "author",
                    "firstName": "Tim",
                    "lastName": "Salimans"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ben",
                    "lastName": "Poole"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jonathan",
                    "lastName": "Ho"
                }
            ],
            "abstractNote": "Diffusion-based generative models have demonstrated a capacity for perceptually impressive synthesis, but can they also be great likelihood-based models? We answer this in the affirmative, and introduce a family of diffusion-based generative models that obtain state-of-the-art likelihoods on standard image density estimation benchmarks. Unlike other diffusion-based models, our method allows for efficient optimization of the noise schedule jointly with the rest of the model. We show that the variational lower bound (VLB) simplifies to a remarkably short expression in terms of the signal-to-noise ratio of the diffused data, thereby improving our theoretical understanding of this model class. Using this insight, we prove an equivalence between several models proposed in the literature. In addition, we show that the continuous-time VLB is invariant to the noise schedule, except for the signal-to-noise ratio at its endpoints. This enables us to learn a noise schedule that minimizes the variance of the resulting VLB estimator, leading to faster optimization. Combining these advances with architectural improvements, we obtain state-of-the-art likelihoods on image density estimation benchmarks, outperforming autoregressive models that have dominated these benchmarks for many years, with often significantly faster optimization. In addition, we show how to turn the model into a bits-back compression scheme, and demonstrate lossless compression rates close to the theoretical optimum.",
            "publicationTitle": "arXiv:2107.00630 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-07-12",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2107.00630",
            "accessDate": "2021-08-30T03:51:56Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2107.00630",
            "tags": [
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Statistics - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-08-30T03:51:56Z",
            "dateModified": "2021-08-30T03:51:56Z"
        }
    },
    {
        "key": "PHYY97CD",
        "version": 510,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/PHYY97CD",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/PHYY97CD",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Keng",
            "parsedDate": "2019-02-06",
            "numChildren": 1
        },
        "data": {
            "key": "PHYY97CD",
            "version": 510,
            "itemType": "webpage",
            "title": "Importance Sampling and Estimating Marginal Likelihood in Variational",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Brian",
                    "lastName": "Keng"
                }
            ],
            "abstractNote": "A short post describing how to use importance sampling to estimate marginal likelihood in variational autoencoders.",
            "websiteTitle": "Bounded Rationality",
            "websiteType": "",
            "date": "2019-02-06T08:20:11-04:00",
            "shortTitle": "",
            "url": "http://bjlkeng.github.io/posts/importance-sampling-and-estimating-marginal-likelihood-in-variational-autoencoders/",
            "accessDate": "2021-08-29T12:51:23Z",
            "language": "en",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-08-29T12:51:23Z",
            "dateModified": "2021-08-29T12:51:23Z"
        }
    },
    {
        "key": "V5YVRHX3",
        "version": 509,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/V5YVRHX3",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/V5YVRHX3",
                "type": "text/html"
            }
        },
        "meta": {
            "numChildren": 0
        },
        "data": {
            "key": "V5YVRHX3",
            "version": 509,
            "itemType": "attachment",
            "linkMode": "imported_url",
            "title": "Ch-var-is.pdf",
            "accessDate": "2021-08-29T10:43:16Z",
            "url": "https://statweb.stanford.edu/~owen/mc/Ch-var-is.pdf",
            "note": "",
            "contentType": "application/pdf",
            "charset": "",
            "filename": "Ch-var-is.pdf",
            "md5": "593890cbd9e8a1ca790004a92f140b97",
            "mtime": 1630233796253,
            "tags": [],
            "collections": [],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/NFRIXL7W"
            },
            "dateAdded": "2021-08-29T10:43:28Z",
            "dateModified": "2021-08-29T10:43:28Z"
        }
    },
    {
        "key": "DX3FDTCL",
        "version": 506,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/DX3FDTCL",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/DX3FDTCL",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Papamakarios et al.",
            "parsedDate": "2021-04-08",
            "numChildren": 3
        },
        "data": {
            "key": "DX3FDTCL",
            "version": 506,
            "itemType": "journalArticle",
            "title": "Normalizing Flows for Probabilistic Modeling and Inference",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "George",
                    "lastName": "Papamakarios"
                },
                {
                    "creatorType": "author",
                    "firstName": "Eric",
                    "lastName": "Nalisnick"
                },
                {
                    "creatorType": "author",
                    "firstName": "Danilo Jimenez",
                    "lastName": "Rezende"
                },
                {
                    "creatorType": "author",
                    "firstName": "Shakir",
                    "lastName": "Mohamed"
                },
                {
                    "creatorType": "author",
                    "firstName": "Balaji",
                    "lastName": "Lakshminarayanan"
                }
            ],
            "abstractNote": "Normalizing flows provide a general mechanism for defining expressive probability distributions, only requiring the specification of a (usually simple) base distribution and a series of bijective transformations. There has been much recent work on normalizing flows, ranging from improving their expressive power to expanding their application. We believe the field has now matured and is in need of a unified perspective. In this review, we attempt to provide such a perspective by describing flows through the lens of probabilistic modeling and inference. We place special emphasis on the fundamental principles of flow design, and discuss foundational topics such as expressive power and computational trade-offs. We also broaden the conceptual framing of flows by relating them to more general probability transformations. Lastly, we summarize the use of flows for tasks such as generative modeling, approximate inference, and supervised learning.",
            "publicationTitle": "arXiv:1912.02762 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-08",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/1912.02762",
            "accessDate": "2021-08-27T05:58:02Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 1912.02762",
            "tags": [
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Statistics - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/SY4F5EG6"
            },
            "dateAdded": "2021-08-27T05:58:26Z",
            "dateModified": "2021-08-27T05:58:26Z"
        }
    },
    {
        "key": "BVCI8J9W",
        "version": 503,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/BVCI8J9W",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/BVCI8J9W",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Lugmayr et al.",
            "parsedDate": "2020-07-31",
            "numChildren": 2
        },
        "data": {
            "key": "BVCI8J9W",
            "version": 503,
            "itemType": "journalArticle",
            "title": "SRFlow: Learning the Super-Resolution Space with Normalizing Flow",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Andreas",
                    "lastName": "Lugmayr"
                },
                {
                    "creatorType": "author",
                    "firstName": "Martin",
                    "lastName": "Danelljan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Luc",
                    "lastName": "Van Gool"
                },
                {
                    "creatorType": "author",
                    "firstName": "Radu",
                    "lastName": "Timofte"
                }
            ],
            "abstractNote": "Super-resolution is an ill-posed problem, since it allows for multiple predictions for a given low-resolution image. This fundamental fact is largely ignored by state-of-the-art deep learning based approaches. These methods instead train a deterministic mapping using combinations of reconstruction and adversarial losses. In this work, we therefore propose SRFlow: a normalizing flow based super-resolution method capable of learning the conditional distribution of the output given the low-resolution input. Our model is trained in a principled manner using a single loss, namely the negative log-likelihood. SRFlow therefore directly accounts for the ill-posed nature of the problem, and learns to predict diverse photo-realistic high-resolution images. Moreover, we utilize the strong image posterior learned by SRFlow to design flexible image manipulation techniques, capable of enhancing super-resolved images by, e.g., transferring content from other images. We perform extensive experiments on faces, as well as on super-resolution in general. SRFlow outperforms state-of-the-art GAN-based approaches in terms of both PSNR and perceptual quality metrics, while allowing for diversity through the exploration of the space of super-resolved solutions.",
            "publicationTitle": "arXiv:2006.14200 [cs, eess]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-07-31",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "en",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "SRFlow",
            "url": "http://arxiv.org/abs/2006.14200",
            "accessDate": "2021-08-26T14:46:21Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2006.14200",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Electrical Engineering and Systems Science - Image and Video Processing",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-08-26T14:46:21Z",
            "dateModified": "2021-08-26T14:46:21Z"
        }
    },
    {
        "key": "TB6I7UG4",
        "version": 503,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/TB6I7UG4",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/TB6I7UG4",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Kobyzev et al.",
            "parsedDate": "2020",
            "numChildren": 3
        },
        "data": {
            "key": "TB6I7UG4",
            "version": 503,
            "itemType": "journalArticle",
            "title": "Normalizing Flows: An Introduction and Review of Current Methods",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Ivan",
                    "lastName": "Kobyzev"
                },
                {
                    "creatorType": "author",
                    "firstName": "Simon J. D.",
                    "lastName": "Prince"
                },
                {
                    "creatorType": "author",
                    "firstName": "Marcus A.",
                    "lastName": "Brubaker"
                }
            ],
            "abstractNote": "Normalizing Flows are generative models which produce tractable distributions where both sampling and density evaluation can be efficient and exact. The goal of this survey article is to give a coherent and comprehensive review of the literature around the construction and use of Normalizing Flows for distribution learning. We aim to provide context and explanation of the models, review current state-of-the-art literature, and identify open questions and promising future directions.",
            "publicationTitle": "IEEE Transactions on Pattern Analysis and Machine Intelligence",
            "volume": "",
            "issue": "",
            "pages": "1-1",
            "date": "2020",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "IEEE Trans. Pattern Anal. Mach. Intell.",
            "language": "",
            "DOI": "10.1109/TPAMI.2020.2992934",
            "ISSN": "0162-8828, 2160-9292, 1939-3539",
            "shortTitle": "Normalizing Flows",
            "url": "http://arxiv.org/abs/1908.09257",
            "accessDate": "2021-08-26T14:46:06Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 1908.09257",
            "tags": [
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Statistics - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-08-26T14:46:06Z",
            "dateModified": "2021-08-26T14:46:06Z"
        }
    },
    {
        "key": "JYFX3L2W",
        "version": 500,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/JYFX3L2W",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/JYFX3L2W",
                "type": "text/html"
            }
        },
        "meta": {
            "numChildren": 0
        },
        "data": {
            "key": "JYFX3L2W",
            "version": 500,
            "itemType": "attachment",
            "linkMode": "imported_url",
            "title": "samplingPart1.pdf",
            "accessDate": "2021-08-25T04:57:26Z",
            "url": "http://www.cse.psu.edu/~rtc12/CSE586/lectures/samplingPart1.pdf",
            "note": "",
            "contentType": "application/pdf",
            "charset": "",
            "filename": "samplingPart1.pdf",
            "md5": "08af61f51da3a33d6e77a850df1eac2a",
            "mtime": 1629867446000,
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-08-25T04:57:26Z",
            "dateModified": "2021-08-25T04:57:26Z"
        }
    },
    {
        "key": "D6W2XS3A",
        "version": 492,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/D6W2XS3A",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/D6W2XS3A",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Skorokhodov et al.",
            "parsedDate": "2021-06-28",
            "numChildren": 3
        },
        "data": {
            "key": "D6W2XS3A",
            "version": 492,
            "itemType": "journalArticle",
            "title": "Adversarial Generation of Continuous Images",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Ivan",
                    "lastName": "Skorokhodov"
                },
                {
                    "creatorType": "author",
                    "firstName": "Savva",
                    "lastName": "Ignatyev"
                },
                {
                    "creatorType": "author",
                    "firstName": "Mohamed",
                    "lastName": "Elhoseiny"
                }
            ],
            "abstractNote": "In most existing learning systems, images are typically viewed as 2D pixel arrays. However, in another paradigm gaining popularity, a 2D image is represented as an implicit neural representation (INR) - an MLP that predicts an RGB pixel value given its (x,y) coordinate. In this paper, we propose two novel architectural techniques for building INR-based image decoders: factorized multiplicative modulation and multi-scale INRs, and use them to build a state-of-the-art continuous image GAN. Previous attempts to adapt INRs for image generation were limited to MNIST-like datasets and do not scale to complex real-world data. Our proposed INR-GAN architecture improves the performance of continuous image generators by several times, greatly reducing the gap between continuous image GANs and pixel-based ones. Apart from that, we explore several exciting properties of the INR-based decoders, like out-of-the-box superresolution, meaningful image-space interpolation, accelerated inference of low-resolution images, an ability to extrapolate outside of image boundaries, and strong geometric prior. The project page is located at https://universome.github.io/inr-gan.",
            "publicationTitle": "arXiv:2011.12026 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-06-28",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2011.12026",
            "accessDate": "2021-08-23T13:27:44Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2011.12026",
            "tags": [
                {
                    "tag": "Computer Science - Artificial Intelligence",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-08-23T13:27:44Z",
            "dateModified": "2021-08-23T13:27:44Z"
        }
    },
    {
        "key": "TQH2EHNS",
        "version": 492,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/TQH2EHNS",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/TQH2EHNS",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Anokhin et al.",
            "parsedDate": "2020-11-27",
            "numChildren": 2
        },
        "data": {
            "key": "TQH2EHNS",
            "version": 492,
            "itemType": "journalArticle",
            "title": "Image Generators with Conditionally-Independent Pixel Synthesis",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Ivan",
                    "lastName": "Anokhin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Kirill",
                    "lastName": "Demochkin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Taras",
                    "lastName": "Khakhulin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Gleb",
                    "lastName": "Sterkin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Victor",
                    "lastName": "Lempitsky"
                },
                {
                    "creatorType": "author",
                    "firstName": "Denis",
                    "lastName": "Korzhenkov"
                }
            ],
            "abstractNote": "Existing image generator networks rely heavily on spatial convolutions and, optionally, self-attention blocks in order to gradually synthesize images in a coarse-to-fine manner. Here, we present a new architecture for image generators, where the color value at each pixel is computed independently given the value of a random latent vector and the coordinate of that pixel. No spatial convolutions or similar operations that propagate information across pixels are involved during the synthesis. We analyze the modeling capabilities of such generators when trained in an adversarial fashion, and observe the new generators to achieve similar generation quality to state-of-the-art convolutional generators. We also investigate several interesting properties unique to the new architecture.",
            "publicationTitle": "arXiv:2011.13775 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-11-27",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2011.13775",
            "accessDate": "2021-08-23T13:27:35Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2011.13775",
            "tags": [
                {
                    "tag": "Computer Science - Artificial Intelligence",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-08-23T13:27:35Z",
            "dateModified": "2021-08-23T13:27:35Z"
        }
    },
    {
        "key": "DUSDVTE8",
        "version": 491,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/DUSDVTE8",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/DUSDVTE8",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Perez et al.",
            "parsedDate": "2017-12-18",
            "numChildren": 3
        },
        "data": {
            "key": "DUSDVTE8",
            "version": 491,
            "itemType": "journalArticle",
            "title": "FiLM: Visual Reasoning with a General Conditioning Layer",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Ethan",
                    "lastName": "Perez"
                },
                {
                    "creatorType": "author",
                    "firstName": "Florian",
                    "lastName": "Strub"
                },
                {
                    "creatorType": "author",
                    "firstName": "Harm",
                    "lastName": "de Vries"
                },
                {
                    "creatorType": "author",
                    "firstName": "Vincent",
                    "lastName": "Dumoulin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Aaron",
                    "lastName": "Courville"
                }
            ],
            "abstractNote": "We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence neural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.",
            "publicationTitle": "arXiv:1709.07871 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2017-12-18",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "FiLM",
            "url": "http://arxiv.org/abs/1709.07871",
            "accessDate": "2021-08-23T08:10:32Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 1709.07871",
            "tags": [
                {
                    "tag": "Computer Science - Artificial Intelligence",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Computation and Language",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Statistics - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {
                "owl:sameAs": [
                    "http://zotero.org/groups/4320173/items/SK233QWE",
                    "http://zotero.org/groups/4320173/items/P6ZETIVM"
                ]
            },
            "dateAdded": "2021-08-23T08:11:21Z",
            "dateModified": "2021-08-23T08:11:21Z"
        }
    },
    {
        "key": "JRXZ9IDX",
        "version": 488,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/JRXZ9IDX",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/JRXZ9IDX",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Chen et al.",
            "parsedDate": "2021-06-18",
            "numChildren": 3
        },
        "data": {
            "key": "JRXZ9IDX",
            "version": 488,
            "itemType": "journalArticle",
            "title": "WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Nanxin",
                    "lastName": "Chen"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yu",
                    "lastName": "Zhang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Heiga",
                    "lastName": "Zen"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ron J.",
                    "lastName": "Weiss"
                },
                {
                    "creatorType": "author",
                    "firstName": "Mohammad",
                    "lastName": "Norouzi"
                },
                {
                    "creatorType": "author",
                    "firstName": "Najim",
                    "lastName": "Dehak"
                },
                {
                    "creatorType": "author",
                    "firstName": "William",
                    "lastName": "Chan"
                }
            ],
            "abstractNote": "This paper introduces WaveGrad 2, a non-autoregressive generative model for text-to-speech synthesis. WaveGrad 2 is trained to estimate the gradient of the log conditional density of the waveform given a phoneme sequence. The model takes an input phoneme sequence, and through an iterative refinement process, generates an audio waveform. This contrasts to the original WaveGrad vocoder which conditions on mel-spectrogram features, generated by a separate model. The iterative refinement process starts from Gaussian noise, and through a series of refinement steps (e.g., 50 steps), progressively recovers the audio sequence. WaveGrad 2 offers a natural way to trade-off between inference speed and sample quality, through adjusting the number of refinement steps. Experiments show that the model can generate high fidelity audio, approaching the performance of a state-of-the-art neural TTS system. We also report various ablation studies over different model configurations. Audio samples are available at https://wavegrad.github.io/v2.",
            "publicationTitle": "arXiv:2106.09660 [cs, eess]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-06-18",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "WaveGrad 2",
            "url": "http://arxiv.org/abs/2106.09660",
            "accessDate": "2021-08-23T07:44:35Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2106.09660",
            "tags": [
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Sound",
                    "type": 1
                },
                {
                    "tag": "Electrical Engineering and Systems Science - Audio and Speech Processing",
                    "type": 1
                }
            ],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/7UJJ2WYT"
            },
            "dateAdded": "2021-08-23T07:45:15Z",
            "dateModified": "2021-08-23T07:45:15Z"
        }
    },
    {
        "key": "5CL67G2S",
        "version": 477,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/5CL67G2S",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/5CL67G2S",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Nguyen-Phuoc et al.",
            "parsedDate": "2019-10-01",
            "numChildren": 3
        },
        "data": {
            "key": "5CL67G2S",
            "version": 477,
            "itemType": "journalArticle",
            "title": "HoloGAN: Unsupervised learning of 3D representations from natural images",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Thu",
                    "lastName": "Nguyen-Phuoc"
                },
                {
                    "creatorType": "author",
                    "firstName": "Chuan",
                    "lastName": "Li"
                },
                {
                    "creatorType": "author",
                    "firstName": "Lucas",
                    "lastName": "Theis"
                },
                {
                    "creatorType": "author",
                    "firstName": "Christian",
                    "lastName": "Richardt"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yong-Liang",
                    "lastName": "Yang"
                }
            ],
            "abstractNote": "We propose a novel generative adversarial network (GAN) for the task of unsupervised learning of 3D representations from natural images. Most generative models rely on 2D kernels to generate images and make few assumptions about the 3D world. These models therefore tend to create blurry images or artefacts in tasks that require a strong 3D understanding, such as novel-view synthesis. HoloGAN instead learns a 3D representation of the world, and to render this representation in a realistic manner. Unlike other GANs, HoloGAN provides explicit control over the pose of generated objects through rigid-body transformations of the learnt 3D features. Our experiments show that using explicit 3D features enables HoloGAN to disentangle 3D pose and identity, which is further decomposed into shape and appearance, while still being able to generate images with similar or higher visual quality than other generative models. HoloGAN can be trained end-to-end from unlabelled 2D images only. Particularly, we do not require pose labels, 3D shapes, or multiple views of the same objects. This shows that HoloGAN is the first generative model that learns 3D representations from natural images in an entirely unsupervised manner.",
            "publicationTitle": "arXiv:1904.01326 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2019-10-01",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "HoloGAN",
            "url": "http://arxiv.org/abs/1904.01326",
            "accessDate": "2021-08-17T08:37:33Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 1904.01326",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-08-17T08:37:33Z",
            "dateModified": "2021-08-17T08:37:33Z"
        }
    },
    {
        "key": "9TGH4IPE",
        "version": 483,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/9TGH4IPE",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/9TGH4IPE",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Xie et al.",
            "parsedDate": "2021-04-16",
            "numChildren": 2
        },
        "data": {
            "key": "9TGH4IPE",
            "version": 483,
            "itemType": "journalArticle",
            "title": "FiG-NeRF: Figure-Ground Neural Radiance Fields for 3D Object Category Modelling",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Christopher",
                    "lastName": "Xie"
                },
                {
                    "creatorType": "author",
                    "firstName": "Keunhong",
                    "lastName": "Park"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ricardo",
                    "lastName": "Martin-Brualla"
                },
                {
                    "creatorType": "author",
                    "firstName": "Matthew",
                    "lastName": "Brown"
                }
            ],
            "abstractNote": "We investigate the use of Neural Radiance Fields (NeRF) to learn high quality 3D object category models from collections of input images. In contrast to previous work, we are able to do this whilst simultaneously separating foreground objects from their varying backgrounds. We achieve this via a 2-component NeRF model, FiG-NeRF, that prefers explanation of the scene as a geometrically constant background and a deformable foreground that represents the object category. We show that this method can learn accurate 3D object category models using only photometric supervision and casually captured images of the objects. Additionally, our 2-part decomposition allows the model to perform accurate and crisp amodal segmentation. We quantitatively evaluate our method with view synthesis and image fidelity metrics, using synthetic, lab-captured, and in-the-wild data. Our results demonstrate convincing 3D object category modelling that exceed the performance of existing methods.",
            "publicationTitle": "arXiv:2104.08418 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-16",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "FiG-NeRF",
            "url": "http://arxiv.org/abs/2104.08418",
            "accessDate": "2021-08-17T07:58:38Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2104.08418",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/8YGWBCHJ"
            },
            "dateAdded": "2021-08-17T07:58:38Z",
            "dateModified": "2021-08-17T07:58:38Z"
        }
    },
    {
        "key": "MDCBU9RP",
        "version": 467,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/MDCBU9RP",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/MDCBU9RP",
                "type": "text/html"
            }
        },
        "meta": {
            "numChildren": 0
        },
        "data": {
            "key": "MDCBU9RP",
            "version": 467,
            "itemType": "attachment",
            "linkMode": "imported_url",
            "title": "arXiv Fulltext PDF",
            "accessDate": "2021-08-04T07:03:04Z",
            "url": "https://arxiv.org/pdf/2011.13456.pdf",
            "note": "",
            "contentType": "application/pdf",
            "charset": "",
            "filename": "Song et al. - 2021 - Score-Based Generative Modeling through Stochastic.pdf",
            "md5": "3327d1e3196026d81e8ee11b39c49943",
            "mtime": 1628060584000,
            "tags": [],
            "collections": [],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/GPIBEZAU"
            },
            "dateAdded": "2021-08-04T07:03:04Z",
            "dateModified": "2021-08-04T07:06:15Z"
        }
    },
    {
        "key": "YGD6PLK9",
        "version": 466,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/YGD6PLK9",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/YGD6PLK9",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Song et al.",
            "parsedDate": "2021-02-10",
            "numChildren": 2
        },
        "data": {
            "key": "YGD6PLK9",
            "version": 466,
            "itemType": "journalArticle",
            "title": "Score-Based Generative Modeling through Stochastic Differential Equations",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Yang",
                    "lastName": "Song"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jascha",
                    "lastName": "Sohl-Dickstein"
                },
                {
                    "creatorType": "author",
                    "firstName": "Diederik P.",
                    "lastName": "Kingma"
                },
                {
                    "creatorType": "author",
                    "firstName": "Abhishek",
                    "lastName": "Kumar"
                },
                {
                    "creatorType": "author",
                    "firstName": "Stefano",
                    "lastName": "Ermon"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ben",
                    "lastName": "Poole"
                }
            ],
            "abstractNote": "Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.",
            "publicationTitle": "arXiv:2011.13456 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-02-10",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2011.13456",
            "accessDate": "2021-08-04T07:00:31Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2011.13456",
            "tags": [
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Statistics - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/73KSBICK"
            },
            "dateAdded": "2021-08-04T07:00:31Z",
            "dateModified": "2021-08-04T07:00:31Z"
        }
    },
    {
        "key": "V4GQUIZW",
        "version": 455,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/V4GQUIZW",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/V4GQUIZW",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Browne",
            "numChildren": 1
        },
        "data": {
            "key": "V4GQUIZW",
            "version": 455,
            "itemType": "journalArticle",
            "title": "An Introduction to MCMC methods and Bayesian Statistics",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "William",
                    "lastName": "Browne"
                }
            ],
            "abstractNote": "",
            "publicationTitle": "",
            "volume": "",
            "issue": "",
            "pages": "69",
            "date": "",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "en",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "",
            "accessDate": "",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "Zotero",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-07-26T09:24:47Z",
            "dateModified": "2021-07-26T09:24:47Z"
        }
    },
    {
        "key": "3GGK9957",
        "version": 452,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/3GGK9957",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/3GGK9957",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Welling and Teh",
            "numChildren": 1
        },
        "data": {
            "key": "3GGK9957",
            "version": 452,
            "itemType": "journalArticle",
            "title": "Bayesian Learning via Stochastic Gradient Langevin Dynamics",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Max",
                    "lastName": "Welling"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yee Whye",
                    "lastName": "Teh"
                }
            ],
            "abstractNote": "In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic gradient optimization algorithm we show that the iterates will converge to samples from the true posterior distribution as we anneal the stepsize. This seamless transition between optimization and Bayesian posterior sampling provides an inbuilt protection against overﬁtting. We also propose a practical method for Monte Carlo estimates of posterior statistics which monitors a “sampling threshold” and collects samples after it has been surpassed. We apply the method to three models: a mixture of Gaussians, logistic regression and ICA with natural gradients.",
            "publicationTitle": "",
            "volume": "",
            "issue": "",
            "pages": "8",
            "date": "",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "en",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "",
            "accessDate": "",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "Zotero",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-07-26T08:03:37Z",
            "dateModified": "2021-07-26T08:03:37Z"
        }
    },
    {
        "key": "CULAMFWA",
        "version": 450,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/CULAMFWA",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/CULAMFWA",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Sinha et al.",
            "parsedDate": "2021-06-12",
            "numChildren": 1
        },
        "data": {
            "key": "CULAMFWA",
            "version": 450,
            "itemType": "journalArticle",
            "title": "D2C: Diffusion-Denoising Models for Few-shot Conditional Generation",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Abhishek",
                    "lastName": "Sinha"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jiaming",
                    "lastName": "Song"
                },
                {
                    "creatorType": "author",
                    "firstName": "Chenlin",
                    "lastName": "Meng"
                },
                {
                    "creatorType": "author",
                    "firstName": "Stefano",
                    "lastName": "Ermon"
                }
            ],
            "abstractNote": "Conditional generative models of high-dimensional images have many applications, but supervision signals from conditions to images can be expensive to acquire. This paper describes Diffusion-Decoding models with Contrastive representations (D2C), a paradigm for training unconditional variational autoencoders (VAEs) for few-shot conditional image generation. D2C uses a learned diffusion-based prior over the latent representations to improve generation and contrastive self-supervised learning to improve representation quality. D2C can adapt to novel generation tasks conditioned on labels or manipulation constraints, by learning from as few as 100 labeled examples. On conditional generation from new labels, D2C achieves superior performance over state-of-the-art VAEs and diffusion models. On conditional image manipulation, D2C generations are two orders of magnitude faster to produce over StyleGAN2 ones and are preferred by 50% - 60% of the human evaluators in a double-blind study.",
            "publicationTitle": "arXiv:2106.06819 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-06-12",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "D2C",
            "url": "http://arxiv.org/abs/2106.06819",
            "accessDate": "2021-07-22T16:31:54Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2106.06819",
            "tags": [
                {
                    "tag": "Computer Science - Artificial Intelligence",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {
                "dc:replaces": "http://zotero.org/users/7902311/items/8LW53DI8"
            },
            "dateAdded": "2021-07-22T16:31:55Z",
            "dateModified": "2021-07-26T06:53:48Z"
        }
    },
    {
        "key": "HFRNTGLQ",
        "version": 450,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/HFRNTGLQ",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/HFRNTGLQ",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Cheng et al.",
            "parsedDate": "2019",
            "numChildren": 1
        },
        "data": {
            "key": "HFRNTGLQ",
            "version": 450,
            "itemType": "conferencePaper",
            "title": "A Bayesian Perspective on the Deep Image Prior",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Zezhou",
                    "lastName": "Cheng"
                },
                {
                    "creatorType": "author",
                    "firstName": "Matheus",
                    "lastName": "Gadelha"
                },
                {
                    "creatorType": "author",
                    "firstName": "Subhransu",
                    "lastName": "Maji"
                },
                {
                    "creatorType": "author",
                    "firstName": "Daniel",
                    "lastName": "Sheldon"
                }
            ],
            "abstractNote": "The deep image prior [26] was recently introduced as a prior for natural images. It represents images as the output of a convolutional network with random inputs. For “inference”, gradient descent is performed to adjust network parameters to make the output match observations. This approach yields good performance on a range of image reconstruction tasks. We show that the deep image prior is asymptotically equivalent to a stationary Gaussian process prior in the limit as the number of channels in each layer of the network goes to inﬁnity, and derive the corresponding kernel. This informs a Bayesian approach to inference. We show that by conducting posterior inference using stochastic gradient Langevin dynamics we avoid the need for early stopping, which is a drawback of the current approach, and improve results for denoising and impainting tasks. We illustrate these intuitions on a number of 1D and 2D signal reconstruction tasks.",
            "date": "6/2019",
            "proceedingsTitle": "2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)",
            "conferenceName": "2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)",
            "place": "Long Beach, CA, USA",
            "publisher": "IEEE",
            "volume": "",
            "pages": "5438-5446",
            "series": "",
            "language": "en",
            "DOI": "10.1109/CVPR.2019.00559",
            "ISBN": "978-1-72813-293-8",
            "shortTitle": "",
            "url": "https://ieeexplore.ieee.org/document/8954206/",
            "accessDate": "2021-07-26T06:51:00Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "DOI.org (Crossref)",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/EH6JRM9E"
            },
            "dateAdded": "2021-07-26T06:52:53Z",
            "dateModified": "2021-07-26T06:52:53Z"
        }
    },
    {
        "key": "Z9FIEAFE",
        "version": 446,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/Z9FIEAFE",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/Z9FIEAFE",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Grill et al.",
            "parsedDate": "2020-09-10",
            "numChildren": 2
        },
        "data": {
            "key": "Z9FIEAFE",
            "version": 446,
            "itemType": "journalArticle",
            "title": "Bootstrap your own latent: A new approach to self-supervised Learning",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Jean-Bastien",
                    "lastName": "Grill"
                },
                {
                    "creatorType": "author",
                    "firstName": "Florian",
                    "lastName": "Strub"
                },
                {
                    "creatorType": "author",
                    "firstName": "Florent",
                    "lastName": "Altché"
                },
                {
                    "creatorType": "author",
                    "firstName": "Corentin",
                    "lastName": "Tallec"
                },
                {
                    "creatorType": "author",
                    "firstName": "Pierre H.",
                    "lastName": "Richemond"
                },
                {
                    "creatorType": "author",
                    "firstName": "Elena",
                    "lastName": "Buchatskaya"
                },
                {
                    "creatorType": "author",
                    "firstName": "Carl",
                    "lastName": "Doersch"
                },
                {
                    "creatorType": "author",
                    "firstName": "Bernardo Avila",
                    "lastName": "Pires"
                },
                {
                    "creatorType": "author",
                    "firstName": "Zhaohan Daniel",
                    "lastName": "Guo"
                },
                {
                    "creatorType": "author",
                    "firstName": "Mohammad Gheshlaghi",
                    "lastName": "Azar"
                },
                {
                    "creatorType": "author",
                    "firstName": "Bilal",
                    "lastName": "Piot"
                },
                {
                    "creatorType": "author",
                    "firstName": "Koray",
                    "lastName": "Kavukcuoglu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Rémi",
                    "lastName": "Munos"
                },
                {
                    "creatorType": "author",
                    "firstName": "Michal",
                    "lastName": "Valko"
                }
            ],
            "abstractNote": "We introduce Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other. From an augmented view of an image, we train the online network to predict the target network representation of the same image under a different augmented view. At the same time, we update the target network with a slow-moving average of the online network. While state-of-the art methods rely on negative pairs, BYOL achieves a new state of the art without them. BYOL reaches $74.3\\%$ top-1 classification accuracy on ImageNet using a linear evaluation with a ResNet-50 architecture and $79.6\\%$ with a larger ResNet. We show that BYOL performs on par or better than the current state of the art on both transfer and semi-supervised benchmarks. Our implementation and pretrained models are given on GitHub.",
            "publicationTitle": "arXiv:2006.07733 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-09-10",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "Bootstrap your own latent",
            "url": "http://arxiv.org/abs/2006.07733",
            "accessDate": "2021-07-26T06:51:08Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2006.07733",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Statistics - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-07-26T06:51:08Z",
            "dateModified": "2021-07-26T06:51:08Z"
        }
    },
    {
        "key": "AR4NYQRM",
        "version": 440,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/AR4NYQRM",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/AR4NYQRM",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Watson et al.",
            "parsedDate": "2021-06-07",
            "numChildren": 2
        },
        "data": {
            "key": "AR4NYQRM",
            "version": 440,
            "itemType": "journalArticle",
            "title": "Learning to Efficiently Sample from Diffusion Probabilistic Models",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Daniel",
                    "lastName": "Watson"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jonathan",
                    "lastName": "Ho"
                },
                {
                    "creatorType": "author",
                    "firstName": "Mohammad",
                    "lastName": "Norouzi"
                },
                {
                    "creatorType": "author",
                    "firstName": "William",
                    "lastName": "Chan"
                }
            ],
            "abstractNote": "Denoising Diffusion Probabilistic Models (DDPMs) have emerged as a powerful family of generative models that can yield high-fidelity samples and competitive log-likelihoods across a range of domains, including image and speech synthesis. Key advantages of DDPMs include ease of training, in contrast to generative adversarial networks, and speed of generation, in contrast to autoregressive models. However, DDPMs typically require hundreds-to-thousands of steps to generate a high fidelity sample, making them prohibitively expensive for high dimensional problems. Fortunately, DDPMs allow trading generation speed for sample quality through adjusting the number of refinement steps as a post process. Prior work has been successful in improving generation speed through handcrafting the time schedule by trial and error. We instead view the selection of the inference time schedules as an optimization problem, and introduce an exact dynamic programming algorithm that finds the optimal discrete time schedules for any pre-trained DDPM. Our method exploits the fact that ELBO can be decomposed into separate KL terms, and given any computation budget, discovers the time schedule that maximizes the training ELBO exactly. Our method is efficient, has no hyper-parameters of its own, and can be applied to any pre-trained DDPM with no retraining. We discover inference time schedules requiring as few as 32 refinement steps, while sacrificing less than 0.1 bits per dimension compared to the default 4,000 steps used on ImageNet 64x64 [Ho et al., 2020; Nichol and Dhariwal, 2021].",
            "publicationTitle": "arXiv:2106.03802 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-06-07",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2106.03802",
            "accessDate": "2021-07-22T16:32:18Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2106.03802",
            "tags": [
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-07-22T16:32:18Z",
            "dateModified": "2021-07-22T16:32:18Z"
        }
    },
    {
        "key": "DSBTHNHU",
        "version": 433,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/DSBTHNHU",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/DSBTHNHU",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Karras et al.",
            "parsedDate": "2019-03-29",
            "numChildren": 2
        },
        "data": {
            "key": "DSBTHNHU",
            "version": 433,
            "itemType": "journalArticle",
            "title": "A Style-Based Generator Architecture for Generative Adversarial Networks",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Tero",
                    "lastName": "Karras"
                },
                {
                    "creatorType": "author",
                    "firstName": "Samuli",
                    "lastName": "Laine"
                },
                {
                    "creatorType": "author",
                    "firstName": "Timo",
                    "lastName": "Aila"
                }
            ],
            "abstractNote": "We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.",
            "publicationTitle": "arXiv:1812.04948 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2019-03-29",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/1812.04948",
            "accessDate": "2021-07-22T05:42:42Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 1812.04948",
            "tags": [
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Neural and Evolutionary Computing",
                    "type": 1
                },
                {
                    "tag": "Statistics - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-07-22T05:42:42Z",
            "dateModified": "2021-07-22T05:42:42Z"
        }
    },
    {
        "key": "5AA4YFUK",
        "version": 428,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/5AA4YFUK",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/5AA4YFUK",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Shen and Zhou",
            "parsedDate": "2021-04-03",
            "numChildren": 2
        },
        "data": {
            "key": "5AA4YFUK",
            "version": 428,
            "itemType": "journalArticle",
            "title": "Closed-Form Factorization of Latent Semantics in GANs",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Yujun",
                    "lastName": "Shen"
                },
                {
                    "creatorType": "author",
                    "firstName": "Bolei",
                    "lastName": "Zhou"
                }
            ],
            "abstractNote": "A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images. In order to identify such latent dimensions for image editing, previous methods typically annotate a collection of synthesized samples and train linear classifiers in the latent space. However, they require a clear definition of the target attribute as well as the corresponding manual annotations, limiting their applications in practice. In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner. In particular, we take a closer look into the generation mechanism of GANs and further propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights. With a lightning-fast implementation, our approach is capable of not only finding semantically meaningful dimensions comparably to the state-of-the-art supervised methods, but also resulting in far more versatile concepts across multiple GAN models trained on a wide range of datasets.",
            "publicationTitle": "arXiv:2007.06600 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-03",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2007.06600",
            "accessDate": "2021-07-22T05:15:38Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2007.06600",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-07-22T05:15:39Z",
            "dateModified": "2021-07-22T05:15:39Z"
        }
    },
    {
        "key": "7FCRP639",
        "version": 424,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/7FCRP639",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/7FCRP639",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Guo et al.",
            "parsedDate": "2021-07-12",
            "numChildren": 3
        },
        "data": {
            "key": "7FCRP639",
            "version": 424,
            "itemType": "journalArticle",
            "title": "Fast and Explicit Neural View Synthesis",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Pengsheng",
                    "lastName": "Guo"
                },
                {
                    "creatorType": "author",
                    "firstName": "Miguel Angel",
                    "lastName": "Bautista"
                },
                {
                    "creatorType": "author",
                    "firstName": "Alex",
                    "lastName": "Colburn"
                },
                {
                    "creatorType": "author",
                    "firstName": "Liang",
                    "lastName": "Yang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Daniel",
                    "lastName": "Ulbricht"
                },
                {
                    "creatorType": "author",
                    "firstName": "Joshua M.",
                    "lastName": "Susskind"
                },
                {
                    "creatorType": "author",
                    "firstName": "Qi",
                    "lastName": "Shan"
                }
            ],
            "abstractNote": "We study the problem of novel view synthesis of a scene comprised of 3D objects. We propose a simple yet effective approach that is neither continuous nor implicit, challenging recent trends on view synthesis. We demonstrate that although continuous radiance field representations have gained a lot of attention due to their expressive power, our simple approach obtains comparable or even better novel view reconstruction quality comparing with state-of-the-art baselines while increasing rendering speed by over 400x. Our model is trained in a category-agnostic manner and does not require scene-specific optimization. Therefore, it is able to generalize novel view synthesis to object categories not seen during training. In addition, we show that with our simple formulation, we can use view synthesis as a self-supervision signal for efficient learning of 3D geometry without explicit 3D supervision.",
            "publicationTitle": "",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021/07/12",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "en",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "https://arxiv.org/abs/2107.05775v1",
            "accessDate": "2021-07-20T05:01:51Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arxiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [
                {
                    "tag": "pure"
                }
            ],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/4LPNJ4WU"
            },
            "dateAdded": "2021-07-22T04:47:34Z",
            "dateModified": "2021-07-22T04:47:34Z"
        }
    },
    {
        "key": "VJ63H28S",
        "version": 418,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/VJ63H28S",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/VJ63H28S",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Spezialetti et al.",
            "parsedDate": "2019-09-15",
            "numChildren": 3
        },
        "data": {
            "key": "VJ63H28S",
            "version": 418,
            "itemType": "journalArticle",
            "title": "Learning an Effective Equivariant 3D Descriptor Without Supervision",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Riccardo",
                    "lastName": "Spezialetti"
                },
                {
                    "creatorType": "author",
                    "firstName": "Samuele",
                    "lastName": "Salti"
                },
                {
                    "creatorType": "author",
                    "firstName": "Luigi",
                    "lastName": "Di Stefano"
                }
            ],
            "abstractNote": "Establishing correspondences between 3D shapes is a fundamental task in 3D Computer Vision, typically addressed by matching local descriptors. Recently, a few attempts at applying the deep learning paradigm to the task have shown promising results. Yet, the only explored way to learn rotation invariant descriptors has been to feed neural networks with highly engineered and invariant representations provided by existing hand-crafted descriptors, a path that goes in the opposite direction of end-to-end learning from raw data so successfully deployed for 2D images. In this paper, we explore the benefits of taking a step back in the direction of end-to-end learning of 3D descriptors by disentangling the creation of a robust and distinctive rotation equivariant representation, which can be learned from unoriented input data, and the definition of a good canonical orientation, required only at test time to obtain an invariant descriptor. To this end, we leverage two recent innovations: spherical convolutional neural networks to learn an equivariant descriptor and plane folding decoders to learn without supervision. The effectiveness of the proposed approach is experimentally validated by outperforming hand-crafted and learned descriptors on a standard benchmark.",
            "publicationTitle": "arXiv:1909.06887 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2019-09-15",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/1909.06887",
            "accessDate": "2021-07-19T10:33:05Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 1909.06887",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "aek"
                }
            ],
            "collections": [],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/QD8NM686"
            },
            "dateAdded": "2021-07-19T10:33:06Z",
            "dateModified": "2021-07-19T10:33:28Z"
        }
    },
    {
        "key": "74XD9YIJ",
        "version": 415,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/74XD9YIJ",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/74XD9YIJ",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Tewari et al.",
            "parsedDate": "2020-09-20",
            "numChildren": 2
        },
        "data": {
            "key": "74XD9YIJ",
            "version": 415,
            "itemType": "journalArticle",
            "title": "PIE: Portrait Image Embedding for Semantic Control",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Ayush",
                    "lastName": "Tewari"
                },
                {
                    "creatorType": "author",
                    "firstName": "Mohamed",
                    "lastName": "Elgharib"
                },
                {
                    "creatorType": "author",
                    "firstName": "Mallikarjun B.",
                    "lastName": "R."
                },
                {
                    "creatorType": "author",
                    "firstName": "Florian",
                    "lastName": "Bernard"
                },
                {
                    "creatorType": "author",
                    "firstName": "Hans-Peter",
                    "lastName": "Seidel"
                },
                {
                    "creatorType": "author",
                    "firstName": "Patrick",
                    "lastName": "Pérez"
                },
                {
                    "creatorType": "author",
                    "firstName": "Michael",
                    "lastName": "Zollhöfer"
                },
                {
                    "creatorType": "author",
                    "firstName": "Christian",
                    "lastName": "Theobalt"
                }
            ],
            "abstractNote": "Editing of portrait images is a very popular and important research topic with a large variety of applications. For ease of use, control should be provided via a semantically meaningful parameterization that is akin to computer animation controls. The vast majority of existing techniques do not provide such intuitive and fine-grained control, or only enable coarse editing of a single isolated control parameter. Very recently, high-quality semantically controlled editing has been demonstrated, however only on synthetically created StyleGAN images. We present the first approach for embedding real portrait images in the latent space of StyleGAN, which allows for intuitive editing of the head pose, facial expression, and scene illumination in the image. Semantic editing in parameter space is achieved based on StyleRig, a pretrained neural network that maps the control space of a 3D morphable face model to the latent space of the GAN. We design a novel hierarchical non-linear optimization problem to obtain the embedding. An identity preservation energy term allows spatially coherent edits while maintaining facial integrity. Our approach runs at interactive frame rates and thus allows the user to explore the space of possible edits. We evaluate our approach on a wide set of portrait photos, compare it to the current state of the art, and validate the effectiveness of its components in an ablation study.",
            "publicationTitle": "arXiv:2009.09485 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-09-20",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "PIE",
            "url": "http://arxiv.org/abs/2009.09485",
            "accessDate": "2021-07-16T08:58:35Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2009.09485",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Graphics",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {
                "dc:replaces": "http://zotero.org/users/7902311/items/I6MPCQS2"
            },
            "dateAdded": "2021-07-16T08:58:39Z",
            "dateModified": "2021-07-17T05:01:49Z"
        }
    },
    {
        "key": "8PEEG5WI",
        "version": 412,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/8PEEG5WI",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/8PEEG5WI",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Saha et al.",
            "parsedDate": "2021-03-10",
            "numChildren": 1
        },
        "data": {
            "key": "8PEEG5WI",
            "version": 412,
            "itemType": "journalArticle",
            "title": "LOHO: Latent Optimization of Hairstyles via Orthogonalization",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Rohit",
                    "lastName": "Saha"
                },
                {
                    "creatorType": "author",
                    "firstName": "Brendan",
                    "lastName": "Duke"
                },
                {
                    "creatorType": "author",
                    "firstName": "Florian",
                    "lastName": "Shkurti"
                },
                {
                    "creatorType": "author",
                    "firstName": "Graham W.",
                    "lastName": "Taylor"
                },
                {
                    "creatorType": "author",
                    "firstName": "Parham",
                    "lastName": "Aarabi"
                }
            ],
            "abstractNote": "Hairstyle transfer is challenging due to hair structure differences in the source and target hair. Therefore, we propose Latent Optimization of Hairstyles via Orthogonalization (LOHO), an optimization-based approach using GAN inversion to infill missing hair structure details in latent space during hairstyle transfer. Our approach decomposes hair into three attributes: perceptual structure, appearance, and style, and includes tailored losses to model each of these attributes independently. Furthermore, we propose two-stage optimization and gradient orthogonalization to enable disentangled latent space optimization of our hair attributes. Using LOHO for latent space manipulation, users can synthesize novel photorealistic images by manipulating hair attributes either individually or jointly, transferring the desired attributes from reference hairstyles. LOHO achieves a superior FID compared with the current state-of-the-art (SOTA) for hairstyle transfer. Additionally, LOHO preserves the subject's identity comparably well according to PSNR and SSIM when compared to SOTA image embedding pipelines. Code is available at https://github.com/dukebw/LOHO.",
            "publicationTitle": "arXiv:2103.03891 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-03-10",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "LOHO",
            "url": "http://arxiv.org/abs/2103.03891",
            "accessDate": "2021-07-16T04:08:51Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2103.03891",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [
                "F2HZKCVK"
            ],
            "relations": {
                "dc:replaces": "http://zotero.org/users/7902311/items/KH3GZIYP"
            },
            "dateAdded": "2021-07-16T04:08:51Z",
            "dateModified": "2021-07-16T08:17:14Z"
        }
    },
    {
        "key": "XPVFILBL",
        "version": 406,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/XPVFILBL",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/XPVFILBL",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Lee et al.",
            "parsedDate": "2021-07-09",
            "numChildren": 1
        },
        "data": {
            "key": "XPVFILBL",
            "version": 406,
            "itemType": "journalArticle",
            "title": "ViTGAN: Training GANs with Vision Transformers",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Kwonjoon",
                    "lastName": "Lee"
                },
                {
                    "creatorType": "author",
                    "firstName": "Huiwen",
                    "lastName": "Chang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Lu",
                    "lastName": "Jiang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Han",
                    "lastName": "Zhang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Zhuowen",
                    "lastName": "Tu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ce",
                    "lastName": "Liu"
                }
            ],
            "abstractNote": "Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases. In this paper, we investigate if such observation can be extended to image generation. To this end, we integrate the ViT architecture into generative adversarial networks (GANs). We observe that existing regularization methods for GANs interact poorly with self-attention, causing serious instability during training. To resolve this issue, we introduce novel regularization techniques for training GANs with ViTs. Empirically, our approach, named ViTGAN, achieves comparable performance to state-of-the-art CNN-based StyleGAN2 on CIFAR-10, CelebA, and LSUN bedroom datasets.",
            "publicationTitle": "arXiv:2107.04589 [cs, eess]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-07-09",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "ViTGAN",
            "url": "http://arxiv.org/abs/2107.04589",
            "accessDate": "2021-07-12T06:00:16Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2107.04589",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Electrical Engineering and Systems Science - Image and Video Processing",
                    "type": 1
                }
            ],
            "collections": [
                "LT39298Q"
            ],
            "relations": {
                "dc:replaces": "http://zotero.org/users/7902311/items/4WSSKLDR"
            },
            "dateAdded": "2021-07-12T06:00:20Z",
            "dateModified": "2021-07-13T05:47:42Z"
        }
    },
    {
        "key": "FJACULSK",
        "version": 406,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/FJACULSK",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/FJACULSK",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Siarohin et al.",
            "parsedDate": "2021-04-22",
            "numChildren": 0
        },
        "data": {
            "key": "FJACULSK",
            "version": 406,
            "itemType": "journalArticle",
            "title": "Motion Representations for Articulated Animation",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Aliaksandr",
                    "lastName": "Siarohin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Oliver J.",
                    "lastName": "Woodford"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jian",
                    "lastName": "Ren"
                },
                {
                    "creatorType": "author",
                    "firstName": "Menglei",
                    "lastName": "Chai"
                },
                {
                    "creatorType": "author",
                    "firstName": "Sergey",
                    "lastName": "Tulyakov"
                }
            ],
            "abstractNote": "We propose novel motion representations for animating articulated objects consisting of distinct parts. In a completely unsupervised manner, our method identifies object parts, tracks them in a driving video, and infers their motions by considering their principal axes. In contrast to the previous keypoint-based works, our method extracts meaningful and consistent regions, describing locations, shape, and pose. The regions correspond to semantically relevant and distinct object parts, that are more easily detected in frames of the driving video. To force decoupling of foreground from background, we model non-object related global motion with an additional affine transformation. To facilitate animation and prevent the leakage of the shape of the driving object, we disentangle shape and pose of objects in the region space. Our model can animate a variety of objects, surpassing previous methods by a large margin on existing benchmarks. We present a challenging new benchmark with high-resolution videos and show that the improvement is particularly pronounced when articulated objects are considered, reaching 96.6% user preference vs. the state of the art.",
            "publicationTitle": "arXiv:2104.11280 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-22",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2104.11280",
            "accessDate": "2021-07-13T04:20:14Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2104.11280",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                }
            ],
            "collections": [
                "F2HZKCVK"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/YX5KKJFZ",
                "dc:replaces": "http://zotero.org/users/7902311/items/DJ6YW5RF"
            },
            "dateAdded": "2021-07-13T04:20:16Z",
            "dateModified": "2021-07-13T05:47:40Z"
        }
    },
    {
        "key": "HA4JH92Q",
        "version": 399,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/HA4JH92Q",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/HA4JH92Q",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Bi et al.",
            "parsedDate": "2020-08-16",
            "numChildren": 0
        },
        "data": {
            "key": "HA4JH92Q",
            "version": 399,
            "itemType": "journalArticle",
            "title": "Neural Reflectance Fields for Appearance Acquisition",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Sai",
                    "lastName": "Bi"
                },
                {
                    "creatorType": "author",
                    "firstName": "Zexiang",
                    "lastName": "Xu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Pratul",
                    "lastName": "Srinivasan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ben",
                    "lastName": "Mildenhall"
                },
                {
                    "creatorType": "author",
                    "firstName": "Kalyan",
                    "lastName": "Sunkavalli"
                },
                {
                    "creatorType": "author",
                    "firstName": "Miloš",
                    "lastName": "Hašan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yannick",
                    "lastName": "Hold-Geoffroy"
                },
                {
                    "creatorType": "author",
                    "firstName": "David",
                    "lastName": "Kriegman"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ravi",
                    "lastName": "Ramamoorthi"
                }
            ],
            "abstractNote": "We present Neural Reflectance Fields, a novel deep scene representation that encodes volume density, normal and reflectance properties at any 3D point in a scene using a fully-connected neural network. We combine this representation with a physically-based differentiable ray marching framework that can render images from a neural reflectance field under any viewpoint and light. We demonstrate that neural reflectance fields can be estimated from images captured with a simple collocated camera-light setup, and accurately model the appearance of real-world scenes with complex geometry and reflectance. Once estimated, they can be used to render photo-realistic images under novel viewpoint and (non-collocated) lighting conditions and accurately reproduce challenging effects like specularities, shadows and occlusions. This allows us to perform high-quality view synthesis and relighting that is significantly better than previous methods. We also demonstrate that we can compose the estimated neural reflectance field of a real scene with traditional scene models and render them using standard Monte Carlo rendering engines. Our work thus enables a complete pipeline from high-quality and practical appearance acquisition to 3D scene composition and rendering.",
            "publicationTitle": "arXiv:2008.03824 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-08-16",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2008.03824",
            "accessDate": "2021-07-09T14:43:26Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2008.03824",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Graphics",
                    "type": 1
                }
            ],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {
                "dc:replaces": "http://zotero.org/users/7902311/items/FHW6GZKG"
            },
            "dateAdded": "2021-07-09T14:43:27Z",
            "dateModified": "2021-07-10T05:02:09Z"
        }
    },
    {
        "key": "IPMUGLKE",
        "version": 395,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/IPMUGLKE",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/IPMUGLKE",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Wang et al.",
            "parsedDate": "2021-04-02",
            "numChildren": 4
        },
        "data": {
            "key": "IPMUGLKE",
            "version": 395,
            "itemType": "journalArticle",
            "title": "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Ting-Chun",
                    "lastName": "Wang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Arun",
                    "lastName": "Mallya"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ming-Yu",
                    "lastName": "Liu"
                }
            ],
            "abstractNote": "We propose a neural talking-head video synthesis model and demonstrate its application to video conferencing. Our model learns to synthesize a talking-head video using a source image containing the target person's appearance and a driving video that dictates the motion in the output. Our motion is encoded based on a novel keypoint representation, where the identity-specific and motion-related information is decomposed unsupervisedly. Extensive experimental validation shows that our model outperforms competing methods on benchmark datasets. Moreover, our compact keypoint representation enables a video conferencing system that achieves the same visual quality as the commercial H.264 standard while only using one-tenth of the bandwidth. Besides, we show our keypoint representation allows the user to rotate the head during synthesis, which is useful for simulating face-to-face video conferencing experiences.",
            "publicationTitle": "arXiv:2011.15126 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-02",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2011.15126",
            "accessDate": "2021-06-29T08:34:13Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2011.15126",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                }
            ],
            "collections": [
                "F2HZKCVK"
            ],
            "relations": {
                "dc:replaces": "http://zotero.org/users/7902311/items/CU6M3ILJ"
            },
            "dateAdded": "2021-06-29T08:34:13Z",
            "dateModified": "2021-07-08T16:00:54Z"
        }
    },
    {
        "key": "RF7ID2P5",
        "version": 395,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/RF7ID2P5",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/RF7ID2P5",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Anokhin et al.",
            "parsedDate": "2020-11-27",
            "numChildren": 4
        },
        "data": {
            "key": "RF7ID2P5",
            "version": 395,
            "itemType": "journalArticle",
            "title": "Image Generators with Conditionally-Independent Pixel Synthesis",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Ivan",
                    "lastName": "Anokhin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Kirill",
                    "lastName": "Demochkin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Taras",
                    "lastName": "Khakhulin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Gleb",
                    "lastName": "Sterkin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Victor",
                    "lastName": "Lempitsky"
                },
                {
                    "creatorType": "author",
                    "firstName": "Denis",
                    "lastName": "Korzhenkov"
                }
            ],
            "abstractNote": "Existing image generator networks rely heavily on spatial convolutions and, optionally, self-attention blocks in order to gradually synthesize images in a coarse-to-fine manner. Here, we present a new architecture for image generators, where the color value at each pixel is computed independently given the value of a random latent vector and the coordinate of that pixel. No spatial convolutions or similar operations that propagate information across pixels are involved during the synthesis. We analyze the modeling capabilities of such generators when trained in an adversarial fashion, and observe the new generators to achieve similar generation quality to state-of-the-art convolutional generators. We also investigate several interesting properties unique to the new architecture.",
            "publicationTitle": "arXiv:2011.13775 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-11-27",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2011.13775",
            "accessDate": "2021-06-20T16:03:51Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2011.13775",
            "tags": [],
            "collections": [
                "LT39298Q"
            ],
            "relations": {
                "dc:replaces": "http://zotero.org/users/7902311/items/X5PMFWUL",
                "owl:sameAs": "http://zotero.org/groups/4320173/items/563ANXWP"
            },
            "dateAdded": "2021-06-20T16:03:52Z",
            "dateModified": "2021-07-08T16:00:53Z"
        }
    },
    {
        "key": "PEW9ADIM",
        "version": 390,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/PEW9ADIM",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/PEW9ADIM",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Burda et al.",
            "parsedDate": "2018-10-30",
            "numChildren": 2
        },
        "data": {
            "key": "PEW9ADIM",
            "version": 390,
            "itemType": "journalArticle",
            "title": "Exploration by Random Network Distillation",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Yuri",
                    "lastName": "Burda"
                },
                {
                    "creatorType": "author",
                    "firstName": "Harrison",
                    "lastName": "Edwards"
                },
                {
                    "creatorType": "author",
                    "firstName": "Amos",
                    "lastName": "Storkey"
                },
                {
                    "creatorType": "author",
                    "firstName": "Oleg",
                    "lastName": "Klimov"
                }
            ],
            "abstractNote": "We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed. The bonus is the error of a neural network predicting features of the observations given by a fixed randomly initialized neural network. We also introduce a method to flexibly combine intrinsic and extrinsic rewards. We find that the random network distillation (RND) bonus combined with this increased flexibility enables significant progress on several hard exploration Atari games. In particular we establish state of the art performance on Montezuma's Revenge, a game famously difficult for deep reinforcement learning methods. To the best of our knowledge, this is the first method that achieves better than average human performance on this game without using demonstrations or having access to the underlying state of the game, and occasionally completes the first level.",
            "publicationTitle": "arXiv:1810.12894 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2018-10-30",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/1810.12894",
            "accessDate": "2021-07-08T09:24:05Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 1810.12894",
            "tags": [
                {
                    "tag": "Computer Science - Artificial Intelligence",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Statistics - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-07-08T09:24:05Z",
            "dateModified": "2021-07-08T09:24:05Z"
        }
    },
    {
        "key": "CI2S3ESR",
        "version": 387,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/CI2S3ESR",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/CI2S3ESR",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Božicˇ et al.",
            "numChildren": 1
        },
        "data": {
            "key": "CI2S3ESR",
            "version": 387,
            "itemType": "journalArticle",
            "title": "TransformerFusion: Monocular RGB Scene Reconstruction using Transformers",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Aljaž",
                    "lastName": "Božicˇ"
                },
                {
                    "creatorType": "author",
                    "firstName": "Pablo",
                    "lastName": "Palafox"
                },
                {
                    "creatorType": "author",
                    "firstName": "Justus",
                    "lastName": "Thies"
                },
                {
                    "creatorType": "author",
                    "firstName": "Angela",
                    "lastName": "Dai"
                },
                {
                    "creatorType": "author",
                    "firstName": "Matthias",
                    "lastName": "Nießner"
                }
            ],
            "abstractNote": "We introduce TransformerFusion, a transformer-based 3D scene reconstruction approach. From an input monocular RGB video, the video frames are processed by a transformer network that fuses the observations into a volumetric feature grid representing the scene; this feature grid is then decoded into an implicit 3D scene representation. Key to our approach is the transformer architecture that enables the network to learn to attend to the most relevant image frames for each 3D location in the scene, supervised only by the scene reconstruction task. Features are fused in a coarse-to-ﬁne fashion, storing ﬁne-level features only where needed, requiring lower memory storage and enabling fusion at interactive rates. The feature grid is then decoded to a higher-resolution scene reconstruction, using an MLP-based surface occupancy prediction from interpolated coarse-to-ﬁne 3D features. Our approach results in an accurate surface reconstruction, outperforming state-of-the-art multi-view stereo depth estimation methods, fully-convolutional 3D reconstruction approaches, and approaches using LSTM- or GRU-based recurrent networks for video sequence fusion.",
            "publicationTitle": "",
            "volume": "",
            "issue": "",
            "pages": "17",
            "date": "",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "en",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "",
            "accessDate": "",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "Zotero",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-07-07T15:38:20Z",
            "dateModified": "2021-07-07T15:38:22Z"
        }
    },
    {
        "key": "LFSDSB8N",
        "version": 384,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/LFSDSB8N",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/LFSDSB8N",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Rochow et al.",
            "parsedDate": "2021-06-24",
            "numChildren": 0
        },
        "data": {
            "key": "LFSDSB8N",
            "version": 384,
            "itemType": "journalArticle",
            "title": "FaDIV-Syn: Fast Depth-Independent View Synthesis",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Andre",
                    "lastName": "Rochow"
                },
                {
                    "creatorType": "author",
                    "firstName": "Max",
                    "lastName": "Schwarz"
                },
                {
                    "creatorType": "author",
                    "firstName": "Michael",
                    "lastName": "Weinmann"
                },
                {
                    "creatorType": "author",
                    "firstName": "Sven",
                    "lastName": "Behnke"
                }
            ],
            "abstractNote": "We introduce FaDIV-Syn, a fast depth-independent view synthesis method. Our multi-view approach addresses the problem that view synthesis methods are often limited by their depth estimation stage, where incorrect depth predictions can lead to large projection errors. To avoid this issue, we efficiently warp multiple input images into the target frame for a range of assumed depth planes. The resulting tensor representation is fed into a U-Net-like CNN with gated convolutions, which directly produces the novel output view. We therefore side-step explicit depth estimation. This improves efficiency and performance on transparent, reflective, and feature-less scene parts. FaDIV-Syn can handle both interpolation and extrapolation tasks and outperforms state-of-the-art extrapolation methods on the large-scale RealEstate10k dataset. In contrast to comparable methods, it is capable of real-time operation due to its lightweight architecture. We further demonstrate data efficiency of FaDIV-Syn by training from fewer examples as well as its generalization to higher resolutions and arbitrary depth ranges under severe depth discretization.",
            "publicationTitle": "arXiv:2106.13139 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-06-24",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "FaDIV-Syn",
            "url": "http://arxiv.org/abs/2106.13139",
            "accessDate": "2021-07-06T08:24:41Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2106.13139",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                }
            ],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-07-06T08:24:44Z",
            "dateModified": "2021-07-06T08:24:46Z"
        }
    },
    {
        "key": "VGQBFZXW",
        "version": 380,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/VGQBFZXW",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/VGQBFZXW",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Richardson et al.",
            "numChildren": 1
        },
        "data": {
            "key": "VGQBFZXW",
            "version": 380,
            "itemType": "journalArticle",
            "title": "Encoding in Style: A StyleGAN Encoder for Image-to-Image Translation",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Elad",
                    "lastName": "Richardson"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yuval",
                    "lastName": "Alaluf"
                },
                {
                    "creatorType": "author",
                    "firstName": "Or",
                    "lastName": "Patashnik"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yotam",
                    "lastName": "Nitzan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yaniv",
                    "lastName": "Azar"
                },
                {
                    "creatorType": "author",
                    "firstName": "Stav",
                    "lastName": "Shapiro"
                },
                {
                    "creatorType": "author",
                    "firstName": "Daniel",
                    "lastName": "Cohen-Or"
                }
            ],
            "abstractNote": "",
            "publicationTitle": "",
            "volume": "",
            "issue": "",
            "pages": "10",
            "date": "",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "en",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "",
            "accessDate": "",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "Zotero",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [
                "F2HZKCVK"
            ],
            "relations": {},
            "dateAdded": "2021-07-06T03:38:38Z",
            "dateModified": "2021-07-06T03:38:38Z"
        }
    },
    {
        "key": "Y2QKW3AU",
        "version": 367,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/Y2QKW3AU",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/Y2QKW3AU",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Tritrong et al.",
            "parsedDate": "2021-04-12",
            "numChildren": 3
        },
        "data": {
            "key": "Y2QKW3AU",
            "version": 367,
            "itemType": "journalArticle",
            "title": "Repurposing GANs for One-shot Semantic Part Segmentation",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Nontawat",
                    "lastName": "Tritrong"
                },
                {
                    "creatorType": "author",
                    "firstName": "Pitchaporn",
                    "lastName": "Rewatbowornwong"
                },
                {
                    "creatorType": "author",
                    "firstName": "Supasorn",
                    "lastName": "Suwajanakorn"
                }
            ],
            "abstractNote": "While GANs have shown success in realistic image generation, the idea of using GANs for other tasks unrelated to synthesis is underexplored. Do GANs learn meaningful structural parts of objects during their attempt to reproduce those objects? In this work, we test this hypothesis and propose a simple and effective approach based on GANs for semantic part segmentation that requires as few as one label example along with an unlabeled dataset. Our key idea is to leverage a trained GAN to extract pixel-wise representation from the input image and use it as feature vectors for a segmentation network. Our experiments demonstrate that GANs representation is \"readily discriminative\" and produces surprisingly good results that are comparable to those from supervised baselines trained with significantly more labels. We believe this novel repurposing of GANs underlies a new class of unsupervised representation learning that is applicable to many other tasks. More results are available at https://repurposegans.github.io/.",
            "publicationTitle": "arXiv:2103.04379 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-12",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2103.04379",
            "accessDate": "2021-07-05T13:21:22Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "All rights reserved",
            "extra": "arXiv: 2103.04379",
            "inPublications": true,
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-07-05T13:21:22Z",
            "dateModified": "2021-07-05T13:21:23Z"
        }
    },
    {
        "key": "D525LI7L",
        "version": 383,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/D525LI7L",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/D525LI7L",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Zhang et al.",
            "parsedDate": "2021-06-03",
            "numChildren": 3
        },
        "data": {
            "key": "D525LI7L",
            "version": 383,
            "itemType": "journalArticle",
            "title": "NeRFactor: Neural Factorization of Shape and Reflectance Under an Unknown Illumination",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Xiuming",
                    "lastName": "Zhang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Pratul P.",
                    "lastName": "Srinivasan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Boyang",
                    "lastName": "Deng"
                },
                {
                    "creatorType": "author",
                    "firstName": "Paul",
                    "lastName": "Debevec"
                },
                {
                    "creatorType": "author",
                    "firstName": "William T.",
                    "lastName": "Freeman"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jonathan T.",
                    "lastName": "Barron"
                }
            ],
            "abstractNote": "We address the problem of recovering the shape and spatially-varying reflectance of an object from posed multi-view images of the object illuminated by one unknown lighting condition. This enables the rendering of novel views of the object under arbitrary environment lighting and editing of the object's material properties. The key to our approach, which we call Neural Radiance Factorization (NeRFactor), is to distill the volumetric geometry of a Neural Radiance Field (NeRF) [Mildenhall et al. 2020] representation of the object into a surface representation and then jointly refine the geometry while solving for the spatially-varying reflectance and the environment lighting. Specifically, NeRFactor recovers 3D neural fields of surface normals, light visibility, albedo, and Bidirectional Reflectance Distribution Functions (BRDFs) without any supervision, using only a re-rendering loss, simple smoothness priors, and a data-driven BRDF prior learned from real-world BRDF measurements. By explicitly modeling light visibility, NeRFactor is able to separate shadows from albedo and synthesize realistic soft or hard shadows under arbitrary lighting conditions. NeRFactor is able to recover convincing 3D models for free-viewpoint relighting in this challenging and underconstrained capture setup for both synthetic and real scenes. Qualitative and quantitative experiments show that NeRFactor outperforms classic and deep learning-based state of the art across various tasks. Our code and data are available at people.csail.mit.edu/xiuming/projects/nerfactor/.",
            "publicationTitle": "arXiv:2106.01970 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-06-03",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "NeRFactor",
            "url": "http://arxiv.org/abs/2106.01970",
            "accessDate": "2021-07-05T13:17:37Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2106.01970",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Graphics",
                    "type": 1
                }
            ],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/MRZQCDDS"
            },
            "dateAdded": "2021-07-05T13:17:37Z",
            "dateModified": "2021-07-05T13:19:16Z"
        }
    },
    {
        "key": "HFJ4ZVXS",
        "version": 354,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/HFJ4ZVXS",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/HFJ4ZVXS",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Luo et al.",
            "parsedDate": "2020-08-26",
            "numChildren": 3
        },
        "data": {
            "key": "HFJ4ZVXS",
            "version": 354,
            "itemType": "journalArticle",
            "title": "Consistent Video Depth Estimation",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Xuan",
                    "lastName": "Luo"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jia-Bin",
                    "lastName": "Huang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Richard",
                    "lastName": "Szeliski"
                },
                {
                    "creatorType": "author",
                    "firstName": "Kevin",
                    "lastName": "Matzen"
                },
                {
                    "creatorType": "author",
                    "firstName": "Johannes",
                    "lastName": "Kopf"
                }
            ],
            "abstractNote": "We present an algorithm for reconstructing dense, geometrically consistent depth for all pixels in a monocular video. We leverage a conventional structure-from-motion reconstruction to establish geometric constraints on pixels in the video. Unlike the ad-hoc priors in classical reconstruction, we use a learning-based prior, i.e., a convolutional neural network trained for single-image depth estimation. At test time, we fine-tune this network to satisfy the geometric constraints of a particular input video, while retaining its ability to synthesize plausible depth details in parts of the video that are less constrained. We show through quantitative validation that our method achieves higher accuracy and a higher degree of geometric consistency than previous monocular reconstruction methods. Visually, our results appear more stable. Our algorithm is able to handle challenging hand-held captured input videos with a moderate degree of dynamic motion. The improved quality of the reconstruction enables several applications, such as scene reconstruction and advanced video-based visual effects.",
            "publicationTitle": "arXiv:2004.15021 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-08-26",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2004.15021",
            "accessDate": "2021-07-01T09:38:55Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2004.15021",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-07-01T09:38:55Z",
            "dateModified": "2021-07-01T09:39:00Z"
        }
    },
    {
        "key": "SMZZ7DTF",
        "version": 348,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/SMZZ7DTF",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/SMZZ7DTF",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Wetzstein",
            "numChildren": 1
        },
        "data": {
            "key": "SMZZ7DTF",
            "version": 348,
            "itemType": "journalArticle",
            "title": "EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Gordon",
                    "lastName": "Wetzstein"
                }
            ],
            "abstractNote": "",
            "publicationTitle": "",
            "volume": "",
            "issue": "",
            "pages": "8",
            "date": "",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "en",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "",
            "accessDate": "",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "Zotero",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [
                "QP76V5CN"
            ],
            "relations": {},
            "dateAdded": "2021-06-29T13:33:26Z",
            "dateModified": "2021-06-29T13:33:26Z"
        }
    },
    {
        "key": "4HWNANVR",
        "version": 345,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/4HWNANVR",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/4HWNANVR",
                "type": "text/html"
            }
        },
        "meta": {
            "numChildren": 0
        },
        "data": {
            "key": "4HWNANVR",
            "version": 345,
            "itemType": "attachment",
            "linkMode": "imported_url",
            "title": "book-fall-07.pdf",
            "accessDate": "2021-06-29T13:14:36Z",
            "url": "https://see.stanford.edu/materials/lsoftaee261/book-fall-07.pdf",
            "note": "",
            "contentType": "application/pdf",
            "charset": "",
            "filename": "book-fall-07.pdf",
            "md5": "efdb368e3544db43d33d1bd8c6befd09",
            "mtime": 1624972476000,
            "tags": [],
            "collections": [
                "QP76V5CN"
            ],
            "relations": {},
            "dateAdded": "2021-06-29T13:14:36Z",
            "dateModified": "2021-06-29T13:14:36Z"
        }
    },
    {
        "key": "R38Y95JI",
        "version": 341,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/R38Y95JI",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/R38Y95JI",
                "type": "text/html"
            }
        },
        "meta": {
            "numChildren": 1
        },
        "data": {
            "key": "R38Y95JI",
            "version": 341,
            "itemType": "webpage",
            "title": "Properties of Fourier Transform",
            "creators": [],
            "abstractNote": "",
            "websiteTitle": "",
            "websiteType": "",
            "date": "",
            "shortTitle": "",
            "url": "http://fourier.eng.hmc.edu/e101/lectures/handout3/node2.html",
            "accessDate": "2021-06-29T12:39:13Z",
            "language": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [
                "QP76V5CN"
            ],
            "relations": {},
            "dateAdded": "2021-06-29T12:39:13Z",
            "dateModified": "2021-06-29T12:39:15Z"
        }
    },
    {
        "key": "Z9GH4CHH",
        "version": 395,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/Z9GH4CHH",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/Z9GH4CHH",
                "type": "text/html"
            }
        },
        "meta": {
            "numChildren": 0
        },
        "data": {
            "key": "Z9GH4CHH",
            "version": 395,
            "itemType": "webpage",
            "title": "One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing",
            "creators": [],
            "abstractNote": "",
            "websiteTitle": "",
            "websiteType": "",
            "date": "",
            "shortTitle": "",
            "url": "https://nvlabs.github.io/face-vid2vid/",
            "accessDate": "2021-06-29T08:22:40Z",
            "language": "en",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-29T08:22:40Z",
            "dateModified": "2021-06-29T08:22:40Z"
        }
    },
    {
        "key": "J2N3Z7QS",
        "version": 334,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/J2N3Z7QS",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/J2N3Z7QS",
                "type": "text/html"
            }
        },
        "meta": {
            "numChildren": 1
        },
        "data": {
            "key": "J2N3Z7QS",
            "version": 334,
            "itemType": "webpage",
            "title": "face-vid2vid",
            "creators": [],
            "abstractNote": "",
            "websiteTitle": "",
            "websiteType": "",
            "date": "",
            "shortTitle": "",
            "url": "https://nvlabs.github.io/face-vid2vid/",
            "accessDate": "2021-06-29T08:22:37Z",
            "language": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [
                "F2HZKCVK"
            ],
            "relations": {},
            "dateAdded": "2021-06-29T08:22:37Z",
            "dateModified": "2021-06-29T08:22:37Z"
        }
    },
    {
        "key": "2RP3YE93",
        "version": 329,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/2RP3YE93",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/2RP3YE93",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Thies et al.",
            "parsedDate": "2020-07-29",
            "numChildren": 3
        },
        "data": {
            "key": "2RP3YE93",
            "version": 329,
            "itemType": "journalArticle",
            "title": "Neural Voice Puppetry: Audio-driven Facial Reenactment",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Justus",
                    "lastName": "Thies"
                },
                {
                    "creatorType": "author",
                    "firstName": "Mohamed",
                    "lastName": "Elgharib"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ayush",
                    "lastName": "Tewari"
                },
                {
                    "creatorType": "author",
                    "firstName": "Christian",
                    "lastName": "Theobalt"
                },
                {
                    "creatorType": "author",
                    "firstName": "Matthias",
                    "lastName": "Nießner"
                }
            ],
            "abstractNote": "We present Neural Voice Puppetry, a novel approach for audio-driven facial video synthesis. Given an audio sequence of a source person or digital assistant, we generate a photo-realistic output video of a target person that is in sync with the audio of the source input. This audio-driven facial reenactment is driven by a deep neural network that employs a latent 3D face model space. Through the underlying 3D representation, the model inherently learns temporal stability while we leverage neural rendering to generate photo-realistic output frames. Our approach generalizes across different people, allowing us to synthesize videos of a target actor with the voice of any unknown source actor or even synthetic voices that can be generated utilizing standard text-to-speech approaches. Neural Voice Puppetry has a variety of use-cases, including audio-driven video avatars, video dubbing, and text-driven video synthesis of a talking head. We demonstrate the capabilities of our method in a series of audio- and text-based puppetry examples, including comparisons to state-of-the-art techniques and a user study.",
            "publicationTitle": "arXiv:1912.05566 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-07-29",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "Neural Voice Puppetry",
            "url": "http://arxiv.org/abs/1912.05566",
            "accessDate": "2021-06-29T05:15:03Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 1912.05566",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Graphics",
                    "type": 1
                }
            ],
            "collections": [
                "F2HZKCVK"
            ],
            "relations": {},
            "dateAdded": "2021-06-29T05:15:03Z",
            "dateModified": "2021-06-29T05:15:03Z"
        }
    },
    {
        "key": "43SSUSDX",
        "version": 329,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/43SSUSDX",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/43SSUSDX",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Fried et al.",
            "parsedDate": "2019-06-04",
            "numChildren": 3
        },
        "data": {
            "key": "43SSUSDX",
            "version": 329,
            "itemType": "journalArticle",
            "title": "Text-based Editing of Talking-head Video",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Ohad",
                    "lastName": "Fried"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ayush",
                    "lastName": "Tewari"
                },
                {
                    "creatorType": "author",
                    "firstName": "Michael",
                    "lastName": "Zollhöfer"
                },
                {
                    "creatorType": "author",
                    "firstName": "Adam",
                    "lastName": "Finkelstein"
                },
                {
                    "creatorType": "author",
                    "firstName": "Eli",
                    "lastName": "Shechtman"
                },
                {
                    "creatorType": "author",
                    "firstName": "Dan B.",
                    "lastName": "Goldman"
                },
                {
                    "creatorType": "author",
                    "firstName": "Kyle",
                    "lastName": "Genova"
                },
                {
                    "creatorType": "author",
                    "firstName": "Zeyu",
                    "lastName": "Jin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Christian",
                    "lastName": "Theobalt"
                },
                {
                    "creatorType": "author",
                    "firstName": "Maneesh",
                    "lastName": "Agrawala"
                }
            ],
            "abstractNote": "Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis.",
            "publicationTitle": "arXiv:1906.01524 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2019-06-04",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/1906.01524",
            "accessDate": "2021-06-29T05:14:55Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 1906.01524",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Graphics",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [
                "F2HZKCVK"
            ],
            "relations": {},
            "dateAdded": "2021-06-29T05:14:55Z",
            "dateModified": "2021-06-29T05:14:57Z"
        }
    },
    {
        "key": "VIJCJGDN",
        "version": 327,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/VIJCJGDN",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/VIJCJGDN",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Chan and Koo",
            "parsedDate": "2008",
            "numChildren": 1
        },
        "data": {
            "key": "VIJCJGDN",
            "version": 327,
            "itemType": "journalArticle",
            "title": "AN INTRODUCTION TO SYNTHETIC APERTURE RADAR (SAR)",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Yee Kit",
                    "lastName": "Chan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Voon Chet",
                    "lastName": "Koo"
                }
            ],
            "abstractNote": "This paper outlines basic principle of Synthetic Aperture Radar (SAR). Matched ﬁlter approaches for processing the received data and pulse compression technique are presented. Besides the SAR radar equation, the linear frequency modulation (LFM) waveform and matched ﬁlter response are also discussed. Finally the system design consideration of various parameters and aspects are also highlighted.",
            "publicationTitle": "Progress In Electromagnetics Research B",
            "volume": "2",
            "issue": "",
            "pages": "27-60",
            "date": "2008",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "PIER B",
            "language": "en",
            "DOI": "10.2528/PIERB07110101",
            "ISSN": "1937-6472",
            "shortTitle": "",
            "url": "http://www.jpier.org/PIERB/pier.php?paper=07110101",
            "accessDate": "2021-06-28T17:58:55Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "DOI.org (Crossref)",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [
                "QP76V5CN"
            ],
            "relations": {},
            "dateAdded": "2021-06-28T17:58:55Z",
            "dateModified": "2021-06-28T17:58:55Z"
        }
    },
    {
        "key": "UYE7ST24",
        "version": 323,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/UYE7ST24",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/UYE7ST24",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Ulyanov et al.",
            "parsedDate": "2020",
            "numChildren": 2
        },
        "data": {
            "key": "UYE7ST24",
            "version": 323,
            "itemType": "journalArticle",
            "title": "Deep Image Prior",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Dmitry",
                    "lastName": "Ulyanov"
                },
                {
                    "creatorType": "author",
                    "firstName": "Andrea",
                    "lastName": "Vedaldi"
                },
                {
                    "creatorType": "author",
                    "firstName": "Victor",
                    "lastName": "Lempitsky"
                }
            ],
            "abstractNote": "Deep convolutional networks have become a popular tool for image generation and restoration. Generally, their excellent performance is imputed to their ability to learn realistic image priors from a large number of example images. In this paper, we show that, on the contrary, the structure of a generator network is sufficient to capture a great deal of low-level image statistics prior to any learning. In order to do so, we show that a randomly-initialized neural network can be used as a handcrafted prior with excellent results in standard inverse problems such as denoising, super-resolution, and inpainting. Furthermore, the same prior can be used to invert deep neural representations to diagnose them, and to restore images based on flash-no flash input pairs. Apart from its diverse applications, our approach highlights the inductive bias captured by standard generator network architectures. It also bridges the gap between two very popular families of image restoration methods: learning-based methods using deep convolutional networks and learning-free methods based on handcrafted image priors such as self-similarity. Code and supplementary material are available at https://dmitryulyanov.github.io/deep_image_prior .",
            "publicationTitle": "International Journal of Computer Vision",
            "volume": "128",
            "issue": "7",
            "pages": "1867-1888",
            "date": "07/2020",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "Int J Comput Vis",
            "language": "",
            "DOI": "10.1007/s11263-020-01303-4",
            "ISSN": "0920-5691, 1573-1405",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/1711.10925",
            "accessDate": "2021-06-28T16:40:33Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 1711.10925",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Statistics - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-28T16:40:33Z",
            "dateModified": "2021-06-28T16:40:34Z"
        }
    },
    {
        "key": "UUTNYPIJ",
        "version": 320,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/UUTNYPIJ",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/UUTNYPIJ",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Wu et al.",
            "parsedDate": "2019-05-18",
            "numChildren": 3
        },
        "data": {
            "key": "UUTNYPIJ",
            "version": 320,
            "itemType": "journalArticle",
            "title": "Deep Compressed Sensing",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Yan",
                    "lastName": "Wu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Mihaela",
                    "lastName": "Rosca"
                },
                {
                    "creatorType": "author",
                    "firstName": "Timothy",
                    "lastName": "Lillicrap"
                }
            ],
            "abstractNote": "Compressed sensing (CS) provides an elegant framework for recovering sparse signals from compressed measurements. For example, CS can exploit the structure of natural images and recover an image from only a few random measurements. CS is flexible and data efficient, but its application has been restricted by the strong assumption of sparsity and costly reconstruction process. A recent approach that combines CS with neural network generators has removed the constraint of sparsity, but reconstruction remains slow. Here we propose a novel framework that significantly improves both the performance and speed of signal recovery by jointly training a generator and the optimisation process for reconstruction via meta-learning. We explore training the measurements with different objectives, and derive a family of models based on minimising measurement errors. We show that Generative Adversarial Nets (GANs) can be viewed as a special case in this family of models. Borrowing insights from the CS perspective, we develop a novel way of improving GANs using gradient information from the discriminator.",
            "publicationTitle": "arXiv:1905.06723 [cs, eess, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2019-05-18",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/1905.06723",
            "accessDate": "2021-06-28T16:29:34Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 1905.06723",
            "tags": [
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Electrical Engineering and Systems Science - Signal Processing",
                    "type": 1
                },
                {
                    "tag": "Statistics - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [
                "QP76V5CN"
            ],
            "relations": {},
            "dateAdded": "2021-06-28T16:29:34Z",
            "dateModified": "2021-06-28T16:29:34Z"
        }
    },
    {
        "key": "Y7XUZZP9",
        "version": 320,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/Y7XUZZP9",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/Y7XUZZP9",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Heckel and Soltanolkotabi",
            "parsedDate": "2020-05-07",
            "numChildren": 3
        },
        "data": {
            "key": "Y7XUZZP9",
            "version": 320,
            "itemType": "journalArticle",
            "title": "Compressive sensing with un-trained neural networks: Gradient descent finds the smoothest approximation",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Reinhard",
                    "lastName": "Heckel"
                },
                {
                    "creatorType": "author",
                    "firstName": "Mahdi",
                    "lastName": "Soltanolkotabi"
                }
            ],
            "abstractNote": "Un-trained convolutional neural networks have emerged as highly successful tools for image recovery and restoration. They are capable of solving standard inverse problems such as denoising and compressive sensing with excellent results by simply fitting a neural network model to measurements from a single image or signal without the need for any additional training data. For some applications, this critically requires additional regularization in the form of early stopping the optimization. For signal recovery from a few measurements, however, un-trained convolutional networks have an intriguing self-regularizing property: Even though the network can perfectly fit any image, the network recovers a natural image from few measurements when trained with gradient descent until convergence. In this paper, we provide numerical evidence for this property and study it theoretically. We show that---without any further regularization---an un-trained convolutional neural network can approximately reconstruct signals and images that are sufficiently structured, from a near minimal number of random measurements.",
            "publicationTitle": "arXiv:2005.03991 [cs, eess, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-05-07",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "Compressive sensing with un-trained neural networks",
            "url": "http://arxiv.org/abs/2005.03991",
            "accessDate": "2021-06-28T16:29:29Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2005.03991",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                },
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                },
                {
                    "tag": "Electrical Engineering and Systems Science - Image and Video Processing",
                    "type": 1
                },
                {
                    "tag": "Statistics - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [
                "QP76V5CN"
            ],
            "relations": {},
            "dateAdded": "2021-06-28T16:29:29Z",
            "dateModified": "2021-06-28T16:29:29Z"
        }
    },
    {
        "key": "NQ87R5AB",
        "version": 422,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/NQ87R5AB",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/NQ87R5AB",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Patole and Torlak",
            "parsedDate": "2013",
            "numChildren": 1
        },
        "data": {
            "key": "NQ87R5AB",
            "version": 422,
            "itemType": "journalArticle",
            "title": "Two Dimensional Array Imaging With Beam Steered Data",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Sujeet",
                    "lastName": "Patole"
                },
                {
                    "creatorType": "author",
                    "firstName": "Murat",
                    "lastName": "Torlak"
                }
            ],
            "abstractNote": "This paper discusses different approaches used for millimeter wave imaging of two-dimensional objects. Imaging of a two dimensional object requires reﬂected wave data to be collected across two distinct dimensions. In this paper, we propose a reconstruction method that uses narrowband waveforms along with two dimensional beam steering. The beam is steered in azimuthal and elevation direction, which forms the two distinct dimensions required for the reconstruction. The Reconstruction technique uses inverse Fourier transform along with amplitude and phase correction factors. In addition, this reconstruction technique does not require interpolation of the data in either wavenumber or spatial domain. Use of the two dimensional beam steering offers better performance in the presence of noise compared with the existing methods, such as switched array imaging system. Effects of RF impairments such as quantization of the phase of beam steering weights and timing jitter which add to phase noise, are analyzed.",
            "publicationTitle": "IEEE Transactions on Image Processing",
            "volume": "22",
            "issue": "12",
            "pages": "5181-5189",
            "date": "12/2013",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "IEEE Trans. on Image Process.",
            "language": "en",
            "DOI": "10.1109/TIP.2013.2282115",
            "ISSN": "1057-7149, 1941-0042",
            "shortTitle": "",
            "url": "http://ieeexplore.ieee.org/document/6600891/",
            "accessDate": "2021-06-28T14:35:02Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "DOI.org (Crossref)",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [
                {
                    "tag": "psf"
                }
            ],
            "collections": [
                "QP76V5CN"
            ],
            "relations": {},
            "dateAdded": "2021-06-28T14:35:02Z",
            "dateModified": "2021-06-28T14:35:02Z"
        }
    },
    {
        "key": "2FY2I56Q",
        "version": 309,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/2FY2I56Q",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/2FY2I56Q",
                "type": "text/html"
            }
        },
        "meta": {
            "parsedDate": "2021-02-22",
            "numChildren": 1
        },
        "data": {
            "key": "2FY2I56Q",
            "version": 309,
            "itemType": "encyclopediaArticle",
            "title": "Weyl expansion",
            "creators": [],
            "abstractNote": "In physics, Weyl expansion, also known as Weyl identity or angular spectrum expansion, expresses an outgoing spherical wave as a linear combination of plane waves. In Cartesian coordinate system, it can be denoted as\n\n  \n    \n      \n        \n          \n            \n              e\n              \n                −\n                j\n                \n                  k\n                  \n                    0\n                  \n                \n                r\n              \n            \n            r\n          \n        \n        =\n        \n          \n            1\n            \n              j\n              2\n              π\n            \n          \n        \n        \n          ∫\n          \n            −\n            ∞\n          \n          \n            ∞\n          \n        \n        \n          ∫\n          \n            −\n            ∞\n          \n          \n            ∞\n          \n        \n        d\n        \n          k\n          \n            x\n          \n        \n        d\n        \n          k\n          \n            y\n          \n        \n        \n          e\n          \n            −\n            j\n            (\n            \n              k\n              \n                x\n              \n            \n            x\n            +\n            \n              k\n              \n                y\n              \n            \n            y\n            )\n          \n        \n        \n          \n            \n              e\n              \n                −\n                j\n                \n                  k\n                  \n                    z\n                  \n                \n                \n                  |\n                \n                z\n                \n                  |\n                \n              \n            \n            \n              k\n              \n                z\n              \n            \n          \n        \n      \n    \n    {\\displaystyle {\\frac {e^{-jk_{0}r}}{r}}={\\frac {1}{j2\\pi }}\\int _{-\\infty }^{\\infty }\\int _{-\\infty }^{\\infty }dk_{x}dk_{y}e^{-j(k_{x}x+k_{y}y)}{\\frac {e^{-jk_{z}|z|}}{k_{z}}}}\n  where \n  \n    \n      \n        \n          k\n          \n            x\n          \n        \n      \n    \n    {\\displaystyle k_{x}}\n  , \n  \n    \n      \n        \n          k\n          \n            y\n          \n        \n      \n    \n    {\\displaystyle k_{y}}\n   and \n  \n    \n      \n        \n          k\n          \n            z\n          \n        \n      \n    \n    {\\displaystyle k_{z}}\n   are the wavenumbers in their respective coordinate axes:\n\n  \n    \n      \n        \n          k\n          \n            0\n          \n        \n        =\n        \n          \n            \n              k\n              \n                x\n              \n              \n                2\n              \n            \n            +\n            \n              k\n              \n                y\n              \n              \n                2\n              \n            \n            +\n            \n              k\n              \n                z\n              \n              \n                2\n              \n            \n          \n        \n      \n    \n    {\\displaystyle k_{0}={\\sqrt {k_{x}^{2}+k_{y}^{2}+k_{z}^{2}}}}\n  The expansion is named after Hermann Weyl, who published it in 1919. Weyl identity is largely used to characterize the reflection and transmission of spherical waves at planar interfaces; thus, it is often used to derive the Green's functions for Helmholtz equation in layered media. The expansion also covers evanescent wave components. It is often preferred to the Sommerfeld identity when the field representation is needed to be in Cartesian coordinates.The resulting Weyl integral is commonly encountered in microwave integrated circuit analysis and electromagnetic radiation over a stratified medium; as in the case for Sommerfeld integral, it is numerically evaluated. As a result, it is used in calculation of Green's functions for method of moments for such geometries. Other uses include the descriptions of dipolar emissions near surfaces in nanophotonics, holographic inverse scattering problems, Green's functions in quantum electrodynamics and acoustic or seismic waves.",
            "encyclopediaTitle": "Wikipedia",
            "series": "",
            "seriesNumber": "",
            "volume": "",
            "numberOfVolumes": "",
            "edition": "",
            "place": "",
            "publisher": "",
            "date": "2021-02-22T15:01:55Z",
            "pages": "",
            "ISBN": "",
            "shortTitle": "",
            "url": "https://en.wikipedia.org/w/index.php?title=Weyl_expansion&oldid=1008286290",
            "accessDate": "2021-06-28T14:27:57Z",
            "language": "en",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "Wikipedia",
            "callNumber": "",
            "rights": "Creative Commons Attribution-ShareAlike License",
            "extra": "Page Version ID: 1008286290",
            "tags": [],
            "collections": [
                "QP76V5CN"
            ],
            "relations": {},
            "dateAdded": "2021-06-28T14:27:57Z",
            "dateModified": "2021-06-28T14:27:57Z"
        }
    },
    {
        "key": "6E7M3H3R",
        "version": 302,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/6E7M3H3R",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/6E7M3H3R",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Yanik and Torlak",
            "parsedDate": "2019",
            "numChildren": 1
        },
        "data": {
            "key": "6E7M3H3R",
            "version": 302,
            "itemType": "journalArticle",
            "title": "Near-Field MIMO-SAR Millimeter-Wave Imaging With Sparsely Sampled Aperture Data",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Muhammet Emin",
                    "lastName": "Yanik"
                },
                {
                    "creatorType": "author",
                    "firstName": "Murat",
                    "lastName": "Torlak"
                }
            ],
            "abstractNote": "The primary challenge of a cost-effective and low-complexity near-ﬁeld millimeter-wave (mmWave) imaging system is to achieve high resolution with a few antenna elements as possible. Multiple-input multiple-output (MIMO) radar using simultaneous operation of spatially diverse transmit and receive antennas is a good candidate to increase the number of available degrees of freedom. On the other hand, higher integration complexity of extremely dense transceiver electronics limits the use of MIMO only solutions within a relatively large imaging aperture. Hybrid concepts combining synthetic aperture radar (SAR) techniques and sparse MIMO arrays present a good compromise to achieve short data acquisition time and low complexity. However, compared with conventional monostatic sampling schemes, image reconstruction methods for MIMO-SAR are more complicated. In this paper, we propose a high-resolution mmWave imaging system combining 2-D MIMO arrays with SAR, along with a novel Fourier-based image reconstruction algorithm using sparsely sampled aperture data. The proposed algorithm is veriﬁed by both simulation and processing real data collected with our mmWave imager prototype utilizing commercially available 77-GHz MIMO radar sensors. The experimental results conﬁrm that our complete solution presents a strong potential in high-resolution imaging with a signiﬁcantly reduced number of antenna elements.",
            "publicationTitle": "IEEE Access",
            "volume": "7",
            "issue": "",
            "pages": "31801-31819",
            "date": "2019",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "IEEE Access",
            "language": "en",
            "DOI": "10.1109/ACCESS.2019.2902859",
            "ISSN": "2169-3536",
            "shortTitle": "",
            "url": "https://ieeexplore.ieee.org/document/8658136/",
            "accessDate": "2021-06-28T14:02:17Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "DOI.org (Crossref)",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [
                "QP76V5CN"
            ],
            "relations": {},
            "dateAdded": "2021-06-28T14:02:17Z",
            "dateModified": "2021-06-28T14:02:17Z"
        }
    },
    {
        "key": "5KJ7MVJ5",
        "version": 313,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/5KJ7MVJ5",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/5KJ7MVJ5",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Yanik and Torlak",
            "numChildren": 3
        },
        "data": {
            "key": "5KJ7MVJ5",
            "version": 313,
            "itemType": "journalArticle",
            "title": "Millimeter-Wave Near-Field Imaging with Two-Dimensional SAR Data",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Muhammet Emin",
                    "lastName": "Yanik"
                },
                {
                    "creatorType": "author",
                    "firstName": "Murat",
                    "lastName": "Torlak"
                }
            ],
            "abstractNote": "In this paper, we present a low-cost high-resolution millimeter-wave (mmWave) imager that combines commercially available 77 GHz system-on-chip radar sensors and synthetic aperture radar (SAR) signal processing techniques. To create a synthetic aperture over the target scene, the imager is built with two-axis motorized rail system which can synthesize a large aperture in both the azimuth and elevation. For image reconstruction, we employ two signal processing techniques based on frequency-modulated continuous-wave (FMCW) radar signal model. Our prototype system is described in detail along with various imaging results. The experimental results conﬁrm that our low-cost system design demonstrates a great potential for high-resolution imaging tasks in various applications.",
            "publicationTitle": "",
            "volume": "",
            "issue": "",
            "pages": "6",
            "date": "",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "en",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "",
            "accessDate": "",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "Zotero",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [
                "QP76V5CN"
            ],
            "relations": {},
            "dateAdded": "2021-06-28T13:31:14Z",
            "dateModified": "2021-06-28T13:31:23Z"
        }
    },
    {
        "key": "T5TS9HIW",
        "version": 296,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/T5TS9HIW",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/T5TS9HIW",
                "type": "text/html"
            }
        },
        "meta": {
            "numChildren": 2
        },
        "data": {
            "key": "T5TS9HIW",
            "version": 296,
            "itemType": "webpage",
            "title": "Bayes in the sky: Bayesian inference and model selection in cosmology - Roberto Trotta",
            "creators": [],
            "abstractNote": "",
            "websiteTitle": "",
            "websiteType": "",
            "date": "",
            "shortTitle": "",
            "url": "https://ned.ipac.caltech.edu/level5/Sept13/Trotta/Trotta4.html",
            "accessDate": "2021-06-28T05:13:16Z",
            "language": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-28T05:13:16Z",
            "dateModified": "2021-06-28T05:13:16Z"
        }
    },
    {
        "key": "6BIBVQ7P",
        "version": 288,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/6BIBVQ7P",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/6BIBVQ7P",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Kong and Ping",
            "parsedDate": "2021-06-23",
            "numChildren": 3
        },
        "data": {
            "key": "6BIBVQ7P",
            "version": 288,
            "itemType": "journalArticle",
            "title": "On Fast Sampling of Diffusion Probabilistic Models",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Zhifeng",
                    "lastName": "Kong"
                },
                {
                    "creatorType": "author",
                    "firstName": "Wei",
                    "lastName": "Ping"
                }
            ],
            "abstractNote": "In this work, we propose FastDPM, a unified framework for fast sampling in diffusion probabilistic models. FastDPM generalizes previous methods and gives rise to new algorithms with improved sample quality. We systematically investigate the fast sampling methods under this framework across different domains, on different datasets, and with different amount of conditional information provided for generation. We find the performance of a particular method depends on data domains (e.g., image or audio), the trade-off between sampling speed and sample quality, and the amount of conditional information. We further provide insights and recipes on the choice of methods for practitioners.",
            "publicationTitle": "arXiv:2106.00132 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-06-23",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2106.00132",
            "accessDate": "2021-06-28T04:26:24Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2106.00132",
            "tags": [
                {
                    "tag": "Computer Science - Machine Learning",
                    "type": 1
                }
            ],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {},
            "dateAdded": "2021-06-28T04:26:24Z",
            "dateModified": "2021-06-28T04:26:24Z"
        }
    },
    {
        "key": "6Q2MWQIW",
        "version": 285,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/6Q2MWQIW",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/6Q2MWQIW",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Li et al.",
            "parsedDate": "2021-05-18",
            "numChildren": 2
        },
        "data": {
            "key": "6Q2MWQIW",
            "version": 285,
            "itemType": "journalArticle",
            "title": "SRDiff: Single Image Super-Resolution with Diffusion Probabilistic Models",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Haoying",
                    "lastName": "Li"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yifan",
                    "lastName": "Yang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Meng",
                    "lastName": "Chang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Huajun",
                    "lastName": "Feng"
                },
                {
                    "creatorType": "author",
                    "firstName": "Zhihai",
                    "lastName": "Xu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Qi",
                    "lastName": "Li"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yueting",
                    "lastName": "Chen"
                }
            ],
            "abstractNote": "Single image super-resolution (SISR) aims to reconstruct high-resolution (HR) images from the given low-resolution (LR) ones, which is an ill-posed problem because one LR image corresponds to multiple HR images. Recently, learning-based SISR methods have greatly outperformed traditional ones, while suffering from over-smoothing, mode collapse or large model footprint issues for PSNR-oriented, GAN-driven and flow-based methods respectively. To solve these problems, we propose a novel single image super-resolution diffusion probabilistic model (SRDiff), which is the first diffusion-based model for SISR. SRDiff is optimized with a variant of the variational bound on the data likelihood and can provide diverse and realistic SR predictions by gradually transforming the Gaussian noise into a super-resolution (SR) image conditioned on an LR input through a Markov chain. In addition, we introduce residual prediction to the whole framework to speed up convergence. Our extensive experiments on facial and general benchmarks (CelebA and DIV2K datasets) show that 1) SRDiff can generate diverse SR results in rich details with state-of-the-art performance, given only one LR input; 2) SRDiff is easy to train with a small footprint; and 3) SRDiff can perform flexible image manipulation including latent space interpolation and content fusion.",
            "publicationTitle": "arXiv:2104.14951 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-05-18",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "SRDiff",
            "url": "http://arxiv.org/abs/2104.14951",
            "accessDate": "2021-06-26T16:33:13Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2104.14951",
            "tags": [
                {
                    "tag": "Computer Science - Computer Vision and Pattern Recognition",
                    "type": 1
                }
            ],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {},
            "dateAdded": "2021-06-26T16:33:13Z",
            "dateModified": "2021-06-26T16:33:13Z"
        }
    },
    {
        "key": "HT98Y9ZU",
        "version": 283,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/HT98Y9ZU",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/HT98Y9ZU",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Song et al.",
            "parsedDate": "2021-02-10",
            "numChildren": 3
        },
        "data": {
            "key": "HT98Y9ZU",
            "version": 283,
            "itemType": "journalArticle",
            "title": "Score-Based Generative Modeling through Stochastic Differential Equations",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Yang",
                    "lastName": "Song"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jascha",
                    "lastName": "Sohl-Dickstein"
                },
                {
                    "creatorType": "author",
                    "firstName": "Diederik P.",
                    "lastName": "Kingma"
                },
                {
                    "creatorType": "author",
                    "firstName": "Abhishek",
                    "lastName": "Kumar"
                },
                {
                    "creatorType": "author",
                    "firstName": "Stefano",
                    "lastName": "Ermon"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ben",
                    "lastName": "Poole"
                }
            ],
            "abstractNote": "Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (\\aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of 1024 x 1024 images for the first time from a score-based generative model.",
            "publicationTitle": "arXiv:2011.13456 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-02-10",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2011.13456",
            "accessDate": "2021-06-26T16:16:48Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2011.13456",
            "tags": [
                {
                    "tag": "read"
                }
            ],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {},
            "dateAdded": "2021-06-26T16:16:48Z",
            "dateModified": "2021-06-26T16:16:50Z"
        }
    },
    {
        "key": "SZ3C29IB",
        "version": 360,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/SZ3C29IB",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/SZ3C29IB",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Flynn et al.",
            "parsedDate": "2019",
            "numChildren": 1
        },
        "data": {
            "key": "SZ3C29IB",
            "version": 360,
            "itemType": "conferencePaper",
            "title": "DeepView: View Synthesis With Learned Gradient Descent",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "John",
                    "lastName": "Flynn"
                },
                {
                    "creatorType": "author",
                    "firstName": "Michael",
                    "lastName": "Broxton"
                },
                {
                    "creatorType": "author",
                    "firstName": "Paul",
                    "lastName": "Debevec"
                },
                {
                    "creatorType": "author",
                    "firstName": "Matthew",
                    "lastName": "DuVall"
                },
                {
                    "creatorType": "author",
                    "firstName": "Graham",
                    "lastName": "Fyffe"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ryan",
                    "lastName": "Overbeck"
                },
                {
                    "creatorType": "author",
                    "firstName": "Noah",
                    "lastName": "Snavely"
                },
                {
                    "creatorType": "author",
                    "firstName": "Richard",
                    "lastName": "Tucker"
                }
            ],
            "abstractNote": "We present a novel approach to view synthesis using multiplane images (MPIs). Building on recent advances in learned gradient descent, our algorithm generates an MPI from a set of sparse camera viewpoints. The resulting method incorporates occlusion reasoning, improving performance on challenging scene features such as object boundaries, lighting reﬂections, thin structures, and scenes with high depth complexity. We show that our method achieves high-quality, state-of-the-art results on two datasets: the Kalantari light ﬁeld dataset, and a new camera array dataset, Spaces, which we make publicly available.",
            "date": "6/2019",
            "proceedingsTitle": "2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)",
            "conferenceName": "2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)",
            "place": "Long Beach, CA, USA",
            "publisher": "IEEE",
            "volume": "",
            "pages": "2362-2371",
            "series": "",
            "language": "en",
            "DOI": "10.1109/CVPR.2019.00247",
            "ISBN": "978-1-72813-293-8",
            "shortTitle": "DeepView",
            "url": "https://ieeexplore.ieee.org/document/8953705/",
            "accessDate": "2021-06-26T16:09:33Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "DOI.org (Crossref)",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/VSRM4YS5"
            },
            "dateAdded": "2021-06-26T16:09:33Z",
            "dateModified": "2021-06-26T16:09:33Z"
        }
    },
    {
        "key": "UTETBLZE",
        "version": 273,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/UTETBLZE",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/UTETBLZE",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Niklasson et al.",
            "parsedDate": "2021-02-11",
            "numChildren": 0
        },
        "data": {
            "key": "UTETBLZE",
            "version": 273,
            "itemType": "journalArticle",
            "title": "Self-Organising Textures",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Eyvind",
                    "lastName": "Niklasson"
                },
                {
                    "creatorType": "author",
                    "firstName": "Alexander",
                    "lastName": "Mordvintsev"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ettore",
                    "lastName": "Randazzo"
                },
                {
                    "creatorType": "author",
                    "firstName": "Michael",
                    "lastName": "Levin"
                }
            ],
            "abstractNote": "Neural Cellular Automata learn to generate textures, exhibiting surprising properties.",
            "publicationTitle": "Distill",
            "volume": "6",
            "issue": "2",
            "pages": "e00027.003",
            "date": "2021/02/11",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "Distill",
            "language": "en",
            "DOI": "10.23915/distill.00027.003",
            "ISSN": "2476-0757",
            "shortTitle": "",
            "url": "https://distill.pub/selforg/2021/textures",
            "accessDate": "2021-06-26T15:58:08Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "distill.pub",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {},
            "dateAdded": "2021-06-26T15:58:08Z",
            "dateModified": "2021-06-26T15:58:14Z"
        }
    },
    {
        "key": "4EKYAFYR",
        "version": 285,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/4EKYAFYR",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/4EKYAFYR",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Park et al.",
            "parsedDate": "2021-06-24",
            "numChildren": 3
        },
        "data": {
            "key": "4EKYAFYR",
            "version": 285,
            "itemType": "journalArticle",
            "title": "HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Keunhong",
                    "lastName": "Park"
                },
                {
                    "creatorType": "author",
                    "firstName": "Utkarsh",
                    "lastName": "Sinha"
                },
                {
                    "creatorType": "author",
                    "firstName": "Peter",
                    "lastName": "Hedman"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jonathan T.",
                    "lastName": "Barron"
                },
                {
                    "creatorType": "author",
                    "firstName": "Sofien",
                    "lastName": "Bouaziz"
                },
                {
                    "creatorType": "author",
                    "firstName": "Dan B.",
                    "lastName": "Goldman"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ricardo",
                    "lastName": "Martin-Brualla"
                },
                {
                    "creatorType": "author",
                    "firstName": "Steven M.",
                    "lastName": "Seitz"
                }
            ],
            "abstractNote": "Neural Radiance Fields (NeRF) are able to reconstruct scenes with unprecedented fidelity, and various recent works have extended NeRF to handle dynamic scenes. A common approach to reconstruct such non-rigid scenes is through the use of a learned deformation field mapping from coordinates in each input image into a canonical template coordinate space. However, these deformation-based approaches struggle to model changes in topology, as topological changes require a discontinuity in the deformation field, but these deformation fields are necessarily continuous. We address this limitation by lifting NeRFs into a higher dimensional space, and by representing the 5D radiance field corresponding to each individual input image as a slice through this \"hyper-space\". Our method is inspired by level set methods, which model the evolution of surfaces as slices through a higher dimensional surface. We evaluate our method on two tasks: (i) interpolating smoothly between \"moments\", i.e., configurations of the scene, seen in the input images while maintaining visual plausibility, and (ii) novel-view synthesis at fixed moments. We show that our method, which we dub HyperNeRF, outperforms existing methods on both tasks by significant margins. Compared to Nerfies, HyperNeRF reduces average error rates by 8.6% for interpolation and 8.8% for novel-view synthesis, as measured by LPIPS.",
            "publicationTitle": "arXiv:2106.13228 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-06-24",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "HyperNeRF",
            "url": "http://arxiv.org/abs/2106.13228",
            "accessDate": "2021-06-25T18:30:30Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2106.13228",
            "tags": [],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-06-25T18:30:30Z",
            "dateModified": "2021-06-25T18:30:35Z"
        }
    },
    {
        "key": "CYEZKAXX",
        "version": 359,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/CYEZKAXX",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/CYEZKAXX",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Nichol and Dhariwal",
            "parsedDate": "2021-02-18",
            "numChildren": 2
        },
        "data": {
            "key": "CYEZKAXX",
            "version": 359,
            "itemType": "journalArticle",
            "title": "Improved Denoising Diffusion Probabilistic Models",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Alex",
                    "lastName": "Nichol"
                },
                {
                    "creatorType": "author",
                    "firstName": "Prafulla",
                    "lastName": "Dhariwal"
                }
            ],
            "abstractNote": "Denoising diffusion probabilistic models (DDPM) are a class of generative models which have recently been shown to produce excellent samples. We show that with a few simple modifications, DDPMs can also achieve competitive log-likelihoods while maintaining high sample quality. Additionally, we find that learning variances of the reverse diffusion process allows sampling with an order of magnitude fewer forward passes with a negligible difference in sample quality, which is important for the practical deployment of these models. We additionally use precision and recall to compare how well DDPMs and GANs cover the target distribution. Finally, we show that the sample quality and likelihood of these models scale smoothly with model capacity and training compute, making them easily scalable. We release our code at https://github.com/openai/improved-diffusion",
            "publicationTitle": "arXiv:2102.09672 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-02-18",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2102.09672",
            "accessDate": "2021-06-25T08:12:48Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2102.09672",
            "tags": [],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/KWNUZBK2"
            },
            "dateAdded": "2021-06-25T08:12:49Z",
            "dateModified": "2021-06-25T08:12:49Z"
        }
    },
    {
        "key": "PVGU9IG5",
        "version": 488,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/PVGU9IG5",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/PVGU9IG5",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Chen et al.",
            "parsedDate": "2020-10-09",
            "numChildren": 2
        },
        "data": {
            "key": "PVGU9IG5",
            "version": 488,
            "itemType": "journalArticle",
            "title": "WaveGrad: Estimating Gradients for Waveform Generation",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Nanxin",
                    "lastName": "Chen"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yu",
                    "lastName": "Zhang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Heiga",
                    "lastName": "Zen"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ron J.",
                    "lastName": "Weiss"
                },
                {
                    "creatorType": "author",
                    "firstName": "Mohammad",
                    "lastName": "Norouzi"
                },
                {
                    "creatorType": "author",
                    "firstName": "William",
                    "lastName": "Chan"
                }
            ],
            "abstractNote": "This paper introduces WaveGrad, a conditional model for waveform generation which estimates gradients of the data density. The model is built on prior work on score matching and diffusion probabilistic models. It starts from a Gaussian white noise signal and iteratively refines the signal via a gradient-based sampler conditioned on the mel-spectrogram. WaveGrad offers a natural way to trade inference speed for sample quality by adjusting the number of refinement steps, and bridges the gap between non-autoregressive and autoregressive models in terms of audio quality. We find that it can generate high fidelity audio samples using as few as six iterations. Experiments reveal WaveGrad to generate high fidelity audio, outperforming adversarial non-autoregressive baselines and matching a strong likelihood-based autoregressive baseline using fewer sequential operations. Audio samples are available at https://wavegrad.github.io/.",
            "publicationTitle": "arXiv:2009.00713 [cs, eess, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-10-09",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "WaveGrad",
            "url": "http://arxiv.org/abs/2009.00713",
            "accessDate": "2021-06-24T12:53:38Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2009.00713",
            "tags": [],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/2M8T36BS"
            },
            "dateAdded": "2021-06-24T12:53:38Z",
            "dateModified": "2021-06-24T12:53:38Z"
        }
    },
    {
        "key": "IMIGHLZD",
        "version": 285,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/IMIGHLZD",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/IMIGHLZD",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Huang et al.",
            "parsedDate": "2021-06-05",
            "numChildren": 2
        },
        "data": {
            "key": "IMIGHLZD",
            "version": 285,
            "itemType": "journalArticle",
            "title": "A Variational Perspective on Diffusion-Based Generative Models and Score Matching",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Chin-Wei",
                    "lastName": "Huang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jae Hyun",
                    "lastName": "Lim"
                },
                {
                    "creatorType": "author",
                    "firstName": "Aaron",
                    "lastName": "Courville"
                }
            ],
            "abstractNote": "Discrete-time diffusion-based generative models and score matching methods have shown promising results in modeling high-dimensional image data. Recently, Song et al. (2021) show that diffusion processes that transform data into noise can be reversed via learning the score function, i.e. the gradient of the log-density of the perturbed data. They propose to plug the learned score function into an inverse formula to define a generative diffusion process. Despite the empirical success, a theoretical underpinning of this procedure is still lacking. In this work, we approach the (continuous-time) generative diffusion directly and derive a variational framework for likelihood estimation, which includes continuous-time normalizing flows as a special case, and can be seen as an infinitely deep variational autoencoder. Under this framework, we show that minimizing the score-matching loss is equivalent to maximizing a lower bound of the likelihood of the plug-in reverse SDE proposed by Song et al. (2021), bridging the theoretical gap.",
            "publicationTitle": "arXiv:2106.02808 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-06-05",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2106.02808",
            "accessDate": "2021-06-24T10:40:11Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2106.02808",
            "tags": [],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {},
            "dateAdded": "2021-06-24T10:40:11Z",
            "dateModified": "2021-06-24T10:40:14Z"
        }
    },
    {
        "key": "ILRZA9ZA",
        "version": 255,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/ILRZA9ZA",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/ILRZA9ZA",
                "type": "text/html"
            }
        },
        "meta": {
            "numChildren": 1
        },
        "data": {
            "key": "ILRZA9ZA",
            "version": 255,
            "itemType": "webpage",
            "title": "Variational Inference - Deriving the ELBO · Infinite n♾rm",
            "creators": [],
            "abstractNote": "",
            "websiteTitle": "",
            "websiteType": "",
            "date": "",
            "shortTitle": "",
            "url": "https://chrisorm.github.io/VI-ELBO.html",
            "accessDate": "2021-06-24T10:34:47Z",
            "language": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-24T10:34:47Z",
            "dateModified": "2021-06-24T10:34:47Z"
        }
    },
    {
        "key": "JRR7M9JQ",
        "version": 252,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/JRR7M9JQ",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/JRR7M9JQ",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Keng",
            "parsedDate": "2017-04-03",
            "numChildren": 1
        },
        "data": {
            "key": "JRR7M9JQ",
            "version": 252,
            "itemType": "webpage",
            "title": "Variational Bayes and The Mean-Field Approximation",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Brian",
                    "lastName": "Keng"
                }
            ],
            "abstractNote": "A brief introduction to variational Bayes and the mean-field approximation.",
            "websiteTitle": "Bounded Rationality",
            "websiteType": "",
            "date": "2017-04-03T08:02:46-05:00",
            "shortTitle": "",
            "url": "http://bjlkeng.github.io/posts/variational-bayes-and-the-mean-field-approximation/",
            "accessDate": "2021-06-24T10:20:02Z",
            "language": "en",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-24T10:20:02Z",
            "dateModified": "2021-06-24T10:20:02Z"
        }
    },
    {
        "key": "PIV5FK3P",
        "version": 285,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/PIV5FK3P",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/PIV5FK3P",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Karras et al.",
            "parsedDate": "2021-06-23",
            "numChildren": 2
        },
        "data": {
            "key": "PIV5FK3P",
            "version": 285,
            "itemType": "journalArticle",
            "title": "Alias-Free Generative Adversarial Networks",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Tero",
                    "lastName": "Karras"
                },
                {
                    "creatorType": "author",
                    "firstName": "Miika",
                    "lastName": "Aittala"
                },
                {
                    "creatorType": "author",
                    "firstName": "Samuli",
                    "lastName": "Laine"
                },
                {
                    "creatorType": "author",
                    "firstName": "Erik",
                    "lastName": "Härkönen"
                },
                {
                    "creatorType": "author",
                    "firstName": "Janne",
                    "lastName": "Hellsten"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jaakko",
                    "lastName": "Lehtinen"
                },
                {
                    "creatorType": "author",
                    "firstName": "Timo",
                    "lastName": "Aila"
                }
            ],
            "abstractNote": "We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the generator network. Interpreting all signals in the network as continuous, we derive generally applicable, small architectural changes that guarantee that unwanted information cannot leak into the hierarchical synthesis process. The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales. Our results pave the way for generative models better suited for video and animation.",
            "publicationTitle": "arXiv:2106.12423 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-06-23",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2106.12423",
            "accessDate": "2021-06-24T08:56:28Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2106.12423",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-24T08:56:28Z",
            "dateModified": "2021-06-24T08:56:28Z"
        }
    },
    {
        "key": "AAHHRFU4",
        "version": 285,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/AAHHRFU4",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/AAHHRFU4",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Shaham et al.",
            "parsedDate": "2019-09-04",
            "numChildren": 3
        },
        "data": {
            "key": "AAHHRFU4",
            "version": 285,
            "itemType": "journalArticle",
            "title": "SinGAN: Learning a Generative Model from a Single Natural Image",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Tamar Rott",
                    "lastName": "Shaham"
                },
                {
                    "creatorType": "author",
                    "firstName": "Tali",
                    "lastName": "Dekel"
                },
                {
                    "creatorType": "author",
                    "firstName": "Tomer",
                    "lastName": "Michaeli"
                }
            ],
            "abstractNote": "We introduce SinGAN, an unconditional generative model that can be learned from a single natural image. Our model is trained to capture the internal distribution of patches within the image, and is then able to generate high quality, diverse samples that carry the same visual content as the image. SinGAN contains a pyramid of fully convolutional GANs, each responsible for learning the patch distribution at a different scale of the image. This allows generating new samples of arbitrary size and aspect ratio, that have significant variability, yet maintain both the global structure and the fine textures of the training image. In contrast to previous single image GAN schemes, our approach is not limited to texture images, and is not conditional (i.e. it generates samples from noise). User studies confirm that the generated samples are commonly confused to be real images. We illustrate the utility of SinGAN in a wide range of image manipulation tasks.",
            "publicationTitle": "arXiv:1905.01164 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2019-09-04",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "SinGAN",
            "url": "http://arxiv.org/abs/1905.01164",
            "accessDate": "2021-06-23T17:35:44Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 1905.01164",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-23T17:35:44Z",
            "dateModified": "2021-06-23T17:35:44Z"
        }
    },
    {
        "key": "K99WRYEI",
        "version": 308,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/K99WRYEI",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/K99WRYEI",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Smith et al.",
            "parsedDate": "2020-09-21",
            "numChildren": 1
        },
        "data": {
            "key": "K99WRYEI",
            "version": 308,
            "itemType": "conferencePaper",
            "title": "Near-Field MIMO-ISAR Millimeter-Wave Imaging",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Josiah Wayl",
                    "lastName": "Smith"
                },
                {
                    "creatorType": "author",
                    "firstName": "Muhammet Emin",
                    "lastName": "Yanik"
                },
                {
                    "creatorType": "author",
                    "firstName": "Murat",
                    "lastName": "Torlak"
                }
            ],
            "abstractNote": "Multiple-input-multiple-output (MIMO) millimeterwave (mmWave) sensors for synthetic aperture radar (SAR) and inverse SAR (ISAR) address the fundamental challenges of costeffectiveness and scalability inherent to near-ﬁeld imaging. In this paper, near-ﬁeld MIMO-ISAR mmWave imaging systems are discussed and developed. The rotational ISAR (R-ISAR) regime investigated in this paper requires rotating the target at a constant radial distance from the transceiver and scanning the transceiver along a vertical track. Using a 77GHz mmWave radar, a high resolution three-dimensional (3-D) image can be reconstructed from this two-dimensional scanning taking into account the spherical near-ﬁeld wavefront. While prior work in literature consists of single-input-single-output circular synthetic aperture radar (SISO-CSAR) algorithms or computationally sluggish MIMO-CSAR image reconstruction algorithms, this paper proposes a novel algorithm for efﬁcient MIMO 3-D holographic imaging and details the design of a MIMO R-ISAR imaging system. The proposed algorithm applies a multistatic-tomonostatic phase compensation to the R-ISAR regime allowing for use of highly efﬁcient monostatic algorithms. We demonstrate the algorithm’s performance in real-world imaging scenarios on a prototyped MIMO R-ISAR platform. Our fully integrated system, consisting of an mechanical scanner and efﬁcient imaging algorithm, is capable of pairing the scanning efﬁciency of the MIMO regime with the computational efﬁciency of single pixel image reconstruction algorithms.",
            "date": "2020-9-21",
            "proceedingsTitle": "2020 IEEE Radar Conference (RadarConf20)",
            "conferenceName": "2020 IEEE Radar Conference (RadarConf20)",
            "place": "Florence, Italy",
            "publisher": "IEEE",
            "volume": "",
            "pages": "1-6",
            "series": "",
            "language": "en",
            "DOI": "10.1109/RadarConf2043947.2020.9266412",
            "ISBN": "978-1-72818-942-0",
            "shortTitle": "",
            "url": "https://ieeexplore.ieee.org/document/9266412/",
            "accessDate": "2021-06-23T10:23:16Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "DOI.org (Crossref)",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [
                "QP76V5CN"
            ],
            "relations": {},
            "dateAdded": "2021-06-23T10:23:16Z",
            "dateModified": "2021-06-23T10:23:16Z"
        }
    },
    {
        "key": "3BKWE2SR",
        "version": 236,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/3BKWE2SR",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/3BKWE2SR",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Lin et al.",
            "parsedDate": "2021-04-13",
            "numChildren": 2
        },
        "data": {
            "key": "3BKWE2SR",
            "version": 236,
            "itemType": "journalArticle",
            "title": "BARF: Bundle-Adjusting Neural Radiance Fields",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Chen-Hsuan",
                    "lastName": "Lin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Wei-Chiu",
                    "lastName": "Ma"
                },
                {
                    "creatorType": "author",
                    "firstName": "Antonio",
                    "lastName": "Torralba"
                },
                {
                    "creatorType": "author",
                    "firstName": "Simon",
                    "lastName": "Lucey"
                }
            ],
            "abstractNote": "Neural Radiance Fields (NeRF) have recently gained a surge of interest within the computer vision community for its power to synthesize photorealistic novel views of real-world scenes. One limitation of NeRF, however, is its requirement of accurate camera poses to learn the scene representations. In this paper, we propose Bundle-Adjusting Neural Radiance Fields (BARF) for training NeRF from imperfect (or even unknown) camera poses -- the joint problem of learning neural 3D representations and registering camera frames. We establish a theoretical connection to classical image alignment and show that coarse-to-fine registration is also applicable to NeRF. Furthermore, we show that na\\\"ively applying positional encoding in NeRF has a negative impact on registration with a synthesis-based objective. Experiments on synthetic and real-world data show that BARF can effectively optimize the neural scene representations and resolve large camera pose misalignment at the same time. This enables view synthesis and localization of video sequences from unknown camera poses, opening up new avenues for visual localization systems (e.g. SLAM) and potential applications for dense 3D mapping and reconstruction.",
            "publicationTitle": "arXiv:2104.06405 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-13",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "BARF",
            "url": "http://arxiv.org/abs/2104.06405",
            "accessDate": "2021-06-23T09:02:54Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2104.06405",
            "tags": [
                {
                    "tag": "nerf-no pose"
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-23T09:02:54Z",
            "dateModified": "2021-06-23T09:04:10Z"
        }
    },
    {
        "key": "LTAMWVUW",
        "version": 235,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/LTAMWVUW",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/LTAMWVUW",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Meng et al.",
            "parsedDate": "2021-03-30",
            "numChildren": 2
        },
        "data": {
            "key": "LTAMWVUW",
            "version": 235,
            "itemType": "journalArticle",
            "title": "GNeRF: GAN-based Neural Radiance Field without Posed Camera",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Quan",
                    "lastName": "Meng"
                },
                {
                    "creatorType": "author",
                    "firstName": "Anpei",
                    "lastName": "Chen"
                },
                {
                    "creatorType": "author",
                    "firstName": "Haimin",
                    "lastName": "Luo"
                },
                {
                    "creatorType": "author",
                    "firstName": "Minye",
                    "lastName": "Wu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Hao",
                    "lastName": "Su"
                },
                {
                    "creatorType": "author",
                    "firstName": "Lan",
                    "lastName": "Xu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Xuming",
                    "lastName": "He"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jingyi",
                    "lastName": "Yu"
                }
            ],
            "abstractNote": "We introduce GNeRF, a framework to marry Generative Adversarial Networks (GAN) with Neural Radiance Field reconstruction for the complex scenarios with unknown and even randomly initialized camera poses. Recent NeRF-based advances have gained popularity for remarkable realistic novel view synthesis. However, most of them heavily rely on accurate camera poses estimation, while few recent methods can only optimize the unknown camera poses in roughly forward-facing scenes with relatively short camera trajectories and require rough camera poses initialization. Differently, our GNeRF only utilizes randomly initialized poses for complex outside-in scenarios. We propose a novel two-phases end-to-end framework. The first phase takes the use of GANs into the new realm for coarse camera poses and radiance fields jointly optimization, while the second phase refines them with additional photometric loss. We overcome local minima using a hybrid and iterative optimization scheme. Extensive experiments on a variety of synthetic and natural scenes demonstrate the effectiveness of GNeRF. More impressively, our approach outperforms the baselines favorably in those scenes with repeated patterns or even low textures that are regarded as extremely challenging before.",
            "publicationTitle": "arXiv:2103.15606 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-03-30",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "GNeRF",
            "url": "http://arxiv.org/abs/2103.15606",
            "accessDate": "2021-06-20T16:04:03Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2103.15606",
            "tags": [
                {
                    "tag": "nerf-no pose"
                }
            ],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:04:03Z",
            "dateModified": "2021-06-23T09:03:44Z"
        }
    },
    {
        "key": "S9RRBVBF",
        "version": 235,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/S9RRBVBF",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/S9RRBVBF",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Wang et al.",
            "parsedDate": "2021-02-19",
            "numChildren": 3
        },
        "data": {
            "key": "S9RRBVBF",
            "version": 235,
            "itemType": "journalArticle",
            "title": "NeRF--: Neural Radiance Fields Without Known Camera Parameters",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Zirui",
                    "lastName": "Wang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Shangzhe",
                    "lastName": "Wu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Weidi",
                    "lastName": "Xie"
                },
                {
                    "creatorType": "author",
                    "firstName": "Min",
                    "lastName": "Chen"
                },
                {
                    "creatorType": "author",
                    "firstName": "Victor Adrian",
                    "lastName": "Prisacariu"
                }
            ],
            "abstractNote": "This paper tackles the problem of novel view synthesis (NVS) from 2D images without known camera poses and intrinsics. Among various NVS techniques, Neural Radiance Field (NeRF) has recently gained popularity due to its remarkable synthesis quality. Existing NeRF-based approaches assume that the camera parameters associated with each input image are either directly accessible at training, or can be accurately estimated with conventional techniques based on correspondences, such as Structure-from-Motion. In this work, we propose an end-to-end framework, termed NeRF--, for training NeRF models given only RGB images, without pre-computed camera parameters. Specifically, we show that the camera parameters, including both intrinsics and extrinsics, can be automatically discovered via joint optimisation during the training of the NeRF model. On the standard LLFF benchmark, our model achieves comparable novel view synthesis results compared to the baseline trained with COLMAP pre-computed camera parameters. We also conduct extensive analyses to understand the model behaviour under different camera trajectories, and show that in scenarios where COLMAP fails, our model still produces robust results.",
            "publicationTitle": "arXiv:2102.07064 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-02-19",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "NeRF--",
            "url": "http://arxiv.org/abs/2102.07064",
            "accessDate": "2021-06-20T16:03:58Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2102.07064",
            "tags": [
                {
                    "tag": "nerf-no pose"
                }
            ],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:58Z",
            "dateModified": "2021-06-23T09:03:24Z"
        }
    },
    {
        "key": "AZZC3G4L",
        "version": 236,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/AZZC3G4L",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/AZZC3G4L",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Locatello et al.",
            "parsedDate": "2020-10-14",
            "numChildren": 3
        },
        "data": {
            "key": "AZZC3G4L",
            "version": 236,
            "itemType": "journalArticle",
            "title": "Object-Centric Learning with Slot Attention",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Francesco",
                    "lastName": "Locatello"
                },
                {
                    "creatorType": "author",
                    "firstName": "Dirk",
                    "lastName": "Weissenborn"
                },
                {
                    "creatorType": "author",
                    "firstName": "Thomas",
                    "lastName": "Unterthiner"
                },
                {
                    "creatorType": "author",
                    "firstName": "Aravindh",
                    "lastName": "Mahendran"
                },
                {
                    "creatorType": "author",
                    "firstName": "Georg",
                    "lastName": "Heigold"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jakob",
                    "lastName": "Uszkoreit"
                },
                {
                    "creatorType": "author",
                    "firstName": "Alexey",
                    "lastName": "Dosovitskiy"
                },
                {
                    "creatorType": "author",
                    "firstName": "Thomas",
                    "lastName": "Kipf"
                }
            ],
            "abstractNote": "Learning object-centric representations of complex scenes is a promising step towards enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep learning approaches learn distributed representations that do not capture the compositional properties of natural scenes. In this paper, we present the Slot Attention module, an architectural component that interfaces with perceptual representations such as the output of a convolutional neural network and produces a set of task-dependent abstract representations which we call slots. These slots are exchangeable and can bind to any object in the input by specializing through a competitive procedure over multiple rounds of attention. We empirically demonstrate that Slot Attention can extract object-centric representations that enable generalization to unseen compositions when trained on unsupervised object discovery and supervised property prediction tasks.",
            "publicationTitle": "arXiv:2006.15055 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-10-14",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2006.15055",
            "accessDate": "2021-06-22T10:22:10Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2006.15055",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-22T10:22:10Z",
            "dateModified": "2021-06-22T10:22:10Z"
        }
    },
    {
        "key": "KRKX539K",
        "version": 236,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/KRKX539K",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/KRKX539K",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Stelzner et al.",
            "parsedDate": "2021-04-02",
            "numChildren": 4
        },
        "data": {
            "key": "KRKX539K",
            "version": 236,
            "itemType": "journalArticle",
            "title": "Decomposing 3D Scenes into Objects via Unsupervised Volume Segmentation",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Karl",
                    "lastName": "Stelzner"
                },
                {
                    "creatorType": "author",
                    "firstName": "Kristian",
                    "lastName": "Kersting"
                },
                {
                    "creatorType": "author",
                    "firstName": "Adam R.",
                    "lastName": "Kosiorek"
                }
            ],
            "abstractNote": "We present ObSuRF, a method which turns a single image of a scene into a 3D model represented as a set of Neural Radiance Fields (NeRFs), with each NeRF corresponding to a different object. A single forward pass of an encoder network outputs a set of latent vectors describing the objects in the scene. These vectors are used independently to condition a NeRF decoder, defining the geometry and appearance of each object. We make learning more computationally efficient by deriving a novel loss, which allows training NeRFs on RGB-D inputs without explicit ray marching. After confirming that the model performs equal or better than state of the art on three 2D image segmentation benchmarks, we apply it to two multi-object 3D datasets: A multiview version of CLEVR, and a novel dataset in which scenes are populated by ShapeNet models. We find that after training ObSuRF on RGB-D views of training scenes, it is capable of not only recovering the 3D geometry of a scene depicted in a single input image, but also to segment it into objects, despite receiving no supervision in that regard.",
            "publicationTitle": "arXiv:2104.01148 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-02",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2104.01148",
            "accessDate": "2021-06-22T10:14:53Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2104.01148",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-22T10:14:53Z",
            "dateModified": "2021-06-22T10:14:53Z"
        }
    },
    {
        "key": "TRKCUJS5",
        "version": 236,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/TRKCUJS5",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/TRKCUJS5",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Duan et al.",
            "parsedDate": "2020-07-16",
            "numChildren": 3
        },
        "data": {
            "key": "TRKCUJS5",
            "version": 236,
            "itemType": "journalArticle",
            "title": "Curriculum DeepSDF",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Yueqi",
                    "lastName": "Duan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Haidong",
                    "lastName": "Zhu"
                },
                {
                    "creatorType": "author",
                    "firstName": "He",
                    "lastName": "Wang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Li",
                    "lastName": "Yi"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ram",
                    "lastName": "Nevatia"
                },
                {
                    "creatorType": "author",
                    "firstName": "Leonidas J.",
                    "lastName": "Guibas"
                }
            ],
            "abstractNote": "When learning to sketch, beginners start with simple and flexible shapes, and then gradually strive for more complex and accurate ones in the subsequent training sessions. In this paper, we design a \"shape curriculum\" for learning continuous Signed Distance Function (SDF) on shapes, namely Curriculum DeepSDF. Inspired by how humans learn, Curriculum DeepSDF organizes the learning task in ascending order of difficulty according to the following two criteria: surface accuracy and sample difficulty. The former considers stringency in supervising with ground truth, while the latter regards the weights of hard training samples near complex geometry and fine structure. More specifically, Curriculum DeepSDF learns to reconstruct coarse shapes at first, and then gradually increases the accuracy and focuses more on complex local details. Experimental results show that a carefully-designed curriculum leads to significantly better shape reconstructions with the same training data, training epochs and network architecture as DeepSDF. We believe that the application of shape curricula can benefit the training process of a wide variety of 3D shape representation learning methods.",
            "publicationTitle": "arXiv:2003.08593 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-07-16",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2003.08593",
            "accessDate": "2021-06-22T10:12:49Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2003.08593",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-22T10:12:49Z",
            "dateModified": "2021-06-22T10:12:49Z"
        }
    },
    {
        "key": "9E7TRNAB",
        "version": 236,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/9E7TRNAB",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/9E7TRNAB",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Chong and Forsyth",
            "parsedDate": "2021-06-11",
            "numChildren": 3
        },
        "data": {
            "key": "9E7TRNAB",
            "version": 236,
            "itemType": "journalArticle",
            "title": "GANs N' Roses: Stable, Controllable, Diverse Image to Image Translation (works for videos too!)",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Min Jin",
                    "lastName": "Chong"
                },
                {
                    "creatorType": "author",
                    "firstName": "David",
                    "lastName": "Forsyth"
                }
            ],
            "abstractNote": "We show how to learn a map that takes a content code, derived from a face image, and a randomly chosen style code to an anime image. We derive an adversarial loss from our simple and effective definitions of style and content. This adversarial loss guarantees the map is diverse -- a very wide range of anime can be produced from a single content code. Under plausible assumptions, the map is not just diverse, but also correctly represents the probability of an anime, conditioned on an input face. In contrast, current multimodal generation procedures cannot capture the complex styles that appear in anime. Extensive quantitative experiments support the idea the map is correct. Extensive qualitative results show that the method can generate a much more diverse range of styles than SOTA comparisons. Finally, we show that our formalization of content and style allows us to perform video to video translation without ever training on videos.",
            "publicationTitle": "arXiv:2106.06561 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-06-11",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "GANs N' Roses",
            "url": "http://arxiv.org/abs/2106.06561",
            "accessDate": "2021-06-22T09:53:32Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2106.06561",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-22T09:53:32Z",
            "dateModified": "2021-06-22T09:53:32Z"
        }
    },
    {
        "key": "YNWI67BZ",
        "version": 360,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/YNWI67BZ",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/YNWI67BZ",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Barron et al.",
            "parsedDate": "2021-05-06",
            "numChildren": 2
        },
        "data": {
            "key": "YNWI67BZ",
            "version": 360,
            "itemType": "journalArticle",
            "title": "Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Jonathan T.",
                    "lastName": "Barron"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ben",
                    "lastName": "Mildenhall"
                },
                {
                    "creatorType": "author",
                    "firstName": "Matthew",
                    "lastName": "Tancik"
                },
                {
                    "creatorType": "author",
                    "firstName": "Peter",
                    "lastName": "Hedman"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ricardo",
                    "lastName": "Martin-Brualla"
                },
                {
                    "creatorType": "author",
                    "firstName": "Pratul P.",
                    "lastName": "Srinivasan"
                }
            ],
            "abstractNote": "The rendering procedure used by neural radiance fields (NeRF) samples a scene with a single ray per pixel and may therefore produce renderings that are excessively blurred or aliased when training or testing images observe scene content at different resolutions. The straightforward solution of supersampling by rendering with multiple rays per pixel is impractical for NeRF, because rendering each ray requires querying a multilayer perceptron hundreds of times. Our solution, which we call \"mip-NeRF\" (a la \"mipmap\"), extends NeRF to represent the scene at a continuously-valued scale. By efficiently rendering anti-aliased conical frustums instead of rays, mip-NeRF reduces objectionable aliasing artifacts and significantly improves NeRF's ability to represent fine details, while also being 7% faster than NeRF and half the size. Compared to NeRF, mip-NeRF reduces average error rates by 17% on the dataset presented with NeRF and by 60% on a challenging multiscale variant of that dataset that we present. Mip-NeRF is also able to match the accuracy of a brute-force supersampled NeRF on our multiscale dataset while being 22x faster.",
            "publicationTitle": "arXiv:2103.13415 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-05-06",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "Mip-NeRF",
            "url": "http://arxiv.org/abs/2103.13415",
            "accessDate": "2021-06-20T16:04:02Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2103.13415",
            "tags": [],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/C7NV4DL3"
            },
            "dateAdded": "2021-06-20T16:04:02Z",
            "dateModified": "2021-06-22T07:00:06Z"
        }
    },
    {
        "key": "HSQJR92J",
        "version": 396,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/HSQJR92J",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/HSQJR92J",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Lin et al.",
            "parsedDate": "2020-12-14",
            "numChildren": 2
        },
        "data": {
            "key": "HSQJR92J",
            "version": 396,
            "itemType": "journalArticle",
            "title": "Real-Time High-Resolution Background Matting",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Shanchuan",
                    "lastName": "Lin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Andrey",
                    "lastName": "Ryabtsev"
                },
                {
                    "creatorType": "author",
                    "firstName": "Soumyadip",
                    "lastName": "Sengupta"
                },
                {
                    "creatorType": "author",
                    "firstName": "Brian",
                    "lastName": "Curless"
                },
                {
                    "creatorType": "author",
                    "firstName": "Steve",
                    "lastName": "Seitz"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ira",
                    "lastName": "Kemelmacher-Shlizerman"
                }
            ],
            "abstractNote": "We introduce a real-time, high-resolution background replacement technique which operates at 30fps in 4K resolution, and 60fps for HD on a modern GPU. Our technique is based on background matting, where an additional frame of the background is captured and used in recovering the alpha matte and the foreground layer. The main challenge is to compute a high-quality alpha matte, preserving strand-level hair details, while processing high-resolution images in real-time. To achieve this goal, we employ two neural networks; a base network computes a low-resolution result which is refined by a second network operating at high-resolution on selective patches. We introduce two largescale video and image matting datasets: VideoMatte240K and PhotoMatte13K/85. Our approach yields higher quality results compared to the previous state-of-the-art in background matting, while simultaneously yielding a dramatic boost in both speed and resolution.",
            "publicationTitle": "arXiv:2012.07810 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-12-14",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2012.07810",
            "accessDate": "2021-06-22T05:53:26Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2012.07810",
            "tags": [
                {
                    "tag": "seminar"
                }
            ],
            "collections": [
                "F2HZKCVK"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/4F7Q5SQQ"
            },
            "dateAdded": "2021-06-22T05:53:26Z",
            "dateModified": "2021-06-22T05:54:19Z"
        }
    },
    {
        "key": "WNN73FRH",
        "version": 205,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/WNN73FRH",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/WNN73FRH",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Finn et al.",
            "parsedDate": "2017-07-18",
            "numChildren": 3
        },
        "data": {
            "key": "WNN73FRH",
            "version": 205,
            "itemType": "journalArticle",
            "title": "Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Chelsea",
                    "lastName": "Finn"
                },
                {
                    "creatorType": "author",
                    "firstName": "Pieter",
                    "lastName": "Abbeel"
                },
                {
                    "creatorType": "author",
                    "firstName": "Sergey",
                    "lastName": "Levine"
                }
            ],
            "abstractNote": "We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on two few-shot image classification benchmarks, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies.",
            "publicationTitle": "arXiv:1703.03400 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2017-07-18",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/1703.03400",
            "accessDate": "2021-06-22T03:41:34Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 1703.03400",
            "tags": [],
            "collections": [
                "PKH8WMHP"
            ],
            "relations": {},
            "dateAdded": "2021-06-22T03:41:34Z",
            "dateModified": "2021-06-22T03:41:34Z"
        }
    },
    {
        "key": "5JXL5TAU",
        "version": 187,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/5JXL5TAU",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/5JXL5TAU",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Suwajanakorn et al.",
            "parsedDate": "2015",
            "numChildren": 1
        },
        "data": {
            "key": "5JXL5TAU",
            "version": 187,
            "itemType": "journalArticle",
            "title": "What Makes Tom Hanks Look Like Tom Hanks",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Supasorn",
                    "lastName": "Suwajanakorn"
                },
                {
                    "creatorType": "author",
                    "firstName": "Steven M.",
                    "lastName": "Seitz"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ira",
                    "lastName": "Kemelmacher-Shlizerman"
                }
            ],
            "abstractNote": "We reconstruct a controllable model of a person from a large photo collection that captures his or her persona, i.e., physical appearance and behavior. The ability to operate on unstructured photo collections enables modeling a huge number of people, including celebrities and other well photographed people without requiring them to be scanned. Moreover, we show the ability to drive or puppeteer the captured person B using any other video of a different person A. In this scenario, B acts out the role of person A, but retains his/her own personality and character. Our system is based on a novel combination of 3D face reconstruction, tracking, alignment, and multi-texture modeling, applied to the puppeteering problem. We demonstrate convincing results on a large variety of celebrities derived from Internet imagery and video.",
            "publicationTitle": "2015 IEEE International Conference on Computer Vision (ICCV)",
            "volume": "",
            "issue": "",
            "pages": "3952-3960",
            "date": "12/2015",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "en",
            "DOI": "10.1109/ICCV.2015.450",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://ieeexplore.ieee.org/document/7410807/",
            "accessDate": "2021-06-21T18:37:06Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "DOI.org (Crossref)",
            "callNumber": "",
            "rights": "All rights reserved",
            "extra": "",
            "inPublications": true,
            "tags": [],
            "collections": [
                "F2HZKCVK"
            ],
            "relations": {},
            "dateAdded": "2021-06-21T18:37:06Z",
            "dateModified": "2021-06-21T18:45:13Z"
        }
    },
    {
        "key": "GKRDIJD5",
        "version": 187,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/GKRDIJD5",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/GKRDIJD5",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Suwajanakorn et al.",
            "parsedDate": "2017-07-20",
            "numChildren": 1
        },
        "data": {
            "key": "GKRDIJD5",
            "version": 187,
            "itemType": "journalArticle",
            "title": "Synthesizing Obama: learning lip sync from audio",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Supasorn",
                    "lastName": "Suwajanakorn"
                },
                {
                    "creatorType": "author",
                    "firstName": "Steven M.",
                    "lastName": "Seitz"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ira",
                    "lastName": "Kemelmacher-Shlizerman"
                }
            ],
            "abstractNote": "",
            "publicationTitle": "ACM Transactions on Graphics",
            "volume": "36",
            "issue": "4",
            "pages": "1-13",
            "date": "2017-07-20",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "ACM Trans. Graph.",
            "language": "en",
            "DOI": "10.1145/3072959.3073640",
            "ISSN": "0730-0301, 1557-7368",
            "shortTitle": "Synthesizing Obama",
            "url": "https://dl.acm.org/doi/10.1145/3072959.3073640",
            "accessDate": "2021-06-21T18:34:08Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "DOI.org (Crossref)",
            "callNumber": "",
            "rights": "All rights reserved",
            "extra": "",
            "inPublications": true,
            "tags": [],
            "collections": [
                "F2HZKCVK"
            ],
            "relations": {},
            "dateAdded": "2021-06-21T18:34:08Z",
            "dateModified": "2021-06-21T18:34:08Z"
        }
    },
    {
        "key": "KZWM976P",
        "version": 146,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/KZWM976P",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/KZWM976P",
                "type": "text/html"
            }
        },
        "meta": {
            "numChildren": 0
        },
        "data": {
            "key": "KZWM976P",
            "version": 146,
            "itemType": "attachment",
            "linkMode": "imported_file",
            "title": "TPAMI-2021-05-0852_Proof_hi.pdf",
            "accessDate": "",
            "url": "",
            "note": "",
            "contentType": "application/pdf",
            "charset": "",
            "filename": "TPAMI-2021-05-0852_Proof_hi.pdf",
            "md5": "2782955e41fdb136c22039f2d5f37c2d",
            "mtime": 1624259395383,
            "tags": [
                {
                    "tag": "review"
                },
                {
                    "tag": "tpami"
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-21T07:34:41Z",
            "dateModified": "2021-06-21T12:18:37Z"
        }
    },
    {
        "key": "RCAS5DV9",
        "version": 146,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/RCAS5DV9",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/RCAS5DV9",
                "type": "text/html"
            }
        },
        "meta": {
            "numChildren": 0
        },
        "data": {
            "key": "RCAS5DV9",
            "version": 146,
            "itemType": "attachment",
            "linkMode": "imported_file",
            "title": "TPAMISI-2021-03-0432_Proof_hi.pdf",
            "accessDate": "",
            "url": "",
            "note": "",
            "contentType": "application/pdf",
            "charset": "",
            "filename": "TPAMISI-2021-03-0432_Proof_hi.pdf",
            "md5": "089e9942625a5b6ba3fbb8531a93cc74",
            "mtime": 1624259169691,
            "tags": [
                {
                    "tag": "review"
                },
                {
                    "tag": "tpami"
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-21T07:35:07Z",
            "dateModified": "2021-06-21T12:18:31Z"
        }
    },
    {
        "key": "QWRQKTJA",
        "version": 486,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/QWRQKTJA",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/QWRQKTJA",
                "type": "text/html"
            },
            "attachment": {
                "href": "https://api.zotero.org/users/7902311/items/HGQ9WVQ7",
                "type": "application/json",
                "attachmentType": "application/pdf",
                "attachmentSize": 27985746
            }
        },
        "meta": {
            "creatorSummary": "Ho et al.",
            "numChildren": 3
        },
        "data": {
            "key": "QWRQKTJA",
            "version": 486,
            "itemType": "journalArticle",
            "title": "Cascaded Diﬀusion Models for High Fidelity Image Generation",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Jonathan",
                    "lastName": "Ho"
                },
                {
                    "creatorType": "author",
                    "firstName": "Chitwan",
                    "lastName": "Saharia"
                },
                {
                    "creatorType": "author",
                    "firstName": "William",
                    "lastName": "Chan"
                },
                {
                    "creatorType": "author",
                    "firstName": "David J",
                    "lastName": "Fleet"
                },
                {
                    "creatorType": "author",
                    "firstName": "Mohammad",
                    "lastName": "Norouzi"
                },
                {
                    "creatorType": "author",
                    "firstName": "Tim",
                    "lastName": "Salimans"
                }
            ],
            "abstractNote": "We show that cascaded diﬀusion models are capable of generating high ﬁdelity images on the class-conditional ImageNet generation challenge, without any assistance from auxiliary image classiﬁers to boost sample quality. A cascaded diﬀusion model comprises a pipeline of multiple diﬀusion models that generate images of increasing resolution, beginning with a standard diﬀusion model at the lowest resolution, followed by one or more super-resolution diﬀusion models that successively upsample the image and add higher resolution details. We ﬁnd that the sample quality of a cascading pipeline relies crucially on conditioning augmentation, our proposed method of data augmentation of the lower resolution conditioning inputs to the super-resolution models. Our experiments show that conditioning augmentation prevents compounding error during sampling in a cascaded model, helping us to train cascading pipelines achieving FID scores of 1.48 at 64×64, 3.52 at 128×128 and 4.88 at 256×256 resolutions, outperforming BigGAN-deep, and classiﬁcation accuracy scores of 63.02% (top-1) and 84.06% (top-5) at 256×256, outperforming VQ-VAE-2.",
            "publicationTitle": "",
            "volume": "",
            "issue": "",
            "pages": "28",
            "date": "",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "en",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "",
            "accessDate": "",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "Zotero",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/VBJP47EP"
            },
            "dateAdded": "2021-06-20T07:01:40Z",
            "dateModified": "2021-06-20T16:53:22Z"
        }
    },
    {
        "key": "GXUECHN2",
        "version": 94,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/GXUECHN2",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/GXUECHN2",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Zhang et al.",
            "parsedDate": "2020-10-21",
            "numChildren": 4
        },
        "data": {
            "key": "GXUECHN2",
            "version": 94,
            "itemType": "journalArticle",
            "title": "NeRF++: Analyzing and Improving Neural Radiance Fields",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Kai",
                    "lastName": "Zhang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Gernot",
                    "lastName": "Riegler"
                },
                {
                    "creatorType": "author",
                    "firstName": "Noah",
                    "lastName": "Snavely"
                },
                {
                    "creatorType": "author",
                    "firstName": "Vladlen",
                    "lastName": "Koltun"
                }
            ],
            "abstractNote": "Neural Radiance Fields (NeRF) achieve impressive view synthesis results for a variety of capture settings, including 360 capture of bounded scenes and forward-facing capture of bounded and unbounded scenes. NeRF fits multi-layer perceptrons (MLPs) representing view-invariant opacity and view-dependent color volumes to a set of training images, and samples novel views based on volume rendering techniques. In this technical report, we first remark on radiance fields and their potential ambiguities, namely the shape-radiance ambiguity, and analyze NeRF's success in avoiding such ambiguities. Second, we address a parametrization issue involved in applying NeRF to 360 captures of objects within large-scale, unbounded 3D scenes. Our method improves view synthesis fidelity in this challenging scenario. Code is available at https://github.com/Kai-46/nerfplusplus.",
            "publicationTitle": "arXiv:2010.07492 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-10-21",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "NeRF++",
            "url": "http://arxiv.org/abs/2010.07492",
            "accessDate": "2021-06-20T16:03:46Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2010.07492",
            "tags": [
                {
                    "tag": "view-synthesis"
                }
            ],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:46Z",
            "dateModified": "2021-06-20T16:47:50Z"
        }
    },
    {
        "key": "DK93ZEF2",
        "version": 358,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/DK93ZEF2",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/DK93ZEF2",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Wizadwongsa et al.",
            "parsedDate": "2021-04-12",
            "numChildren": 3
        },
        "data": {
            "key": "DK93ZEF2",
            "version": 358,
            "itemType": "journalArticle",
            "title": "NeX: Real-time View Synthesis with Neural Basis Expansion",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Suttisak",
                    "lastName": "Wizadwongsa"
                },
                {
                    "creatorType": "author",
                    "firstName": "Pakkapon",
                    "lastName": "Phongthawee"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jiraphon",
                    "lastName": "Yenphraphai"
                },
                {
                    "creatorType": "author",
                    "firstName": "Supasorn",
                    "lastName": "Suwajanakorn"
                }
            ],
            "abstractNote": "We present NeX, a new approach to novel view synthesis based on enhancements of multiplane image (MPI) that can reproduce next-level view-dependent effects -- in real time. Unlike traditional MPI that uses a set of simple RGB$\\alpha$ planes, our technique models view-dependent effects by instead parameterizing each pixel as a linear combination of basis functions learned from a neural network. Moreover, we propose a hybrid implicit-explicit modeling strategy that improves upon fine detail and produces state-of-the-art results. Our method is evaluated on benchmark forward-facing datasets as well as our newly-introduced dataset designed to test the limit of view-dependent modeling with significantly more challenging effects such as rainbow reflections on a CD. Our method achieves the best overall scores across all major metrics on these datasets with more than 1000$\\times$ faster rendering time than the state of the art. For real-time demos, visit https://nex-mpi.github.io/",
            "publicationTitle": "arXiv:2103.05606 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-12",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "NeX",
            "url": "http://arxiv.org/abs/2103.05606",
            "accessDate": "2021-06-20T16:04:00Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "All rights reserved",
            "extra": "arXiv: 2103.05606",
            "inPublications": true,
            "tags": [
                {
                    "tag": "real-time"
                },
                {
                    "tag": "view-synthesis"
                }
            ],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/7AFV33RH"
            },
            "dateAdded": "2021-06-20T16:04:00Z",
            "dateModified": "2021-06-20T16:47:40Z"
        }
    },
    {
        "key": "3JBZMNIE",
        "version": 360,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/3JBZMNIE",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/3JBZMNIE",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Yu et al.",
            "parsedDate": "2021-03-25",
            "numChildren": 2
        },
        "data": {
            "key": "3JBZMNIE",
            "version": 360,
            "itemType": "journalArticle",
            "title": "PlenOctrees for Real-time Rendering of Neural Radiance Fields",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Alex",
                    "lastName": "Yu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ruilong",
                    "lastName": "Li"
                },
                {
                    "creatorType": "author",
                    "firstName": "Matthew",
                    "lastName": "Tancik"
                },
                {
                    "creatorType": "author",
                    "firstName": "Hao",
                    "lastName": "Li"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ren",
                    "lastName": "Ng"
                },
                {
                    "creatorType": "author",
                    "firstName": "Angjoo",
                    "lastName": "Kanazawa"
                }
            ],
            "abstractNote": "We introduce a method to render Neural Radiance Fields (NeRFs) in real time using PlenOctrees, an octree-based 3D representation which supports view-dependent effects. Our method can render 800x800 images at more than 150 FPS, which is over 3000 times faster than conventional NeRFs. We do so without sacrificing quality while preserving the ability of NeRFs to perform free-viewpoint rendering of scenes with arbitrary geometry and view-dependent effects. Real-time performance is achieved by pre-tabulating the NeRF into a PlenOctree. In order to preserve view-dependent effects such as specularities, we factorize the appearance via closed-form spherical basis functions. Specifically, we show that it is possible to train NeRFs to predict a spherical harmonic representation of radiance, removing the viewing direction as an input to the neural network. Furthermore, we show that PlenOctrees can be directly optimized to further minimize the reconstruction loss, which leads to equal or better quality compared to competing methods. Moreover, this octree optimization step can be used to reduce the training time, as we no longer need to wait for the NeRF training to converge fully. Our real-time neural rendering approach may potentially enable new applications such as 6-DOF industrial and product visualizations, as well as next generation AR/VR systems. PlenOctrees are amenable to in-browser rendering as well; please visit the project page for the interactive online demo, as well as video and code: https://alexyu.net/plenoctrees",
            "publicationTitle": "arXiv:2103.14024 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-03-25",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2103.14024",
            "accessDate": "2021-06-20T16:04:02Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2103.14024",
            "tags": [
                {
                    "tag": "nerf"
                },
                {
                    "tag": "real-time"
                }
            ],
            "collections": [
                "8CLBFGR2",
                "CPYKW3PF"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/B2UUWCG9"
            },
            "dateAdded": "2021-06-20T16:04:02Z",
            "dateModified": "2021-06-20T16:47:26Z"
        }
    },
    {
        "key": "4JSQ85CD",
        "version": 360,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/4JSQ85CD",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/4JSQ85CD",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Reiser et al.",
            "parsedDate": "2021-03-25",
            "numChildren": 2
        },
        "data": {
            "key": "4JSQ85CD",
            "version": 360,
            "itemType": "journalArticle",
            "title": "KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Christian",
                    "lastName": "Reiser"
                },
                {
                    "creatorType": "author",
                    "firstName": "Songyou",
                    "lastName": "Peng"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yiyi",
                    "lastName": "Liao"
                },
                {
                    "creatorType": "author",
                    "firstName": "Andreas",
                    "lastName": "Geiger"
                }
            ],
            "abstractNote": "NeRF synthesizes novel views of a scene with unprecedented quality by fitting a neural radiance field to RGB images. However, NeRF requires querying a deep Multi-Layer Perceptron (MLP) millions of times, leading to slow rendering times, even on modern GPUs. In this paper, we demonstrate that significant speed-ups are possible by utilizing thousands of tiny MLPs instead of one single large MLP. In our setting, each individual MLP only needs to represent parts of the scene, thus smaller and faster-to-evaluate MLPs can be used. By combining this divide-and-conquer strategy with further optimizations, rendering is accelerated by two orders of magnitude compared to the original NeRF model without incurring high storage costs. Further, using teacher-student distillation for training, we show that this speed-up can be achieved without sacrificing visual quality.",
            "publicationTitle": "arXiv:2103.13744 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-03-25",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "KiloNeRF",
            "url": "http://arxiv.org/abs/2103.13744",
            "accessDate": "2021-06-20T16:04:03Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2103.13744",
            "tags": [
                {
                    "tag": "nerf"
                },
                {
                    "tag": "real-time"
                }
            ],
            "collections": [
                "8CLBFGR2",
                "CPYKW3PF"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/B3M6EGUK"
            },
            "dateAdded": "2021-06-20T16:04:04Z",
            "dateModified": "2021-06-20T16:46:57Z"
        }
    },
    {
        "key": "C3MDVZBZ",
        "version": 84,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/C3MDVZBZ",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/C3MDVZBZ",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Kingma and Welling",
            "parsedDate": "2014-05-01",
            "numChildren": 2
        },
        "data": {
            "key": "C3MDVZBZ",
            "version": 84,
            "itemType": "journalArticle",
            "title": "Auto-Encoding Variational Bayes",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Diederik P.",
                    "lastName": "Kingma"
                },
                {
                    "creatorType": "author",
                    "firstName": "Max",
                    "lastName": "Welling"
                }
            ],
            "abstractNote": "How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.",
            "publicationTitle": "arXiv:1312.6114 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2014-05-01",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/1312.6114",
            "accessDate": "2021-06-20T16:38:23Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 1312.6114",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:38:23Z",
            "dateModified": "2021-06-20T16:40:30Z"
        }
    },
    {
        "key": "29Y6HCH6",
        "version": 83,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/29Y6HCH6",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/29Y6HCH6",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Kellnhofer et al.",
            "numChildren": 1
        },
        "data": {
            "key": "29Y6HCH6",
            "version": 83,
            "itemType": "journalArticle",
            "title": "Neural Lumigraph Rendering",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Petr",
                    "lastName": "Kellnhofer"
                },
                {
                    "creatorType": "author",
                    "firstName": "Lars C",
                    "lastName": "Jebe"
                },
                {
                    "creatorType": "author",
                    "firstName": "Andrew",
                    "lastName": "Jones"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ryan",
                    "lastName": "Spicer"
                },
                {
                    "creatorType": "author",
                    "firstName": "Kari",
                    "lastName": "Pulli"
                },
                {
                    "creatorType": "author",
                    "firstName": "Gordon",
                    "lastName": "Wetzstein"
                }
            ],
            "abstractNote": "Novel view synthesis is a challenging and ill-posed inverse rendering problem. Neural rendering techniques have recently achieved photorealistic image quality for this task. State-of-the-art (SOTA) neural volume rendering approaches, however, are slow to train and require minutes of inference (i.e., rendering) time for high image resolutions. We adopt high-capacity neural scene representations with periodic activations for jointly optimizing an implicit surface and a radiance ﬁeld of a scene supervised exclusively with posed 2D images. Our neural rendering pipeline accelerates SOTA neural volume rendering by about two orders of magnitude and our implicit surface representation is unique in allowing us to export a mesh with view-dependent texture information. Thus, like other implicit surface representations, ours is compatible with traditional graphics pipelines, enabling real-time rendering rates, while achieving unprecedented image quality compared to other surface methods. We assess the quality of our approach using existing datasets as well as high-quality 3D face data captured with a custom multi-camera rig.",
            "publicationTitle": "",
            "volume": "",
            "issue": "",
            "pages": "11",
            "date": "",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "en",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "",
            "accessDate": "",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "Zotero",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:37:26Z",
            "dateModified": "2021-06-20T16:37:26Z"
        }
    },
    {
        "key": "PDRM7Y2K",
        "version": 83,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/PDRM7Y2K",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/PDRM7Y2K",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Riegler and Koltun",
            "parsedDate": "2020",
            "numChildren": 1
        },
        "data": {
            "key": "PDRM7Y2K",
            "version": 83,
            "itemType": "conferencePaper",
            "title": "Free View Synthesis",
            "creators": [
                {
                    "creatorType": "editor",
                    "firstName": "Andrea",
                    "lastName": "Vedaldi"
                },
                {
                    "creatorType": "editor",
                    "firstName": "Horst",
                    "lastName": "Bischof"
                },
                {
                    "creatorType": "editor",
                    "firstName": "Thomas",
                    "lastName": "Brox"
                },
                {
                    "creatorType": "editor",
                    "firstName": "Jan-Michael",
                    "lastName": "Frahm"
                },
                {
                    "creatorType": "author",
                    "firstName": "Gernot",
                    "lastName": "Riegler"
                },
                {
                    "creatorType": "author",
                    "firstName": "Vladlen",
                    "lastName": "Koltun"
                }
            ],
            "abstractNote": "We present a method for novel view synthesis from input images that are freely distributed around a scene. Our method does not rely on a regular arrangement of input views, can synthesize images for free camera movement through the scene, and works for general scenes with unconstrained geometric layouts. We calibrate the input images via SfM and erect a coarse geometric scaﬀold via MVS. This scaﬀold is used to create a proxy depth map for a novel view of the scene. Based on this depth map, a recurrent encoder-decoder network processes reprojected features from nearby views and synthesizes the new view. Our network does not need to be optimized for a given scene. After training on a dataset, it works in previously unseen environments with no ﬁnetuning or per-scene optimization. We evaluate the presented approach on challenging real-world datasets, including Tanks and Temples, where we demonstrate successful view synthesis for the ﬁrst time and substantially outperform prior and concurrent work.",
            "date": "2020",
            "proceedingsTitle": "Computer Vision – ECCV 2020",
            "conferenceName": "",
            "place": "Cham",
            "publisher": "Springer International Publishing",
            "volume": "12364",
            "pages": "623-640",
            "series": "",
            "language": "en",
            "DOI": "",
            "ISBN": "978-3-030-58528-0 978-3-030-58529-7",
            "shortTitle": "",
            "url": "https://link.springer.com/10.1007/978-3-030-58529-7_37",
            "accessDate": "2021-06-20T16:35:45Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "DOI.org (Crossref)",
            "callNumber": "",
            "rights": "",
            "extra": "Series Title: Lecture Notes in Computer Science\nDOI: 10.1007/978-3-030-58529-7_37",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:35:45Z",
            "dateModified": "2021-06-20T16:36:24Z"
        }
    },
    {
        "key": "HRHXD8DU",
        "version": 83,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/HRHXD8DU",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/HRHXD8DU",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Bertel et al.",
            "parsedDate": "2020-12-04",
            "numChildren": 0
        },
        "data": {
            "key": "HRHXD8DU",
            "version": 83,
            "itemType": "conferencePaper",
            "title": "Deferred Neural Rendering for View Extrapolation",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Tobias",
                    "lastName": "Bertel"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yusuke",
                    "lastName": "Tomoto"
                },
                {
                    "creatorType": "author",
                    "firstName": "Srinivas",
                    "lastName": "Rao"
                },
                {
                    "creatorType": "author",
                    "firstName": "Rodrigo",
                    "lastName": "Ortiz-Cayon"
                },
                {
                    "creatorType": "author",
                    "firstName": "Stefan",
                    "lastName": "Holzer"
                },
                {
                    "creatorType": "author",
                    "firstName": "Christian",
                    "lastName": "Richardt"
                }
            ],
            "abstractNote": "",
            "date": "2020-12-04",
            "proceedingsTitle": "SIGGRAPH Asia 2020 Posters",
            "conferenceName": "SA '20: SIGGRAPH Asia 2020",
            "place": "Virtual Event Republic of Korea",
            "publisher": "ACM",
            "volume": "",
            "pages": "1-2",
            "series": "",
            "language": "en",
            "DOI": "10.1145/3415264.3425441",
            "ISBN": "978-1-4503-8113-0",
            "shortTitle": "",
            "url": "https://dl.acm.org/doi/10.1145/3415264.3425441",
            "accessDate": "2021-06-20T16:35:55Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "DOI.org (Crossref)",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:35:55Z",
            "dateModified": "2021-06-20T16:35:55Z"
        }
    },
    {
        "key": "W2SDJE56",
        "version": 308,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/W2SDJE56",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/W2SDJE56",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Sheen et al.",
            "parsedDate": "2001-09",
            "numChildren": 1
        },
        "data": {
            "key": "W2SDJE56",
            "version": 308,
            "itemType": "journalArticle",
            "title": "Three-dimensional millimeter-wave imaging for concealed weapon detection",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "D.M.",
                    "lastName": "Sheen"
                },
                {
                    "creatorType": "author",
                    "firstName": "D.L.",
                    "lastName": "McMakin"
                },
                {
                    "creatorType": "author",
                    "firstName": "T.E.",
                    "lastName": "Hall"
                }
            ],
            "abstractNote": "Millimeter-wave imaging techniques and systems have been developed at the Pacific Northwest National Laboratory (PNNL), Richland, WA, for the detection of concealed weapons and contraband at airports and other secure locations. These techniques were derived from microwave holography techniques that utilize phase and amplitude information recorded over a two-dimensional aperture to reconstruct a focused image of the target. Millimeter-wave imaging is well suited for the detection of concealed weapons or other contraband carried on personnel since millimeter-waves are nonionizing, readily penetrate common clothing material, and are reflected from the human body and any concealed items. In this paper, a wide-bandwidth three-dimensional holographic microwave imaging technique is described. Practical weapon detection systems for airport or other high-throughput applications require high-speed scanning on the order of 3 to 10 s. To achieve this goal, a prototype imaging system utilizing a 27–33 GHz linear sequentially switched array and a high-speed linear scanner has been developed and tested. This system is described in detail along with numerous imaging results.",
            "publicationTitle": "IEEE Transactions on Microwave Theory and Techniques",
            "volume": "49",
            "issue": "9",
            "pages": "1581-1592",
            "date": "Sept./2001",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "IEEE Trans. Microwave Theory Techn.",
            "language": "en",
            "DOI": "10.1109/22.942570",
            "ISSN": "00189480",
            "shortTitle": "",
            "url": "http://ieeexplore.ieee.org/document/942570/",
            "accessDate": "2021-06-20T16:29:57Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "DOI.org (Crossref)",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [
                {
                    "tag": "star"
                }
            ],
            "collections": [
                "QP76V5CN"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:29:57Z",
            "dateModified": "2021-06-20T16:30:45Z"
        }
    },
    {
        "key": "I976C3LC",
        "version": 308,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/I976C3LC",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/I976C3LC",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Yanik and Torlak",
            "parsedDate": "2019",
            "numChildren": 1
        },
        "data": {
            "key": "I976C3LC",
            "version": 308,
            "itemType": "conferencePaper",
            "title": "Near-Field 2-D SAR Imaging by Millimeter-Wave Radar for Concealed Item Detection",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Muhammet Emin",
                    "lastName": "Yanik"
                },
                {
                    "creatorType": "author",
                    "firstName": "Murat",
                    "lastName": "Torlak"
                }
            ],
            "abstractNote": "Recent progress in complementary metaloxide semiconductor (CMOS) based frequency-modulated continuous-wave (FMCW) radars has made it possible to design low-cost and low-power millimeter-wave (mmWave) sensors. As a result, there is a strong desire to exploit the progress in mmWave sensors in wide range of imaging applications including medical, automotive, and security. In this paper, we present a low-cost high-resolution mmWave imager prototype that combines commercially available 77 GHz system-on-chip FMCW radar sensors and synthetic aperture radar (SAR) signal processing techniques for concealed item detection. To create a synthetic aperture over a target scene, the imager is constructed with a two-axis motorized rail system which can synthesize a large aperture in both horizontal and vertical directions. Our prototype system is described in detail along with signal processing techniques for two-dimensional (2-D) image reconstruction. The imaging examples of concealed items in various scenarios conﬁrm that our low-cost prototype has a great potential for high-resolution imaging tasks in security applications.",
            "date": "1/2019",
            "proceedingsTitle": "2019 IEEE Radio and Wireless Symposium (RWS)",
            "conferenceName": "2019 IEEE Radio and Wireless Symposium (RWS)",
            "place": "Orlando, FL, USA",
            "publisher": "IEEE",
            "volume": "",
            "pages": "1-4",
            "series": "",
            "language": "en",
            "DOI": "10.1109/RWS.2019.8714552",
            "ISBN": "978-1-5386-5944-1",
            "shortTitle": "",
            "url": "https://ieeexplore.ieee.org/document/8714552/",
            "accessDate": "2021-06-20T16:30:09Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "DOI.org (Crossref)",
            "callNumber": "",
            "rights": "",
            "extra": "",
            "tags": [],
            "collections": [
                "QP76V5CN"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:30:09Z",
            "dateModified": "2021-06-20T16:30:14Z"
        }
    },
    {
        "key": "WSXPLJSX",
        "version": 362,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/WSXPLJSX",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/WSXPLJSX",
                "type": "text/html"
            },
            "attachment": {
                "href": "https://api.zotero.org/users/7902311/items/F9DCKB7B",
                "type": "application/json",
                "attachmentType": "application/pdf",
                "attachmentSize": 6493563
            }
        },
        "meta": {
            "creatorSummary": "Yu et al.",
            "parsedDate": "2020-12-03",
            "numChildren": 3
        },
        "data": {
            "key": "WSXPLJSX",
            "version": 362,
            "itemType": "journalArticle",
            "title": "pixelNeRF: Neural Radiance Fields from One or Few Images",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Alex",
                    "lastName": "Yu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Vickie",
                    "lastName": "Ye"
                },
                {
                    "creatorType": "author",
                    "firstName": "Matthew",
                    "lastName": "Tancik"
                },
                {
                    "creatorType": "author",
                    "firstName": "Angjoo",
                    "lastName": "Kanazawa"
                }
            ],
            "abstractNote": "We propose pixelNeRF, a learning framework that predicts a continuous neural scene representation conditioned on one or few input images. The existing approach for constructing neural radiance fields involves optimizing the representation to every scene independently, requiring many calibrated views and significant compute time. We take a step towards resolving these shortcomings by introducing an architecture that conditions a NeRF on image inputs in a fully convolutional manner. This allows the network to be trained across multiple scenes to learn a scene prior, enabling it to perform novel view synthesis in a feed-forward manner from a sparse set of views (as few as one). Leveraging the volume rendering approach of NeRF, our model can be trained directly from images with no explicit 3D supervision. We conduct extensive experiments on ShapeNet benchmarks for single image novel view synthesis tasks with held-out objects as well as entire unseen categories. We further demonstrate the flexibility of pixelNeRF by demonstrating it on multi-object ShapeNet scenes and real scenes from the DTU dataset. In all cases, pixelNeRF outperforms current state-of-the-art baselines for novel view synthesis and single image 3D reconstruction. For the video and code, please visit the project website: https://alexyu.net/pixelnerf",
            "publicationTitle": "arXiv:2012.02190 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-12-03",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "pixelNeRF",
            "url": "http://arxiv.org/abs/2012.02190",
            "accessDate": "2021-04-26T04:31:14Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2012.02190",
            "tags": [],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {
                "dc:replaces": "http://zotero.org/users/7902311/items/PBCRUD8V",
                "owl:sameAs": "http://zotero.org/groups/4320173/items/EGRHI42D"
            },
            "dateAdded": "2021-04-26T04:31:14Z",
            "dateModified": "2021-06-20T16:24:59Z"
        }
    },
    {
        "key": "KSTKXVLX",
        "version": 77,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/KSTKXVLX",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/KSTKXVLX",
                "type": "text/html"
            },
            "attachment": {
                "href": "https://api.zotero.org/users/7902311/items/2JA6F8X6",
                "type": "application/json",
                "attachmentType": "application/pdf",
                "attachmentSize": 24114521
            }
        },
        "meta": {
            "creatorSummary": "Trevithick and Yang",
            "parsedDate": "2020-11-29",
            "numChildren": 4
        },
        "data": {
            "key": "KSTKXVLX",
            "version": 77,
            "itemType": "journalArticle",
            "title": "GRF: Learning a General Radiance Field for 3D Scene Representation and Rendering",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Alex",
                    "lastName": "Trevithick"
                },
                {
                    "creatorType": "author",
                    "firstName": "Bo",
                    "lastName": "Yang"
                }
            ],
            "abstractNote": "We present a simple yet powerful implicit neural function that can represent and render arbitrarily complex 3D scenes in a single network only from 2D observations. The function models 3D scenes as a general radiance field, which takes a set of posed 2D images with camera poses and intrinsics as input, constructs an internal representation for each 3D point of the scene, and renders the corresponding appearance and geometry of any 3D point viewing from an arbitrary angle. The key to our approach is to explicitly integrate the principle of multi-view geometry to obtain the internal representations from observed 2D views, such that the learned implicit representations empirically remain multi-view consistent. In addition, we introduce an effective neural module to learn general features for each pixel in 2D images, allowing the constructed internal 3D representations to be general as well. Extensive experiments demonstrate the superiority of our approach.",
            "publicationTitle": "arXiv:2010.04595 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-11-29",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "GRF",
            "url": "http://arxiv.org/abs/2010.04595",
            "accessDate": "2021-06-07T09:23:49Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2010.04595",
            "tags": [],
            "collections": [],
            "relations": {
                "dc:replaces": "http://zotero.org/users/7902311/items/Q8H2Z875"
            },
            "dateAdded": "2021-06-07T09:23:49Z",
            "dateModified": "2021-06-20T16:24:58Z"
        }
    },
    {
        "key": "ZN4Q97I6",
        "version": 94,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/ZN4Q97I6",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/ZN4Q97I6",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Sohl-Dickstein et al.",
            "parsedDate": "2015-11-18",
            "numChildren": 1
        },
        "data": {
            "key": "ZN4Q97I6",
            "version": 94,
            "itemType": "journalArticle",
            "title": "Deep Unsupervised Learning using Nonequilibrium Thermodynamics",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Jascha",
                    "lastName": "Sohl-Dickstein"
                },
                {
                    "creatorType": "author",
                    "firstName": "Eric A.",
                    "lastName": "Weiss"
                },
                {
                    "creatorType": "author",
                    "firstName": "Niru",
                    "lastName": "Maheswaranathan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Surya",
                    "lastName": "Ganguli"
                }
            ],
            "abstractNote": "A central problem in machine learning involves modeling complex data-sets using highly ﬂexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both ﬂexibility and tractability. The essential idea, inspired by non-equilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly ﬂexible and tractable generative model of the data. This approach allows us to rapidly learn, sample from, and evaluate probabilities in deep generative models with thousands of layers or time steps, as well as to compute conditional and posterior probabilities under the learned model. We additionally release an open source reference implementation of the algorithm.",
            "publicationTitle": "arXiv:1503.03585 [cond-mat, q-bio, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2015-11-18",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "en",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/1503.03585",
            "accessDate": "2021-06-20T14:32:45Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 1503.03585",
            "tags": [],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {
                "dc:replaces": "http://zotero.org/users/7902311/items/UBPKFFDI"
            },
            "dateAdded": "2021-06-20T14:32:45Z",
            "dateModified": "2021-06-20T16:24:57Z"
        }
    },
    {
        "key": "S8D7ZN3X",
        "version": 96,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/S8D7ZN3X",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/S8D7ZN3X",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Sasaki et al.",
            "parsedDate": "2021-04-12",
            "numChildren": 3
        },
        "data": {
            "key": "S8D7ZN3X",
            "version": 96,
            "itemType": "journalArticle",
            "title": "UNIT-DDPM: UNpaired Image Translation with Denoising Diffusion Probabilistic Models",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Hiroshi",
                    "lastName": "Sasaki"
                },
                {
                    "creatorType": "author",
                    "firstName": "Chris G.",
                    "lastName": "Willcocks"
                },
                {
                    "creatorType": "author",
                    "firstName": "Toby P.",
                    "lastName": "Breckon"
                }
            ],
            "abstractNote": "We propose a novel unpaired image-to-image translation method that uses denoising diffusion probabilistic models without requiring adversarial training. Our method, UNpaired Image Translation with Denoising Diffusion Probabilistic Models (UNIT-DDPM), trains a generative model to infer the joint distribution of images over both domains as a Markov chain by minimising a denoising score matching objective conditioned on the other domain. In particular, we update both domain translation models simultaneously, and we generate target domain images by a denoising Markov Chain Monte Carlo approach that is conditioned on the input source domain images, based on Langevin dynamics. Our approach provides stable model training for image-to-image translation and generates high-quality image outputs. This enables state-of-the-art Fr\\'echet Inception Distance (FID) performance on several public datasets, including both colour and multispectral imagery, significantly outperforming the contemporary adversarial image-to-image translation methods.",
            "publicationTitle": "arXiv:2104.05358 [cs, eess]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-12",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "UNIT-DDPM",
            "url": "http://arxiv.org/abs/2104.05358",
            "accessDate": "2021-06-20T16:22:50Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2104.05358",
            "tags": [],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:22:50Z",
            "dateModified": "2021-06-20T16:22:50Z"
        }
    },
    {
        "key": "SEIXRRIR",
        "version": 96,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/SEIXRRIR",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/SEIXRRIR",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Dhariwal and Nichol",
            "parsedDate": "2021-06-01",
            "numChildren": 3
        },
        "data": {
            "key": "SEIXRRIR",
            "version": 96,
            "itemType": "journalArticle",
            "title": "Diffusion Models Beat GANs on Image Synthesis",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Prafulla",
                    "lastName": "Dhariwal"
                },
                {
                    "creatorType": "author",
                    "firstName": "Alex",
                    "lastName": "Nichol"
                }
            ],
            "abstractNote": "We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for fidelity using gradients from a classifier. We achieve an FID of 2.97 on ImageNet 128$\\times$128, 4.59 on ImageNet 256$\\times$256, and 7.72 on ImageNet 512$\\times$512, and we match BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3.94 on ImageNet 256$\\times$256 and 3.85 on ImageNet 512$\\times$512. We release our code at https://github.com/openai/guided-diffusion",
            "publicationTitle": "arXiv:2105.05233 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-06-01",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2105.05233",
            "accessDate": "2021-06-20T16:04:05Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2105.05233",
            "tags": [],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:04:05Z",
            "dateModified": "2021-06-20T16:04:05Z"
        }
    },
    {
        "key": "74A95424",
        "version": 360,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/74A95424",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/74A95424",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Hedman et al.",
            "parsedDate": "2021-03-26",
            "numChildren": 3
        },
        "data": {
            "key": "74A95424",
            "version": 360,
            "itemType": "journalArticle",
            "title": "Baking Neural Radiance Fields for Real-Time View Synthesis",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Peter",
                    "lastName": "Hedman"
                },
                {
                    "creatorType": "author",
                    "firstName": "Pratul P.",
                    "lastName": "Srinivasan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ben",
                    "lastName": "Mildenhall"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jonathan T.",
                    "lastName": "Barron"
                },
                {
                    "creatorType": "author",
                    "firstName": "Paul",
                    "lastName": "Debevec"
                }
            ],
            "abstractNote": "Neural volumetric representations such as Neural Radiance Fields (NeRF) have emerged as a compelling technique for learning to represent 3D scenes from images with the goal of rendering photorealistic images of the scene from unobserved viewpoints. However, NeRF's computational requirements are prohibitive for real-time applications: rendering views from a trained NeRF requires querying a multilayer perceptron (MLP) hundreds of times per ray. We present a method to train a NeRF, then precompute and store (i.e. \"bake\") it as a novel representation called a Sparse Neural Radiance Grid (SNeRG) that enables real-time rendering on commodity hardware. To achieve this, we introduce 1) a reformulation of NeRF's architecture, and 2) a sparse voxel grid representation with learned feature vectors. The resulting scene representation retains NeRF's ability to render fine geometric details and view-dependent appearance, is compact (averaging less than 90 MB per scene), and can be rendered in real-time (higher than 30 frames per second on a laptop GPU). Actual screen captures are shown in our video.",
            "publicationTitle": "arXiv:2103.14645 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-03-26",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2103.14645",
            "accessDate": "2021-06-20T16:04:03Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2103.14645",
            "tags": [],
            "collections": [
                "8CLBFGR2",
                "CPYKW3PF"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/8YL49DGT"
            },
            "dateAdded": "2021-06-20T16:04:03Z",
            "dateModified": "2021-06-20T16:04:03Z"
        }
    },
    {
        "key": "MCKXIXQG",
        "version": 198,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/MCKXIXQG",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/MCKXIXQG",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "DeVries et al.",
            "parsedDate": "2021-04-01",
            "numChildren": 2
        },
        "data": {
            "key": "MCKXIXQG",
            "version": 198,
            "itemType": "journalArticle",
            "title": "Unconstrained Scene Generation with Locally Conditioned Radiance Fields",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Terrance",
                    "lastName": "DeVries"
                },
                {
                    "creatorType": "author",
                    "firstName": "Miguel Angel",
                    "lastName": "Bautista"
                },
                {
                    "creatorType": "author",
                    "firstName": "Nitish",
                    "lastName": "Srivastava"
                },
                {
                    "creatorType": "author",
                    "firstName": "Graham W.",
                    "lastName": "Taylor"
                },
                {
                    "creatorType": "author",
                    "firstName": "Joshua M.",
                    "lastName": "Susskind"
                }
            ],
            "abstractNote": "We tackle the challenge of learning a distribution over complex, realistic, indoor scenes. In this paper, we introduce Generative Scene Networks (GSN), which learns to decompose scenes into a collection of many local radiance fields that can be rendered from a free moving camera. Our model can be used as a prior to generate new scenes, or to complete a scene given only sparse 2D observations. Recent work has shown that generative models of radiance fields can capture properties such as multi-view consistency and view-dependent lighting. However, these models are specialized for constrained viewing of single objects, such as cars or faces. Due to the size and complexity of realistic indoor environments, existing models lack the representational capacity to adequately capture them. Our decomposition scheme scales to larger and more complex scenes while preserving details and diversity, and the learned prior enables high-quality rendering from viewpoints that are significantly different from observed viewpoints. When compared to existing models, GSN produces quantitatively higher-quality scene renderings across several different scene datasets.",
            "publicationTitle": "arXiv:2104.00670 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-01",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2104.00670",
            "accessDate": "2021-06-20T16:04:03Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2104.00670",
            "tags": [],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:04:03Z",
            "dateModified": "2021-06-20T16:04:03Z"
        }
    },
    {
        "key": "EDN42UE6",
        "version": 96,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/EDN42UE6",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/EDN42UE6",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Chen et al.",
            "parsedDate": "2021-03-29",
            "numChildren": 2
        },
        "data": {
            "key": "EDN42UE6",
            "version": 96,
            "itemType": "journalArticle",
            "title": "MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Anpei",
                    "lastName": "Chen"
                },
                {
                    "creatorType": "author",
                    "firstName": "Zexiang",
                    "lastName": "Xu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Fuqiang",
                    "lastName": "Zhao"
                },
                {
                    "creatorType": "author",
                    "firstName": "Xiaoshuai",
                    "lastName": "Zhang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Fanbo",
                    "lastName": "Xiang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jingyi",
                    "lastName": "Yu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Hao",
                    "lastName": "Su"
                }
            ],
            "abstractNote": "We present MVSNeRF, a novel neural rendering approach that can efficiently reconstruct neural radiance fields for view synthesis. Unlike prior works on neural radiance fields that consider per-scene optimization on densely captured images, we propose a generic deep neural network that can reconstruct radiance fields from only three nearby input views via fast network inference. Our approach leverages plane-swept cost volumes (widely used in multi-view stereo) for geometry-aware scene reasoning, and combines this with physically based volume rendering for neural radiance field reconstruction. We train our network on real objects in the DTU dataset, and test it on three different datasets to evaluate its effectiveness and generalizability. Our approach can generalize across scenes (even indoor scenes, completely different from our training scenes of objects) and generate realistic view synthesis results using only three input images, significantly outperforming concurrent works on generalizable radiance field reconstruction. Moreover, if dense images are captured, our estimated radiance field representation can be easily fine-tuned; this leads to fast per-scene reconstruction with higher rendering quality and substantially less optimization time than NeRF.",
            "publicationTitle": "arXiv:2103.15595 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-03-29",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "MVSNeRF",
            "url": "http://arxiv.org/abs/2103.15595",
            "accessDate": "2021-06-20T16:04:03Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2103.15595",
            "tags": [],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:04:03Z",
            "dateModified": "2021-06-20T16:04:03Z"
        }
    },
    {
        "key": "BKNP8W5V",
        "version": 82,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/BKNP8W5V",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/BKNP8W5V",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Sucar et al.",
            "parsedDate": "2021-03-23",
            "numChildren": 2
        },
        "data": {
            "key": "BKNP8W5V",
            "version": 82,
            "itemType": "journalArticle",
            "title": "iMAP: Implicit Mapping and Positioning in Real-Time",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Edgar",
                    "lastName": "Sucar"
                },
                {
                    "creatorType": "author",
                    "firstName": "Shikun",
                    "lastName": "Liu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Joseph",
                    "lastName": "Ortiz"
                },
                {
                    "creatorType": "author",
                    "firstName": "Andrew J.",
                    "lastName": "Davison"
                }
            ],
            "abstractNote": "We show for the first time that a multilayer perceptron (MLP) can serve as the only scene representation in a real-time SLAM system for a handheld RGB-D camera. Our network is trained in live operation without prior data, building a dense, scene-specific implicit 3D model of occupancy and colour which is also immediately used for tracking. Achieving real-time SLAM via continual training of a neural network against a live image stream requires significant innovation. Our iMAP algorithm uses a keyframe structure and multi-processing computation flow, with dynamic information-guided pixel sampling for speed, with tracking at 10 Hz and global map updating at 2 Hz. The advantages of an implicit MLP over standard dense SLAM techniques include efficient geometry representation with automatic detail control and smooth, plausible filling-in of unobserved regions such as the back surfaces of objects.",
            "publicationTitle": "arXiv:2103.12352 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-03-23",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "iMAP",
            "url": "http://arxiv.org/abs/2103.12352",
            "accessDate": "2021-06-20T16:04:01Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2103.12352",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:04:02Z",
            "dateModified": "2021-06-20T16:04:02Z"
        }
    },
    {
        "key": "J9YJJ2H5",
        "version": 360,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/J9YJJ2H5",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/J9YJJ2H5",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Garbin et al.",
            "parsedDate": "2021-04-15",
            "numChildren": 3
        },
        "data": {
            "key": "J9YJJ2H5",
            "version": 360,
            "itemType": "journalArticle",
            "title": "FastNeRF: High-Fidelity Neural Rendering at 200FPS",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Stephan J.",
                    "lastName": "Garbin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Marek",
                    "lastName": "Kowalski"
                },
                {
                    "creatorType": "author",
                    "firstName": "Matthew",
                    "lastName": "Johnson"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jamie",
                    "lastName": "Shotton"
                },
                {
                    "creatorType": "author",
                    "firstName": "Julien",
                    "lastName": "Valentin"
                }
            ],
            "abstractNote": "Recent work on Neural Radiance Fields (NeRF) showed how neural networks can be used to encode complex 3D environments that can be rendered photorealistically from novel viewpoints. Rendering these images is very computationally demanding and recent improvements are still a long way from enabling interactive rates, even on high-end hardware. Motivated by scenarios on mobile and mixed reality devices, we propose FastNeRF, the first NeRF-based system capable of rendering high fidelity photorealistic images at 200Hz on a high-end consumer GPU. The core of our method is a graphics-inspired factorization that allows for (i) compactly caching a deep radiance map at each position in space, (ii) efficiently querying that map using ray directions to estimate the pixel values in the rendered image. Extensive experiments show that the proposed method is 3000 times faster than the original NeRF algorithm and at least an order of magnitude faster than existing work on accelerating NeRF, while maintaining visual quality and extensibility.",
            "publicationTitle": "arXiv:2103.10380 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-15",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "FastNeRF",
            "url": "http://arxiv.org/abs/2103.10380",
            "accessDate": "2021-06-20T16:04:01Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2103.10380",
            "tags": [],
            "collections": [
                "8CLBFGR2",
                "CPYKW3PF"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/XLVM7R6C"
            },
            "dateAdded": "2021-06-20T16:04:01Z",
            "dateModified": "2021-06-20T16:04:01Z"
        }
    },
    {
        "key": "Q3DXFQ7X",
        "version": 96,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/Q3DXFQ7X",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/Q3DXFQ7X",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Neff et al.",
            "parsedDate": "2021-05-11",
            "numChildren": 3
        },
        "data": {
            "key": "Q3DXFQ7X",
            "version": 96,
            "itemType": "journalArticle",
            "title": "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Thomas",
                    "lastName": "Neff"
                },
                {
                    "creatorType": "author",
                    "firstName": "Pascal",
                    "lastName": "Stadlbauer"
                },
                {
                    "creatorType": "author",
                    "firstName": "Mathias",
                    "lastName": "Parger"
                },
                {
                    "creatorType": "author",
                    "firstName": "Andreas",
                    "lastName": "Kurz"
                },
                {
                    "creatorType": "author",
                    "firstName": "Joerg H.",
                    "lastName": "Mueller"
                },
                {
                    "creatorType": "author",
                    "firstName": "Chakravarty R. Alla",
                    "lastName": "Chaitanya"
                },
                {
                    "creatorType": "author",
                    "firstName": "Anton",
                    "lastName": "Kaplanyan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Markus",
                    "lastName": "Steinberger"
                }
            ],
            "abstractNote": "The recent research explosion around implicit neural representations, such as NeRF, shows that there is immense potential for implicitly storing high-quality scene and lighting information in neural networks. However, one major limitation preventing the use of NeRF in interactive and real-time rendering applications is the prohibitive computational cost of excessive network evaluations along each view ray, requiring dozens of petaFLOPS when aiming for real-time rendering on consumer hardware. In this work, we take a step towards bringing neural representations closer to practical rendering of synthetic content in interactive and real-time applications, such as games and virtual reality. We show that the number of samples required for each view ray can be significantly reduced when local samples are placed around surfaces in the scene. To this end, we propose a depth oracle network, which predicts ray sample locations for each view ray with a single network evaluation. We show that using a classification network around logarithmically discretized and spherically warped depth values is essential to encode surface locations rather than directly estimating depth. The combination of these techniques leads to DONeRF, a dual network design with a depth oracle network as a first step and a locally sampled shading network for ray accumulation. With our design, we reduce the inference costs by up to 48x compared to NeRF. Using an off-the-shelf inference API in combination with simple compute kernels, we are the first to render raymarching-based neural representations at interactive frame rates (15 frames per second at 800x800) on a single GPU. At the same time, since we focus on the important parts of the scene around surfaces, we achieve equal or better quality compared to NeRF to enable interactive high-quality rendering.",
            "publicationTitle": "arXiv:2103.03231 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-05-11",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "DONeRF",
            "url": "http://arxiv.org/abs/2103.03231",
            "accessDate": "2021-06-20T16:04:01Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2103.03231",
            "tags": [],
            "collections": [
                "8CLBFGR2",
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:04:01Z",
            "dateModified": "2021-06-20T16:04:01Z"
        }
    },
    {
        "key": "QNXL2EIJ",
        "version": 82,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/QNXL2EIJ",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/QNXL2EIJ",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Lombardi et al.",
            "parsedDate": "2021-05-06",
            "numChildren": 3
        },
        "data": {
            "key": "QNXL2EIJ",
            "version": 82,
            "itemType": "journalArticle",
            "title": "Mixture of Volumetric Primitives for Efficient Neural Rendering",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Stephen",
                    "lastName": "Lombardi"
                },
                {
                    "creatorType": "author",
                    "firstName": "Tomas",
                    "lastName": "Simon"
                },
                {
                    "creatorType": "author",
                    "firstName": "Gabriel",
                    "lastName": "Schwartz"
                },
                {
                    "creatorType": "author",
                    "firstName": "Michael",
                    "lastName": "Zollhoefer"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yaser",
                    "lastName": "Sheikh"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jason",
                    "lastName": "Saragih"
                }
            ],
            "abstractNote": "Real-time rendering and animation of humans is a core function in games, movies, and telepresence applications. Existing methods have a number of drawbacks we aim to address with our work. Triangle meshes have difficulty modeling thin structures like hair, volumetric representations like Neural Volumes are too low-resolution given a reasonable memory budget, and high-resolution implicit representations like Neural Radiance Fields are too slow for use in real-time applications. We present Mixture of Volumetric Primitives (MVP), a representation for rendering dynamic 3D content that combines the completeness of volumetric representations with the efficiency of primitive-based rendering, e.g., point-based or mesh-based methods. Our approach achieves this by leveraging spatially shared computation with a deconvolutional architecture and by minimizing computation in empty regions of space with volumetric primitives that can move to cover only occupied regions. Our parameterization supports the integration of correspondence and tracking constraints, while being robust to areas where classical tracking fails, such as around thin or translucent structures and areas with large topological variability. MVP is a hybrid that generalizes both volumetric and primitive-based representations. Through a series of extensive experiments we demonstrate that it inherits the strengths of each, while avoiding many of their limitations. We also compare our approach to several state-of-the-art methods and demonstrate that MVP produces superior results in terms of quality and runtime performance.",
            "publicationTitle": "arXiv:2103.01954 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-05-06",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2103.01954",
            "accessDate": "2021-06-20T16:04:00Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2103.01954",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:04:00Z",
            "dateModified": "2021-06-20T16:04:00Z"
        }
    },
    {
        "key": "N9TACYGJ",
        "version": 82,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/N9TACYGJ",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/N9TACYGJ",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Xiang et al.",
            "parsedDate": "2021-03-01",
            "numChildren": 2
        },
        "data": {
            "key": "N9TACYGJ",
            "version": 82,
            "itemType": "journalArticle",
            "title": "NeuTex: Neural Texture Mapping for Volumetric Neural Rendering",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Fanbo",
                    "lastName": "Xiang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Zexiang",
                    "lastName": "Xu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Miloš",
                    "lastName": "Hašan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yannick",
                    "lastName": "Hold-Geoffroy"
                },
                {
                    "creatorType": "author",
                    "firstName": "Kalyan",
                    "lastName": "Sunkavalli"
                },
                {
                    "creatorType": "author",
                    "firstName": "Hao",
                    "lastName": "Su"
                }
            ],
            "abstractNote": "Recent work has demonstrated that volumetric scene representations combined with differentiable volume rendering can enable photo-realistic rendering for challenging scenes that mesh reconstruction fails on. However, these methods entangle geometry and appearance in a \"black-box\" volume that cannot be edited. Instead, we present an approach that explicitly disentangles geometry--represented as a continuous 3D volume--from appearance--represented as a continuous 2D texture map. We achieve this by introducing a 3D-to-2D texture mapping (or surface parameterization) network into volumetric representations. We constrain this texture mapping network using an additional 2D-to-3D inverse mapping network and a novel cycle consistency loss to make 3D surface points map to 2D texture points that map back to the original 3D points. We demonstrate that this representation can be reconstructed using only multi-view image supervision and generates high-quality rendering results. More importantly, by separating geometry and texture, we allow users to edit appearance by simply editing 2D texture maps.",
            "publicationTitle": "arXiv:2103.00762 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-03-01",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "NeuTex",
            "url": "http://arxiv.org/abs/2103.00762",
            "accessDate": "2021-06-20T16:04:00Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2103.00762",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:04:00Z",
            "dateModified": "2021-06-20T16:04:00Z"
        }
    },
    {
        "key": "BE56GYH3",
        "version": 82,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/BE56GYH3",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/BE56GYH3",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Wang et al.",
            "parsedDate": "2021-04-06",
            "numChildren": 3
        },
        "data": {
            "key": "BE56GYH3",
            "version": 82,
            "itemType": "journalArticle",
            "title": "IBRNet: Learning Multi-View Image-Based Rendering",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Qianqian",
                    "lastName": "Wang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Zhicheng",
                    "lastName": "Wang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Kyle",
                    "lastName": "Genova"
                },
                {
                    "creatorType": "author",
                    "firstName": "Pratul",
                    "lastName": "Srinivasan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Howard",
                    "lastName": "Zhou"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jonathan T.",
                    "lastName": "Barron"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ricardo",
                    "lastName": "Martin-Brualla"
                },
                {
                    "creatorType": "author",
                    "firstName": "Noah",
                    "lastName": "Snavely"
                },
                {
                    "creatorType": "author",
                    "firstName": "Thomas",
                    "lastName": "Funkhouser"
                }
            ],
            "abstractNote": "We present a method that synthesizes novel views of complex scenes by interpolating a sparse set of nearby views. The core of our method is a network architecture that includes a multilayer perceptron and a ray transformer that estimates radiance and volume density at continuous 5D locations (3D spatial locations and 2D viewing directions), drawing appearance information on the fly from multiple source views. By drawing on source views at render time, our method hearkens back to classic work on image-based rendering (IBR), and allows us to render high-resolution imagery. Unlike neural scene representation work that optimizes per-scene functions for rendering, we learn a generic view interpolation function that generalizes to novel scenes. We render images using classic volume rendering, which is fully differentiable and allows us to train using only multi-view posed images as supervision. Experiments show that our method outperforms recent novel view synthesis methods that also seek to generalize to novel scenes. Further, if fine-tuned on each scene, our method is competitive with state-of-the-art single-scene neural rendering methods. Project page: https://ibrnet.github.io/",
            "publicationTitle": "arXiv:2102.13090 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-06",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "IBRNet",
            "url": "http://arxiv.org/abs/2102.13090",
            "accessDate": "2021-06-20T16:03:59Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2102.13090",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:04:00Z",
            "dateModified": "2021-06-20T16:04:00Z"
        }
    },
    {
        "key": "UQPB3TH8",
        "version": 81,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/UQPB3TH8",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/UQPB3TH8",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Rematas et al.",
            "parsedDate": "2021-02-17",
            "numChildren": 3
        },
        "data": {
            "key": "UQPB3TH8",
            "version": 81,
            "itemType": "journalArticle",
            "title": "ShaRF: Shape-conditioned Radiance Fields from a Single View",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Konstantinos",
                    "lastName": "Rematas"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ricardo",
                    "lastName": "Martin-Brualla"
                },
                {
                    "creatorType": "author",
                    "firstName": "Vittorio",
                    "lastName": "Ferrari"
                }
            ],
            "abstractNote": "We present a method for estimating neural scenes representations of objects given only a single image. The core of our method is the estimation of a geometric scaffold for the object and its use as a guide for the reconstruction of the underlying radiance field. Our formulation is based on a generative process that first maps a latent code to a voxelized shape, and then renders it to an image, with the object appearance being controlled by a second latent code. During inference, we optimize both the latent codes and the networks to fit a test image of a new object. The explicit disentanglement of shape and appearance allows our model to be fine-tuned given a single image. We can then render new views in a geometrically consistent manner and they represent faithfully the input object. Additionally, our method is able to generalize to images outside of the training domain (more realistic renderings and even real photographs). Finally, the inferred geometric scaffold is itself an accurate estimate of the object's 3D shape. We demonstrate in several experiments the effectiveness of our approach in both synthetic and real images.",
            "publicationTitle": "arXiv:2102.08860 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-02-17",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "ShaRF",
            "url": "http://arxiv.org/abs/2102.08860",
            "accessDate": "2021-06-20T16:03:59Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2102.08860",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:59Z",
            "dateModified": "2021-06-20T16:03:59Z"
        }
    },
    {
        "key": "VDKL49H5",
        "version": 81,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/VDKL49H5",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/VDKL49H5",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Takikawa et al.",
            "parsedDate": "2021-01-26",
            "numChildren": 2
        },
        "data": {
            "key": "VDKL49H5",
            "version": 81,
            "itemType": "journalArticle",
            "title": "Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Towaki",
                    "lastName": "Takikawa"
                },
                {
                    "creatorType": "author",
                    "firstName": "Joey",
                    "lastName": "Litalien"
                },
                {
                    "creatorType": "author",
                    "firstName": "Kangxue",
                    "lastName": "Yin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Karsten",
                    "lastName": "Kreis"
                },
                {
                    "creatorType": "author",
                    "firstName": "Charles",
                    "lastName": "Loop"
                },
                {
                    "creatorType": "author",
                    "firstName": "Derek",
                    "lastName": "Nowrouzezahrai"
                },
                {
                    "creatorType": "author",
                    "firstName": "Alec",
                    "lastName": "Jacobson"
                },
                {
                    "creatorType": "author",
                    "firstName": "Morgan",
                    "lastName": "McGuire"
                },
                {
                    "creatorType": "author",
                    "firstName": "Sanja",
                    "lastName": "Fidler"
                }
            ],
            "abstractNote": "Neural signed distance functions (SDFs) are emerging as an effective representation for 3D shapes. State-of-the-art methods typically encode the SDF with a large, fixed-size neural network to approximate complex shapes with implicit surfaces. Rendering with these large networks is, however, computationally expensive since it requires many forward passes through the network for every pixel, making these representations impractical for real-time graphics. We introduce an efficient neural representation that, for the first time, enables real-time rendering of high-fidelity neural SDFs, while achieving state-of-the-art geometry reconstruction quality. We represent implicit surfaces using an octree-based feature volume which adaptively fits shapes with multiple discrete levels of detail (LODs), and enables continuous LOD with SDF interpolation. We further develop an efficient algorithm to directly render our novel neural SDF representation in real-time by querying only the necessary LODs with sparse octree traversal. We show that our representation is 2-3 orders of magnitude more efficient in terms of rendering speed compared to previous works. Furthermore, it produces state-of-the-art reconstruction quality for complex shapes under both 3D geometric and 2D image-space metrics.",
            "publicationTitle": "arXiv:2101.10994 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-01-26",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "Neural Geometric Level of Detail",
            "url": "http://arxiv.org/abs/2101.10994",
            "accessDate": "2021-06-20T16:03:58Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2101.10994",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:59Z",
            "dateModified": "2021-06-20T16:03:59Z"
        }
    },
    {
        "key": "IWDM4ZF5",
        "version": 95,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/IWDM4ZF5",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/IWDM4ZF5",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Wang et al.",
            "parsedDate": "2020-12-17",
            "numChildren": 2
        },
        "data": {
            "key": "IWDM4ZF5",
            "version": 95,
            "itemType": "journalArticle",
            "title": "Learning Compositional Radiance Fields of Dynamic Human Heads",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Ziyan",
                    "lastName": "Wang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Timur",
                    "lastName": "Bagautdinov"
                },
                {
                    "creatorType": "author",
                    "firstName": "Stephen",
                    "lastName": "Lombardi"
                },
                {
                    "creatorType": "author",
                    "firstName": "Tomas",
                    "lastName": "Simon"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jason",
                    "lastName": "Saragih"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jessica",
                    "lastName": "Hodgins"
                },
                {
                    "creatorType": "author",
                    "firstName": "Michael",
                    "lastName": "Zollhöfer"
                }
            ],
            "abstractNote": "Photorealistic rendering of dynamic humans is an important ability for telepresence systems, virtual shopping, synthetic data generation, and more. Recently, neural rendering methods, which combine techniques from computer graphics and machine learning, have created high-fidelity models of humans and objects. Some of these methods do not produce results with high-enough fidelity for driveable human models (Neural Volumes) whereas others have extremely long rendering times (NeRF). We propose a novel compositional 3D representation that combines the best of previous methods to produce both higher-resolution and faster results. Our representation bridges the gap between discrete and continuous volumetric representations by combining a coarse 3D-structure-aware grid of animation codes with a continuous learned scene function that maps every position and its corresponding local animation code to its view-dependent emitted radiance and local volume density. Differentiable volume rendering is employed to compute photo-realistic novel views of the human head and upper body as well as to train our novel representation end-to-end using only 2D supervision. In addition, we show that the learned dynamic radiance field can be used to synthesize novel unseen expressions based on a global animation code. Our approach achieves state-of-the-art results for synthesizing novel views of dynamic human heads and the upper body.",
            "publicationTitle": "arXiv:2012.09955 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-12-17",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2012.09955",
            "accessDate": "2021-06-20T16:03:58Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2012.09955",
            "tags": [],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:58Z",
            "dateModified": "2021-06-20T16:03:58Z"
        }
    },
    {
        "key": "75JSFXK4",
        "version": 81,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/75JSFXK4",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/75JSFXK4",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Peng et al.",
            "parsedDate": "2021-03-29",
            "numChildren": 3
        },
        "data": {
            "key": "75JSFXK4",
            "version": 81,
            "itemType": "journalArticle",
            "title": "Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Sida",
                    "lastName": "Peng"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yuanqing",
                    "lastName": "Zhang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yinghao",
                    "lastName": "Xu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Qianqian",
                    "lastName": "Wang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Qing",
                    "lastName": "Shuai"
                },
                {
                    "creatorType": "author",
                    "firstName": "Hujun",
                    "lastName": "Bao"
                },
                {
                    "creatorType": "author",
                    "firstName": "Xiaowei",
                    "lastName": "Zhou"
                }
            ],
            "abstractNote": "This paper addresses the challenge of novel view synthesis for a human performer from a very sparse set of camera views. Some recent works have shown that learning implicit neural representations of 3D scenes achieves remarkable view synthesis quality given dense input views. However, the representation learning will be ill-posed if the views are highly sparse. To solve this ill-posed problem, our key idea is to integrate observations over video frames. To this end, we propose Neural Body, a new human body representation which assumes that the learned neural representations at different frames share the same set of latent codes anchored to a deformable mesh, so that the observations across frames can be naturally integrated. The deformable mesh also provides geometric guidance for the network to learn 3D representations more efficiently. To evaluate our approach, we create a multi-view dataset named ZJU-MoCap that captures performers with complex motions. Experiments on ZJU-MoCap show that our approach outperforms prior works by a large margin in terms of novel view synthesis quality. We also demonstrate the capability of our approach to reconstruct a moving person from a monocular video on the People-Snapshot dataset. The code and dataset are available at https://zju3dv.github.io/neuralbody/.",
            "publicationTitle": "arXiv:2012.15838 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-03-29",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "Neural Body",
            "url": "http://arxiv.org/abs/2012.15838",
            "accessDate": "2021-06-20T16:03:58Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2012.15838",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:58Z",
            "dateModified": "2021-06-20T16:03:58Z"
        }
    },
    {
        "key": "WMXXCCEA",
        "version": 171,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/WMXXCCEA",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/WMXXCCEA",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Tretschk et al.",
            "parsedDate": "2021-02-26",
            "numChildren": 3
        },
        "data": {
            "key": "WMXXCCEA",
            "version": 171,
            "itemType": "journalArticle",
            "title": "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Edgar",
                    "lastName": "Tretschk"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ayush",
                    "lastName": "Tewari"
                },
                {
                    "creatorType": "author",
                    "firstName": "Vladislav",
                    "lastName": "Golyanik"
                },
                {
                    "creatorType": "author",
                    "firstName": "Michael",
                    "lastName": "Zollhöfer"
                },
                {
                    "creatorType": "author",
                    "firstName": "Christoph",
                    "lastName": "Lassner"
                },
                {
                    "creatorType": "author",
                    "firstName": "Christian",
                    "lastName": "Theobalt"
                }
            ],
            "abstractNote": "We present Non-Rigid Neural Radiance Fields (NR-NeRF), a reconstruction and novel view synthesis approach for general non-rigid dynamic scenes. Our approach takes RGB images of a dynamic scene as input, e.g., from a monocular video recording, and creates a high-quality space-time geometry and appearance representation. In particular, we show that even a single handheld consumer-grade camera is sufficient to synthesize sophisticated renderings of a dynamic scene from novel virtual camera views, for example a `bullet-time' video effect. Our method disentangles the dynamic scene into a canonical volume and its deformation. Scene deformation is implemented as ray bending, where straight rays are deformed non-rigidly to represent scene motion. We also propose a novel rigidity regression network that enables us to better constrain rigid regions of the scene, which leads to more stable results. The ray bending and rigidity network are trained without any explicit supervision. In addition to novel view synthesis, our formulation enables dense correspondence estimation across views and time, as well as compelling video editing applications such as motion exaggeration. We demonstrate the effectiveness of our method using extensive evaluations, including ablation studies and comparisons to the state of the art. We urge the reader to watch the supplemental video for qualitative results. Our code will be open sourced.",
            "publicationTitle": "arXiv:2012.12247 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-02-26",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "Non-Rigid Neural Radiance Fields",
            "url": "http://arxiv.org/abs/2012.12247",
            "accessDate": "2021-06-20T16:03:58Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2012.12247",
            "tags": [],
            "collections": [
                "CPYKW3PF",
                "I7H95BKE"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:58Z",
            "dateModified": "2021-06-20T16:03:58Z"
        }
    },
    {
        "key": "VYTC7KJ6",
        "version": 81,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/VYTC7KJ6",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/VYTC7KJ6",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Liu et al.",
            "parsedDate": "2020-12-18",
            "numChildren": 3
        },
        "data": {
            "key": "VYTC7KJ6",
            "version": 81,
            "itemType": "journalArticle",
            "title": "Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Andrew",
                    "lastName": "Liu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Richard",
                    "lastName": "Tucker"
                },
                {
                    "creatorType": "author",
                    "firstName": "Varun",
                    "lastName": "Jampani"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ameesh",
                    "lastName": "Makadia"
                },
                {
                    "creatorType": "author",
                    "firstName": "Noah",
                    "lastName": "Snavely"
                },
                {
                    "creatorType": "author",
                    "firstName": "Angjoo",
                    "lastName": "Kanazawa"
                }
            ],
            "abstractNote": "We introduce the problem of perpetual view generation -- long-range generation of novel views corresponding to an arbitrarily long camera trajectory given a single image. This is a challenging problem that goes far beyond the capabilities of current view synthesis methods, which work for a limited range of viewpoints and quickly degenerate when presented with a large camera motion. Methods designed for video generation also have limited ability to produce long video sequences and are often agnostic to scene geometry. We take a hybrid approach that integrates both geometry and image synthesis in an iterative render, refine, and repeat framework, allowing for long-range generation that cover large distances after hundreds of frames. Our approach can be trained from a set of monocular video sequences without any manual annotation. We propose a dataset of aerial footage of natural coastal scenes, and compare our method with recent view synthesis and conditional video generation baselines, showing that it can generate plausible scenes for much longer time horizons over large camera trajectories compared to existing methods. Please visit our project page at https://infinite-nature.github.io/.",
            "publicationTitle": "arXiv:2012.09855 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-12-18",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "Infinite Nature",
            "url": "http://arxiv.org/abs/2012.09855",
            "accessDate": "2021-06-20T16:03:57Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2012.09855",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:57Z",
            "dateModified": "2021-06-20T16:03:57Z"
        }
    },
    {
        "key": "9YFDKDRF",
        "version": 199,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/9YFDKDRF",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/9YFDKDRF",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Hu et al.",
            "parsedDate": "2021-04-16",
            "numChildren": 3
        },
        "data": {
            "key": "9YFDKDRF",
            "version": 199,
            "itemType": "journalArticle",
            "title": "Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Ronghang",
                    "lastName": "Hu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Nikhila",
                    "lastName": "Ravi"
                },
                {
                    "creatorType": "author",
                    "firstName": "Alex",
                    "lastName": "Berg"
                },
                {
                    "creatorType": "author",
                    "firstName": "Deepak",
                    "lastName": "Pathak"
                }
            ],
            "abstractNote": "We present Worldsheet, a method for novel view synthesis using just a single RGB image as input. The main insight is that simply shrink-wrapping a planar mesh sheet onto the input image, consistent with the learned intermediate depth, captures underlying geometry sufficient to generate photorealistic unseen views with large viewpoint changes. To operationalize this, we propose a novel differentiable texture sampler that allows our wrapped mesh sheet to be textured and rendered differentiably into an image from a target viewpoint. Our approach is category-agnostic, end-to-end trainable without using any 3D supervision, and requires a single image at test time. We also explore a simple extension by stacking multiple layers of Worldsheets to better handle occlusions. Worldsheet consistently outperforms prior state-of-the-art methods on single-image view synthesis across several datasets. Furthermore, this simple idea captures novel views surprisingly well on a wide range of high-resolution in-the-wild images, converting them into navigable 3D pop-ups. Video results and code at https://worldsheet.github.io.",
            "publicationTitle": "arXiv:2012.09854 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-16",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "Worldsheet",
            "url": "http://arxiv.org/abs/2012.09854",
            "accessDate": "2021-06-20T16:03:56Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2012.09854",
            "tags": [],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:57Z",
            "dateModified": "2021-06-20T16:03:57Z"
        }
    },
    {
        "key": "6L8GNRIL",
        "version": 174,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/6L8GNRIL",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/6L8GNRIL",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Gao et al.",
            "parsedDate": "2021-04-16",
            "numChildren": 3
        },
        "data": {
            "key": "6L8GNRIL",
            "version": 174,
            "itemType": "journalArticle",
            "title": "Portrait Neural Radiance Fields from a Single Image",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Chen",
                    "lastName": "Gao"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yichang",
                    "lastName": "Shih"
                },
                {
                    "creatorType": "author",
                    "firstName": "Wei-Sheng",
                    "lastName": "Lai"
                },
                {
                    "creatorType": "author",
                    "firstName": "Chia-Kai",
                    "lastName": "Liang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jia-Bin",
                    "lastName": "Huang"
                }
            ],
            "abstractNote": "We present a method for estimating Neural Radiance Fields (NeRF) from a single headshot portrait. While NeRF has demonstrated high-quality view synthesis, it requires multiple images of static scenes and thus impractical for casual captures and moving subjects. In this work, we propose to pretrain the weights of a multilayer perceptron (MLP), which implicitly models the volumetric density and colors, with a meta-learning framework using a light stage portrait dataset. To improve the generalization to unseen faces, we train the MLP in the canonical coordinate space approximated by 3D face morphable models. We quantitatively evaluate the method using controlled captures and demonstrate the generalization to real portrait images, showing favorable results against state-of-the-arts.",
            "publicationTitle": "arXiv:2012.05903 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-16",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2012.05903",
            "accessDate": "2021-06-20T16:03:56Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2012.05903",
            "tags": [],
            "collections": [
                "CPYKW3PF",
                "KFHEDKK5"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:56Z",
            "dateModified": "2021-06-20T16:03:56Z"
        }
    },
    {
        "key": "ZT43H7LK",
        "version": 81,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/ZT43H7LK",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/ZT43H7LK",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Gafni et al.",
            "parsedDate": "2020-12-05",
            "numChildren": 3
        },
        "data": {
            "key": "ZT43H7LK",
            "version": 81,
            "itemType": "journalArticle",
            "title": "Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Guy",
                    "lastName": "Gafni"
                },
                {
                    "creatorType": "author",
                    "firstName": "Justus",
                    "lastName": "Thies"
                },
                {
                    "creatorType": "author",
                    "firstName": "Michael",
                    "lastName": "Zollhöfer"
                },
                {
                    "creatorType": "author",
                    "firstName": "Matthias",
                    "lastName": "Nießner"
                }
            ],
            "abstractNote": "We present dynamic neural radiance fields for modeling the appearance and dynamics of a human face. Digitally modeling and reconstructing a talking human is a key building-block for a variety of applications. Especially, for telepresence applications in AR or VR, a faithful reproduction of the appearance including novel viewpoints or head-poses is required. In contrast to state-of-the-art approaches that model the geometry and material properties explicitly, or are purely image-based, we introduce an implicit representation of the head based on scene representation networks. To handle the dynamics of the face, we combine our scene representation network with a low-dimensional morphable model which provides explicit control over pose and expressions. We use volumetric rendering to generate images from this hybrid representation and demonstrate that such a dynamic neural scene representation can be learned from monocular input data only, without the need of a specialized capture setup. In our experiments, we show that this learned volumetric representation allows for photo-realistic image generation that surpasses the quality of state-of-the-art video-based reenactment methods.",
            "publicationTitle": "arXiv:2012.03065 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-12-05",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2012.03065",
            "accessDate": "2021-06-20T16:03:56Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2012.03065",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:56Z",
            "dateModified": "2021-06-20T16:03:56Z"
        }
    },
    {
        "key": "HTAUNQ6X",
        "version": 95,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/HTAUNQ6X",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/HTAUNQ6X",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Guo et al.",
            "parsedDate": "2020-12-15",
            "numChildren": 3
        },
        "data": {
            "key": "HTAUNQ6X",
            "version": 95,
            "itemType": "journalArticle",
            "title": "Object-Centric Neural Scene Rendering",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Michelle",
                    "lastName": "Guo"
                },
                {
                    "creatorType": "author",
                    "firstName": "Alireza",
                    "lastName": "Fathi"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jiajun",
                    "lastName": "Wu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Thomas",
                    "lastName": "Funkhouser"
                }
            ],
            "abstractNote": "We present a method for composing photorealistic scenes from captured images of objects. Our work builds upon neural radiance fields (NeRFs), which implicitly model the volumetric density and directionally-emitted radiance of a scene. While NeRFs synthesize realistic pictures, they only model static scenes and are closely tied to specific imaging conditions. This property makes NeRFs hard to generalize to new scenarios, including new lighting or new arrangements of objects. Instead of learning a scene radiance field as a NeRF does, we propose to learn object-centric neural scattering functions (OSFs), a representation that models per-object light transport implicitly using a lighting- and view-dependent neural network. This enables rendering scenes even when objects or lights move, without retraining. Combined with a volumetric path tracing procedure, our framework is capable of rendering both intra- and inter-object light transport effects including occlusions, specularities, shadows, and indirect illumination. We evaluate our approach on scene composition and show that it generalizes to novel illumination conditions, producing photorealistic, physically accurate renderings of multi-object scenes.",
            "publicationTitle": "arXiv:2012.08503 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-12-15",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2012.08503",
            "accessDate": "2021-06-20T16:03:56Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2012.08503",
            "tags": [],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:56Z",
            "dateModified": "2021-06-20T16:03:56Z"
        }
    },
    {
        "key": "4BWLNEMH",
        "version": 95,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/4BWLNEMH",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/4BWLNEMH",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Yen-Chen et al.",
            "parsedDate": "2021-04-01",
            "numChildren": 3
        },
        "data": {
            "key": "4BWLNEMH",
            "version": 95,
            "itemType": "journalArticle",
            "title": "iNeRF: Inverting Neural Radiance Fields for Pose Estimation",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Lin",
                    "lastName": "Yen-Chen"
                },
                {
                    "creatorType": "author",
                    "firstName": "Pete",
                    "lastName": "Florence"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jonathan T.",
                    "lastName": "Barron"
                },
                {
                    "creatorType": "author",
                    "firstName": "Alberto",
                    "lastName": "Rodriguez"
                },
                {
                    "creatorType": "author",
                    "firstName": "Phillip",
                    "lastName": "Isola"
                },
                {
                    "creatorType": "author",
                    "firstName": "Tsung-Yi",
                    "lastName": "Lin"
                }
            ],
            "abstractNote": "We present iNeRF, a framework that performs mesh-free pose estimation by \"inverting\" a Neural RadianceField (NeRF). NeRFs have been shown to be remarkably effective for the task of view synthesis - synthesizing photorealistic novel views of real-world scenes or objects. In this work, we investigate whether we can apply analysis-by-synthesis via NeRF for mesh-free, RGB-only 6DoF pose estimation - given an image, find the translation and rotation of a camera relative to a 3D object or scene. Our method assumes that no object mesh models are available during either training or test time. Starting from an initial pose estimate, we use gradient descent to minimize the residual between pixels rendered from a NeRF and pixels in an observed image. In our experiments, we first study 1) how to sample rays during pose refinement for iNeRF to collect informative gradients and 2) how different batch sizes of rays affect iNeRF on a synthetic dataset. We then show that for complex real-world scenes from the LLFF dataset, iNeRF can improve NeRF by estimating the camera poses of novel images and using these images as additional training data for NeRF. Finally, we show iNeRF can perform category-level object pose estimation, including object instances not seen during training, with RGB images by inverting a NeRF model inferred from a single view.",
            "publicationTitle": "arXiv:2012.05877 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-01",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "iNeRF",
            "url": "http://arxiv.org/abs/2012.05877",
            "accessDate": "2021-06-20T16:03:55Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2012.05877",
            "tags": [],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:55Z",
            "dateModified": "2021-06-20T16:03:55Z"
        }
    },
    {
        "key": "EQVWR6WG",
        "version": 80,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/EQVWR6WG",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/EQVWR6WG",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Xu et al.",
            "parsedDate": "2020-12-09",
            "numChildren": 3
        },
        "data": {
            "key": "EQVWR6WG",
            "version": 80,
            "itemType": "journalArticle",
            "title": "Positional Encoding as Spatial Inductive Bias in GANs",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Rui",
                    "lastName": "Xu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Xintao",
                    "lastName": "Wang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Kai",
                    "lastName": "Chen"
                },
                {
                    "creatorType": "author",
                    "firstName": "Bolei",
                    "lastName": "Zhou"
                },
                {
                    "creatorType": "author",
                    "firstName": "Chen Change",
                    "lastName": "Loy"
                }
            ],
            "abstractNote": "SinGAN shows impressive capability in learning internal patch distribution despite its limited effective receptive field. We are interested in knowing how such a translation-invariant convolutional generator could capture the global structure with just a spatially i.i.d. input. In this work, taking SinGAN and StyleGAN2 as examples, we show that such capability, to a large extent, is brought by the implicit positional encoding when using zero padding in the generators. Such positional encoding is indispensable for generating images with high fidelity. The same phenomenon is observed in other generative architectures such as DCGAN and PGGAN. We further show that zero padding leads to an unbalanced spatial bias with a vague relation between locations. To offer a better spatial inductive bias, we investigate alternative positional encodings and analyze their effects. Based on a more flexible positional encoding explicitly, we propose a new multi-scale training strategy and demonstrate its effectiveness in the state-of-the-art unconditional generator StyleGAN2. Besides, the explicit spatial inductive bias substantially improve SinGAN for more versatile image manipulation.",
            "publicationTitle": "arXiv:2012.05217 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-12-09",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2012.05217",
            "accessDate": "2021-06-20T16:03:55Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2012.05217",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:55Z",
            "dateModified": "2021-06-20T16:03:55Z"
        }
    },
    {
        "key": "VMJXU2HQ",
        "version": 80,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/VMJXU2HQ",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/VMJXU2HQ",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Srinivasan et al.",
            "parsedDate": "2020-12-07",
            "numChildren": 3
        },
        "data": {
            "key": "VMJXU2HQ",
            "version": 80,
            "itemType": "journalArticle",
            "title": "NeRV: Neural Reflectance and Visibility Fields for Relighting and View Synthesis",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Pratul P.",
                    "lastName": "Srinivasan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Boyang",
                    "lastName": "Deng"
                },
                {
                    "creatorType": "author",
                    "firstName": "Xiuming",
                    "lastName": "Zhang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Matthew",
                    "lastName": "Tancik"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ben",
                    "lastName": "Mildenhall"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jonathan T.",
                    "lastName": "Barron"
                }
            ],
            "abstractNote": "We present a method that takes as input a set of images of a scene illuminated by unconstrained known lighting, and produces as output a 3D representation that can be rendered from novel viewpoints under arbitrary lighting conditions. Our method represents the scene as a continuous volumetric function parameterized as MLPs whose inputs are a 3D location and whose outputs are the following scene properties at that input location: volume density, surface normal, material parameters, distance to the first surface intersection in any direction, and visibility of the external environment in any direction. Together, these allow us to render novel views of the object under arbitrary lighting, including indirect illumination effects. The predicted visibility and surface intersection fields are critical to our model's ability to simulate direct and indirect illumination during training, because the brute-force techniques used by prior work are intractable for lighting conditions outside of controlled setups with a single light. Our method outperforms alternative approaches for recovering relightable 3D scene representations, and performs well in complex lighting settings that have posed a significant challenge to prior work.",
            "publicationTitle": "arXiv:2012.03927 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-12-07",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "NeRV",
            "url": "http://arxiv.org/abs/2012.03927",
            "accessDate": "2021-06-20T16:03:55Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2012.03927",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:55Z",
            "dateModified": "2021-06-20T16:03:55Z"
        }
    },
    {
        "key": "5W2FTFLY",
        "version": 80,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/5W2FTFLY",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/5W2FTFLY",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Boss et al.",
            "parsedDate": "2021-05-19",
            "numChildren": 2
        },
        "data": {
            "key": "5W2FTFLY",
            "version": 80,
            "itemType": "journalArticle",
            "title": "NeRD: Neural Reflectance Decomposition from Image Collections",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Mark",
                    "lastName": "Boss"
                },
                {
                    "creatorType": "author",
                    "firstName": "Raphael",
                    "lastName": "Braun"
                },
                {
                    "creatorType": "author",
                    "firstName": "Varun",
                    "lastName": "Jampani"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jonathan T.",
                    "lastName": "Barron"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ce",
                    "lastName": "Liu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Hendrik P. A.",
                    "lastName": "Lensch"
                }
            ],
            "abstractNote": "Decomposing a scene into its shape, reflectance, and illumination is a challenging but essential problem in computer vision and graphics. This problem is inherently more challenging when the illumination is not a single light source under laboratory conditions but is instead an unconstrained environmental illumination. Though recent work has shown that implicit representations can be used to model the radiance field of an object, these techniques only enable view synthesis and not relighting. Additionally, evaluating these radiance fields is resource and time-intensive. By decomposing a scene into explicit representations, any rendering framework can be leveraged to generate novel views under any illumination in real-time. NeRD is a method that achieves this decomposition by introducing physically-based rendering to neural radiance fields. Even challenging non-Lambertian reflectances, complex geometry, and unknown illumination can be decomposed into high-quality models. The datasets and code is available on the project page: https://markboss.me/publication/2021-nerd/",
            "publicationTitle": "arXiv:2012.03918 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-05-19",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "NeRD",
            "url": "http://arxiv.org/abs/2012.03918",
            "accessDate": "2021-06-20T16:03:54Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2012.03918",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:54Z",
            "dateModified": "2021-06-20T16:03:54Z"
        }
    },
    {
        "key": "43G9UVNV",
        "version": 121,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/43G9UVNV",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/43G9UVNV",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Tancik et al.",
            "parsedDate": "2021-03-23",
            "numChildren": 3
        },
        "data": {
            "key": "43G9UVNV",
            "version": 121,
            "itemType": "journalArticle",
            "title": "Learned Initializations for Optimizing Coordinate-Based Neural Representations",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Matthew",
                    "lastName": "Tancik"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ben",
                    "lastName": "Mildenhall"
                },
                {
                    "creatorType": "author",
                    "firstName": "Terrance",
                    "lastName": "Wang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Divi",
                    "lastName": "Schmidt"
                },
                {
                    "creatorType": "author",
                    "firstName": "Pratul P.",
                    "lastName": "Srinivasan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jonathan T.",
                    "lastName": "Barron"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ren",
                    "lastName": "Ng"
                }
            ],
            "abstractNote": "Coordinate-based neural representations have shown significant promise as an alternative to discrete, array-based representations for complex low dimensional signals. However, optimizing a coordinate-based network from randomly initialized weights for each new signal is inefficient. We propose applying standard meta-learning algorithms to learn the initial weight parameters for these fully-connected networks based on the underlying class of signals being represented (e.g., images of faces or 3D models of chairs). Despite requiring only a minor change in implementation, using these learned initial weights enables faster convergence during optimization and can serve as a strong prior over the signal class being modeled, resulting in better generalization when only partial observations of a given signal are available. We explore these benefits across a variety of tasks, including representing 2D images, reconstructing CT scans, and recovering 3D shapes and scenes from 2D image observations.",
            "publicationTitle": "arXiv:2012.02189 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-03-23",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2012.02189",
            "accessDate": "2021-06-20T16:03:53Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2012.02189",
            "tags": [],
            "collections": [
                "VDV7CTDX"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:53Z",
            "dateModified": "2021-06-20T16:03:53Z"
        }
    },
    {
        "key": "WHVCIMNZ",
        "version": 165,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/WHVCIMNZ",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/WHVCIMNZ",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Lindell et al.",
            "parsedDate": "2021-05-22",
            "numChildren": 3
        },
        "data": {
            "key": "WHVCIMNZ",
            "version": 165,
            "itemType": "journalArticle",
            "title": "AutoInt: Automatic Integration for Fast Neural Volume Rendering",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "David B.",
                    "lastName": "Lindell"
                },
                {
                    "creatorType": "author",
                    "firstName": "Julien N. P.",
                    "lastName": "Martel"
                },
                {
                    "creatorType": "author",
                    "firstName": "Gordon",
                    "lastName": "Wetzstein"
                }
            ],
            "abstractNote": "Numerical integration is a foundational technique in scientific computing and is at the core of many computer vision applications. Among these applications, neural volume rendering has recently been proposed as a new paradigm for view synthesis, achieving photorealistic image quality. However, a fundamental obstacle to making these methods practical is the extreme computational and memory requirements caused by the required volume integrations along the rendered rays during training and inference. Millions of rays, each requiring hundreds of forward passes through a neural network are needed to approximate those integrations with Monte Carlo sampling. Here, we propose automatic integration, a new framework for learning efficient, closed-form solutions to integrals using coordinate-based neural networks. For training, we instantiate the computational graph corresponding to the derivative of the network. The graph is fitted to the signal to integrate. After optimization, we reassemble the graph to obtain a network that represents the antiderivative. By the fundamental theorem of calculus, this enables the calculation of any definite integral in two evaluations of the network. Applying this approach to neural rendering, we improve a tradeoff between rendering speed and image quality: improving render times by greater than 10 times with a tradeoff of slightly reduced image quality.",
            "publicationTitle": "arXiv:2012.01714 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-05-22",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "AutoInt",
            "url": "http://arxiv.org/abs/2012.01714",
            "accessDate": "2021-06-20T16:03:53Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2012.01714",
            "tags": [],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:53Z",
            "dateModified": "2021-06-20T16:03:53Z"
        }
    },
    {
        "key": "B5TC93EQ",
        "version": 95,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/B5TC93EQ",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/B5TC93EQ",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Pumarola et al.",
            "parsedDate": "2020-11-27",
            "numChildren": 2
        },
        "data": {
            "key": "B5TC93EQ",
            "version": 95,
            "itemType": "journalArticle",
            "title": "D-NeRF: Neural Radiance Fields for Dynamic Scenes",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Albert",
                    "lastName": "Pumarola"
                },
                {
                    "creatorType": "author",
                    "firstName": "Enric",
                    "lastName": "Corona"
                },
                {
                    "creatorType": "author",
                    "firstName": "Gerard",
                    "lastName": "Pons-Moll"
                },
                {
                    "creatorType": "author",
                    "firstName": "Francesc",
                    "lastName": "Moreno-Noguer"
                }
            ],
            "abstractNote": "Neural rendering techniques combining machine learning with geometric reasoning have arisen as one of the most promising approaches for synthesizing novel views of a scene from a sparse set of images. Among these, stands out the Neural radiance fields (NeRF), which trains a deep network to map 5D input coordinates (representing spatial location and viewing direction) into a volume density and view-dependent emitted radiance. However, despite achieving an unprecedented level of photorealism on the generated images, NeRF is only applicable to static scenes, where the same spatial location can be queried from different images. In this paper we introduce D-NeRF, a method that extends neural radiance fields to a dynamic domain, allowing to reconstruct and render novel images of objects under rigid and non-rigid motions from a \\emph{single} camera moving around the scene. For this purpose we consider time as an additional input to the system, and split the learning process in two main stages: one that encodes the scene into a canonical space and another that maps this canonical representation into the deformed scene at a particular time. Both mappings are simultaneously learned using fully-connected networks. Once the networks are trained, D-NeRF can render novel images, controlling both the camera view and the time variable, and thus, the object movement. We demonstrate the effectiveness of our approach on scenes with objects under rigid, articulated and non-rigid motions. Code, model weights and the dynamic scenes dataset will be released.",
            "publicationTitle": "arXiv:2011.13961 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-11-27",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "D-NeRF",
            "url": "http://arxiv.org/abs/2011.13961",
            "accessDate": "2021-06-20T16:03:53Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2011.13961",
            "tags": [],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:53Z",
            "dateModified": "2021-06-20T16:03:53Z"
        }
    },
    {
        "key": "DQHUP86T",
        "version": 80,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/DQHUP86T",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/DQHUP86T",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Chan et al.",
            "parsedDate": "2021-04-05",
            "numChildren": 3
        },
        "data": {
            "key": "DQHUP86T",
            "version": 80,
            "itemType": "journalArticle",
            "title": "pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Eric R.",
                    "lastName": "Chan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Marco",
                    "lastName": "Monteiro"
                },
                {
                    "creatorType": "author",
                    "firstName": "Petr",
                    "lastName": "Kellnhofer"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jiajun",
                    "lastName": "Wu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Gordon",
                    "lastName": "Wetzstein"
                }
            ],
            "abstractNote": "We have witnessed rapid progress on 3D-aware image synthesis, leveraging recent advances in generative visual models and neural rendering. Existing approaches however fall short in two ways: first, they may lack an underlying 3D representation or rely on view-inconsistent rendering, hence synthesizing images that are not multi-view consistent; second, they often depend upon representation network architectures that are not expressive enough, and their results thus lack in image quality. We propose a novel generative model, named Periodic Implicit Generative Adversarial Networks ($\\pi$-GAN or pi-GAN), for high-quality 3D-aware image synthesis. $\\pi$-GAN leverages neural representations with periodic activation functions and volumetric rendering to represent scenes as view-consistent 3D representations with fine detail. The proposed approach obtains state-of-the-art results for 3D-aware image synthesis with multiple real and synthetic datasets.",
            "publicationTitle": "arXiv:2012.00926 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-05",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "pi-GAN",
            "url": "http://arxiv.org/abs/2012.00926",
            "accessDate": "2021-06-20T16:03:52Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2012.00926",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:52Z",
            "dateModified": "2021-06-20T16:03:52Z"
        }
    },
    {
        "key": "93SRFTTC",
        "version": 79,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/93SRFTTC",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/93SRFTTC",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Li et al.",
            "parsedDate": "2021-04-20",
            "numChildren": 3
        },
        "data": {
            "key": "93SRFTTC",
            "version": 79,
            "itemType": "journalArticle",
            "title": "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Zhengqi",
                    "lastName": "Li"
                },
                {
                    "creatorType": "author",
                    "firstName": "Simon",
                    "lastName": "Niklaus"
                },
                {
                    "creatorType": "author",
                    "firstName": "Noah",
                    "lastName": "Snavely"
                },
                {
                    "creatorType": "author",
                    "firstName": "Oliver",
                    "lastName": "Wang"
                }
            ],
            "abstractNote": "We present a method to perform novel view and time synthesis of dynamic scenes, requiring only a monocular video with known camera poses as input. To do this, we introduce Neural Scene Flow Fields, a new representation that models the dynamic scene as a time-variant continuous function of appearance, geometry, and 3D scene motion. Our representation is optimized through a neural network to fit the observed input views. We show that our representation can be used for complex dynamic scenes, including thin structures, view-dependent effects, and natural degrees of motion. We conduct a number of experiments that demonstrate our approach significantly outperforms recent monocular view synthesis methods, and show qualitative results of space-time view synthesis on a variety of real-world videos.",
            "publicationTitle": "arXiv:2011.13084 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-20",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2011.13084",
            "accessDate": "2021-06-20T16:03:51Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2011.13084",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:52Z",
            "dateModified": "2021-06-20T16:03:52Z"
        }
    },
    {
        "key": "RXML26ZL",
        "version": 180,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/RXML26ZL",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/RXML26ZL",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Xian et al.",
            "parsedDate": "2020-11-25",
            "numChildren": 3
        },
        "data": {
            "key": "RXML26ZL",
            "version": 180,
            "itemType": "journalArticle",
            "title": "Space-time Neural Irradiance Fields for Free-Viewpoint Video",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Wenqi",
                    "lastName": "Xian"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jia-Bin",
                    "lastName": "Huang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Johannes",
                    "lastName": "Kopf"
                },
                {
                    "creatorType": "author",
                    "firstName": "Changil",
                    "lastName": "Kim"
                }
            ],
            "abstractNote": "We present a method that learns a spatiotemporal neural irradiance field for dynamic scenes from a single video. Our learned representation enables free-viewpoint rendering of the input video. Our method builds upon recent advances in implicit representations. Learning a spatiotemporal irradiance field from a single video poses significant challenges because the video contains only one observation of the scene at any point in time. The 3D geometry of a scene can be legitimately represented in numerous ways since varying geometry (motion) can be explained with varying appearance and vice versa. We address this ambiguity by constraining the time-varying geometry of our dynamic scene representation using the scene depth estimated from video depth estimation methods, aggregating contents from individual frames into a single global representation. We provide an extensive quantitative evaluation and demonstrate compelling free-viewpoint rendering results.",
            "publicationTitle": "arXiv:2011.12950 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-11-25",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2011.12950",
            "accessDate": "2021-06-20T16:03:51Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2011.12950",
            "tags": [],
            "collections": [
                "CPYKW3PF",
                "FJNUUAVD"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:51Z",
            "dateModified": "2021-06-20T16:03:51Z"
        }
    },
    {
        "key": "VSEJ2GUV",
        "version": 174,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/VSEJ2GUV",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/VSEJ2GUV",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Park et al.",
            "parsedDate": "2021-05-13",
            "numChildren": 3
        },
        "data": {
            "key": "VSEJ2GUV",
            "version": 174,
            "itemType": "journalArticle",
            "title": "Nerfies: Deformable Neural Radiance Fields",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Keunhong",
                    "lastName": "Park"
                },
                {
                    "creatorType": "author",
                    "firstName": "Utkarsh",
                    "lastName": "Sinha"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jonathan T.",
                    "lastName": "Barron"
                },
                {
                    "creatorType": "author",
                    "firstName": "Sofien",
                    "lastName": "Bouaziz"
                },
                {
                    "creatorType": "author",
                    "firstName": "Dan B.",
                    "lastName": "Goldman"
                },
                {
                    "creatorType": "author",
                    "firstName": "Steven M.",
                    "lastName": "Seitz"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ricardo",
                    "lastName": "Martin-Brualla"
                }
            ],
            "abstractNote": "We present the first method capable of photorealistically reconstructing deformable scenes using photos/videos captured casually from mobile phones. Our approach augments neural radiance fields (NeRF) by optimizing an additional continuous volumetric deformation field that warps each observed point into a canonical 5D NeRF. We observe that these NeRF-like deformation fields are prone to local minima, and propose a coarse-to-fine optimization method for coordinate-based models that allows for more robust optimization. By adapting principles from geometry processing and physical simulation to NeRF-like models, we propose an elastic regularization of the deformation field that further improves robustness. We show that our method can turn casually captured selfie photos/videos into deformable NeRF models that allow for photorealistic renderings of the subject from arbitrary viewpoints, which we dub \"nerfies.\" We evaluate our method by collecting time-synchronized data using a rig with two mobile phones, yielding train/validation images of the same pose at different viewpoints. We show that our method faithfully reconstructs non-rigidly deforming scenes and reproduces unseen views with high fidelity.",
            "publicationTitle": "arXiv:2011.12948 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-05-13",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "Nerfies",
            "url": "http://arxiv.org/abs/2011.12948",
            "accessDate": "2021-06-20T16:03:51Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2011.12948",
            "tags": [],
            "collections": [
                "CPYKW3PF",
                "I7H95BKE",
                "KFHEDKK5"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:51Z",
            "dateModified": "2021-06-20T16:03:51Z"
        }
    },
    {
        "key": "XCPDD5N9",
        "version": 79,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/XCPDD5N9",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/XCPDD5N9",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Li et al.",
            "parsedDate": "2020-11-22",
            "numChildren": 3
        },
        "data": {
            "key": "XCPDD5N9",
            "version": 79,
            "itemType": "journalArticle",
            "title": "Multi-Plane Program Induction with 3D Box Priors",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Yikai",
                    "lastName": "Li"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jiayuan",
                    "lastName": "Mao"
                },
                {
                    "creatorType": "author",
                    "firstName": "Xiuming",
                    "lastName": "Zhang"
                },
                {
                    "creatorType": "author",
                    "firstName": "William T.",
                    "lastName": "Freeman"
                },
                {
                    "creatorType": "author",
                    "firstName": "Joshua B.",
                    "lastName": "Tenenbaum"
                },
                {
                    "creatorType": "author",
                    "firstName": "Noah",
                    "lastName": "Snavely"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jiajun",
                    "lastName": "Wu"
                }
            ],
            "abstractNote": "We consider two important aspects in understanding and editing images: modeling regular, program-like texture or patterns in 2D planes, and 3D posing of these planes in the scene. Unlike prior work on image-based program synthesis, which assumes the image contains a single visible 2D plane, we present Box Program Induction (BPI), which infers a program-like scene representation that simultaneously models repeated structure on multiple 2D planes, the 3D position and orientation of the planes, and camera parameters, all from a single image. Our model assumes a box prior, i.e., that the image captures either an inner view or an outer view of a box in 3D. It uses neural networks to infer visual cues such as vanishing points, wireframe lines to guide a search-based algorithm to find the program that best explains the image. Such a holistic, structured scene representation enables 3D-aware interactive image editing operations such as inpainting missing pixels, changing camera parameters, and extrapolate the image contents.",
            "publicationTitle": "arXiv:2011.10007 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-11-22",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2011.10007",
            "accessDate": "2021-06-20T16:03:50Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2011.10007",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:50Z",
            "dateModified": "2021-06-20T16:03:50Z"
        }
    },
    {
        "key": "ZNDAS9TD",
        "version": 94,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/ZNDAS9TD",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/ZNDAS9TD",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Rebain et al.",
            "parsedDate": "2020-11-24",
            "numChildren": 2
        },
        "data": {
            "key": "ZNDAS9TD",
            "version": 94,
            "itemType": "journalArticle",
            "title": "DeRF: Decomposed Radiance Fields",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Daniel",
                    "lastName": "Rebain"
                },
                {
                    "creatorType": "author",
                    "firstName": "Wei",
                    "lastName": "Jiang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Soroosh",
                    "lastName": "Yazdani"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ke",
                    "lastName": "Li"
                },
                {
                    "creatorType": "author",
                    "firstName": "Kwang Moo",
                    "lastName": "Yi"
                },
                {
                    "creatorType": "author",
                    "firstName": "Andrea",
                    "lastName": "Tagliasacchi"
                }
            ],
            "abstractNote": "With the advent of Neural Radiance Fields (NeRF), neural networks can now render novel views of a 3D scene with quality that fools the human eye. Yet, generating these images is very computationally intensive, limiting their applicability in practical scenarios. In this paper, we propose a technique based on spatial decomposition capable of mitigating this issue. Our key observation is that there are diminishing returns in employing larger (deeper and/or wider) networks. Hence, we propose to spatially decompose a scene and dedicate smaller networks for each decomposed part. When working together, these networks can render the whole scene. This allows us near-constant inference time regardless of the number of decomposed parts. Moreover, we show that a Voronoi spatial decomposition is preferable for this purpose, as it is provably compatible with the Painter's Algorithm for efficient and GPU-friendly rendering. Our experiments show that for real-world scenes, our method provides up to 3x more efficient inference than NeRF (with the same rendering quality), or an improvement of up to 1.0~dB in PSNR (for the same inference cost).",
            "publicationTitle": "arXiv:2011.12490 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-11-24",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "DeRF",
            "url": "http://arxiv.org/abs/2011.12490",
            "accessDate": "2021-06-20T16:03:50Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2011.12490",
            "tags": [],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:50Z",
            "dateModified": "2021-06-20T16:03:50Z"
        }
    },
    {
        "key": "V5KTHHYD",
        "version": 177,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/V5KTHHYD",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/V5KTHHYD",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Niemeyer and Geiger",
            "parsedDate": "2021-04-29",
            "numChildren": 3
        },
        "data": {
            "key": "V5KTHHYD",
            "version": 177,
            "itemType": "journalArticle",
            "title": "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Michael",
                    "lastName": "Niemeyer"
                },
                {
                    "creatorType": "author",
                    "firstName": "Andreas",
                    "lastName": "Geiger"
                }
            ],
            "abstractNote": "Deep generative models allow for photorealistic image synthesis at high resolutions. But for many applications, this is not enough: content creation also needs to be controllable. While several recent works investigate how to disentangle underlying factors of variation in the data, most of them operate in 2D and hence ignore that our world is three-dimensional. Further, only few works consider the compositional nature of scenes. Our key hypothesis is that incorporating a compositional 3D scene representation into the generative model leads to more controllable image synthesis. Representing scenes as compositional generative neural feature fields allows us to disentangle one or multiple objects from the background as well as individual objects' shapes and appearances while learning from unstructured and unposed image collections without any additional supervision. Combining this scene representation with a neural rendering pipeline yields a fast and realistic image synthesis model. As evidenced by our experiments, our model is able to disentangle individual objects and allows for translating and rotating them in the scene as well as changing the camera pose.",
            "publicationTitle": "arXiv:2011.12100 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-29",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "GIRAFFE",
            "url": "http://arxiv.org/abs/2011.12100",
            "accessDate": "2021-06-20T16:03:50Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2011.12100",
            "tags": [
                {
                    "tag": "nerf"
                }
            ],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:50Z",
            "dateModified": "2021-06-20T16:03:50Z"
        }
    },
    {
        "key": "HSN56RWV",
        "version": 79,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/HSN56RWV",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/HSN56RWV",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Skorokhodov et al.",
            "parsedDate": "2020-11-24",
            "numChildren": 3
        },
        "data": {
            "key": "HSN56RWV",
            "version": 79,
            "itemType": "journalArticle",
            "title": "Adversarial Generation of Continuous Images",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Ivan",
                    "lastName": "Skorokhodov"
                },
                {
                    "creatorType": "author",
                    "firstName": "Savva",
                    "lastName": "Ignatyev"
                },
                {
                    "creatorType": "author",
                    "firstName": "Mohamed",
                    "lastName": "Elhoseiny"
                }
            ],
            "abstractNote": "In most existing learning systems, images are typically viewed as 2D pixel arrays. However, in another paradigm gaining popularity, a 2D image is represented as an implicit neural representation (INR) -- an MLP that predicts an RGB pixel value given its (x,y) coordinate. In this paper, we propose two novel architectural techniques for building INR-based image decoders: factorized multiplicative modulation and multi-scale INRs, and use them to build a state-of-the-art continuous image GAN. Previous attempts to adapt INRs for image generation were limited to MNIST-like datasets and do not scale to complex real-world data. Our proposed architectural design improves the performance of continuous image generators by x6-40 times and reaches FID scores of 6.27 on LSUN bedroom 256x256 and 16.32 on FFHQ 1024x1024, greatly reducing the gap between continuous image GANs and pixel-based ones. To the best of our knowledge, these are the highest reported scores for an image generator, that consists entirely of fully-connected layers. Apart from that, we explore several exciting properties of INR-based decoders, like out-of-the-box superresolution, meaningful image-space interpolation, accelerated inference of low-resolution images, an ability to extrapolate outside of image boundaries and strong geometric prior. The source code is available at https://github.com/universome/inr-gan",
            "publicationTitle": "arXiv:2011.12026 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-11-24",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2011.12026",
            "accessDate": "2021-06-20T16:03:49Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2011.12026",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:49Z",
            "dateModified": "2021-06-20T16:03:49Z"
        }
    },
    {
        "key": "86AX9V9R",
        "version": 362,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/86AX9V9R",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/86AX9V9R",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Liu et al.",
            "parsedDate": "2021-01-06",
            "numChildren": 3
        },
        "data": {
            "key": "86AX9V9R",
            "version": 362,
            "itemType": "journalArticle",
            "title": "Neural Sparse Voxel Fields",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Lingjie",
                    "lastName": "Liu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jiatao",
                    "lastName": "Gu"
                },
                {
                    "creatorType": "author",
                    "firstName": "Kyaw Zaw",
                    "lastName": "Lin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Tat-Seng",
                    "lastName": "Chua"
                },
                {
                    "creatorType": "author",
                    "firstName": "Christian",
                    "lastName": "Theobalt"
                }
            ],
            "abstractNote": "Photo-realistic free-viewpoint rendering of real-world scenes using classical computer graphics techniques is challenging, because it requires the difficult step of capturing detailed appearance and geometry models. Recent studies have demonstrated promising results by learning scene representations that implicitly encode both geometry and appearance without 3D supervision. However, existing approaches in practice often show blurry renderings caused by the limited network capacity or the difficulty in finding accurate intersections of camera rays with the scene geometry. Synthesizing high-resolution imagery from these representations often requires time-consuming optical ray marching. In this work, we introduce Neural Sparse Voxel Fields (NSVF), a new neural scene representation for fast and high-quality free-viewpoint rendering. NSVF defines a set of voxel-bounded implicit fields organized in a sparse voxel octree to model local properties in each cell. We progressively learn the underlying voxel structures with a differentiable ray-marching operation from only a set of posed RGB images. With the sparse voxel octree structure, rendering novel views can be accelerated by skipping the voxels containing no relevant scene content. Our method is typically over 10 times faster than the state-of-the-art (namely, NeRF(Mildenhall et al., 2020)) at inference time while achieving higher quality results. Furthermore, by utilizing an explicit sparse voxel representation, our method can easily be applied to scene editing and scene composition. We also demonstrate several challenging tasks, including multi-scene learning, free-viewpoint rendering of a moving human, and large-scale scene rendering. Code and data are available at our website: https://github.com/facebookresearch/NSVF.",
            "publicationTitle": "arXiv:2007.11571 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-01-06",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2007.11571",
            "accessDate": "2021-06-20T16:03:48Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2007.11571",
            "tags": [],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/UCFBLXW4"
            },
            "dateAdded": "2021-06-20T16:03:48Z",
            "dateModified": "2021-06-20T16:03:48Z"
        }
    },
    {
        "key": "U962VCXI",
        "version": 183,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/U962VCXI",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/U962VCXI",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Lin et al.",
            "parsedDate": "2020-10-20",
            "numChildren": 3
        },
        "data": {
            "key": "U962VCXI",
            "version": 183,
            "itemType": "journalArticle",
            "title": "SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Chen-Hsuan",
                    "lastName": "Lin"
                },
                {
                    "creatorType": "author",
                    "firstName": "Chaoyang",
                    "lastName": "Wang"
                },
                {
                    "creatorType": "author",
                    "firstName": "Simon",
                    "lastName": "Lucey"
                }
            ],
            "abstractNote": "Dense 3D object reconstruction from a single image has recently witnessed remarkable advances, but supervising neural networks with ground-truth 3D shapes is impractical due to the laborious process of creating paired image-shape datasets. Recent efforts have turned to learning 3D reconstruction without 3D supervision from RGB images with annotated 2D silhouettes, dramatically reducing the cost and effort of annotation. These techniques, however, remain impractical as they still require multi-view annotations of the same object instance during training. As a result, most experimental efforts to date have been limited to synthetic datasets. In this paper, we address this issue and propose SDF-SRN, an approach that requires only a single view of objects at training time, offering greater utility for real-world scenarios. SDF-SRN learns implicit 3D shape representations to handle arbitrary shape topologies that may exist in the datasets. To this end, we derive a novel differentiable rendering formulation for learning signed distance functions (SDF) from 2D silhouettes. Our method outperforms the state of the art under challenging single-view supervision settings on both synthetic and real-world datasets.",
            "publicationTitle": "arXiv:2010.10505 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-10-20",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "SDF-SRN",
            "url": "http://arxiv.org/abs/2010.10505",
            "accessDate": "2021-06-20T16:03:48Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2010.10505",
            "tags": [],
            "collections": [
                "5XFEXSAE"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:48Z",
            "dateModified": "2021-06-20T16:03:48Z"
        }
    },
    {
        "key": "BVV9955T",
        "version": 167,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/BVV9955T",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/BVV9955T",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Riegler and Koltun",
            "parsedDate": "2021-05-02",
            "numChildren": 3
        },
        "data": {
            "key": "BVV9955T",
            "version": 167,
            "itemType": "journalArticle",
            "title": "Stable View Synthesis",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Gernot",
                    "lastName": "Riegler"
                },
                {
                    "creatorType": "author",
                    "firstName": "Vladlen",
                    "lastName": "Koltun"
                }
            ],
            "abstractNote": "We present Stable View Synthesis (SVS). Given a set of source images depicting a scene from freely distributed viewpoints, SVS synthesizes new views of the scene. The method operates on a geometric scaffold computed via structure-from-motion and multi-view stereo. Each point on this 3D scaffold is associated with view rays and corresponding feature vectors that encode the appearance of this point in the input images. The core of SVS is view-dependent on-surface feature aggregation, in which directional feature vectors at each 3D point are processed to produce a new feature vector for a ray that maps this point into the new target view. The target view is then rendered by a convolutional network from a tensor of features synthesized in this way for all pixels. The method is composed of differentiable modules and is trained end-to-end. It supports spatially-varying view-dependent importance weighting and feature transformation of source images at each point; spatial and temporal stability due to the smooth dependence of on-surface feature aggregation on the target view; and synthesis of view-dependent effects such as specular reflection. Experimental results demonstrate that SVS outperforms state-of-the-art view synthesis methods both quantitatively and qualitatively on three diverse real-world datasets, achieving unprecedented levels of realism in free-viewpoint video of challenging large-scale scenes. Code is available at https://github.com/intel-isl/StableViewSynthesis",
            "publicationTitle": "arXiv:2011.07233 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-05-02",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2011.07233",
            "accessDate": "2021-06-20T16:03:48Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2011.07233",
            "tags": [],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:48Z",
            "dateModified": "2021-06-20T16:03:48Z"
        }
    },
    {
        "key": "7LPDVVES",
        "version": 184,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/7LPDVVES",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/7LPDVVES",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Chibane et al.",
            "parsedDate": "2020-10-26",
            "numChildren": 3
        },
        "data": {
            "key": "7LPDVVES",
            "version": 184,
            "itemType": "journalArticle",
            "title": "Neural Unsigned Distance Fields for Implicit Function Learning",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Julian",
                    "lastName": "Chibane"
                },
                {
                    "creatorType": "author",
                    "firstName": "Aymen",
                    "lastName": "Mir"
                },
                {
                    "creatorType": "author",
                    "firstName": "Gerard",
                    "lastName": "Pons-Moll"
                }
            ],
            "abstractNote": "In this work we target a learnable output representation that allows continuous, high resolution outputs of arbitrary shape. Recent works represent 3D surfaces implicitly with a Neural Network, thereby breaking previous barriers in resolution, and ability to represent diverse topologies. However, neural implicit representations are limited to closed surfaces, which divide the space into inside and outside. Many real world objects such as walls of a scene scanned by a sensor, clothing, or a car with inner structures are not closed. This constitutes a significant barrier, in terms of data pre-processing (objects need to be artificially closed creating artifacts), and the ability to output open surfaces. In this work, we propose Neural Distance Fields (NDF), a neural network based model which predicts the unsigned distance field for arbitrary 3D shapes given sparse point clouds. NDF represent surfaces at high resolutions as prior implicit models, but do not require closed surface data, and significantly broaden the class of representable shapes in the output. NDF allow to extract the surface as very dense point clouds and as meshes. We also show that NDF allow for surface normal calculation and can be rendered using a slight modification of sphere tracing. We find NDF can be used for multi-target regression (multiple outputs for one input) with techniques that have been exclusively used for rendering in graphics. Experiments on ShapeNet show that NDF, while simple, is the state-of-the art, and allows to reconstruct shapes with inner structures, such as the chairs inside a bus. Notably, we show that NDF are not restricted to 3D shapes, and can approximate more general open surfaces such as curves, manifolds, and functions. Code is available for research at https://virtualhumans.mpi-inf.mpg.de/ndf/.",
            "publicationTitle": "arXiv:2010.13938 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-10-26",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2010.13938",
            "accessDate": "2021-06-20T16:03:47Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2010.13938",
            "tags": [],
            "collections": [
                "5XFEXSAE"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:47Z",
            "dateModified": "2021-06-20T16:03:47Z"
        }
    },
    {
        "key": "F86LTWEM",
        "version": 78,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/F86LTWEM",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/F86LTWEM",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Attal et al.",
            "parsedDate": "2020-08-14",
            "numChildren": 3
        },
        "data": {
            "key": "F86LTWEM",
            "version": 78,
            "itemType": "journalArticle",
            "title": "MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Benjamin",
                    "lastName": "Attal"
                },
                {
                    "creatorType": "author",
                    "firstName": "Selena",
                    "lastName": "Ling"
                },
                {
                    "creatorType": "author",
                    "firstName": "Aaron",
                    "lastName": "Gokaslan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Christian",
                    "lastName": "Richardt"
                },
                {
                    "creatorType": "author",
                    "firstName": "James",
                    "lastName": "Tompkin"
                }
            ],
            "abstractNote": "We introduce a method to convert stereo 360{\\deg} (omnidirectional stereo) imagery into a layered, multi-sphere image representation for six degree-of-freedom (6DoF) rendering. Stereo 360{\\deg} imagery can be captured from multi-camera systems for virtual reality (VR), but lacks motion parallax and correct-in-all-directions disparity cues. Together, these can quickly lead to VR sickness when viewing content. One solution is to try and generate a format suitable for 6DoF rendering, such as by estimating depth. However, this raises questions as to how to handle disoccluded regions in dynamic scenes. Our approach is to simultaneously learn depth and disocclusions via a multi-sphere image representation, which can be rendered with correct 6DoF disparity and motion parallax in VR. This significantly improves comfort for the viewer, and can be inferred and rendered in real time on modern GPU hardware. Together, these move towards making VR video a more comfortable immersive medium.",
            "publicationTitle": "arXiv:2008.06534 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-08-14",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "MatryODShka",
            "url": "http://arxiv.org/abs/2008.06534",
            "accessDate": "2021-06-20T16:03:45Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2008.06534",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:45Z",
            "dateModified": "2021-06-20T16:03:45Z"
        }
    },
    {
        "key": "VR6DWTNT",
        "version": 94,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/VR6DWTNT",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/VR6DWTNT",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Martin-Brualla et al.",
            "parsedDate": "2021-01-06",
            "numChildren": 3
        },
        "data": {
            "key": "VR6DWTNT",
            "version": 94,
            "itemType": "journalArticle",
            "title": "NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Ricardo",
                    "lastName": "Martin-Brualla"
                },
                {
                    "creatorType": "author",
                    "firstName": "Noha",
                    "lastName": "Radwan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Mehdi S. M.",
                    "lastName": "Sajjadi"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jonathan T.",
                    "lastName": "Barron"
                },
                {
                    "creatorType": "author",
                    "firstName": "Alexey",
                    "lastName": "Dosovitskiy"
                },
                {
                    "creatorType": "author",
                    "firstName": "Daniel",
                    "lastName": "Duckworth"
                }
            ],
            "abstractNote": "We present a learning-based method for synthesizing novel views of complex scenes using only unstructured collections of in-the-wild photographs. We build on Neural Radiance Fields (NeRF), which uses the weights of a multilayer perceptron to model the density and color of a scene as a function of 3D coordinates. While NeRF works well on images of static subjects captured under controlled settings, it is incapable of modeling many ubiquitous, real-world phenomena in uncontrolled images, such as variable illumination or transient occluders. We introduce a series of extensions to NeRF to address these issues, thereby enabling accurate reconstructions from unstructured image collections taken from the internet. We apply our system, dubbed NeRF-W, to internet photo collections of famous landmarks, and demonstrate temporally consistent novel view renderings that are significantly closer to photorealism than the prior state of the art.",
            "publicationTitle": "arXiv:2008.02268 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-01-06",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "NeRF in the Wild",
            "url": "http://arxiv.org/abs/2008.02268",
            "accessDate": "2021-06-20T16:03:45Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2008.02268",
            "tags": [],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:45Z",
            "dateModified": "2021-06-20T16:03:45Z"
        }
    },
    {
        "key": "PP3NI8CA",
        "version": 359,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/PP3NI8CA",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/PP3NI8CA",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Tancik et al.",
            "parsedDate": "2020-06-18",
            "numChildren": 3
        },
        "data": {
            "key": "PP3NI8CA",
            "version": 359,
            "itemType": "journalArticle",
            "title": "Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Matthew",
                    "lastName": "Tancik"
                },
                {
                    "creatorType": "author",
                    "firstName": "Pratul P.",
                    "lastName": "Srinivasan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ben",
                    "lastName": "Mildenhall"
                },
                {
                    "creatorType": "author",
                    "firstName": "Sara",
                    "lastName": "Fridovich-Keil"
                },
                {
                    "creatorType": "author",
                    "firstName": "Nithin",
                    "lastName": "Raghavan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Utkarsh",
                    "lastName": "Singhal"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ravi",
                    "lastName": "Ramamoorthi"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jonathan T.",
                    "lastName": "Barron"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ren",
                    "lastName": "Ng"
                }
            ],
            "abstractNote": "We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron (MLP) to learn high-frequency functions in low-dimensional problem domains. These results shed light on recent advances in computer vision and graphics that achieve state-of-the-art results by using MLPs to represent complex 3D objects and scenes. Using tools from the neural tangent kernel (NTK) literature, we show that a standard MLP fails to learn high frequencies both in theory and in practice. To overcome this spectral bias, we use a Fourier feature mapping to transform the effective NTK into a stationary kernel with a tunable bandwidth. We suggest an approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities.",
            "publicationTitle": "arXiv:2006.10739 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-06-18",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2006.10739",
            "accessDate": "2021-06-20T16:03:44Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2006.10739",
            "tags": [],
            "collections": [
                "CPYKW3PF",
                "J4345UQS"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/N64THQEK"
            },
            "dateAdded": "2021-06-20T16:03:44Z",
            "dateModified": "2021-06-20T16:03:44Z"
        }
    },
    {
        "key": "Z9RMBP5X",
        "version": 77,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/Z9RMBP5X",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/Z9RMBP5X",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Yariv et al.",
            "parsedDate": "2020-10-25",
            "numChildren": 2
        },
        "data": {
            "key": "Z9RMBP5X",
            "version": 77,
            "itemType": "journalArticle",
            "title": "Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Lior",
                    "lastName": "Yariv"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yoni",
                    "lastName": "Kasten"
                },
                {
                    "creatorType": "author",
                    "firstName": "Dror",
                    "lastName": "Moran"
                },
                {
                    "creatorType": "author",
                    "firstName": "Meirav",
                    "lastName": "Galun"
                },
                {
                    "creatorType": "author",
                    "firstName": "Matan",
                    "lastName": "Atzmon"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ronen",
                    "lastName": "Basri"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yaron",
                    "lastName": "Lipman"
                }
            ],
            "abstractNote": "In this work we address the challenging problem of multiview 3D surface reconstruction. We introduce a neural network architecture that simultaneously learns the unknown geometry, camera parameters, and a neural renderer that approximates the light reflected from the surface towards the camera. The geometry is represented as a zero level-set of a neural network, while the neural renderer, derived from the rendering equation, is capable of (implicitly) modeling a wide set of lighting conditions and materials. We trained our network on real world 2D images of objects with different material properties, lighting conditions, and noisy camera initializations from the DTU MVS dataset. We found our model to produce state of the art 3D surface reconstructions with high fidelity, resolution and detail.",
            "publicationTitle": "arXiv:2003.09852 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-10-25",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2003.09852",
            "accessDate": "2021-06-20T16:03:43Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2003.09852",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:43Z",
            "dateModified": "2021-06-20T16:03:43Z"
        }
    },
    {
        "key": "865FJ5C2",
        "version": 77,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/865FJ5C2",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/865FJ5C2",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Sitzmann et al.",
            "parsedDate": "2020-06-17",
            "numChildren": 3
        },
        "data": {
            "key": "865FJ5C2",
            "version": 77,
            "itemType": "journalArticle",
            "title": "Implicit Neural Representations with Periodic Activation Functions",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Vincent",
                    "lastName": "Sitzmann"
                },
                {
                    "creatorType": "author",
                    "firstName": "Julien N. P.",
                    "lastName": "Martel"
                },
                {
                    "creatorType": "author",
                    "firstName": "Alexander W.",
                    "lastName": "Bergman"
                },
                {
                    "creatorType": "author",
                    "firstName": "David B.",
                    "lastName": "Lindell"
                },
                {
                    "creatorType": "author",
                    "firstName": "Gordon",
                    "lastName": "Wetzstein"
                }
            ],
            "abstractNote": "Implicitly defined, continuous, differentiable signal representations parameterized by neural networks have emerged as a powerful paradigm, offering many possible benefits over conventional representations. However, current network architectures for such implicit neural representations are incapable of modeling signals with fine detail, and fail to represent a signal's spatial and temporal derivatives, despite the fact that these are essential to many physical signals defined implicitly as the solution to partial differential equations. We propose to leverage periodic activation functions for implicit neural representations and demonstrate that these networks, dubbed sinusoidal representation networks or Sirens, are ideally suited for representing complex natural signals and their derivatives. We analyze Siren activation statistics to propose a principled initialization scheme and demonstrate the representation of images, wavefields, video, sound, and their derivatives. Further, we show how Sirens can be leveraged to solve challenging boundary value problems, such as particular Eikonal equations (yielding signed distance functions), the Poisson equation, and the Helmholtz and wave equations. Lastly, we combine Sirens with hypernetworks to learn priors over the space of Siren functions.",
            "publicationTitle": "arXiv:2006.09661 [cs, eess]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-06-17",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2006.09661",
            "accessDate": "2021-06-20T16:03:43Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2006.09661",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:43Z",
            "dateModified": "2021-06-20T16:03:43Z"
        }
    },
    {
        "key": "VYS6ZLXT",
        "version": 94,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/VYS6ZLXT",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/VYS6ZLXT",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Mildenhall et al.",
            "parsedDate": "2020-08-03",
            "numChildren": 3
        },
        "data": {
            "key": "VYS6ZLXT",
            "version": 94,
            "itemType": "journalArticle",
            "title": "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Ben",
                    "lastName": "Mildenhall"
                },
                {
                    "creatorType": "author",
                    "firstName": "Pratul P.",
                    "lastName": "Srinivasan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Matthew",
                    "lastName": "Tancik"
                },
                {
                    "creatorType": "author",
                    "firstName": "Jonathan T.",
                    "lastName": "Barron"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ravi",
                    "lastName": "Ramamoorthi"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ren",
                    "lastName": "Ng"
                }
            ],
            "abstractNote": "We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location $(x,y,z)$ and viewing direction $(\\theta, \\phi)$) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.",
            "publicationTitle": "arXiv:2003.08934 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-08-03",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "NeRF",
            "url": "http://arxiv.org/abs/2003.08934",
            "accessDate": "2021-06-20T16:03:43Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2003.08934",
            "tags": [],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:43Z",
            "dateModified": "2021-06-20T16:03:43Z"
        }
    },
    {
        "key": "5H9I5CSF",
        "version": 359,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/5H9I5CSF",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/5H9I5CSF",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Ho et al.",
            "parsedDate": "2020-12-16",
            "numChildren": 2
        },
        "data": {
            "key": "5H9I5CSF",
            "version": 359,
            "itemType": "journalArticle",
            "title": "Denoising Diffusion Probabilistic Models",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Jonathan",
                    "lastName": "Ho"
                },
                {
                    "creatorType": "author",
                    "firstName": "Ajay",
                    "lastName": "Jain"
                },
                {
                    "creatorType": "author",
                    "firstName": "Pieter",
                    "lastName": "Abbeel"
                }
            ],
            "abstractNote": "We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. Our implementation is available at https://github.com/hojonathanho/diffusion",
            "publicationTitle": "arXiv:2006.11239 [cs, stat]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-12-16",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "",
            "url": "http://arxiv.org/abs/2006.11239",
            "accessDate": "2021-06-20T16:03:42Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2006.11239",
            "tags": [],
            "collections": [
                "TNWL7M5C"
            ],
            "relations": {
                "owl:sameAs": "http://zotero.org/groups/4320173/items/RQT67FRB"
            },
            "dateAdded": "2021-06-20T16:03:42Z",
            "dateModified": "2021-06-20T16:03:42Z"
        }
    },
    {
        "key": "UPA65XTD",
        "version": 182,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/UPA65XTD",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/UPA65XTD",
                "type": "text/html"
            }
        },
        "meta": {
            "creatorSummary": "Sitzmann et al.",
            "parsedDate": "2020-06-17",
            "numChildren": 3
        },
        "data": {
            "key": "UPA65XTD",
            "version": 182,
            "itemType": "journalArticle",
            "title": "MetaSDF: Meta-learning Signed Distance Functions",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Vincent",
                    "lastName": "Sitzmann"
                },
                {
                    "creatorType": "author",
                    "firstName": "Eric R.",
                    "lastName": "Chan"
                },
                {
                    "creatorType": "author",
                    "firstName": "Richard",
                    "lastName": "Tucker"
                },
                {
                    "creatorType": "author",
                    "firstName": "Noah",
                    "lastName": "Snavely"
                },
                {
                    "creatorType": "author",
                    "firstName": "Gordon",
                    "lastName": "Wetzstein"
                }
            ],
            "abstractNote": "Neural implicit shape representations are an emerging paradigm that offers many potential benefits over conventional discrete representations, including memory efficiency at a high spatial resolution. Generalizing across shapes with such neural implicit representations amounts to learning priors over the respective function space and enables geometry reconstruction from partial or noisy observations. Existing generalization methods rely on conditioning a neural network on a low-dimensional latent code that is either regressed by an encoder or jointly optimized in the auto-decoder framework. Here, we formalize learning of a shape space as a meta-learning problem and leverage gradient-based meta-learning algorithms to solve this task. We demonstrate that this approach performs on par with auto-decoder based approaches while being an order of magnitude faster at test-time inference. We further demonstrate that the proposed gradient-based method outperforms encoder-decoder based methods that leverage pooling-based set encoders.",
            "publicationTitle": "arXiv:2006.09662 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2020-06-17",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "MetaSDF",
            "url": "http://arxiv.org/abs/2006.09662",
            "accessDate": "2021-06-20T16:03:40Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2006.09662",
            "tags": [],
            "collections": [
                "5XFEXSAE"
            ],
            "relations": {},
            "dateAdded": "2021-06-20T16:03:40Z",
            "dateModified": "2021-06-20T16:03:40Z"
        }
    },
    {
        "key": "KHSD5WHK",
        "version": 94,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/KHSD5WHK",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/KHSD5WHK",
                "type": "text/html"
            },
            "attachment": {
                "href": "https://api.zotero.org/users/7902311/items/C952EH2V",
                "type": "application/json",
                "attachmentType": "application/pdf",
                "attachmentSize": 48381864
            }
        },
        "meta": {
            "creatorSummary": "Jain et al.",
            "parsedDate": "2021-04-01",
            "numChildren": 3
        },
        "data": {
            "key": "KHSD5WHK",
            "version": 94,
            "itemType": "journalArticle",
            "title": "Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Ajay",
                    "lastName": "Jain"
                },
                {
                    "creatorType": "author",
                    "firstName": "Matthew",
                    "lastName": "Tancik"
                },
                {
                    "creatorType": "author",
                    "firstName": "Pieter",
                    "lastName": "Abbeel"
                }
            ],
            "abstractNote": "We present DietNeRF, a 3D neural scene representation estimated from a few images. Neural Radiance Fields (NeRF) learn a continuous volumetric representation of a scene through multi-view consistency, and can be rendered from novel viewpoints by ray casting. While NeRF has an impressive ability to reconstruct geometry and fine details given many images, up to 100 for challenging 360{\\deg} scenes, it often finds a degenerate solution to its image reconstruction objective when only a few input views are available. To improve few-shot quality, we propose DietNeRF. We introduce an auxiliary semantic consistency loss that encourages realistic renderings at novel poses. DietNeRF is trained on individual scenes to (1) correctly render given input views from the same pose, and (2) match high-level semantic attributes across different, random poses. Our semantic loss allows us to supervise DietNeRF from arbitrary poses. We extract these semantics using a pre-trained visual encoder such as CLIP, a Vision Transformer trained on hundreds of millions of diverse single-view, 2D photographs mined from the web with natural language supervision. In experiments, DietNeRF improves the perceptual quality of few-shot view synthesis when learned from scratch, can render novel views with as few as one observed image when pre-trained on a multi-view dataset, and produces plausible completions of completely unobserved regions.",
            "publicationTitle": "arXiv:2104.00677 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-04-01",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "Putting NeRF on a Diet",
            "url": "http://arxiv.org/abs/2104.00677",
            "accessDate": "2021-04-26T04:42:06Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2104.00677",
            "tags": [],
            "collections": [
                "CPYKW3PF"
            ],
            "relations": {},
            "dateAdded": "2021-04-26T04:42:06Z",
            "dateModified": "2021-04-26T04:42:06Z"
        }
    },
    {
        "key": "HNYWQ5PZ",
        "version": 77,
        "library": {
            "type": "user",
            "id": 7902311,
            "name": "supasorn",
            "links": {
                "alternate": {
                    "href": "https://www.zotero.org/supasorn",
                    "type": "text/html"
                }
            }
        },
        "links": {
            "self": {
                "href": "https://api.zotero.org/users/7902311/items/HNYWQ5PZ",
                "type": "application/json"
            },
            "alternate": {
                "href": "https://www.zotero.org/supasorn/items/HNYWQ5PZ",
                "type": "text/html"
            },
            "attachment": {
                "href": "https://api.zotero.org/users/7902311/items/TEECTE59",
                "type": "application/json",
                "attachmentType": "application/pdf",
                "attachmentSize": 5592753
            }
        },
        "meta": {
            "creatorSummary": "Schwarz et al.",
            "parsedDate": "2021-03-30",
            "numChildren": 2
        },
        "data": {
            "key": "HNYWQ5PZ",
            "version": 77,
            "itemType": "journalArticle",
            "title": "GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis",
            "creators": [
                {
                    "creatorType": "author",
                    "firstName": "Katja",
                    "lastName": "Schwarz"
                },
                {
                    "creatorType": "author",
                    "firstName": "Yiyi",
                    "lastName": "Liao"
                },
                {
                    "creatorType": "author",
                    "firstName": "Michael",
                    "lastName": "Niemeyer"
                },
                {
                    "creatorType": "author",
                    "firstName": "Andreas",
                    "lastName": "Geiger"
                }
            ],
            "abstractNote": "While 2D generative adversarial networks have enabled high-resolution image synthesis, they largely lack an understanding of the 3D world and the image formation process. Thus, they do not provide precise control over camera viewpoint or object pose. To address this problem, several recent approaches leverage intermediate voxel-based representations in combination with differentiable rendering. However, existing methods either produce low image resolution or fall short in disentangling camera and scene properties, e.g., the object identity may vary with the viewpoint. In this paper, we propose a generative model for radiance fields which have recently proven successful for novel view synthesis of a single scene. In contrast to voxel-based representations, radiance fields are not confined to a coarse discretization of the 3D space, yet allow for disentangling camera and scene properties while degrading gracefully in the presence of reconstruction ambiguity. By introducing a multi-scale patch-based discriminator, we demonstrate synthesis of high-resolution images while training our model from unposed 2D images alone. We systematically analyze our approach on several challenging synthetic and real-world datasets. Our experiments reveal that radiance fields are a powerful representation for generative image synthesis, leading to 3D consistent models that render with high fidelity.",
            "publicationTitle": "arXiv:2007.02442 [cs]",
            "volume": "",
            "issue": "",
            "pages": "",
            "date": "2021-03-30",
            "series": "",
            "seriesTitle": "",
            "seriesText": "",
            "journalAbbreviation": "",
            "language": "",
            "DOI": "",
            "ISSN": "",
            "shortTitle": "GRAF",
            "url": "http://arxiv.org/abs/2007.02442",
            "accessDate": "2021-04-26T04:30:42Z",
            "archive": "",
            "archiveLocation": "",
            "libraryCatalog": "arXiv.org",
            "callNumber": "",
            "rights": "",
            "extra": "arXiv: 2007.02442",
            "tags": [],
            "collections": [],
            "relations": {},
            "dateAdded": "2021-04-26T04:30:42Z",
            "dateModified": "2021-04-26T04:30:42Z"
        }
    }
]