back.html

<!DOCTYPE html>
<html>


<!-- Mirrored from sfa.cs.columbia.edu/ by HTTrack Website Copier/3.x [XR&CO'2014], Fri, 21 Jul 2023 10:56:34 GMT -->
<head>
    <title>
        A Dataset and Benchmark for Copyright Protection from Text-to-Image Diffusion Models
    </title>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=1000" />
    <link rel="stylesheet" href="assets/css/main.css" />
    <script>
        (function (i, s, o, g, r, a, m) {
            i["GoogleAnalyticsObject"] = r;
            (i[r] =
                i[r] ||
                function () {
                    (i[r].q = i[r].q || []).push(arguments);
                }),
                (i[r].l = 1 * new Date());
            (a = s.createElement(o)), (m = s.getElementsByTagName(o)[0]);
            a.async = 1;
            a.src = g;
            m.parentNode.insertBefore(a, m);
        })(
            window,
            document,
            "script",
            "https://www.google-analytics.com/analytics.js",
            "ga"
        );
        ga("create", "UA-89797207-1", "auto");
        ga("send", "pageview");
    </script>

    <meta property="og:url" content="https://sfa.cs.columbia.edu/" />
    <meta property="og:type" content="website" />
    <meta property="og:title"
        content="Structure From Action: Learning Interactions for Articulated Object 3D Structure Discovery" />
    <meta property="og:description"
        content="Articulated objects make up a significant portion of our environment. Discovering their parts, joints, and kinematics is crucial for robots to interact with these objects. We introduce Structure from Action (SfA), a framework that discovers the 3D part geometry and joint parameters of unseen articulated objects via a sequence of inferred interactions.  Our key insight is that 3D interaction and perception should be considered in conjunction to construct 3D articulated CAD models, especially in the case of categories not seen during training.  By selecting informative interactions, SfA discovers parts and reveals initially occluded surfaces, like the inside of a closed drawer.  By aggregating visual observations in 3D, SfA accurately segments multiple parts, reconstructs part geometry, and infers all joint parameters in a canonical coordinate frame. Our experiments demonstrate that a single SfA model trained in simulation can generalize to many unseen object categories with unknown kinematic structures and to real-world objects.  Code and data will be publicly available." />
</head>

<body id="top">
    <!-- Main -->
    <div id="main" style="
    padding-bottom: 1em;
    padding-top: 5em;
    width: 60em;
    max-width: 70em;
    margin-left: auto;
    margin-right: auto;
    ">
        <section id="four">
            <h1 style="text-align: center; margin-bottom: 0">
                <font color="4e79a7">CPDM:Copyright Protection in Diffusion Models</font>
            </h1>
            <h3 style="text-align: center">
                A Dataset and Benchmark for Copyright Protection from Text-to-Image Diffusion Models
            </h2>

            <span class="image right" style="
        max-width: 50%;
        margin-top: 0.5em;
        margin-bottom: 0;
        border: 0px solid #415161;
        ">
                <!-- <img src="images/intra_1.jpg" width="20%" alt="" /> -->
            </span>
            <span class="image center" style="
            max-width: 100%;
            max-height: 50%;
            margin-top: 0.5em;
            margin-bottom: 0;
            border: 0px solid #415161;
            ">
                    <img src="images/intra_1.jpg" width="100%" alt="" />
            </span>
           

            <p >
                Copyright is a legal right that grants creators the exclusive authority to 
                reproduce, distribute, and profit from their creative works. However, the 
                recent advancements in text-to-image generation techniques have posed 
                significant challenges to copyright protection, as these methods have 
                facilitated the learning of unauthorized content, artistic creations,
                and portraits, which are subsequently utilized to generate and disseminate 
                uncontrolled content. Especially, the use of stable diffusion, an emerging
                 model for text-to-image generation, poses an increased risk of unauthorized 
                 copyright infringement and distribution. Currently, there is a lack of 
                 systematic studies evaluating the potential correlation between content 
                 generated by stable diffusion and those under copyright protection. 
                 Conducting such studies faces several challenges, including i) t
                 he intrinsic ambiguity related to copyright infringement in text-to-image models,
                  ii) the absence of a comprehensive large-scale dataset, and
                   iii) the lack of standardized metrics for defining copyright infringement. 
                   This work provides the first large-scale standardized dataset and 
                   benchmark on copyright protection. Specifically, we propose a pipeline 
                   to coordinate CLIP, ChatGPT, and diffusion models to generate a dataset 
                   that contains anchor images, corresponding prompts, and images generated
                    by text-to-image models, reflecting the potential abuses of copyright.
                     Furthermore, we explore a suite of evaluation metrics to judge the 
                     effectiveness of copyright protection methods. The proposed dataset,
                      benchmark library, and evaluation metrics will be open-sourced to 
                      facilitate future research and application.
            </p>
            

            <!-- <hr/ style="margin-top: 1em">
        <h3>Highlights</h3>
        
        <section>
            <div class="box alt" style="margin-bottom: 0em;">
                
                <div class="row 50% uniform" style="width: 100%;">
                    <div class="3u" style="margin-top: -5em; font-size: 0.7em; line-height: 1.5em; text-align: center;"><a href="https://ai.googleblog.com/2019/03/unifying-physics-and-deep-learning-with.html"><span class="image fit" style="margin-bottom: 0.5em;"><img src="images/logo-GoogleAI.png" alt="" style="height: 19em; width: auto;" /></span></a></div>
                    <div class="3u" style="margin-left: 3em; margin-top: 1.6em; font-size: 0.7em; line-height: 1.5em; text-align: center;"><a href="https://www.nytimes.com/2019/03/26/technology/google-robotics-lab.html"><span class="image fit" style="margin-bottom: 0.5em;"><img src="images/logo-nyt.jpg" alt="" style="height: 6em; width: auto;" /></span></a></div>
                    <div class="3u" style="margin-left: 6em; margin-top: 2.6em; font-size: 0.7em; line-height: 1.5em; text-align: center;"><a href="https://spectrum.ieee.org/automaton/robotics/artificial-intelligence/google-teaches-robot-to-toss-bananas-better-than-you-do"><span class="image fit" style="margin-bottom: 0.5em;"><img src="images/logo-ieeespectrum.png" alt="" style="height: 3em; width: auto;" /></span></a></div>
                    
                </div>
            </div>
        </section> -->

            <hr />

            <!-- <hr/ style="margin-top: 1em"> -->
            <!-- <div class="row">
    <div class="12u$ 12u$(xsmall)" style="text-align: center">
        <h3>Technical Summary Video</h3>
        <iframe
        id="match-video"
        width="640"
        height="360"
        style="
        margin-bottom: 2em;
        margin-left: auto;
        margin-right: auto;
        display: block;
        "
        src="https://youtu.be/4Oz2Q5hxtnE"
        frameborder="0"
        allowfullscreen
        ></iframe>
    </div>
</div> -->

            <!-- Reconstruction Results -->
            <h2>Pipeline of Dataset Generation</h2>
            <!-- <p> 
            
            To validate the generalization of our approach to real-world
            data. The model performs well on previously unseen instances in the real world despite
                challenging noise artifacts from the real RGBD camera.
            </p>
             -->

            <!-- <center>                                
                <img style="width:100%" src="images/sfa-real-world.png" alt="" />
            </center> -->
            <span class="image center" style="
            max-width: 100%;
            max-height: 100%;
            margin-top: 0.5em;
            margin-bottom: 0;
            border: 0px solid #415161;
            ">
                <img src="images/data-generation.png" height="90%" width="90%" alt="" style="margin: 0 auto;"/>
            </span>
            <p>

                <strong><em>Pipeline for generating CPDM datasets</em></strong> : The clip interrogator is employed to convert
                copyrighted images into textual information that corresponds to them. This text is subsequently
                refined and transformed into prompts, which are then inputted into a diffusion model to generate the
                corresponding infringing images.
        
            </p>

            <!-- <video autoplay="true" loop="" muted="" style="width: 80%; margin-left: 10%;">
                <source src="videos/real-world-video.mp4" type="video/mp4">
            </video> -->
            
                <hr style="margin-top: 0em" />

                <h2>Statistics and Details of the Dataset</h2>
                <!-- <p> 
                
                To validate the generalization of our approach to real-world
                data. The model performs well on previously unseen instances in the real world despite
                    challenging noise artifacts from the real RGBD camera.
                </p>
                 -->
    
                <!-- <center>                                
                    <img style="width:100%" src="images/sfa-real-world.png" alt="" />
                </center> -->
                <span class="image center" style="
                max-width: 100%;
                max-height: 50%;
                margin-top: 0.5em;
                margin-bottom: 0;
                border: 0px solid #415161;
                ">
                        <img src="images/statistics.jpg" width="90%" alt="" style="margin: 0 auto;" />
                    </span>
                <p>
    
                    <strong><em>Style</em></strong> : Painting artworks often embody the distinctive style of the artist, encompassing aspects such
                    as brushstrokes, lines, colors, and compositions.<br>
    
            
                
    
                    <strong><em>Portrait</em></strong> : An individual’s control and use of their own portrait, including
                    facial features, image, and posture.<br>
    
            
                
    
                    <strong><em>Artistic Creation Figure</em></strong> : Artistic creations, including characters from animations and cartoons, are
                    often protected by law.<br>
    
            
                
                    <strong><em>Licensed Illustration</em></strong> : We have obtained authorization to use a portion of Yi Qu’s artworks in this
                    study.
    
            
                </p>
    
                <!-- <video autoplay="true" loop="" muted="" style="width: 80%; margin-left: 10%;">
                    <source src="videos/real-world-video.mp4" type="video/mp4">
                </video> -->
                


            <!-- Reconstruction Results -->
            <hr style="margin-top: 0em" />
            <h2>Experiments</h2>
            <p>
                We conducted testing for unlearning utilizing <strong>Gradient Ascent-based Approach</strong> and <strong>Weight Pruning-based Approach</strong>, while assessing the efficacy of our metric.
            </p>
            <span class="image center" style="
            max-width: 100%;
            max-height: 50%;
            margin-top: 0.5em;
            margin-bottom: 0;
            border: 0px solid #415161;
            ">
                    <img src="images/illustration_1.jpg" width="100%" alt="" />
                </span>
            <p>
                Experimental Results of Model Unlearning. Each row represents an illustration sample,
                where from left to right, it denotes the original image, the image generated after fine-tuning with
                Stable Diffusion, the image generated with SD-v2.1, and the image generated after unlearning using
                gradient ascent and pruning methods.
            </p>
<!-- 
            <p>

                Given a raw RGB point
                cloud, SfA infers and executes informative actions to construct an articulated CAD model, which consists
                of multiple 3D part meshes and the revolute, and prismatic joints connecting them.
                The SfA framework consists of four components: an
                interaction policy, which chooses informative actions that move parts, a part aggregation module,
                which tracks part discoveries over a sequence of interactions, a joint estimation module,
                which predicts joint parameters and kinematic constraints of the articulation, and finally, the pipeline
                for
                the construction of the articulated CAD model.
            </p>


            <center>
                <img style="width:100%" src="images/sfa_approach.png" alt="" />
            </center> -->

            <!-- Reconstruction Results -->
            <hr style="margin-top: 0em" />
            <h2>SfA Interaction and Perception Pipeline</h2>
            <p>
                The video below demonstrates
                SfA's ability to infer informative multi-step interactions given an articulated object,
                and generate the articulated 3D CAD model of the object overtime. By first inferring the push
                and hold actions, and then inferring the parts reconstruction and joint parameters, SfA is
                able to construct the full 3D articulated CAD model in 3 steps.
                <!-- Given a raw RGB point
        cloud, SfA infers and executes informative actions to construct an articulated CAD model, which consists
        of multiple 3D part meshes and the revolute, and prismatic joints connecting them.  -->

                <!-- The SfA framework consists of four components: an
            interaction policy, which chooses informative actions that move parts, a part aggregation module, 
            which tracks part discoveries over a sequence of interactions, a joint estimation module,
            which predicts joint parameters and kinematic constraints of the articulation, and finally, the pipeline for
            the construction of the articulated CAD model.  -->
            </p>

            <video autoplay="" loop="" muted="" style="width: 80%; margin-left: 10%;">
                <source src="videos/sim_sequence.mp4" type="video/mp4">
            </video>

            <!-- <hr /> -->


            <!-- Reconstruction Results -->
            <hr style="margin-top: 0em" />
            <h2>3D Articulated CAD Models Results</h2>

            <center>
                <img style="width:100%" src="images/sfa-fig-sequence-results-sim-cvpr-version.png" alt="" />
            </center>

            <p>
                On the top, we show the step-by-step results from the SfA pipeline. The inferred actions prioritize new
                parts discovery and expose articulations. Below we show
                SfA 3D reconstruction result on unseen objects with different shapes, sizes, and kinematic structures.
                The pipeline can handle both large (furniture) and small (scissor) objects, as well as prismatic
                (drawers, pots)
                and revolute (microwave, chair) joints. Our method outperforms the Ditto
                on both parts reconstruction and joints estimation (revolute: red, prismatic: blue).
            </p>

            <p>
                We show the complete and animated 3D articulated CAD model from the SfA pipeline.
            </p>

            <video autoplay="" loop="" muted="" style="width: 90%; margin-left: 10%;">
                <source src="videos/results-1.mp4" type="video/mp4">
            </video>

            <video autoplay="" loop="" muted="" style="width: 90%; margin-left: 10%;">
                <source src="videos/results-2.mp4" type="video/mp4">
            </video>

            
            <hr style="margin-top: 0em" />
            <h3>Paper</h3>

            <!-- <hr/> -->
            <p style="margin-bottom: 1em">
                Latest version:
                <a href="http://arxiv.org/abs/2207.08997">arXiv: [cs.CV]</a>
                or <a href="paper.pdf">here</a>
            </p>

            <div class="12u$">
                <a href="https://arxiv.org/abs/2207.08997"><span class="image fit"
                        style="border: 1px solid; border-color: #888888"><img src="images/paper-thumbnail.jpg"
                            alt="" /></span></a>
            </div>

            <p style="margin-bottom: 1em">
                Code and instructions will be avaliable.
                <!-- <a href="https://github.com/columbia-ai-robotics/SfA">here</a>. -->
            </p>

            <hr style="margin-top: 0em" />

            <h2>Team</h2>

            <section>
                <div class="box alt" style="margin-bottom: 1em">
                    <div class="row 50% uniform" style="width: 90%">
                        <div class="2u" style="
                    font-size: 0.7em;
                    line-height: 1.5em;
                    text-align: center;
                    ">
                            <a href="https://www.neilnie.com/">
                                <span class="image fit" style="margin-bottom: 0.5em">
                                    <img src="images/neil-thumbnail.jpg" alt="" style="border-radius: 50%" />
                                </span>Neil Nie<sup>1</sup>
                            </a>
                        </div>

                        <div class="2u" style="
            font-size: 0.7em;
            line-height: 1.5em;
            text-align: center;
            ">
                            <a href="https://sagadre.github.io/">
                                <span class="image fit" style="margin-bottom: 0.5em">
                                    <img src="images/samir-thumbnail.jpg" alt="" style="border-radius: 50%" />
                                </span>Samir Yitzhak Gadre <sup>1</sup>
                            </a>
                        </div>

                        <div class="2u" style="
    font-size: 0.7em;
    line-height: 1.5em;
    text-align: center;
    ">
                            <a href="https://sites.google.com/view/ehsanik-personal-website">
                                <span class="image fit" style="margin-bottom: 0.5em">
                                    <img src="images/kiana-thumbnail.jpg" alt="" style="border-radius: 50%" />
                                </span>Kiana Ehsani<sup>2</sup>
                            </a>
                        </div>
                        <div class="2u" style="
font-size: 0.7em;
line-height: 1.5em;
text-align: center;
">
                            <a href="https://www.cs.columbia.edu/~shurans/"><span class="image fit"
                                    style="margin-bottom: 0.5em"><img src="images/shuran-thumbnail.jpg" alt=""
                                        style="border-radius: 50%" /></span>Shuran Song <sup>1</sup></a>
                        </div>
                    </div>
                </div>
            </section>
            <sup>1</sup> Columbia
            University&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<sup>2</sup>
            Allen Institute for AI

            <hr style="margin-top: 1em" />
            <h3>Acknowledgements</h3>
            <p>
                This work was supported in part by National Science Foundation under 2143601, 2037101, and 2132519.
                Thank you Cheng Chi, Huy Ha, Zhenjia Xu, Zeyi Liu, and other colleagues of the CAIR lab for your
                valuable feedback and support. Thanks to Cheng Chi and Zhenjia Xu for your help with
                the UR5 robot experiments.
                We would like to thank Google for the UR5 robot hardware.
            </p>

            <hr style="margin-top: 1em">
            <h3>Contact</h3>
            <p>
                If you have any questions, please feel free to contact
                <a href="mailto:neil.nie@columbia.edu">Neil</a>
            </p>
            <hr />

            <div class="row" style="margin-top: 1em">
                <div class="12u$ 12u$(xsmall)">
                    <h3>Bibtex</h3>
                    <pre><code>
            @article{nie2022sfa, 
            title={Structure from Action: Learning Interactions for Articulated Object 3D Structure Discovery}, 
            author={Nie, Neil and Gadre, Samir Yitzhak and Ehsani, Kiana and Song, Shuran},
            journal={arxiv},
            year={2022} }
                        </code>
                    </pre>
                </div>
            </div>

        </section>
    </div>

    <!-- Footer -->
    <footer id="footer">
        <div class="inner">
            <ul class="copyright">
                <li>
                    Meet
                    <a href="https://en.wikipedia.org/wiki/Danbo_(character)">Danbo</a>
                    the cardboard robot.
                </li>
            </ul>
        </div>
    </footer>

    <!-- Scripts -->
    <script src="assets/js/jquery.min.js"></script>
    <script src="assets/js/jquery.poptrox.min.js"></script>
    <script src="assets/js/skel.min.js"></script>
    <script src="assets/js/util.js"></script>
    <!--[if lte IE 8
        ]><script src="assets/js/ie/respond.min.js"></script
            ><![endif]-->
    <script src="assets/js/main.js"></script>
</body>


<!-- Mirrored from sfa.cs.columbia.edu/ by HTTrack Website Copier/3.x [XR&CO'2014], Fri, 21 Jul 2023 10:57:11 GMT -->
</html>