Skip to content

GSoC 2024

Nikita Manovich edited this page Feb 12, 2024 · 25 revisions

CVAT Google Summer of Code 2024

GSoC 2024 Homepage

CVAT accepted projects


Date Description Comment
February 6, 2024 Mentoring organization application deadline 👍

Resources

CVAT project ideas list

Mailing list to discuss: cvat-gsoc-2024 mailing list

Index to Ideas Below

  1. Load and visualize 16-bit medical images
  2. Keyboard shortcuts customization
  3. Quality control: consensus
  4. Quality control: honeypot
  5. Internationalization and localization

Idea Template

All work is in Python and TypeScript unless otherwise noted.


Ideas

  1. IDEA: Load and visualize 16-bit medical images

    • Description: All digital projection X-ray in DICOM is more than 8 bits and hence encoded in two bytes, even if not all 16 bits are used. Right now CVAT converts 16-bit images into 8-bit. For medical images it leads to losing important information and it isn't possible to annotate such data efficiently. A doctor should adjust the contract of some regions manually to annotate such visual data.
    • Expected Outcomes:
      • Upload digital projection X-ray in DICOM and convert it to 16-bit PNG.
      • Visualize 16-bit PNG image in the browser using WebGL.
      • Implement brightness, inverting, contract, saturation using WebGL.
      • Import/Export datasets in CVAT format.
      • Add functional tests and documentation
    • Resources:
    • Skills Required: Python, TypeScript, WebGL
    • Possible Mentors: Boris Sekachev
    • Difficulty: Hard
    • Duration: 350 hours
  2. IDEA: Keyboard shortcuts customization

    • Description: In many case to have good data annotation speed users need to use mouse, keyboard, and other input devices effectively. One way is to customize keyboard shortcuts and adapt them for a specific use case. For example, if you have several labels in your task, it can be important to assign a shortcut for each label and use them to switch quickly between them and annotate faster. Other users want to lock/unlock an object quickly.
    • Expected Outcomes:
      • It should be possible to configure shortcuts in settings and save them per user.
      • Add functional tests and documentation
    • Resources:
    • Skills Required: TypeScript, React
    • Possible Mentors: Maria Khrustaleva, Kirill Lakhov
    • Difficulty: Medium
    • Duration: 175 hours
  3. IDEA: Quality control: consensus

    • Description: If you use crowd to annotate an image, the easiest way to get high quality annotations for a task is to annotate the same image multiple times. After that you can compare labels from multiple annotators to produce high-quality results. Let's say you try to estimate age of people. The task is very subjective. An averaged answer from multiple annotators can help you predict more precise age for a person.
    • Expected Outcomes:
      • It should be possible to create multiple jobs for the same segment of images (https://github.com/opencv/cvat/issues/125)
      • Support a number of built-in algorithms to merge annotations for a segment: voting, averaging, raw (put all annotations as is)
      • Add functional tests and documentation
    • Resources:
    • Skills Required: Python, Django
    • Possible Mentors: Maxim Zhiltsov, Maria Khrustaleva
    • Difficulty: Medium
    • Duration: 350 hours
  4. IDEA: Internationalization and localization

    • Description: Typical users of CVAT are data annotators from different countries without good knowledge of English. It is very difficult for them to work with a tool which cannot show them messages, hints on their native language. The goal of internationalization and localization is to allow a single web application to offer its content in languages and formats tailored to the audience.
    • Expected Outcomes:
      • CVAT supports one more language. It should be easy to add a new language for a non-technical person.
      • It should be possible to choose a language in UI (e.g., en/fr).
      • Add functional tests and documentation
    • Resources:
    • Skills Required: Python, TypeScript
    • Possible Mentors: Andrey Zhavoronkov, Kirill Lakhov
    • Difficulty: Hard
    • Duration: 350 hours
  5. IDEA: Enhanced multi-object tracking

    • Description: Computer Vision Annotation Tool supports tracks (aka objects that detect something on a range of frames, e.g. a person, walking on a videofile). It would be nice to develop a feature to track a segmentation mask automatically with using modern deep learning approaches. Now the tool only supports single-object trackers. It consumes huge time when users run tracker for many objects. Moreover it supports only bounding boxes and can't be used for more complex objects (e.g. polygons or binary masks).
    • Expected Outcomes:
      • User uploads a video to CVAT, initiates automatic tracking process through the user interface (by drawing a bounding box, or polygon around the object, or pressing a dedicated button). Server side algorithm performs tracking on multiple frames and returns result to client. So, labeling speed is accelerated significantly.
    • Resources:
    • Skills Required: Python, Computer Vision, Neural Networks, TypeScript
    • Possible Mentors: Boris Sekachev, Nikita Manovich
    • Difficulty: Medium
    • Duration: 175 hours
  6. IDEA: Annotate everything automatically

    • Description: The feature suggests an idea to get instance segmentation for an image automatically for a wide range of classes. That may be achieved by using state-of-the art deep learning approaches (e.g. Grounding DINO and Segment Anything collaboration). These models may be integrated into CVAT to provide powerful feature for automatic annotation. It will allow data researchers to accelerate their annotation speed.
    • Expected Outcomes:
      • User uploads set of images to CVAT. For a dedicated image user may give text prompt to the model or just click a button in the user interface to get automatica predictions. A deep learning model is running on server on GPU.
    • Resources:
    • Skills Required: Python, Computer Vision, Neural Networks, TypeScript
    • Possible Mentors: Boris Sekachev, Nikita Manovich
    • Difficulty: Medium
    • Duration: 175 hours
  7. IDEA: API keys and token-based auth for SDK/CLI

    • Description: Currently, the only official way to authorize in SDK/CLI is by providing your username and password in the requests. This approach works, however it has security issues. The idea is to provide an option for a user to generate and manage API access keys. Such a key could be used as a replacement for the login/password pair.
    • Expected Outcomes:
      • Users can generate API access tokens in the account settings in UI
      • Users can revoke existing API access tokens in the account settings
      • Users can call API endpoints providing API access tokens
      • A token can be stored in the user profile files on their computer
      • A token can be used for auth in SDK/CLI
    • Skills Required: Python, Django, Typescript, React
    • Possible Mentors: Maxim Zhiltsov, Roman Donchenko, Andrey Zhavoronkov
    • Difficulty: Medium
    • Duration: 175 hours
  8. IDEA: Extended annotation quality reporting

    • Description: CVAT has basic annotation quality reporting, but there are many ways how it can be extended, both on the UI and server sides. A short list includes: quality reporting for projects, more available settings for the checks, task-specific metrics (e.g. pixel accuracy for segmentation), better visualization (display conf. matrix in UI), issue filters for simpler navigation between problems found and better problem display in the annotation review mode (more clear display).
    • Expected Outcomes:
      • Confusion matrix from a quality report can be shown in UI
      • Issue filters are available in the task review mode
      • Quality reporting for projects is available
      • Other possible improvements
    • Skills Required: Python, Django, Typescript, React
    • Possible Mentors: Maxim Zhiltsov, Boris Sekachev, Kirill Lakhov
    • Difficulty: Medium
    • Duration: 350 hours

Idea Template

1. #### _IDEA:_ <Descriptive Title>
   * ***Description:*** 3-7 sentences describing the task
   * ***Expected Outcomes:***
      * < Short bullet list describing what is to be accomplished >
      * <i.e. create a new module called "bla bla">
      * < Has method to accomplish X >
      * <...>
   * ***Resources:***
         * [For example a paper citation](https://arxiv.org/pdf/1802.08091.pdf)
         * [For example an existing feature request](https://github.com/opencv/cvat/pull/5608)
         * [Possibly an existing related module](https://github.com/opencv/cvat/tree/develop/cvat/apps/opencv) that includes OpenCV JavaScript library.
   * ***Skills Required:*** < for example mastery plus experience coding in Python, college course work in vision that covers AI topics, python. Best if you have also worked with deep neural networks. >
   * ***Possible Mentors:*** < your name goes here >
   * ***Difficulty:*** <Easy, Medium, Hard>
   * ***Duration:*** <90, 175 or 350 hours>

Potential mentors list

Nikita Manovich
Boris Sekachev
Maxim Zhiltsov
Roman Donchenko
Andrey Zhavoronkov
Maria Khrustaleva
Kirill Lakhov

Admins

Nikita Manovich
Boris Sekachev 
Clone this wiki locally