Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
e4fd630
chore: PoC + ipynb
rishisurana-labelbox Sep 3, 2025
dbcc7bf
chore: use ms instead of s in sdk interface
rishisurana-labelbox Sep 8, 2025
dbb592f
:art: Cleaned
github-actions[bot] Sep 8, 2025
ff298d4
:memo: README updated
github-actions[bot] Sep 8, 2025
16896fd
chore: it works for temporal text/radio/checklist classifications
rishisurana-labelbox Sep 11, 2025
7a666cc
chore: clean up and organize code
rishisurana-labelbox Sep 11, 2025
ac58ad0
chore: update tests fail and documentation update
rishisurana-labelbox Sep 11, 2025
67dd14a
:art: Cleaned
github-actions[bot] Sep 11, 2025
a1600e5
:memo: README updated
github-actions[bot] Sep 11, 2025
b4d2f42
chore: improve imports
rishisurana-labelbox Sep 11, 2025
fadb14e
chore: restore py version
rishisurana-labelbox Sep 11, 2025
1e12596
chore: restore py version
rishisurana-labelbox Sep 11, 2025
c2a7b4c
chore: cleanup
rishisurana-labelbox Sep 12, 2025
26a35fd
chore: lint
rishisurana-labelbox Sep 12, 2025
b16f2ea
fix: failing build issue due to lint
rishisurana-labelbox Sep 12, 2025
943cb73
chore: simplify
rishisurana-labelbox Sep 19, 2025
a838513
chore: update examples - all tests passing
rishisurana-labelbox Sep 19, 2025
0ca9cd6
chore: use start frame instead of frame
rishisurana-labelbox Sep 22, 2025
7861537
chore: remove audio object annotation
rishisurana-labelbox Sep 22, 2025
6c3c50a
chore: change class shape for text and radio/checklist
rishisurana-labelbox Sep 22, 2025
68773cf
chore: stan comments
rishisurana-labelbox Sep 25, 2025
58b30f7
chore: top level + nested working
rishisurana-labelbox Sep 26, 2025
0a63def
feat: nested class for temporal annotations support
rishisurana-labelbox Sep 29, 2025
538ba66
chore: revert old change
rishisurana-labelbox Sep 29, 2025
9675c73
chore: update tests
rishisurana-labelbox Sep 29, 2025
327800b
chore: clean up and track test files
rishisurana-labelbox Sep 29, 2025
1174ad8
chore: update audio.ipynb to reflect breadth of use cases
rishisurana-labelbox Sep 29, 2025
2361ca3
chore: cursor reported bug
rishisurana-labelbox Sep 29, 2025
59f0cd8
chore: extract generic temporal nested logic
rishisurana-labelbox Sep 29, 2025
b186359
chore: update temporal logic to be 1:1 with v3 script
rishisurana-labelbox Sep 30, 2025
e63b306
chore: simplifiy drastically
rishisurana-labelbox Sep 30, 2025
6b54e26
chore: works perfectly
rishisurana-labelbox Sep 30, 2025
ccad765
:art: Cleaned
github-actions[bot] Sep 30, 2025
735bb09
:memo: README updated
github-actions[bot] Sep 30, 2025
db3fb5e
chore: update audio.ipynb
rishisurana-labelbox Sep 30, 2025
b0d5ee4
:art: Cleaned
github-actions[bot] Sep 30, 2025
1266338
chore: drastically simplify
rishisurana-labelbox Oct 1, 2025
66e4c44
chore: lint
rishisurana-labelbox Oct 1, 2025
471c618
chore: new new interface
rishisurana-labelbox Oct 2, 2025
478fb23
chore: final nail; interface is simple and works with frame arg
rishisurana-labelbox Oct 3, 2025
82e90e1
chore: lint
rishisurana-labelbox Oct 3, 2025
fb8df4a
:art: Cleaned
github-actions[bot] Oct 3, 2025
f202586
chore: revert init py file
rishisurana-labelbox Oct 3, 2025
1e424ef
chore: new new new interface for tempral classes
rishisurana-labelbox Oct 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 84 additions & 84 deletions examples/README.md

Large diffs are not rendered by default.

34 changes: 33 additions & 1 deletion examples/annotation_import/audio.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@
},
{
"metadata": {},
"source": "ontology_builder = lb.OntologyBuilder(classifications=[\n lb.Classification(class_type=lb.Classification.Type.TEXT,\n name=\"text_audio\"),\n lb.Classification(\n class_type=lb.Classification.Type.CHECKLIST,\n name=\"checklist_audio\",\n options=[\n lb.Option(value=\"first_checklist_answer\"),\n lb.Option(value=\"second_checklist_answer\"),\n ],\n ),\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"radio_audio\",\n options=[\n lb.Option(value=\"first_radio_answer\"),\n lb.Option(value=\"second_radio_answer\"),\n ],\n ),\n])\n\nontology = client.create_ontology(\n \"Ontology Audio Annotations\",\n ontology_builder.asdict(),\n media_type=lb.MediaType.Audio,\n)",
"source": "ontology_builder = lb.OntologyBuilder(classifications=[\n lb.Classification(class_type=lb.Classification.Type.TEXT,\n name=\"text_audio\"),\n lb.Classification(\n class_type=lb.Classification.Type.CHECKLIST,\n name=\"checklist_audio\",\n options=[\n lb.Option(value=\"first_checklist_answer\"),\n lb.Option(value=\"second_checklist_answer\"),\n ],\n ),\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"radio_audio\",\n options=[\n lb.Option(value=\"first_radio_answer\"),\n lb.Option(value=\"second_radio_answer\"),\n ],\n ),\n # Temporal classification for token-level annotations\n lb.Classification(\n class_type=lb.Classification.Type.TEXT,\n name=\"User Speaker\",\n scope=lb.Classification.Scope.INDEX, # INDEX scope for temporal\n ),\n])\n\nontology = client.create_ontology(\n \"Ontology Audio Annotations\",\n ontology_builder.asdict(),\n media_type=lb.MediaType.Audio,\n)",
"cell_type": "code",
"outputs": [],
"execution_count": null
Expand Down Expand Up @@ -252,6 +252,31 @@
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": [
"## Temporal Audio Annotations\n",
"\n",
"You can create temporal annotations for individual tokens (words) with precise timing.\n",
"\n",
"Additionally, you can create **nested temporal annotations** with hierarchical classifications at different frame ranges.\n"
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": "# Define tokens with precise timing (from demo script)\ntokens_data = [\n (\"Hello\", 586, 770), # Hello: frames 586-770\n (\"AI\", 771, 955), # AI: frames 771-955\n (\"how\", 956, 1140), # how: frames 956-1140\n (\"are\", 1141, 1325), # are: frames 1141-1325\n (\"you\", 1326, 1510), # you: frames 1326-1510\n (\"doing\", 1511, 1695), # doing: frames 1511-1695\n (\"today\", 1696, 1880), # today: frames 1696-1880\n]\n\n# Create temporal annotations for each token\ntemporal_annotations = []\nfor token, start_frame, end_frame in tokens_data:\n token_annotation = lb_types.AudioClassificationAnnotation(\n frame=start_frame,\n end_frame=end_frame,\n name=\"User Speaker\",\n value=lb_types.Text(answer=token),\n )\n temporal_annotations.append(token_annotation)\n\nprint(f\"Created {len(temporal_annotations)} temporal token annotations\")",
"cell_type": "code",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"source": "# Create label with both regular and temporal annotations\nlabel_with_temporal = []\nlabel_with_temporal.append(\n lb_types.Label(\n data={\"global_key\": global_key},\n annotations=[text_annotation, checklist_annotation, radio_annotation] +\n temporal_annotations,\n ))\n\nprint(\n f\"Created label with {len(label_with_temporal[0].annotations)} total annotations\"\n)\nprint(\" - Regular annotations: 3\")\nprint(f\" - Temporal annotations: {len(temporal_annotations)}\")",
"cell_type": "code",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"source": [
Expand All @@ -260,6 +285,13 @@
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": "# Upload temporal annotations via MAL\ntemporal_upload_job = lb.MALPredictionImport.create_from_objects(\n client=client,\n project_id=project.uid,\n name=f\"temporal_mal_job-{str(uuid.uuid4())}\",\n predictions=label_with_temporal,\n)\n\ntemporal_upload_job.wait_until_done()\nprint(\"Temporal upload completed!\")\nprint(\"Errors:\", temporal_upload_job.errors)\nprint(\"Status:\", temporal_upload_job.statuses)",
"cell_type": "code",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"source": "# Upload our label using Model-Assisted Labeling\nupload_job = lb.MALPredictionImport.create_from_objects(\n client=client,\n project_id=project.uid,\n name=f\"mal_job-{str(uuid.uuid4())}\",\n predictions=label,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@
from .video import MaskInstance
from .video import VideoMaskAnnotation

from .temporal import TemporalClassificationText
from .temporal import TemporalClassificationQuestion
from .temporal import TemporalClassificationAnswer

from .ner import ConversationEntity
from .ner import DocumentEntity
from .ner import DocumentTextSelection
Expand All @@ -28,6 +32,7 @@
from .classification import ClassificationAnswer
from .classification import Radio
from .classification import Text
from .classification import FrameLocation

from .data import GenericDataRowData
from .data import MaskData
Expand Down
Original file line number Diff line number Diff line change
@@ -1 +1 @@
from .classification import Checklist, ClassificationAnswer, Radio, Text
from .classification import Checklist, ClassificationAnswer, Radio, Text, FrameLocation
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@
from ..feature import FeatureSchema


class FrameLocation(BaseModel):
"""Represents a temporal frame range with start and end times (in milliseconds)."""
start: int
end: int


class ClassificationAnswer(FeatureSchema, ConfidenceMixin, CustomMetricsMixin):
"""
- Represents a classification option.
Expand All @@ -17,11 +23,16 @@ class ClassificationAnswer(FeatureSchema, ConfidenceMixin, CustomMetricsMixin):
Each answer can have a keyframe independent of the others.
So unlike object annotations, classification annotations
track keyframes at a classification answer level.

- For temporal classifications (audio/video), optional frames can specify
one or more time ranges for this answer. Must be within root annotation's frame ranges.
Defaults to root frame ranges if not specified.
"""

extra: Dict[str, Any] = {}
keyframe: Optional[bool] = None
classifications: Optional[List["ClassificationAnnotation"]] = None
frames: Optional[List[FrameLocation]] = None


class Radio(ConfidenceMixin, CustomMetricsMixin, BaseModel):
Expand Down Expand Up @@ -69,8 +80,11 @@ class ClassificationAnnotation(
classifications (Optional[List[ClassificationAnnotation]]): Optional sub classification of the annotation
feature_schema_id (Optional[Cuid])
value (Union[Text, Checklist, Radio])
frames (Optional[List[FrameLocation]]): Frame ranges for temporal classifications (audio/video). Must be within root annotation's frame ranges. Defaults to root frames if not specified.
extra (Dict[str, Any])
"""

value: Union[Text, Checklist, Radio]
message_id: Optional[str] = None
frames: Optional[List[FrameLocation]] = None

38 changes: 36 additions & 2 deletions libs/labelbox/src/labelbox/data/annotation_types/label.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@
from .metrics import ScalarMetric, ConfusionMatrixMetric
from .video import VideoClassificationAnnotation
from .video import VideoObjectAnnotation, VideoMaskAnnotation
from .temporal import (
TemporalClassificationText,
TemporalClassificationQuestion,
)
from .mmc import MessageEvaluationTaskAnnotation
from pydantic import BaseModel, field_validator

Expand Down Expand Up @@ -44,6 +48,8 @@ class Label(BaseModel):
ClassificationAnnotation,
ObjectAnnotation,
VideoMaskAnnotation,
TemporalClassificationText,
TemporalClassificationQuestion,
ScalarMetric,
ConfusionMatrixMetric,
RelationshipAnnotation,
Expand Down Expand Up @@ -75,15 +81,43 @@ def _get_annotations_by_type(self, annotation_type):

def frame_annotations(
self,
) -> Dict[str, Union[VideoObjectAnnotation, VideoClassificationAnnotation]]:
) -> Dict[
int,
Union[
VideoObjectAnnotation,
VideoClassificationAnnotation,
TemporalClassificationText,
TemporalClassificationQuestion,
],
]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Missing List Wrapper in Type Annotation

The Label.frame_annotations method's return type annotation is missing a List wrapper. Because the method uses defaultdict(list) and appends annotations, it actually returns Dict[int, List[Union[...]]], not Dict[int, Union[...]] as currently typed. The docstring correctly describes the list of annotations.

Fix in CursorΒ Fix in Web

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Incorrect Return Type Annotation in Method

The frame_annotations method's return type annotation is off. It shows Dict[int, Union[...]], but the method actually returns Dict[Union[int, None], List[Union[...]]]. The values are lists, and keys can be None if start_frame is None.

Fix in CursorΒ Fix in Web

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Mismatch Between Type Annotation and Implementation

The frame_annotations method's return type annotation is Dict[int, Union[...]], but the implementation returns Dict[int, List[Union[...]]]. The dictionary values are lists of annotations, which the docstring correctly indicates as Dict[int, List].

Fix in CursorΒ Fix in Web

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Incorrect Return Type Annotation for frame_annotations

The frame_annotations method's return type annotation is incorrect. It's currently Dict[int, Union[...]], but since the implementation uses defaultdict(list) and appends annotations, the dictionary values are lists. The annotation should reflect Dict[int, List[Union[VideoObjectAnnotation, VideoClassificationAnnotation, AudioClassificationAnnotation]]].

Fix in CursorΒ Fix in Web

"""Get temporal annotations organized by frame

Returns:
Dict[int, List]: Dictionary mapping frame (milliseconds) to list of temporal annotations

Example:
>>> label.frame_annotations()
{2500: [VideoClassificationAnnotation(...), TemporalClassificationText(...)]}

Note:
For TemporalClassificationText/Question, returns dictionary mapping to start of first frame range.
These annotations may have multiple discontinuous frame ranges.
"""
frame_dict = defaultdict(list)
for annotation in self.annotations:
if isinstance(
annotation,
(VideoObjectAnnotation, VideoClassificationAnnotation),
):
frame_dict[annotation.frame].append(annotation)
return frame_dict
elif isinstance(annotation, (TemporalClassificationText, TemporalClassificationQuestion)):
# For temporal annotations with multiple values/answers, use first frame
if isinstance(annotation, TemporalClassificationText) and annotation.value:
frame_dict[annotation.value[0][0]].append(annotation) # value[0][0] is start_frame
elif isinstance(annotation, TemporalClassificationQuestion) and annotation.value:
if annotation.value[0].frames:
frame_dict[annotation.value[0].frames[0][0]].append(annotation) # frames[0][0] is start_frame
return dict(frame_dict)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Audio Annotations Use Invalid Keys

The frame_annotations method uses annotation.start_frame as a dictionary key for AudioClassificationAnnotation. Since start_frame is Optional[int] and can be None, this can result in None keys in the returned dictionary. This conflicts with the Dict[int, ...] return type annotation and may cause unexpected behavior for consumers expecting integer frame keys.

Fix in CursorΒ Fix in Web


def add_url_to_masks(self, signer) -> "Label":
"""
Expand Down
194 changes: 194 additions & 0 deletions libs/labelbox/src/labelbox/data/annotation_types/temporal.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
"""
Temporal classification annotations for audio, video, and other time-based media.

These classes provide a unified, recursive structure for temporal annotations with
frame-level precision. All temporal classifications support nested hierarchies.
"""

from typing import List, Optional, Tuple, Union
from pydantic import Field

from labelbox.data.annotation_types.annotation import ClassificationAnnotation
from labelbox.data.annotation_types.classification.classification import (
ClassificationAnswer,
FrameLocation,
)


class TemporalClassificationAnswer(ClassificationAnswer):
"""
Temporal answer for Radio/Checklist questions with frame ranges.

Represents a single answer option that can exist at multiple discontinuous
time ranges and contain nested classifications.

Args:
name (str): Name of the answer option
frames (List[Tuple[int, int]]): List of (start_frame, end_frame) ranges in milliseconds
classifications (Optional[List[Union[TemporalClassificationText, TemporalClassificationQuestion]]]):
Nested classifications within this answer
feature_schema_id (Optional[str]): Feature schema identifier
extra (dict): Additional metadata

Example:
>>> # Radio answer with nested classifications
>>> answer = TemporalClassificationAnswer(
>>> name="user",
>>> frames=[(200, 1600)],
>>> classifications=[
>>> TemporalClassificationQuestion(
>>> name="tone",
>>> answers=[
>>> TemporalClassificationAnswer(
>>> name="professional",
>>> frames=[(1000, 1600)]
>>> )
>>> ]
>>> )
>>> ]
>>> )
"""

frames: List[Tuple[int, int]] = Field(
default_factory=list,
description="List of (start_frame, end_frame) tuples in milliseconds",
)
classifications: Optional[
List[Union["TemporalClassificationText", "TemporalClassificationQuestion"]]
] = None


class TemporalClassificationText(ClassificationAnnotation):
"""
Temporal text classification with multiple text values at different frame ranges.

Allows multiple text annotations at different time segments, each with precise
frame ranges. Supports recursive nesting of text and question classifications.

Args:
name (str): Name of the text classification
values (List[Tuple[int, int, str]]): List of (start_frame, end_frame, text_value) tuples
classifications (Optional[List[Union[TemporalClassificationText, TemporalClassificationQuestion]]]):
Nested classifications
feature_schema_id (Optional[str]): Feature schema identifier
extra (dict): Additional metadata

Example:
>>> # Simple text with multiple temporal values
>>> transcription = TemporalClassificationText(
>>> name="transcription",
>>> values=[
>>> (1600, 2000, "Hello, how can I help you?"),
>>> (2500, 3000, "Thank you for calling!"),
>>> ]
>>> )
>>>
>>> # Text with nested classifications
>>> transcription_with_notes = TemporalClassificationText(
>>> name="transcription",
>>> values=[
>>> (1600, 2000, "Hello, how can I help you?"),
>>> ],
>>> classifications=[
>>> TemporalClassificationText(
>>> name="speaker_notes",
>>> values=[
>>> (1600, 2000, "Polite greeting"),
>>> ]
>>> )
>>> ]
>>> )
"""

# Override parent's value field
value: List[Tuple[int, int, str]] = Field(
default_factory=list,
description="List of (start_frame, end_frame, text_value) tuples",
)
classifications: Optional[
List[Union["TemporalClassificationText", "TemporalClassificationQuestion"]]
] = None


class TemporalClassificationQuestion(ClassificationAnnotation):
"""
Temporal Radio/Checklist question with multiple answer options.

Represents a question with one or more answer options, each having their own
frame ranges. Radio questions have a single answer, Checklist can have multiple.

Args:
name (str): Name of the question/classification
answers (List[TemporalClassificationAnswer]): List of answer options with frame ranges
feature_schema_id (Optional[str]): Feature schema identifier
extra (dict): Additional metadata

Note:
- Radio: Single answer in the answers list
- Checklist: Multiple answers in the answers list
The serializer automatically handles the distinction based on the number of answers.

Example:
>>> # Radio question (single answer)
>>> speaker = TemporalClassificationQuestion(
>>> name="speaker",
>>> answers=[
>>> TemporalClassificationAnswer(
>>> name="user",
>>> frames=[(200, 1600)]
>>> )
>>> ]
>>> )
>>>
>>> # Checklist question (multiple answers)
>>> audio_quality = TemporalClassificationQuestion(
>>> name="audio_quality",
>>> answers=[
>>> TemporalClassificationAnswer(
>>> name="background_noise",
>>> frames=[(0, 1500), (2000, 3000)]
>>> ),
>>> TemporalClassificationAnswer(
>>> name="echo",
>>> frames=[(2200, 2900)]
>>> )
>>> ]
>>> )
>>>
>>> # Nested structure: Radio > Radio > Radio
>>> speaker_with_tone = TemporalClassificationQuestion(
>>> name="speaker",
>>> answers=[
>>> TemporalClassificationAnswer(
>>> name="user",
>>> frames=[(200, 1600)],
>>> classifications=[
>>> TemporalClassificationQuestion(
>>> name="tone",
>>> answers=[
>>> TemporalClassificationAnswer(
>>> name="professional",
>>> frames=[(1000, 1600)]
>>> )
>>> ]
>>> )
>>> ]
>>> )
>>> ]
>>> )
"""

# Override parent's value field
value: List[TemporalClassificationAnswer] = Field(
default_factory=list,
description="List of temporal answer options",
)
classifications: Optional[
List[Union["TemporalClassificationText", "TemporalClassificationQuestion"]]
] = None


# Update forward references for recursive types
TemporalClassificationAnswer.model_rebuild()
TemporalClassificationText.model_rebuild()
TemporalClassificationQuestion.model_rebuild()
Loading
Loading