Skip to content

Latest commit

 

History

History
324 lines (230 loc) · 23.5 KB

storage-manage-find-blobs.md

File metadata and controls

324 lines (230 loc) · 23.5 KB
title description author ms.author ms.date ms.service ms.subservice ms.topic ms.reviewer ms.custom
Manage and find Azure Blob data with blob index tags
Learn how to use blob index tags to categorize, manage, and query for blob objects.
normesta
normesta
06/14/2021
storage
common
conceptual
klaasl
references_regions, devx-track-azurepowershell

Manage and find Azure Blob data with blob index tags

As datasets get larger, finding a specific object in a sea of data can be difficult. Blob index tags provide data management and discovery capabilities by using key-value index tag attributes. You can categorize and find objects within a single container or across all containers in your storage account. As data requirements change, objects can be dynamically categorized by updating their index tags. Objects can remain in-place with their current container organization.

Blob index tags let you:

  • Dynamically categorize your blobs using key-value index tags
  • Quickly find specific tagged blobs across an entire storage account
  • Specify conditional behaviors for blob APIs based on the evaluation of index tags
  • Use index tags for advanced controls on features like blob lifecycle management

Consider a scenario where you have millions of blobs in your storage account, accessed by many different applications. You want to find all related data from a single project. You aren't sure what's in scope as the data can be spread across multiple containers with different naming conventions. However, your applications upload all data with tags based on their project. Instead of searching through millions of blobs and comparing names and properties, you can use Project = Contoso as your discovery criteria. Blob index will filter all containers across your entire storage account to quickly find and return just the set of 50 blobs from Project = Contoso.

To get started with examples on how to use blob index, see Use blob index tags to manage and find data.

Blob index tags and data management

Container and blob name prefixes are one-dimensional categorizations. Blob index tags allow for multi-dimensional categorization for blob data types (Block, Append, or Page). Multi-dimensional categorization is natively indexed by Azure Blob Storage so you can quickly find your data.

Consider the following five blobs in your storage account:

  • container1/transaction.csv
  • container2/campaign.docx
  • photos/bannerphoto.png
  • archives/completed/2019review.pdf
  • logs/2020/01/01/logfile.txt

These blobs are separated using a prefix of container/virtual folder/blob name. You can set an index tag attribute of Project = Contoso on these five blobs to categorize them together while maintaining their current prefix organization. Adding index tags eliminates the need to move data by exposing the ability to filter and find data using the index.

Setting blob index tags

Blob index tags are key-value attributes that can be applied to new or existing objects within your storage account. You can specify index tags during the upload process using Put Blob, Put Block List, or Copy Blob operations and the optional x-ms-tags header. If you already have blobs in your storage account, call Set Blob Tags passing a formatted XML document with the index tags in the body of the request.

Important

Setting blob index tags can be performed by the Storage Blob Data Owner and by anyone with a Shared Access Signature that has permission to access the blob's tags (the t SAS permission).

In addition, RBAC users with the Microsoft.Storage/storageAccounts/blobServices/containers/blobs/tags/write permission can perform this operation.

You can apply a single tag on your blob to describe when your data was finished processing.

"processedDate" = '2020-01-01'

You can apply multiple tags on your blob to be more descriptive of the data.

"Project" = 'Contoso'
"Classified" = 'True'
"Status" = 'Unprocessed'
"Priority" = '01'

To modify the existing index tag attributes, retrieve the existing tag attributes, modify the tag attributes, and replace with the Set Blob Tags operation. To remove all index tags from the blob, call the Set Blob Tags operation with no tag attributes specified. As blob index tags are a subresource to the blob data contents, Set Blob Tags doesn't modify any underlying content and doesn't change the blob's last-modified-time or eTag. You can create or modify index tags for all current base blobs and previous versions. However, tags on snapshots or soft deleted blobs cannot be modified.

The following limits apply to blob index tags:

  • Each blob can have up to 10 blob index tags
  • Tag keys must be between one and 128 characters
  • Tag values must be between zero and 256 characters
  • Tag keys and values are case-sensitive
  • Tag keys and values only support string data types. Any numbers, dates, times, or special characters are saved as strings
  • Tag keys and values must adhere to the following naming rules:
    • Alphanumeric characters:
      • a through z (lowercase letters)
      • A through Z (uppercase letters)
      • 0 through 9 (numbers)
    • Valid special characters: space, plus, minus, period, colon, equals, underscore, forward slash ( +-.:=_/)

Getting and listing blob index tags

Blob index tags are stored as a subresource alongside the blob data and can be retrieved independently from the underlying blob data content. Blob index tags for a single blob can be retrieved with the Get Blob Tags operation. The List Blobs operation with the include:tags parameter will also return all blobs within a container along with their blob index tags.

Important

Getting and listing blob index tags can be performed by the Storage Blob Data Owner and by anyone with a Shared Access Signature that has permission to access the blob's tags (the t SAS permission).

In addition, RBAC users with the Microsoft.Storage/storageAccounts/blobServices/containers/blobs/tags/read permission can perform this operation.

For any blobs with at least one blob index tag, the x-ms-tag-count is returned in the List Blobs, Get Blob, and Get Blob Properties operations indicating the count of index tags on the blob.

Finding data using blob index tags

The indexing engine exposes your key-value attributes into a multi-dimensional index. After you set your index tags, they exist on the blob and can be retrieved immediately. It may take some time before the blob index updates. After the blob index updates, you can use the native query and discovery capabilities offered by Blob Storage.

The Find Blobs by Tags operation enables you to get a filtered set of blobs whose index tags match a given query expression. Find Blobs by Tags supports filtering across all containers within your storage account or you can scope the filtering to just a single container. Since all the index tag keys and values are strings, relational operators use a lexicographic sorting.

Important

Finding data using blob index tags can be performed by the Storage Blob Data Owner and by anyone with a Shared Access Signature that has permission to to find blobs by tags (the f SAS permission).

In addition, RBAC users with the Microsoft.Storage/storageAccounts/blobServices/containers/blobs/filter/action permission can perform this operation.

The following criteria applies to blob index filtering:

  • Tag keys should be enclosed in double quotes (")
  • Tag values and container names should be enclosed in single quotes (')
  • The @ character is only allowed for filtering on a specific container name (for example, @container = 'ContainerName')
  • Filters are applied with lexicographic sorting on strings
  • Same sided range operations on the same key are invalid (for example, "Rank" > '10' AND "Rank" >= '15')
  • When using REST to create a filter expression, characters should be URI encoded
  • Tag queries are optimized for equality match using a single tag (e.g. StoreID = "100"). Range queries using a single tag involving >, >=, <, <= are also efficient. Any query using AND with more than one tag will not be as efficient. For example, Cost > "01" AND Cost <= "100" is efficient. Cost > "01 AND StoreID = "2" is not as efficient.

The below table shows all the valid operators for Find Blobs by Tags:

Operator Description Example
= Equal "Status" = 'In Progress'
> Greater than "Date" > '2018-06-18'
>= Greater than or equal "Priority" >= '5'
< Less than "Age" < '32'
<= Less than or equal "Company" <= 'Contoso'
AND Logical and "Rank" >= '010' AND "Rank" < '100'
@container Scope to a specific container @container = 'videofiles' AND "status" = 'done'

Note

Be familiar with lexicographical ordering when setting and querying on tags.

  • Numbers are sorted before letters. Numbers are sorted based on the first digit.
  • Uppercase letters are sorted before lowercase letters.
  • Symbols aren't standard. Some symbols are sorted before numeric values. Other symbols are sorted before or after letters.

Conditional blob operations with blob index tags

In REST versions 2019-10-10 and higher, most blob service APIs now support a conditional header, x-ms-if-tags, such that the operation will only succeed if the specified blob index condition is met. If the condition isn't met, you'll get error 412: The condition specified using HTTP conditional header(s) is not met.

The x-ms-if-tags header may be combined with the other existing HTTP conditional headers (If-Match, If-None-Match, and so on). If multiple conditional headers are provided in a request, they all must evaluate true for the operation to succeed. All conditional headers are effectively combined with logical AND.

The below table shows the valid operators for conditional operations:

Operator Description Example
= Equal "Status" = 'In Progress'
<> Not equal "Status" <> 'Done'
> Greater than "Date" > '2018-06-18'
>= Greater than or equal "Priority" >= '5'
< Less than "Age" < '32'
<= Less than or equal "Company" <= 'Contoso'
AND Logical and "Rank" >= '010' AND "Rank" < '100'
OR Logical or "Status" = 'Done' OR "Priority" >= '05'

Note

There are two additional operators, not equal and logical or, that are allowed in the conditional x-ms-if-tags header for blob operations but do not exist in the Find Blobs by Tags operation.

Platform integrations with blob index tags

Blob index tags not only help you categorize, manage, and search on your blob data, but also provide integration with other Blob Storage features, such as lifecycle management.

Lifecycle management

Using the blobIndexMatch as a rule filter in lifecycle management, you can move data to cooler tiers or delete data based on the index tags applied to your blobs. You can be more granular in your rules and only move or delete blobs if they match the specified tags criteria.

You can set a blob index match as a standalone filter set in a lifecycle rule to apply actions on tagged data. Or you can combine both a prefix and a blob index to match more specific data sets. Specifying multiple filters in a lifecycle rule applies a logical AND operation. The action will only apply if all filter criteria match.

The following sample lifecycle management rule applies to block blobs in a container called videofiles. The rule tiers blobs to archive storage only if the data matches the blob index tag criteria of "Status" == 'Processed' AND "Source" == 'RAW'.

Blob index match rule example for Lifecycle management in Azure portal

{
    "rules": [
        {
            "enabled": true,
            "name": "ArchiveProcessedSourceVideos",
            "type": "Lifecycle",
            "definition": {
                "actions": {
                    "baseBlob": {
                        "tierToArchive": {
                            "daysAfterModificationGreaterThan": 0
                        }
                    }
                },
                "filters": {
                    "blobIndexMatch": [
                        {
                            "name": "Status",
                            "op": "==",
                            "value": "Processed"
                        },
                        {
                            "name": "Source",
                            "op": "==",
                            "value": "RAW"
                        }
                    ],
                    "blobTypes": [
                        "blockBlob"
                    ],
                    "prefixMatch": [
                        "videofiles/"
                    ]
                }
            }
        }
    ]
}

Permissions and authorization

You can authorize access to blob index tags using one of the following approaches:

Blob index tags are a subresource to the blob data. A user with permissions or a SAS token to read or write blobs may not have access to the blob index tags.

Role-based access control

Callers using an Azure AD identity may be granted the following permissions to operate on blob index tags.

Blob index tag operations Azure RBAC action
Set Blob Tags Microsoft.Storage/storageAccounts/blobServices/containers/blobs/tags/write
Get Blob Tags Microsoft.Storage/storageAccounts/blobServices/containers/blobs/tags/read
Find Blobs by Tags Microsoft.Storage/storageAccounts/blobServices/containers/blobs/filter/action

Additional permissions, separate from the underlying blob data, are required for index tag operations. The Storage Blob Data Owner role is granted permissions for all three blob index tag operations. The Storage Blob Data Reader is only granted permissions for Find Blobs by Tags and Get Blob Tags operations.

SAS permissions

Callers using a shared access signature (SAS) may be granted scoped permissions to operate on blob index tags.

Blob SAS

The following permissions may be granted in a blob SAS to allow access to blob index tags. Blob read and write permissions alone aren't enough to allow reading or writing its index tags.

Permission URI symbol Allowed operations
Index tags t Get and set index tags for a blob

Container SAS

The following permissions may be granted in a container SAS to allow filtering on blob tags. The Blob List permission isn't enough to allow filtering blobs by their index tags.

Permission URI symbol Allowed operations
Index tags f Find blobs with index tags

Choosing between metadata and blob index tags

Both blob index tags and metadata provide the ability to store arbitrary user-defined key-value properties alongside a blob resource. Both can be retrieved and set directly, without returning or altering the contents of the blob. It's possible to use both metadata and index tags.

Only index tags are automatically indexed and made searchable by the native Blob Storage service. Metadata can't be natively indexed or searched. You must use a separate service such as Azure Search. Blob index tags have additional permissions for reading, filtering, and writing that are separate from the underlying blob data. Metadata uses the same permissions as the blob and is returned as HTTP headers by the Get Blob and Get Blob Properties operations. Blob index tags are encrypted at rest using a Microsoft-managed key. Metadata is encrypted at rest using the same encryption key specified for blob data.

The following table summarizes the differences between metadata and blob index tags:

Metadata Blob index tags
Limits No numerical limit, 8 KB total, case insensitive 10 tags per blob max, 768 bytes per tag, case sensitive
Updates Not allowed on archive tier, Set Blob Metadata replaces all existing metadata, Set Blob Metadata changes the blob’s last-modified-time Allowed for all access tiers, Set Blob Tags replaces all existing tags, Set Blob Tags doesn't change the blob’s last-modified-time
Storage Stored with the blob data Subresource of the blob data
Indexing & Querying Must use a separate service such as Azure Search Indexing and querying capabilities built into Blob Storage
Encryption Encrypted at rest with the same encryption key used for blob data Encrypted at rest with a Microsoft-managed encryption key
Pricing Size of metadata is included in the storage costs for a blob Fixed cost per index tag
Header response Metadata returned as headers in Get Blob and Get Blob Properties Tag count returned by Get Blob or Get Blob Properties, tags returned only by Get Blob Tags and List Blobs
Permissions Read or write permissions to blob data extends to metadata Additional permissions are required to read, filter, or write index tags
Naming Metadata names must adhere to the naming rules for C# identifiers Blob index tags support a wider range of alphanumeric characters

Pricing

You're charged for the monthly average number of index tags within a storage account. There's no cost for the indexing engine. Requests to Set Blog Tags, Get Blob Tags, and Find Blob Tags are charged at the current respective transaction rates. Note that the number of list transactions consumed when doing a Find Blobs by Tag transaction is equal to the number of clauses in the request. For example, the query (StoreID = 100) is one list transaction. The query (StoreID = 100 AND SKU = 10010) is two list transactions. See Block Blob pricing to learn more.

Regional availability and storage account support

Blob index tags are only available on general-purpose v2 accounts with hierarchical namespace (HNS) disabled. General-purpose v1 accounts aren't supported, but you can upgrade any general-purpose v1 account to a general-purpose v2 account.

Index tags aren't supported on Premium storage accounts. For more information about storage accounts, see Azure storage account overview.

Blob index tags are currently available in all public regions.

To get started, see Use blob index tags to manage and find data.

Important

You must register your subscription before you can use the blob index on your storage accounts. See the Conditions and known issues section of this article.

Conditions and known issues

This section describes known issues and conditions.

  • Only general-purpose v2 accounts are supported. Premium block blob, legacy blob, and accounts with a hierarchical namespace enabled aren't supported. General-purpose v1 accounts won't be supported.
  • Uploading page blobs with index tags doesn't persist the tags. Set the tags after uploading a page blob.
  • When filtering is scoped to a single container, the @container can only be passed if all the index tags in the filter expression are equality checks (key=value).
  • When using the range operator with the AND condition, you can only specify the same index tag key name ("Age" > '013' AND "Age" < '100').
  • If Versioning is enabled, you can still use index tags on the current version. For previous versions, index tags are preserved for versions but aren't passed to the blob index engine. You cannot query index tags to retrieve previous versions.
  • There is no API to determine if index tags are indexed.
  • Lifecycle management only supports equality checks with blob index match.
  • Copy Blob doesn't copy blob index tags from the source blob to the new destination blob. You can specify the tags you want applied to the destination blob during the copy operation.

FAQ

Can blob index help me filter and query content inside my blobs?

No, if you need to search within your blob data, use query acceleration or Azure search.

Are there any requirements on index tag values?

Blob index tags only support string data types and querying returns results with lexicographical ordering. For numbers, zero pad the number. For dates and times, store as an ISO 8601 compliant format.

Are blob index tags and Azure Resource Manager tags related?

No, Resource Manager tags help organize control plane resources such as subscriptions, resource groups, and storage accounts. Index tags provide blob management and discovery on the data plane.

Next steps

For an example of how to use blob index, see Use blob index to manage and find data.

Learn about lifecycle management and set a rule with blob index matching.