Skip to content

Latest commit

 

History

History
264 lines (225 loc) · 10 KB

4.1. Storage - Azure Storage.md

File metadata and controls

264 lines (225 loc) · 10 KB

Storage

Storage services

  • Storage account is top-level account for following services:
    • Blob Storage
    • File Storage
    • Table Storage
    • Queue Storage

Blob Storage

  • Object and disk storage
  • Blob storage tiers
  • Azure Search integration
  • Blob Lease for exclusive write access
    • Pass in lease id to API to modify
    • E.g. IaaS VMs lease Page Blob disks to ensure its managed by single VM
  • You can create snapshots on blob level and view snapshots.

Azure Data Lake Storage

  • Uses blob storage to store data
  • Big data analytics
  • Analytics interface and APIs
  • Blob storage APIs
  • Hadoop compatible access to data
  • ❗ GPv2 Storage accounts only

Blob Types

  • Block Blob
    • Composed of 100 MB blocks
    • Optimized for efficient upload
    • Insert, replace, delete, blocks
    • ❗ Up to 4.77TB max file size
    • ❗ 50.000 max blobs
  • Append blob
    • Can only append blocks
    • Ideal for log and audit files
    • ❗ 195GB max file size
  • Page Blob
    • Optimized for frequent read/write operations
    • Good for VM disks and databases
      • Foundation for IaaS disks
      • Stores VHD files.
      • Underlying storage for Azure SQL
    • ❗ Standard (HDD) / Premium (SSD) storage
    • ❗ 8 TB max file size
    • ❗ Only offered in General Purpose account types

Blob Storage Access Tiers

  • Set on blob level.
  • Three tiers:
    1. Hot Tier: Frequent reads
      • Lower data access costs
      • Higher data storage costs
    2. Cool Tier: Accessed less frequently
      • Higher data access costs
      • Lower data storage costs
      • Optimized for data that's stored 30 days
    3. Archive Tier: Take hours to get data available
      • Highest data access cost
      • Lowest data storage cost
      • Optimized for data that's stored 180 days
      • ❗ Only supported for Block Blobs
  • Changing storage tiers incurs charges
  • ❗ Can't change the Storage Tier of a Blob that has snapshots
  • Azure Blob Storage Lifecycle Management Policies
    • E.g. configure a policy to move a blob directly to the archive storage tier X days after it's uploaded
    • Storage Account -> Blob Service -> Lifecycle Management
    • Executed daily

WORM: Write Once Read Many

  • Cannot be erased or modify for certain period of time.
  • Set on container level
  • Enable: Access Policy -> Add Policy -> Time-based retention (set retention period) / Legal hold (attach tags) -> Lock policy

Soft Delete

  • Saves deleted for a specified period of time
    • Storage Account -> Blob Services -> Soft Delete

Static Website Hosting

  • When activated it creates $web container.
  • You need to have default document and error page.
  • You can integrate Azure CDN
    • Azure Content Delivery Network (CDN)
      • Distributed network of cache servers
      • Provide data to user from closest source
      • Offload traffic from origin servers to CDN
        • Typically static data
      • Pricing tiers are Microsoft, Akami, Verizon (Microsoft partners)
      • Supports HTTPS, large file download optimization, file compression, geo-filtering
      • Azure CDN Core Analytics is collected and can be exported to blob storage, event hubs, Log Analytics.
    • Azure Storage blob becomes origin server.
    • Azure CDN servers become edge servers
    • CDN can authenticate to Blob Storage with SAS tokens to be able to read private data.
    • Caching rules: On blobs you can set CDN caching rules, such as CacheControl:max-age=86400 in blob properties.
    • Set up: Two alternatives
      • Create CDN and configure against blob service.
      • Storage account -> Blob service -> CDN
  • You can have custom domain
  • You can have CORS policies

Azure Search

  • Integrates with Azure Search
  • You can provide metadata in blobs, they'll be used as fields in search index which helps categorize documents and aid with features like faceted search.
    • You can choose index content, content+metadata or just metadata.
  • Searchable blobs can be PDF, doc/docs, xls/xlsx, ppt/pptx, msg, HTML, XML, ZIP, EML, RTF, TXT, CSV, JSON.
  • Azure Search
    • Structure: Index, Fields, Documents
    • Data Load: Push data in yourself, pull data from Azure sources (SQL, Cosmos DB or blob storage)
    • Data Access: REST API, Simpleq Query, Lucene, .NET SDK
    • Features:
      • Fuzzy search handles misspelled words.
      • Suggestions from partial input.
      • Facets for categories.
      • Highlighting search tags for the results.
      • Tune and rank search results
      • Paging
      • Geo-spatial search if index data has latitude and longtitude, user can get related data based on proximity
      • Synonyms
    • Lexical analysis done by Analyzers
    • You can combine following cognitive skills in pipelines: OCR, language detection, key phrase extraction, NER, sentiment analysis, merger/split/image analysis/shaper.

File Storage

  • SMB File Shares
  • Attach to Virtual Machines as file shares
  • Integrates with Azure File Sync
    • On-prem to Azure sync with caching strategy

Table Storage

  • NoSQL Data Store
  • Scheme-less design
  • Azure Cosmos DB

Queue Storage

  • Message based
  • For building synchronous applications
  • URL format: e.g. http://storageaccount.blob.core.windows.net

Account Types

Blob Storage Account

  • Supported services: Blob storage
  • Supported blob types: Block blobs, append blobs
  • Supports blob storage access tiers (hot, cool, archive)

General Purpose V1

  • Supported services: Blob storage
  • ❗ Does not support blob storage access tiers (hot, cool, archive)
  • ❗ Classic deployment & ARM
  • ❗ Does not support ZRS (Zone Redundant Storage) replication
  • Slightly cheaper storage transaction costs, can be converted to V2.

General Purpose V2

  • Supports all latest features.
    • Including anything in General Purpose V1 and blob storage access tiers.
  • 💡 Recommended choice when creating storage account.
  • Lower storage costs than V1
  • ❗ Has a changing soft limit (as of now 500 TB)
    • You can contact Azure support and request higher limits (as of now 5 PB). Same for ingress/egress limits to.

Account Replication

  • Impacts SLA
  • Locally Redundant Storage (LRS)
    • Three copies of data in single data center.
    • Data is spread across multiple hardware racks.
  • Zone Redundant Storage (ZRS)
    • Three copies of data in different availability zones in same region.
    • ❗ Only available for GPv2 storage accounts
  • Geo-reduntant Storage (GRS)
    • Three copies of data in two different data centers in two different regions.
    • ❗ You don't get to choose second region, they're paired regions decided by Microsoft.
    • ❗ Replication involves a delay.
      • RPO (recovery point objective) is typically lower than 15 minutes.
  • Read-access Geo-reduntant Storage (RA-GRS)
    • Same as GRS, but you get read-only access to data in secondary region.

Azure Storage Explorer

  • Cross-platform client application to administer/view storage and Cosmos DB accounts.
    • Can be downloaded with Storage Account -> Open in Explorer in Portal.
    • Available in Azure portal as well (preview & simpler)
  • Can manage accounts across multiple subscriptions
  • Allows you to
    • Run storage emulator in local environment.
    • Manage SAS, CORS, access levels, meta data, files in File Share, stored procedures in Cosmos DB
    • Manage soft delete:
      • Enables recycle bin (retention period) for deleted items.
  • Connecting and authentication
    • Admin access with accont log-in
    • Limited access with account level SAS

Pricing

  • Data storage cost (capacity)
  • Data operations
  • Outbound data transfer (bandwidth)
  • Geo-replication data transfer

Import and export data to Azure

  • You can use portal, PowerShell, REST API, Azure CLI, or .NET Storage SDKs.
  • You can upload files/folders using Azure Storage Explorer.
  • You can use AzCopy commandline utility tool.
    • No limit to # of files in batch
    • Pattern filters to select files
    • Can continue batch after connection interruption
      • Uses internal journal file to handle it
    • Copy newer/older source files.
    • Throttle # of concurrent connections
    • Modify file name and metadata during upload.
    • Generate log file
    • Authenticate with storage account key or SAS.
  • You can use physical drives
    • ❗ 64 bit only operating systems: Windows 8+ and Windows Server 2008+
    • Preparing the drive
      • ❗ NSTF only.
      • ❗ Drives must be encrypted using BitLocker
    • WAImportExportTool
      • Azure Import/Export tool
      • V1: Blob Storage, Export Jobs, V2: GP v1, GP v2
      • Allows you to copy from on-prem.
    • Importing data
      • Create import job
        1. Create storage account
        2. Prepare the drives
          • Connect disk drives to the Windows system via SATA connectors
          • Create a single NTFS volume on each drive
          • Prepare data using WAImportExportTool
            • Modify dataset.csv to include files/folders
            • Modify driveset.csv to include disks & encryption settings
            • Copy access key from storage account
        3. In Azure -> Create import/export job -> Import into Azure -> Select container RG -> Upload JRN (journal) file created from WAImportExportTool -> Choose import destination to the storage account -> Fill return shipping info
        4. Ship the drives to the Azure datacenter & update status with tracking number
      • Costs:
        • Charged: fixed price per device, return shipping costs
        • Free: for the data transfer in Azure
      • ❗ No SLAs on shipping
        • Estimated: 7-10 days after arrival
    • Exporting data
      1. Azure -> Create Import/Export Job -> Choose Export from Azure
      2. Select storage account and optionally containers
      3. Type shipping info
      4. Ship blank drives to Azure
      5. Azure encrypts & copies files
        • Provides recovery key for encrypted drive.
    • Azure Databox
      • Microsoft ships Data Box storage device
        • Each storage device has a maximum usable storage capacity of 80 TB.
      • It lets you send terabytes of data into Azure in a quick, inexpensive, and reliable way