Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to write batch due to unsupported writer feature invariants (when datafusion flag is not enabled) #2292

Closed
bryan-c-castillo-ms opened this issue Mar 15, 2024 · 2 comments
Labels
binding/rust Issues for the Rust crate bug Something isn't working

Comments

@bryan-c-castillo-ms
Copy link

Environment

Delta-rs version:
0.17.1

Binding:
Rust

Environment:

  • Cloud provider: None
  • OS: Windows
  • Other:

Bug

While trying to write a batch to a new table, I receive an error: Error: Transaction { source: UnsupportedWriterFeatures([Invariants]) }

What happened:
I tried creating a new table followed by a batch record write. I received the error: Error: Transaction { source: UnsupportedWriterFeatures([Invariants]) }

What you expected to happen:

I expected the batch to write without error.

How to reproduce it:

test_case2.rs

use std::future::IntoFuture;
use std::sync::Arc;
use std::vec;

use deltalake::kernel::{Action, DataType, StructField};

use deltalake::arrow::array::{builder, StringBuilder};
use deltalake::arrow::array::RecordBatch;
use deltalake::arrow::datatypes::Field as AField;
use deltalake::arrow::datatypes::Schema as ASchema;
use deltalake::arrow::datatypes::DataType as ADataType;
use deltalake::operations::transaction::commit;
use deltalake::operations::writer::{DeltaWriter, WriterConfig};
use deltalake::protocol::{DeltaOperation, SaveMode};
use deltalake::{open_table, DeltaOps, DeltaTableBuilder};

use deltalake::checkpoints::create_checkpoint;

pub type AppResult<T> = Result<T, Box<dyn std::error::Error>>;


struct Person {
    pub name: String,
    pub favorite_number: Option<i32>,
}

impl Person {
    pub fn new(name: impl Into<String>, favorite_number: Option<i32>) -> AppResult<Self> {
        Ok(Self {
            name: name.into(),
            favorite_number,
        })
    }
}

struct PersonBatch {
    pub name: StringBuilder,
    pub favorite_number: builder::Int32Builder,
}

impl PersonBatch {
    pub fn new() -> Self {
        Self {
            name: StringBuilder::new(),
            favorite_number: builder::Int32Builder::new(),
        }
    }

    pub fn append(&mut self, person: &Person) {
        self.name.append_value(&person.name);
        self.favorite_number.append_option(person.favorite_number);
    }

    pub fn finish(&mut self) -> AppResult<RecordBatch> {
        let schema = PersonBatch::get_arrow_schema();
        let batch = RecordBatch::try_new(
            Arc::new(schema),
            vec![
                Arc::new(self.name.finish()),
                Arc::new(self.favorite_number.finish()),
            ],
        )?;
        Ok(batch)
    }

    pub fn get_table_schema() -> Vec<StructField> {
        vec![
            StructField::new("name", DataType::STRING, false),
            StructField::new("favorite_number", DataType::INTEGER, true),
        ]
    }
    
    pub fn get_arrow_schema() -> ASchema {
        ASchema::new(vec![
            AField::new("name", ADataType::Utf8, false),
            AField::new("favorite_number", ADataType::Int32, true),
        ])
    }

    pub async fn create_if_not_exists(table_uri: impl AsRef<str>) -> AppResult<()> {
        let table = DeltaTableBuilder::from_uri(table_uri)
            .build()?;
    
        let config: HashMap<String, Option<String>> = HashMap::new();
        let storage_options: HashMap<String, String> = HashMap::new();
    
        let builder = DeltaOps(table)
            .create()
            .with_save_mode(SaveMode::Ignore)
            .with_configuration(config)
            .with_storage_options(storage_options)
            .with_columns(PersonBatch::get_table_schema())
            .with_partition_columns(vec![String::from("name")]);
        
        let _ = builder.into_future().await?;
        Ok(())
    }
}

fn create_batch(people: Vec<Person>) -> AppResult<RecordBatch> {
    let mut person_batch = PersonBatch::new();

    for person in people {
        person_batch.append(&person);
    }

    let batch = person_batch.finish()?;
    Ok(batch)
}

async fn write_batch(table_uri: impl AsRef<str>, people: Vec<Person>) -> AppResult<()> {
    let mut table = open_table(table_uri).await?;
    table.load().await?;
    let batch = create_batch(people)?;

    let object_store = table.object_store();
    let log_store = table.log_store();

    let write_config = WriterConfig::new(batch.schema().clone(), vec![], None, None, None);
    let mut writer = DeltaWriter::new(object_store.clone(), write_config);

    writer.write(&batch).await?;
    let write_actions = writer.close().await?;

    let actions: Vec<Action> = write_actions
                .iter()
                .map(|add| Action::Add(add.clone()))
                .collect();

    let operation = DeltaOperation::Write {
        mode: SaveMode::Append,
        partition_by: Some(vec![]),
        predicate: None,
    };
    
    let _ = commit(&*log_store, &actions, operation, Some(table.snapshot()?), None).await?;
    Ok(())
}

pub async fn checkpoint(table_uri: impl AsRef<str>) -> AppResult<()> {
    let mut table = open_table(table_uri).await?;
    table.load().await?;

    create_checkpoint(&mut table).await?;
    Ok(())
}

pub async fn run() -> AppResult<()> {
    let table_path = "data/people_1";

    if let Ok(md) = std::fs::metadata(table_path) 
    {
        if md.is_dir() {
            std::fs::remove_dir_all(table_path)?;
        }
    }

    PersonBatch::create_if_not_exists(table_path).await?; 

    write_batch(table_path, vec![
        Person::new("Alice", Some(42))?,
        Person::new("Bob", None)?,
    ]).await?;

    checkpoint(table_path).await?;

    write_batch(table_path, vec![
        Person::new("Bailey", Some(12))?,
        Person::new("Ollie", None)?,
    ]).await?;

    Ok(())
}

main.rs

mod test_case2;
use async_std::task::block_on;

fn main() -> Result<(), Box<dyn std::error::Error>>{
    block_on(test_case2::run())?;
    Ok(())
}

cargo.toml

[package]
name = "dt-test"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
deltalake = { version = "0.17.1", features = ["azure", "json"] }
async-std = { version = "1.12.0", features = ["attributes", "tokio1"] }

output

Error: Transaction { source: UnsupportedWriterFeatures([Invariants]) }
error: process didn't exit successfully: `target\debug\dt-test.exe` (exit code: 1)

More details:

I took a look through the source to see what might be happening. I see that a default writer version of 2 is being assigned and in protocol.rs, writer v2 requires Invariants

    static ref WRITER_V2: HashSet<WriterFeatures> =
        HashSet::from_iter([WriterFeatures::AppendOnly, WriterFeatures::Invariants]);

However, the protocol checker only add the feature for Invariants if the datafusion feature is on.

    #[cfg(feature = "datafusion")]
    {
        writer_features.insert(WriterFeatures::Invariants);
        writer_features.insert(WriterFeatures::CheckConstraints);
    }

If I turn on the datafusion tag on, the code does work though. It seems like the required writer features should be gated by that feature as well?

@bryan-c-castillo-ms bryan-c-castillo-ms added the bug Something isn't working label Mar 15, 2024
@castle8080
Copy link

I think this is a duplicate of 2204.

@rtyler rtyler added the binding/rust Issues for the Rust crate label Aug 10, 2024
@rtyler
Copy link
Member

rtyler commented Aug 10, 2024

#2204

@rtyler rtyler closed this as completed Aug 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/rust Issues for the Rust crate bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants