Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store package names in arrow metadata #122

Merged
merged 22 commits into from
Oct 23, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,10 @@ ArrowTypes = "2.3"
Compat = "3.34, 4"
ConstructionBase = "1.5.7"
DataFrames = "1"
Pkg = "<0.0.1, 1"
omus marked this conversation as resolved.
Show resolved Hide resolved
Tables = "1.4"
Test = "1"
UUIDs = "1"
Test = "<0.0.1, 1"
UUIDs = "<0.0.1, 1"
julia = "1.6"

[extensions]
Expand All @@ -31,11 +32,12 @@ Accessors = "7d9f7c33-5ae7-4f3b-8dc6-eff91059b697"
Aqua = "4c88cf16-eb10-579e-8560-4a9242c79595"
Compat = "34da2185-b29b-5c13-b0c7-acf172513d20"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
UUIDs = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"

[targets]
test = ["Accessors", "Aqua", "Compat", "DataFrames", "Test", "UUIDs"]
test = ["Accessors", "Aqua", "Compat", "DataFrames", "Pkg", "Test", "UUIDs"]

[weakdeps]
ConstructionBase = "187b0558-2788-49d3-abe0-74a17ed4e7c9"
1 change: 1 addition & 0 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Legolas.parse_identifier
Legolas.name
Legolas.version
Legolas.identifier
Legolas.schema_provider
Legolas.parent
Legolas.declared_fields
Legolas.declaration
Expand Down
2 changes: 2 additions & 0 deletions src/Legolas.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ module Legolas
using Tables, Arrow, UUIDs

const LEGOLAS_SCHEMA_QUALIFIED_METADATA_KEY = "legolas_schema_qualified"
const LEGOLAS_SCHEMA_PROVIDER_NAME_METADATA_KEY = "legolas_julia_schema_provider_name"
const LEGOLAS_SCHEMA_PROVIDER_VERSION_METADATA_KEY = "legolas_julia_schema_provider_version"

include("lift.jl")
include("schemas.jl")
Expand Down
56 changes: 50 additions & 6 deletions src/schemas.jl
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,12 @@

struct UnknownSchemaVersionError <: Exception
schema_version::SchemaVersion
schema_provider_name::Union{Missing, Symbol}
schema_provider_version::Union{Missing, VersionNumber}
ericphanson marked this conversation as resolved.
Show resolved Hide resolved
end

UnknownSchemaVersionError(schema_version::SchemaVersion) = UnknownSchemaVersionError(schema_version, missing, missing)

function Base.showerror(io::IO, e::UnknownSchemaVersionError)
print(io, """
UnknownSchemaVersionError: encountered unknown Legolas schema version:
Expand All @@ -110,13 +114,33 @@
This generally indicates that this schema has not been declared (i.e.
the corresponding `@schema` and/or `@version` statements have not been
executed) in the current Julia session.
""")
println(io)

if !ismissing(e.schema_provider_name)
provider_string = string(e.schema_provider_name)
if !ismissing(e.schema_provider_version)
provider_string = string(provider_string, " (version: ", e.schema_provider_version, ")")
ericphanson marked this conversation as resolved.
Show resolved Hide resolved
end
print(io, """
The table's metadata indicates that the table was created with a schema defined in:

In practice, this can arise if you try to read a Legolas table with a
prescribed schema, but haven't actually loaded the schema definition
(or commonly, haven't loaded the dependency that contains the schema
definition - check the versions of loaded packages/modules to confirm
your environment is as expected).
$(provider_string)
ericphanson marked this conversation as resolved.
Show resolved Hide resolved

You likely need to load a compatible version of this package to populate your session with the schema definition.
""")
else
print(io, """

Check warning on line 133 in src/schemas.jl

View check run for this annotation

Codecov / codecov/patch

src/schemas.jl#L133

Added line #L133 was not covered by tests
In practice, this can arise if you try to read a Legolas table with a
prescribed schema, but haven't actually loaded the schema definition
(or commonly, haven't loaded the dependency that contains the schema
definition - check the versions of loaded packages/modules to confirm
your environment is as expected).
""")
end
println(io)

print(io, """
Note that if you're in this particular situation, you can still load the raw
table as-is without Legolas (e.g. via `Arrow.Table(path_to_table)`).
""")
Expand Down Expand Up @@ -165,6 +189,25 @@
"""
identifier(sv::SchemaVersion) = throw(UnknownSchemaVersionError(sv))

"""
Legolas.schema_provider(::SchemaVersion)

Returns a NamedTuple with keys `name` and `version`. The name is a `Symbol` corresponding to the package which defines the schema version, if known; otherwise `nothing`. Likewise the `version` is a `VersionNumber` or `nothing`.
"""
schema_provider(::SchemaVersion) = (; name=nothing, version=nothing)

Check warning on line 197 in src/schemas.jl

View check run for this annotation

Codecov / codecov/patch

src/schemas.jl#L197

Added line #L197 was not covered by tests
omus marked this conversation as resolved.
Show resolved Hide resolved
# shadow `pkgversion` so we don't fail on pre-1.9
pkgversion(m::Module) = isdefined(Base, :pkgversion) ? Base.pkgversion(m) : nothing

# Used in the implementation of `schema_provider`.
function defining_package_version(m::Module)
rootmodule = Base.moduleroot(m)
# Check if this module was defined in a package.
# If not, return `nothing`
path = pathof(rootmodule)
path === nothing && return (; name=nothing, version=nothing)
ericphanson marked this conversation as resolved.
Show resolved Hide resolved
return (; name=Symbol(rootmodule), version=pkgversion(rootmodule))
end

"""
Legolas.declared_fields(sv::Legolas.SchemaVersion)

Expand Down Expand Up @@ -375,7 +418,7 @@
schema_prefix isa Symbol || return :(throw(ArgumentError(string("`Prefix` provided to `@schema` is not a valid type name: ", $(Base.Meta.quot(schema_prefix))))))
return quote
# This approach provides some safety against accidentally replacing another module's schema's name,
# without making it annoying to reload code/modules in an interactice development context.
# without making it annoying to reload code/modules in an interactive development context.
m = $Legolas._schema_declared_in_module(Val(Symbol($schema_name)))
if m isa Module && string(m) != string(@__MODULE__)
throw(ArgumentError(string("A schema with this name was already declared by a different module: ", m)))
Expand Down Expand Up @@ -476,6 +519,7 @@
return quote
@inline $Legolas.declared(::$quoted_schema_version_type) = true
@inline $Legolas.identifier(::$quoted_schema_version_type) = $identifier_string
$Legolas.schema_provider(::$quoted_schema_version_type) = $Legolas.defining_package_version(@__MODULE__)
@inline $Legolas.parent(::$quoted_schema_version_type) = $(Base.Meta.quot(parent))
$Legolas.declared_fields(::$quoted_schema_version_type) = $declared_field_names_types
$Legolas.declaration(::$quoted_schema_version_type) = $(Base.Meta.quot(schema_version_declaration))
Expand Down
33 changes: 31 additions & 2 deletions src/tables.jl
Original file line number Diff line number Diff line change
Expand Up @@ -132,10 +132,16 @@
Otherwise, return `nothing`.
"""
function extract_schema_version(table)
v = extract_metadata(table, LEGOLAS_SCHEMA_QUALIFIED_METADATA_KEY)
isnothing(v) && return nothing
return first(parse_identifier(v))
end

function extract_metadata(table, key)
metadata = Arrow.getmetadata(table)
if !isnothing(metadata)
for (k, v) in metadata
k == LEGOLAS_SCHEMA_QUALIFIED_METADATA_KEY && return first(parse_identifier(v))
k == key && return v
end
end
return nothing
Expand Down Expand Up @@ -165,6 +171,15 @@
via `Legolas.read`; is it missing the expected custom metadata and/or the
expected \"$LEGOLAS_SCHEMA_QUALIFIED_METADATA_KEY\" field?
"""))

provider_name = lift(Symbol, extract_metadata(table, LEGOLAS_SCHEMA_PROVIDER_NAME_METADATA_KEY))
provider_version = lift(VersionNumber, extract_metadata(table, LEGOLAS_SCHEMA_PROVIDER_VERSION_METADATA_KEY))
# If we don't have the schema declared in our session,
# then throw an error with all the information we have available about where
# the schema was defined.
if !declared(sv)
throw(UnknownSchemaVersionError(sv, provider_name, provider_version))
end
try
Legolas.validate(Tables.schema(table), sv)
catch
Expand Down Expand Up @@ -213,11 +228,26 @@
end
end
schema_metadata = LEGOLAS_SCHEMA_QUALIFIED_METADATA_KEY => identifier(sv)
provider_name, provider_version = schema_provider(sv)
provider_name_metadata = LEGOLAS_SCHEMA_PROVIDER_NAME_METADATA_KEY => string(provider_name)
provider_version_metadata = LEGOLAS_SCHEMA_PROVIDER_VERSION_METADATA_KEY => string(provider_version)
if isnothing(metadata)
metadata = (schema_metadata,)
if !isnothing(provider_name)
metadata = (metadata..., provider_name_metadata)
if !isnothing(provider_version)
metadata = (metadata..., provider_version_metadata)
end
end
omus marked this conversation as resolved.
Show resolved Hide resolved
else
metadata = Set(metadata)
push!(metadata, schema_metadata)
if !isnothing(provider_name)
push!(metadata, provider_name_metadata)
if !isnothing(provider_version)
push!(metadata, provider_version_metadata)

Check warning on line 248 in src/tables.jl

View check run for this annotation

Codecov / codecov/patch

src/tables.jl#L246-L248

Added lines #L246 - L248 were not covered by tests
end
end
end
ericphanson marked this conversation as resolved.
Show resolved Hide resolved
write_arrow(io_or_path, table; metadata=metadata, kwargs...)
return table
Expand All @@ -237,4 +267,3 @@
seekstart(io)
return io
end

6 changes: 6 additions & 0 deletions test/TestProviderPkg/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
name = "TestProviderPkg"
uuid = "0abfdf01-ee0b-4279-9694-f097aec3ad32"
version = "0.1.0"

[deps]
Legolas = "741b9549-f6ed-4911-9fbf-4a1c0c97f0cd"
11 changes: 11 additions & 0 deletions test/TestProviderPkg/src/TestProviderPkg.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
module TestProviderPkg

using Legolas: @schema, @version

@schema "test-provider-pkg.foo" Foo

@version FooV1 begin
a::Int
end

end # module TestProviderPkg
25 changes: 25 additions & 0 deletions test/runtests.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,31 @@ using Legolas, Test, DataFrames, Arrow, UUIDs
using Legolas: SchemaVersion, @schema, @version, SchemaVersionDeclarationError, DeclaredFieldInfo
using Accessors
using Aqua
using Pkg

# This test set goes before we load `TestProviderPkg`
@testset "#46: Informative errors when reading unknown schemas from packages" begin
err = Legolas.UnknownSchemaVersionError(Legolas.SchemaVersion("test-provider-pkg.foo", 1), :TestProviderPkg, v"0.1.0")
@test_throws err Legolas.read("test_provider_pkg.arrow")
@test contains(sprint(Base.showerror, err), "TestProviderPkg")

# Let's test some more error printing while we're here; if we did not have the VersionNumber
# (e.g. since the table was generated on Julia pre-1.9), we should still print a reasonable message:
err = Legolas.UnknownSchemaVersionError(Legolas.SchemaVersion("test-provider-pkg.foo", 1), :TestProviderPkg, missing)
@test contains(sprint(Base.showerror, err), "TestProviderPkg")
ericphanson marked this conversation as resolved.
Show resolved Hide resolved
end

# Now load the package, and verify we can write the tables with this metadata
Pkg.develop(; path=joinpath(@__DIR__, "TestProviderPkg"))
using TestProviderPkg

@testset "#46: Writing informative metadata about packages providing schemas" begin
table = [TestProviderPkg.FooV1(; a=1)]
Legolas.write("test_provider_pkg.arrow", table, TestProviderPkg.FooV1SchemaVersion())
table = Legolas.read("test_provider_pkg.arrow")
v = Legolas.extract_metadata(table, Legolas.LEGOLAS_SCHEMA_PROVIDER_NAME_METADATA_KEY)
@test v == "TestProviderPkg"
ericphanson marked this conversation as resolved.
Show resolved Hide resolved
end

@test_throws SchemaVersionDeclarationError("no prior `@schema` declaration found in current module") @version(TestV1, begin x end)

Expand Down
Binary file added test/test_provider_pkg.arrow
Binary file not shown.
Loading