Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensuring XML parser sets factory class then parses #705

Merged
merged 1 commit into from
Jan 12, 2023
Merged

Conversation

jeremyf
Copy link
Contributor

@jeremyf jeremyf commented Jan 12, 2023

This commit for changing the Bulkrax::XmlEntry is the sibling to the changes made to Bulkrax::OaiEntry (see SHA 35970ce). The salient information is pulled as a quote from that commit:

Prior to this commit, as we were reading through the record's metadata
when we were processing shape we would use the default
factory_class to check if shape was a valid field. Then we'd use
the value of model to derive the factory_class. And when we then
processed smell we'd be using the newly derived factory_class from
the model node.

With this commit, we first establish the factory_class from the
model node. Then we again process the all the nodes to parse the
metadata. This ensures that both shape and smell are parsed
against the expected factory_class.

In addition, I have refactored the spec to remove global state pollution of the parser config.

I have also renamed and documented a method to be more descriptive; the previous method name of xml_elements was opaque, as it returned an Array of strings (and not an array of XML elements).

Further steps could be to extract the now "ubiquitous" Avacado class into a spec fixture.

We may want to revisit the XML parser fundamental logic; namely we only parse nodes that are explicitly declared with in the from. This is a bit different than other parsers, in that they will make assumptions about each encountered column (in the case of CSV) or node (in the case of OAI). tl;dr - Here there be dragons.

Closes notch8/adventist_knapsack#210

Related to:

@jeremyf jeremyf added patch-ver for release notes bug-fix labels Jan 12, 2023
Copy link
Contributor

@bkiahstroud bkiahstroud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like rubocop is failing. Otherwise, approved!

This commit for changing the `Bulkrax::XmlEntry` is the sibling to the
changes made to `Bulkrax::OaiEntry` (see SHA 35970ce).  The salient
information is pulled as a quote from that commit:

> Prior to this commit, as we were reading through the record's metadata
> when we were processing `shape` we would use the default
> `factory_class` to check if `shape` was a valid field.  Then we'd use
> the value of `model` to derive the `factory_class`.  And when we then
> processed `smell` we'd be using the newly derived `factory_class` from
> the `model` node.
>
> With this commit, we first establish the `factory_class` from the
> `model` node.  Then we again process the all the nodes to parse the
> metadata.  This ensures that both `shape` and `smell` are parsed
> against the expected `factory_class`.

In addition, I have refactored the spec to remove global state pollution
of the parser config.

I have also renamed and documented a method to be more descriptive; the
previous method name of `xml_elements` was opaque, as it returned an
Array of strings (and not an array of XML elements).

Further steps could be to extract the now "ubiquitous" `Avacado` class
into a spec fixture.

We may want to revisit the XML parser fundamental logic; namely we only
parse nodes that are explicitly declared with in the `from`.  This is a
bit different than other parsers, in that they will make assumptions
about each encountered column (in the case of CSV) or node (in the case
of OAI).  tl;dr - Here there be dragons.

Closes #702

Related to:

- #703
- https://github.com/scientist-softserv/adventist-dl/issues/188
- https://github.com/scientist-softserv/adventist-dl/issues/187
- https://github.com/scientist-softserv/adventist-dl/issues/186

[1]:
@jeremyf jeremyf merged commit 8995b81 into main Jan 12, 2023
@jeremyf jeremyf deleted the i702-part-2 branch January 12, 2023 20:46
jeremyf added a commit that referenced this pull request Jan 23, 2023
Prior to this commit, if we had configured bulkrax such that we were
allowing for our model import to come from multiple fields
(e.g. `from: ['model', 'work_type']`) and the CSV had `work_type` as the
column name, we would get an error as follows:

```
NoMethodError:
  undefined method `method_defined?' for nil:NilClass
    # ./app/models/concerns/bulkrax/has_matchers.rb:128:in `field_supported?'
    # ./app/models/concerns/bulkrax/has_matchers.rb:157:in `multiple?'
    # ./app/models/concerns/bulkrax/has_matchers.rb:34:in `block in add_metadata'
    # ./app/models/concerns/bulkrax/has_matchers.rb:31:in `each'
    # ./app/models/concerns/bulkrax/has_matchers.rb:31:in `add_metadata'
    # ./app/models/bulkrax/csv_entry.rb:77:in `block in add_ingested_metadata'
    # ./app/models/bulkrax/csv_entry.rb:75:in `each'
    # ./app/models/bulkrax/csv_entry.rb:75:in `add_ingested_metadata'
    # ./app/models/bulkrax/csv_entry.rb:43:in `build_metadata'
```

Tracing this line it is because the `factory_class` is nil.  And if the
model column were to be a later column, any columns that occurred before
we established the factory class may have been skipped.

With this commit, we're echoing the functionality introduced in #705 and #703.

Related to:

- notch8/britishlibrary#211
- #705
- #703

Closes #718
jeremyf added a commit that referenced this pull request Jan 24, 2023
* Ensuring CSV sets factory class before metadata parsing

Prior to this commit, if we had configured bulkrax such that we were
allowing for our model import to come from multiple fields
(e.g. `from: ['model', 'work_type']`) and the CSV had `work_type` as the
column name, we would get an error as follows:

```
NoMethodError:
  undefined method `method_defined?' for nil:NilClass
    # ./app/models/concerns/bulkrax/has_matchers.rb:128:in `field_supported?'
    # ./app/models/concerns/bulkrax/has_matchers.rb:157:in `multiple?'
    # ./app/models/concerns/bulkrax/has_matchers.rb:34:in `block in add_metadata'
    # ./app/models/concerns/bulkrax/has_matchers.rb:31:in `each'
    # ./app/models/concerns/bulkrax/has_matchers.rb:31:in `add_metadata'
    # ./app/models/bulkrax/csv_entry.rb:77:in `block in add_ingested_metadata'
    # ./app/models/bulkrax/csv_entry.rb:75:in `each'
    # ./app/models/bulkrax/csv_entry.rb:75:in `add_ingested_metadata'
    # ./app/models/bulkrax/csv_entry.rb:43:in `build_metadata'
```

Tracing this line it is because the `factory_class` is nil.  And if the
model column were to be a later column, any columns that occurred before
we established the factory class may have been skipped.

With this commit, we're echoing the functionality introduced in #705 and #703.

Related to:

- notch8/britishlibrary#211
- #705
- #703

Closes #718

* Update spec/fixtures/csv/factory_class_test.csv

Co-authored-by: Benjamin Kiah Stroud <32469930+bkiahstroud@users.noreply.github.com>

Co-authored-by: Benjamin Kiah Stroud <32469930+bkiahstroud@users.noreply.github.com>
jeremyf added a commit that referenced this pull request Feb 3, 2023
With this commit, we're introducing
`Bulkrax::EntrySpecHelper.entry_for`, a public api method for downstream
Bulkrax spec support.

**Release Notes**

The newly provided `Bulkrax::EntrySpecHelper.entry_for` method is
intended to provide a consistent mechanism for testing your entry parser
logic without needing to jump through the entire import cycle for that
entry.

To use, will need to explicitly require it in your test suite:

```ruby
require 'bulkrax/entry_spec_helper'

RSpec.describe MyParser do
  let(:entry) { Bulkrax::EntrySpecHelper.entry_for(...) }
end
```

Each of the entry types have slightly different parameter requirements
for instantiation.  The best source to see those is in Bulkrax's
`./spec/bulkrax/entry_spec_helper_spec.rb` file

Closes: #714

Related to:

- notch8/adventist-dl#246
- #719
- #705
jeremyf added a commit that referenced this pull request Feb 3, 2023
With this commit, we're introducing
`Bulkrax::EntrySpecHelper.entry_for`, a public api method for downstream
Bulkrax spec support.

**Release Notes**

The newly provided `Bulkrax::EntrySpecHelper.entry_for` method is
intended to provide a consistent mechanism for testing your entry parser
logic without needing to jump through the entire import cycle for that
entry.

To use, will need to explicitly require it in your test suite:

```ruby
require 'bulkrax/entry_spec_helper'

RSpec.describe MyParser do
  let(:entry) { Bulkrax::EntrySpecHelper.entry_for(...) }
end
```

Each of the entry types have slightly different parameter requirements
for instantiation.  The best source to see those is in Bulkrax's
`./spec/bulkrax/entry_spec_helper_spec.rb` file

Closes: #714

Related to:

- notch8/adventist-dl#246
- #719
- #705
jeremyf added a commit that referenced this pull request Feb 6, 2023
With this commit, we're introducing
`Bulkrax::EntrySpecHelper.entry_for`, a public api method for downstream
Bulkrax spec support.

**Release Notes**

The newly provided `Bulkrax::EntrySpecHelper.entry_for` method is
intended to provide a consistent mechanism for testing your entry parser
logic without needing to jump through the entire import cycle for that
entry.

To use, will need to explicitly require it in your test suite:

```ruby
require 'bulkrax/entry_spec_helper'

RSpec.describe MyParser do
  let(:entry) { Bulkrax::EntrySpecHelper.entry_for(...) }
end
```

Each of the entry types have slightly different parameter requirements
for instantiation.  The best source to see those is in Bulkrax's
`./spec/bulkrax/entry_spec_helper_spec.rb` file

Closes: #714

Related to:

- notch8/adventist-dl#246
- #719
- #705
jeremyf added a commit to notch8/britishlibrary that referenced this pull request Mar 14, 2023
Prior to this commit, with Bulkrax 5 there was a deprecation notice
regarding `xml_elements`.

With this commit, we're favoring
`field_mapping_from_values_for_xml_element_names`.

Related to:

samvera/bulkrax#705
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix patch-ver for release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Full text search is broken
2 participants