Skip to content

Commit

Permalink
(PXP-7855): Feat/merge refactor (#85)
Browse files Browse the repository at this point in the history
* chore(tests): add edge cases to tests and expected output, ensure no duplicate records

* feat(merge): refactored merge code and updated test calls to correctly handle duplicates and other edge cases

* Apply automatic documentation changes

* chore(merge): refactor again, breakout functions, improve readability

* Apply automatic documentation changes

* chore(tests): handle more edge cases when no guid is specified

* fix(merge): edge case with multiple empty guids, need to handle updating all previous records

* Apply automatic documentation changes

* chore(merge): don't make copies unnecessarily, cleanup getting values from dict

* Apply automatic documentation changes

* chore(merge): cleaner updating of headers

* Apply automatic documentation changes

* fix(merge): add headers back where needed

* Apply automatic documentation changes

* feat(merge-refactor-suggestion): remove headers arg from _get_updated_records, remove unused reference to "existing_urls" variable (#86)

Co-authored-by: Matthew Cannalte <mcannalte@uchicago.edu>

* Apply automatic documentation changes

* fix(merge): ensure no duplicates when GUID="", more test cases for handling commas and spaces in file names

* fix(merge): handle case with multiple duplicates but no guid

* Apply automatic documentation changes

Co-authored-by: Alexander VT <alexander.m.vantol@gmail.com>
Co-authored-by: Matthew Cannalte <mcannalte@uchicago.edu>
  • Loading branch information
3 people authored Apr 29, 2021
1 parent 0d16282 commit e33a137
Show file tree
Hide file tree
Showing 18 changed files with 364 additions and 162 deletions.
14 changes: 7 additions & 7 deletions .secrets.baseline
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"files": "poetry.lock",
"lines": null
},
"generated_at": "2021-04-16T20:42:51Z",
"generated_at": "2021-04-28T19:37:37Z",
"plugins_used": [
{
"name": "AWSKeyDetector"
Expand Down Expand Up @@ -176,37 +176,37 @@
{
"hashed_secret": "96c9184fb19c9c1618ccf44d141f8029a739891c",
"is_verified": false,
"line_number": 115,
"line_number": 121,
"type": "Hex High Entropy String"
},
{
"hashed_secret": "e1da93616713812cb50e0ac845b1e9e305d949f1",
"is_verified": false,
"line_number": 311,
"line_number": 317,
"type": "Hex High Entropy String"
},
{
"hashed_secret": "47f42f4c34fddab383b817e689dc0fb75af81266",
"is_verified": false,
"line_number": 335,
"line_number": 341,
"type": "Hex High Entropy String"
},
{
"hashed_secret": "300d95dd5d30ab6928ffda6c08c6a129a23e5b39",
"is_verified": false,
"line_number": 359,
"line_number": 365,
"type": "Hex High Entropy String"
},
{
"hashed_secret": "f9e664db75c7f23a299b0b055c10e08d47073e93",
"is_verified": false,
"line_number": 421,
"line_number": 427,
"type": "Hex High Entropy String"
},
{
"hashed_secret": "7c35c215b326b9463b669b657c1ff9873ff53d9a",
"is_verified": false,
"line_number": 446,
"line_number": 452,
"type": "Hex High Entropy String"
}
]
Expand Down
Binary file modified docs/_build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/_build/doctrees/tools/indexing.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/tools/metadata.doctree
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/_build/html/searchindex.js

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/_build/html/tools/indexing.html
Original file line number Diff line number Diff line change
Expand Up @@ -364,7 +364,7 @@ <h1>Indexing Tools<a class="headerlink" href="#indexing-tools" title="Permalink

<dl class="py function">
<dt id="gen3.tools.indexing.verify_manifest.async_verify_object_manifest">
<em class="property"><span class="pre">async</span> </em><code class="sig-prename descclassname"><span class="pre">gen3.tools.indexing.verify_manifest.</span></code><code class="sig-name descname"><span class="pre">async_verify_object_manifest</span></code><span class="sig-paren">(</span><em class="sig-param"><span class="pre">commons_url</span></em>, <em class="sig-param"><span class="pre">manifest_file</span></em>, <em class="sig-param"><span class="pre">max_concurrent_requests=24</span></em>, <em class="sig-param"><span class="pre">manifest_row_parsers={'acl':</span> <span class="pre">&lt;function</span> <span class="pre">_get_acl_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'authz':</span> <span class="pre">&lt;function</span> <span class="pre">_get_authz_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'file_name':</span> <span class="pre">&lt;function</span> <span class="pre">_get_file_name_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'file_size':</span> <span class="pre">&lt;function</span> <span class="pre">_get_file_size_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'guid':</span> <span class="pre">&lt;function</span> <span class="pre">_get_guid_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'md5':</span> <span class="pre">&lt;function</span> <span class="pre">_get_md5_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'urls':</span> <span class="pre">&lt;function</span> <span class="pre">_get_urls_from_row&gt;}</span></em>, <em class="sig-param"><span class="pre">manifest_file_delimiter=None</span></em>, <em class="sig-param"><span class="pre">output_filename='verify-manifest-errors-1619452575.9644923.log'</span></em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/gen3/tools/indexing/verify_manifest.html#async_verify_object_manifest"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#gen3.tools.indexing.verify_manifest.async_verify_object_manifest" title="Permalink to this definition"></a></dt>
<em class="property"><span class="pre">async</span> </em><code class="sig-prename descclassname"><span class="pre">gen3.tools.indexing.verify_manifest.</span></code><code class="sig-name descname"><span class="pre">async_verify_object_manifest</span></code><span class="sig-paren">(</span><em class="sig-param"><span class="pre">commons_url</span></em>, <em class="sig-param"><span class="pre">manifest_file</span></em>, <em class="sig-param"><span class="pre">max_concurrent_requests=24</span></em>, <em class="sig-param"><span class="pre">manifest_row_parsers={'acl':</span> <span class="pre">&lt;function</span> <span class="pre">_get_acl_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'authz':</span> <span class="pre">&lt;function</span> <span class="pre">_get_authz_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'file_name':</span> <span class="pre">&lt;function</span> <span class="pre">_get_file_name_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'file_size':</span> <span class="pre">&lt;function</span> <span class="pre">_get_file_size_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'guid':</span> <span class="pre">&lt;function</span> <span class="pre">_get_guid_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'md5':</span> <span class="pre">&lt;function</span> <span class="pre">_get_md5_from_row&gt;</span></em>, <em class="sig-param"><span class="pre">'urls':</span> <span class="pre">&lt;function</span> <span class="pre">_get_urls_from_row&gt;}</span></em>, <em class="sig-param"><span class="pre">manifest_file_delimiter=None</span></em>, <em class="sig-param"><span class="pre">output_filename='verify-manifest-errors-1619720217.934012.log'</span></em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/gen3/tools/indexing/verify_manifest.html#async_verify_object_manifest"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#gen3.tools.indexing.verify_manifest.async_verify_object_manifest" title="Permalink to this definition"></a></dt>
<dd><p>Verify all file object records into a manifest csv</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
Expand Down
2 changes: 1 addition & 1 deletion docs/_build/html/tools/metadata.html
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ <h1>Metadata Tools<a class="headerlink" href="#metadata-tools" title="Permalink

<dl class="py function">
<dt id="gen3.tools.metadata.ingest_manifest.async_ingest_metadata_manifest">
<em class="property"><span class="pre">async</span> </em><code class="sig-prename descclassname"><span class="pre">gen3.tools.metadata.ingest_manifest.</span></code><code class="sig-name descname"><span class="pre">async_ingest_metadata_manifest</span></code><span class="sig-paren">(</span><em class="sig-param"><span class="pre">commons_url</span></em>, <em class="sig-param"><span class="pre">manifest_file</span></em>, <em class="sig-param"><span class="pre">metadata_source</span></em>, <em class="sig-param"><span class="pre">auth=None</span></em>, <em class="sig-param"><span class="pre">max_concurrent_requests=24</span></em>, <em class="sig-param"><span class="pre">manifest_row_parsers={'guid_for_row':</span> <span class="pre">&lt;function</span> <span class="pre">_get_guid_for_row&gt;</span></em>, <em class="sig-param"><span class="pre">'indexed_file_object_guid':</span> <span class="pre">&lt;function</span> <span class="pre">_query_for_associated_indexd_record_guid&gt;}</span></em>, <em class="sig-param"><span class="pre">manifest_file_delimiter=None</span></em>, <em class="sig-param"><span class="pre">output_filename='ingest-metadata-manifest-errors-1619452576.3926728.log'</span></em>, <em class="sig-param"><span class="pre">get_guid_from_file=True</span></em>, <em class="sig-param"><span class="pre">metadata_type=None</span></em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/gen3/tools/metadata/ingest_manifest.html#async_ingest_metadata_manifest"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#gen3.tools.metadata.ingest_manifest.async_ingest_metadata_manifest" title="Permalink to this definition"></a></dt>
<em class="property"><span class="pre">async</span> </em><code class="sig-prename descclassname"><span class="pre">gen3.tools.metadata.ingest_manifest.</span></code><code class="sig-name descname"><span class="pre">async_ingest_metadata_manifest</span></code><span class="sig-paren">(</span><em class="sig-param"><span class="pre">commons_url</span></em>, <em class="sig-param"><span class="pre">manifest_file</span></em>, <em class="sig-param"><span class="pre">metadata_source</span></em>, <em class="sig-param"><span class="pre">auth=None</span></em>, <em class="sig-param"><span class="pre">max_concurrent_requests=24</span></em>, <em class="sig-param"><span class="pre">manifest_row_parsers={'guid_for_row':</span> <span class="pre">&lt;function</span> <span class="pre">_get_guid_for_row&gt;</span></em>, <em class="sig-param"><span class="pre">'indexed_file_object_guid':</span> <span class="pre">&lt;function</span> <span class="pre">_query_for_associated_indexd_record_guid&gt;}</span></em>, <em class="sig-param"><span class="pre">manifest_file_delimiter=None</span></em>, <em class="sig-param"><span class="pre">output_filename='ingest-metadata-manifest-errors-1619720218.4036705.log'</span></em>, <em class="sig-param"><span class="pre">get_guid_from_file=True</span></em>, <em class="sig-param"><span class="pre">metadata_type=None</span></em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/gen3/tools/metadata/ingest_manifest.html#async_ingest_metadata_manifest"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#gen3.tools.metadata.ingest_manifest.async_ingest_metadata_manifest" title="Permalink to this definition"></a></dt>
<dd><p>Ingest all metadata records into a manifest csv</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters</dt>
Expand Down
2 changes: 1 addition & 1 deletion gen3/tools/indexing/manifest_columns.py
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ def _parse_multiple_values(values):
['/a', '/b']
['/a', '/b']
"""
values = values.translate(values.maketrans("[],\"'", " "))
values = values.translate(values.maketrans("[]\"'", " "))
return values.split()


Expand Down
Loading

0 comments on commit e33a137

Please sign in to comment.