Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.1.1 #44

Merged
merged 308 commits into from
Mar 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
308 commits
Select commit Hold shift + click to select a range
f71d778
bug fix for opersist mn dir creation
iannesbitt Dec 20, 2022
a7cba56
correcting node path call and dumping json
iannesbitt Dec 20, 2022
b2fcec3
moving two fields into correct sublevel
iannesbitt Dec 20, 2022
c04382e
bug fix
iannesbitt Dec 20, 2022
a18c7a0
using subprocess.run()
iannesbitt Dec 20, 2022
c79db0b
adding functions to utils
iannesbitt Dec 20, 2022
3db2204
refactoring so names are in their own variable
iannesbitt Dec 20, 2022
adfec74
adding structure to primary function
iannesbitt Dec 20, 2022
3908815
adding function to check metadata (placeholder for now)
iannesbitt Dec 20, 2022
b0e07ff
adding logging checkpoints
iannesbitt Dec 20, 2022
206ff05
logging in placeholder
iannesbitt Dec 20, 2022
5959dbb
adding checks placeholder
iannesbitt Dec 20, 2022
24471fe
adding base url validator
iannesbitt Dec 20, 2022
1ce6412
adding required input function and fixing bugs
iannesbitt Dec 20, 2022
f4a5a0e
changing to be compatible with default json
iannesbitt Dec 20, 2022
1ddc02b
bug fixes
iannesbitt Dec 20, 2022
ce233ff
bug fixes
iannesbitt Dec 20, 2022
ce44ba3
adding checks for loaded json
iannesbitt Dec 21, 2022
d99886d
adding orcid lookup function
iannesbitt Dec 23, 2022
5653d88
change meaning of mode to indicate production/staging; add production…
iannesbitt Dec 23, 2022
53ccefb
changing cn_url to func variable
iannesbitt Dec 23, 2022
ae2c080
changing mode to indicate production/staging
iannesbitt Dec 23, 2022
d2593b3
adding get_or_create, adding docstring
iannesbitt Dec 23, 2022
00a185f
adding reference to examples code
iannesbitt Dec 23, 2022
9d0a724
adding schedule choice function
iannesbitt Dec 23, 2022
a0583c1
adding schedules dict
iannesbitt Dec 23, 2022
fca3ca4
changing scheduling order
iannesbitt Dec 23, 2022
38edba8
adding data check
iannesbitt Dec 28, 2022
573de82
using default node configuration dictionary directly from mnlite
iannesbitt Dec 28, 2022
ce9f97f
changing calls for default config to go through utils.default_json()
iannesbitt Dec 28, 2022
63e2369
new structure for metadata tests. basic working structure is done
iannesbitt Dec 29, 2022
cef1d38
consolidating try/except
iannesbitt Dec 29, 2022
5a0a49d
changing default_json location to avoid circular import
iannesbitt Dec 29, 2022
43bfc70
adding state definition
iannesbitt Dec 29, 2022
be01665
bug fixes
iannesbitt Dec 29, 2022
a3d019d
adding exception handling for when mnlite is not installed and restar…
iannesbitt Dec 29, 2022
e3fe42d
return name instead of boolean
iannesbitt Jan 4, 2023
edab4b9
bug fix (subprocess doesn't like spaces)
iannesbitt Jan 4, 2023
2b83e08
subprocess scrapy call is finally working
iannesbitt Jan 4, 2023
84f0624
adding test conditions & handling more error types
iannesbitt Jan 4, 2023
9352f80
more helpful error reporting
iannesbitt Jan 4, 2023
a2b2e37
adding condition to logging
iannesbitt Jan 4, 2023
af2a797
adding conditions to logging
iannesbitt Jan 4, 2023
dcfdbe5
allowing either full orcid url or just number
iannesbitt Jan 5, 2023
954d895
add debugging option for record lookup
iannesbitt Jan 5, 2023
09c8d57
more concise notfound error
iannesbitt Jan 5, 2023
1933152
get_or_create_subj will handle asking for names
iannesbitt Jan 5, 2023
740ff12
ensure node_id has the appropriate prefix
iannesbitt Jan 5, 2023
e122563
changing uniqueness prompts
iannesbitt Jan 5, 2023
556fc1c
adding option to check specified number of metadata files
iannesbitt Jan 5, 2023
10b6446
adding option to check all records if necessary
iannesbitt Jan 5, 2023
5a73cb9
bug fix
iannesbitt Jan 5, 2023
15ee773
bug fix
iannesbitt Jan 5, 2023
2b57ba3
log message clarity
iannesbitt Jan 5, 2023
a8aed21
closing opersist instance and moving continue log msg
iannesbitt Jan 5, 2023
2890888
adding local subject lookup and renaming cn lookup
iannesbitt Jan 6, 2023
b2a515c
adding a way to turn off long debug strings
iannesbitt Jan 6, 2023
976758a
bug fix
iannesbitt Jan 6, 2023
93aafbf
making logging more informative
iannesbitt Jan 6, 2023
3c426be
bug fix
iannesbitt Jan 6, 2023
8e09278
categorization of shacl violations
iannesbitt Jan 6, 2023
74fe461
new violations categorization function
iannesbitt Jan 6, 2023
6b0a852
trailing newline
iannesbitt Jan 6, 2023
4e08815
adding prompt to limit number of tested metadata objects
iannesbitt Jan 11, 2023
7734493
adding docstring
iannesbitt Jan 11, 2023
e409aa6
comment clarity
iannesbitt Jan 12, 2023
4a46343
fixing a linting error
iannesbitt Jan 17, 2023
d87c90a
improving handling for violation dictionary
iannesbitt Jan 17, 2023
44610ad
adding to violation extractor
iannesbitt Jan 17, 2023
d504146
filling out violation extractor
iannesbitt Jan 18, 2023
8064529
adding to metadata testing/reporting ops
iannesbitt Jan 18, 2023
1c7bec4
adding debug verbosity level for file contents & violations
iannesbitt Jan 18, 2023
d31f731
adding local (no-scrape) mode
iannesbitt Jan 19, 2023
37e2221
adding local and verbose modes to help text
iannesbitt Jan 19, 2023
9292733
adding modes to getopt arguments
iannesbitt Jan 19, 2023
f826760
debugging
iannesbitt Jan 19, 2023
2020290
debugging
iannesbitt Jan 19, 2023
baaf135
debugging
iannesbitt Jan 19, 2023
cf2f725
reworking extractor to handle multiple-violation string
iannesbitt Jan 19, 2023
d7c1a0f
debugging
iannesbitt Jan 19, 2023
898ae8a
debugging
iannesbitt Jan 19, 2023
55c1016
removing verbose mode (debug goes to log file anyway)
iannesbitt Jan 19, 2023
8a0bf41
debugging
iannesbitt Jan 19, 2023
9f56d80
debugging
iannesbitt Jan 19, 2023
a3beba0
debugging
iannesbitt Jan 19, 2023
3bcb1ed
debugging
iannesbitt Jan 19, 2023
02d4fe1
debugging
iannesbitt Jan 19, 2023
02a57ee
debugging
iannesbitt Jan 19, 2023
252b7a8
bug fix
iannesbitt Jan 19, 2023
c1b6701
bug fix
iannesbitt Jan 19, 2023
198184b
simplifying hash
iannesbitt Jan 19, 2023
1e61d45
lowering some messages to debug
iannesbitt Jan 19, 2023
9c4038e
fixing logging levels
iannesbitt Jan 19, 2023
0870b32
logging levels
iannesbitt Jan 19, 2023
e18f797
changing logging structure
iannesbitt Jan 19, 2023
44e1c16
fixing logging
iannesbitt Jan 19, 2023
b4a29db
fixing logging
iannesbitt Jan 19, 2023
1734cfd
fixing namespace for logging
iannesbitt Jan 19, 2023
b1387d7
fixing logging
iannesbitt Jan 19, 2023
2b47a5c
fixing logging (last time)
iannesbitt Jan 19, 2023
6533d09
undoing namespace changes
iannesbitt Jan 19, 2023
c739448
moving init message into logger function
iannesbitt Jan 19, 2023
9964425
adding args/returns to docstrings
iannesbitt Jan 19, 2023
b589efa
set up a way to send name info to the CN server
iannesbitt Jan 30, 2023
16a0415
add xml creation (step 15) & attempt upload to cn
iannesbitt Feb 1, 2023
3c54e5e
fixing dir in ssh calls
iannesbitt Feb 2, 2023
15ff272
changing severity of keyword violation
iannesbitt Feb 10, 2023
ecb194d
correcting server location bug
iannesbitt Feb 10, 2023
db49831
fixing node path to use end of node_id for dir
iannesbitt Feb 17, 2023
b51fd4a
bug fix if -> elif
iannesbitt Feb 17, 2023
ce9128e
fixing lock file bugs
iannesbitt Feb 21, 2023
40624d2
sonormal now in parent dir (`../sonormal`)
iannesbitt Feb 21, 2023
de8774d
Merge branch 'develop' into feature/onboarding
iannesbitt Feb 22, 2023
33a012a
defining USER_AGENT
iannesbitt Mar 2, 2023
6efb96d
Merge branch 'develop' into feature/onboarding
iannesbitt Mar 2, 2023
5ba15c0
adding usage details to readme
iannesbitt Mar 20, 2023
1322e39
updating dependencies to fix install (#22)
iannesbitt Mar 24, 2023
8999ca6
Merge branch 'develop' into feature/onboarding
iannesbitt Mar 24, 2023
83f2859
essentials > optional (DataONEorg/member-repos#67)
iannesbitt Mar 28, 2023
926f036
fixing dependencies again
iannesbitt May 23, 2023
f63d7f5
fixing small name parsing bug
iannesbitt May 24, 2023
c84cf8e
modifying operation steps
iannesbitt May 24, 2023
4dcae2b
fixing imports for [tool.poetry.scripts]
iannesbitt May 25, 2023
da439c3
fixing imports for [tool.poetry.scripts]
iannesbitt May 25, 2023
215fa20
fixing scp call
iannesbitt May 25, 2023
bfbf929
hopefully fixing xml upload
iannesbitt May 25, 2023
7aeb27a
trying fix for ssh commands
iannesbitt May 25, 2023
24a072a
trying fix for ssh command
iannesbitt May 25, 2023
bac99a4
adding uwsgi requirement
iannesbitt May 26, 2023
960496f
adding paramiko and scp
iannesbitt May 26, 2023
28e3864
fix bug in xml creation
iannesbitt May 26, 2023
cad9685
fixing indentation
iannesbitt May 26, 2023
0f9bfa9
adding new file upload and CN integration framework
iannesbitt May 26, 2023
0fb7377
adding xmltodict
iannesbitt May 26, 2023
53bf84b
adding remote cn server actions
iannesbitt May 26, 2023
d15be7b
fixing missing positional arg
iannesbitt May 26, 2023
2631e67
adding sleep
iannesbitt May 30, 2023
1cce2e1
adding node id return to ssh function
iannesbitt May 30, 2023
9b1daea
adding node id to node capabilities function
iannesbitt May 30, 2023
9e2af4c
adding parentheses to a string format
iannesbitt May 30, 2023
f5da688
addressing #25
iannesbitt Jun 1, 2023
1775d4e
removing save_xml function (#25)
iannesbitt Jun 1, 2023
15e64de
adding partial fix for #24
iannesbitt Jun 1, 2023
0bcb07d
remove `/` from safe chars (#25)
iannesbitt Jun 5, 2023
00ff9b5
addressing #25
iannesbitt Jun 5, 2023
84c5b71
adding dataone-common
iannesbitt Jun 6, 2023
6b7f7a1
move node settings write before subject reg
iannesbitt Aug 14, 2023
e457c0f
adding `d1_python.cnclient` operations (#26)
iannesbitt Aug 14, 2023
c34a51c
fixing node document download (#24)
iannesbitt Aug 15, 2023
a12faec
correction under #24
iannesbitt Aug 15, 2023
581a1b1
addressing #27
iannesbitt Aug 23, 2023
f9437b3
working bugs out for #27
iannesbitt Aug 23, 2023
b474866
fixing node registration command
iannesbitt Aug 23, 2023
948e329
adding some cn/d1_python logic
iannesbitt Aug 23, 2023
447d83d
autothrottle for scrape and sitemap crawl (#23)
iannesbitt Aug 25, 2023
836136b
removing uwsgi from poetry req's (build fails)
iannesbitt Sep 5, 2023
058c4be
removing cache and increasing concurrency target
iannesbitt Sep 5, 2023
a7cff46
adding restart prompt for non-admin user
iannesbitt Sep 5, 2023
33c6ac1
adding hourly option to scheduing presets
iannesbitt Sep 5, 2023
f934859
removing admin tools from requirements
iannesbitt Sep 5, 2023
7062565
changing url prefix behavior to be more forgiving
iannesbitt Sep 5, 2023
9446d55
updating dependencies
iannesbitt Sep 6, 2023
dde116c
updating dependencies again
iannesbitt Sep 8, 2023
636539d
fixing mnlite service restart
iannesbitt Sep 14, 2023
fb814bc
avoiding IndexError when mnlite database is empty
iannesbitt Sep 14, 2023
2cd82ed
adding continue prompt after record checks
iannesbitt Sep 14, 2023
c58d981
replacing len inequality with try
iannesbitt Sep 14, 2023
3e8dd5e
updating prompt message and docstring
iannesbitt Sep 14, 2023
3a3fb61
defining json-ld version for sonormal and pyld (related to #31)
iannesbitt Sep 25, 2023
dc7bb23
no rigidly defined click ver; updates to lock file
iannesbitt Sep 25, 2023
8ed8402
cleaning up errors associated with #27
iannesbitt Sep 26, 2023
86c55ae
more cleaning up after #27
iannesbitt Sep 26, 2023
3c13edf
fixing bug on ssh connection failure
iannesbitt Oct 3, 2023
0cb814e
allow script to set D1_AUTH_TOKEN
iannesbitt Oct 3, 2023
311bdba
adding changes to address #30
iannesbitt Oct 4, 2023
97fee60
updating post-report message
iannesbitt Oct 4, 2023
6a0f3ac
simplifying continue loop ue
iannesbitt Oct 4, 2023
83aefdb
correction for #30
iannesbitt Oct 4, 2023
a5f0b98
Merge branch 'develop' into feature/onboarding
iannesbitt Oct 4, 2023
e8dc48b
fixing node doc download for #30
iannesbitt Oct 4, 2023
555e428
Merge branch 'feature/onboarding' of https://github.com/DataONEorg/mn…
iannesbitt Oct 4, 2023
7e5d78b
removing extra hash in first comment #30
iannesbitt Oct 4, 2023
570becd
adding mnlite user to scp command #30
iannesbitt Oct 4, 2023
61ab4a7
bumping version in prep for merging #17
iannesbitt Oct 4, 2023
dc5a44a
Merge pull request #17 from DataONEorg/feature/onboarding
iannesbitt Oct 4, 2023
0963560
addressing #33
iannesbitt Oct 5, 2023
86d8e6b
removing `sudo` from `curl` commands
iannesbitt Oct 6, 2023
ae56049
starting #32
iannesbitt Oct 6, 2023
f040c0b
work on #32
iannesbitt Oct 12, 2023
b8c3ae4
continuing to address #33
iannesbitt Oct 13, 2023
18c24ad
adding `application/ld+json` to request header #23
iannesbitt Oct 13, 2023
e6e9ae5
adding debug text for response Content-Type #23
iannesbitt Oct 13, 2023
c593591
disabling telnet console
iannesbitt Oct 13, 2023
379106a
adding handling of integer jsonld version
iannesbitt Oct 13, 2023
00669a4
adding response logic for #23
iannesbitt Oct 13, 2023
9bf06bb
enable RedirectMiddleware (#35)
iannesbitt Oct 24, 2023
ed0262c
enabling RobotsTxtMiddleware explicitly (#23)
iannesbitt Oct 24, 2023
c171809
more explicit error logging for jsonldspider (#23)
iannesbitt Oct 24, 2023
33cce69
decoding content-type from response (#23)
iannesbitt Oct 24, 2023
03e7079
adding to docs (#32)
iannesbitt Oct 24, 2023
0e64f64
adding to docs (#32)
iannesbitt Oct 24, 2023
d934fcc
adding to docs (#32)
iannesbitt Oct 24, 2023
10a8d22
adding more docs (#32)
iannesbitt Oct 24, 2023
9ea2163
adding redirect setting explicitly (#35)
iannesbitt Oct 26, 2023
a39d796
adding settings override (#38)
iannesbitt Oct 26, 2023
d14f32c
modifying handling of mn specific settings (#38)
iannesbitt Oct 26, 2023
95d4e14
minor bug fix for #38
iannesbitt Oct 26, 2023
782252c
dummy sitemap to debug #42
iannesbitt Oct 27, 2023
3b4f62d
fixing #42
iannesbitt Oct 27, 2023
e2be731
adding `lastmod_filter` handling (#41)
iannesbitt Oct 31, 2023
490ca0e
adding test Dryad sitemap
iannesbitt Nov 3, 2023
38c0156
setting `REQUEST_FINGERPRINTER_IMPLEMENTATION`
iannesbitt Nov 3, 2023
32c08cb
removing test Dryad sitemap
iannesbitt Nov 3, 2023
045f446
removing Accept-Language en; not relevant to all mns
iannesbitt Nov 6, 2023
62d6cd3
adding nohup output file to .gitignore
iannesbitt Nov 6, 2023
a302548
updating lock for Ubuntu 22.04 SSL version change
iannesbitt Nov 9, 2023
3e16aac
logic for #45
iannesbitt Nov 10, 2023
a2235cb
Added debug logging to `sitemap_filter` (#45)
iannesbitt Nov 10, 2023
08ac602
adding timestamp to debug message (#45)
iannesbitt Nov 10, 2023
ed7b18b
updating test sitemap (#45, DataONEorg/sonormal#4)
iannesbitt Nov 10, 2023
9b2c0c0
switching gt/lt signs (#45)
iannesbitt Nov 10, 2023
afee143
adding more verbose logging for #45
iannesbitt Nov 10, 2023
b6d98e5
fixing no identifier bug (#47)
iannesbitt Nov 10, 2023
836ad07
changing logging levels for #47
iannesbitt Nov 10, 2023
606b46a
changing logging levels for #47
iannesbitt Nov 10, 2023
7588204
Merge pull request #48 from DataONEorg/bugfix-47
iannesbitt Nov 10, 2023
01ed5d0
addressing #49
iannesbitt Nov 16, 2023
1a1e941
adding code for #50
iannesbitt Nov 20, 2023
8ca1987
adding logic for #51
iannesbitt Nov 20, 2023
888ee6c
adding enhanced logging for #51
iannesbitt Nov 20, 2023
f9fcc6f
debugging for #51
iannesbitt Nov 20, 2023
9cf0b94
debugging for #51
iannesbitt Nov 20, 2023
451dca4
more debugging for #51; reversing generator
iannesbitt Nov 20, 2023
737c923
enhanced logging for #51
iannesbitt Nov 20, 2023
157c72e
logic for #52
iannesbitt Nov 22, 2023
4494316
debugging for #52
iannesbitt Nov 22, 2023
eefc496
moving jsonld correction before normalization (#52)
iannesbitt Nov 22, 2023
5da7462
adding a first-time-through exception (#52)
iannesbitt Nov 22, 2023
41b6602
handling #52 in cases where jsonld has no `@graph`
iannesbitt Nov 27, 2023
142b6f7
addressing #54
iannesbitt Jan 10, 2024
7d1b97e
cleaning up #54 fix
iannesbitt Jan 10, 2024
f6f9cda
adding test USAP-DC sitemap and JSON-LD
iannesbitt Mar 7, 2024
cf37370
correcting test USAP sitemap url
iannesbitt Mar 7, 2024
b39b486
trying html embed instead of raw jsonld
iannesbitt Mar 7, 2024
7c7ab45
adding second USAP dataset page with recommended changes
iannesbitt Mar 7, 2024
863edb8
removing duplicate `@id` field
iannesbitt Mar 7, 2024
9a93767
removing duplicate `@id` field
iannesbitt Mar 7, 2024
986a171
adding a giant first jsonld datacenter descripto to test real repo co…
iannesbitt Mar 7, 2024
db5eaa0
adding debugging dumps
iannesbitt Mar 7, 2024
2698389
adding option to choose which jsonld to use if > 1 (#57)
iannesbitt Mar 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
instance/
dbs/
.vscode
nohup.out

docs/diagrams/C4-PlantUML

Expand Down
8 changes: 4 additions & 4 deletions docs/operation.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@ Harvesting is implemented as a scrapy crawler[^scrapy]. Given a sitemap, crawls
## DataONE production and testing hosts

- Test server: so.test.dataone.org
- Environment: ~vieglais
- Virtual env: mnlite
- Environment: `~mnlite`
- Virtual env: `mnlite`
- Production server: sonode.dataone.org
- Environment: ``~mnlite`
- Environment: `~mnlite`
- Virtual env: `mnlite`

## Testing
Expand All @@ -25,7 +25,7 @@ Harvesting is implemented as a scrapy crawler[^scrapy]. Given a sitemap, crawls

1. Log in to sonode.dataone.org (or so.test.dataone.org for testing)
2. `sudo su - mnlite`
3. `workon mnlite`
3. `workon mnlite` (`conda activate mnlite` on the test node)
4. `cd WORK/mnlite`
5. Initialize a new repository: `opersist -f instance/nodes/HAKAI_IYS init`
6. Create a contact subject: `opersist -f instance/nodes/HAKAI_IYS sub -o create -n "Brett Johnson" -s "http://orcid.org/0000-0001-9317-0364"`
Expand Down
5 changes: 5 additions & 0 deletions mnlite/mnode.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,11 @@
]
}
}
"""
Default node configuration dictionary. Defines the node document upon loading
into mnlite system service (see eg: the
`OpenTopography node document <https://sonode.dataone.org/OPENTOPO/v2/node>`_)
"""


def getMNodeNameFromRequest():
Expand Down
44 changes: 44 additions & 0 deletions mnlite/xmnlite.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#uWSGI configuration for mnlite
[uwsgi]
strict = true
master = true
processes = 5
enable-threads = true
vacuum = true ; Delete sockets during shutdown
single-interpreter = true
die-on-term = true ; Shutdown when receiving SIGTERM (default is respawn)
need-app = true

#disable-logging = true ; Disable built-in logging
#log-4xx = true ; but log 4xx's anyway
#log-5xx = true ; and 5xx's

##harakiri = 60 ; forcefully kill workers after 60 seconds
#py-callos-afterfork = true ; allow workers to trap signals

##max-requests = 1000 ; Restart workers after this many requests
##max-worker-lifetime = 3600 ; Restart workers after this many seconds
##reload-on-rss = 2048 ; Restart workers after this much resident memory
##worker-reload-mercy = 60 ; How long to wait before forcefully killing workers

#cheaper-algo = busyness
#processes = 128 ; Maximum number of workers allowed
#cheaper = 8 ; Minimum number of workers allowed
#cheaper-initial = 16 ; Workers created at startup
#cheaper-overload = 1 ; Length of a cycle in seconds
#cheaper-step = 16 ; How many workers to spawn at a time
#cheaper-busyness-multiplier = 30 ; How many cycles to wait before killing workers
#cheaper-busyness-min = 20 ; Below this threshold, kill workers (if stable for multiplier cycles)
#cheaper-busyness-max = 70 ; Above this threshold, spawn new workers
##cheaper-busyness-backlog-alert = 16 ; Spawn emergency workers if more than this many requests are waiting in the queue
##cheaper-busyness-backlog-step = 2 ; How many emergency workers to create if there are too many requests in the queue

##plugins = python
##virtualenv = /home/mnlite/miniconda3/envs/mnlite
module = mnlite:create_app()
socket = /home/mnlite/WORK/mnlite/mnlite/tmp/mnlite.sock
chmod-socket = 664

#stats = /tmp/stats.socket
##stats = 127.0.0.1:9191
##stats-http = true
66 changes: 66 additions & 0 deletions mnonboard/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# mnonboard

This module is designed to provide a wrapper around `opersist` and `mnlite` in order to streamline the [DataONE member node onboarding process](https://github.com/DataONEorg/mnlite/blob/feature/onboarding/docs/operation.md).
It takes as input either a json document manually edited from a template, or converts direct user input to a json document.

## Usage

This script requires working installations of both [sonormal](https://github.com/datadavev/sonormal) and [mnlite](https://github.com/DataONEorg/mnlite) to function properly.

### CLI options

```
Usage: cli [ OPTIONS ]
where OPTIONS := {
-c | --check=[ NUMBER ]
number of random metadata files to check for schema.org compliance
-d | --dump=[ FILE ]
dump default member node json file to configure manually
-h | --help
display this help message
-i | --init
initialize a new member node from scratch
-l | --load=[ FILE ]
initialize a new member node from a json file
-P | --production
run this script in production mode (uses the D1 cn API in searches)
-L | --local
run this script in local mode (will not scrape the remote site for new metadata)
}
```

### Onboarding process

Let's say you are in the `mnlite` base directory.
Start by activating the `mnlite` virtual environment and changing the working directory to `./mnonboard`:

```
workon mnlite
cd mnonboard
```

**Note:** Node data is stored in `instance/nodes/<NODENAME>`

#### Using an existing `node.json`

To onboard a member node with an existing `node.json` file:

```
python cli.py -l ../instance/nodes/BONARES/node.json
```

The script will guide you through the steps to set up the node and harvest its metadata.

#### No existing `node.json`

The script can also ask the user questions to set up the `node.json` file in an assisted manner. To do so, use the `-i` (initialize) flag:

```
python cli.py -i
```

Keep in mind that you should always check the `node.json` file to ensure correct values.

## Other functionality

Coming soon (see [#21](https://github.com/DataONEorg/mnlite/issues/21))
63 changes: 63 additions & 0 deletions mnonboard/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
import os
import logging
from datetime import datetime

from opersist.cli import LOG_DATE_FORMAT, LOG_FORMAT
from mnlite.mnode import DEFAULT_NODE_CONFIG

DEFAULT_JSON = DEFAULT_NODE_CONFIG

__version__ = 'v0.0.1'

LOG_FORMAT = "%(asctime)s %(funcName)s:%(levelname)s: %(message)s" # overrides import

FN_DATE = datetime.now().strftime('%Y-%m-%d')
HM_DATE = datetime.now().strftime('%Y-%m-%d-%H%M')
YM_DATE = datetime.now().strftime('%Y-%m')
LOG_DIR = '/var/log/mnlite/'
LOG_NAME = 'mnonboard-%s.log' % (FN_DATE)
LOG_LOC = os.path.join(LOG_DIR, LOG_NAME)

HARVEST_LOG_NAME = '-crawl-%s.log' % YM_DATE

def start_logging():
"""
Initialize logger.

:returns: The logger to use
:rtype: logging.Logger
"""
logger = logging.getLogger('mnonboard')
logger.setLevel(logging.DEBUG)
formatter = logging.Formatter(fmt=LOG_FORMAT, datefmt=LOG_DATE_FORMAT)
s = logging.StreamHandler()
s.setLevel(logging.INFO)
s.setFormatter(formatter)
# this initializes logging to file
f = logging.FileHandler(LOG_LOC)
f.setLevel(logging.DEBUG)
f.setFormatter(formatter)
# warnings also go to file
# initialize logging
logger.addHandler(s) # stream
logger.addHandler(f) # file
logger.info('----- mnonboard %s start -----' % __version__)
return logger

L = start_logging()

# absolute path of current file
CUR_PATH_ABS = os.path.dirname(os.path.abspath(__file__))

# relative path from root of mnlite dir to nodes directory
NODE_PATH_REL = 'instance/nodes/'

def default_json(fx='Unspecified'):
"""
A function that spits out a dict to be used in onboarding.

:returns: A dict of values to be used in member node creation
:rtype: dict
"""
L.info('%s function loading default json template.' % (fx))
return DEFAULT_JSON
155 changes: 155 additions & 0 deletions mnonboard/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
import os, sys
import getopt
import time

from mnonboard import utils
from mnonboard import info_chx
from mnonboard import data_chx
from mnonboard import cn
from mnonboard.defs import CFG, HELP_TEXT, SO_SRVR, CN_SRVR, CN_SRVR_BASEURL, CN_CERT_LOC, APPROVE_SCRIPT_LOC
from mnonboard import default_json, L

def run(cfg):
"""
Wrapper around opersist that simplifies the process of onboarding a new
member node to DataONE.

:param dict cfg: Dict containing config variables
"""
# auth
if not cfg['token']:
cfg['token'] = os.environ.get('D1_AUTH_TOKEN')
if not cfg['token']:
print('Your DataONE auth token is missing. Please enter it here and/or store it in the env variable "D1_AUTH_TOKEN".')
cfg['token'] = info_chx.req_input('Please enter your DataONE authentication token: ')
os.environ['D1_AUTH_TOKEN'] = cfg['token']
cfg['cert_loc'] = CN_CERT_LOC[cfg['mode']]
DC = cn.init_client(cn_url=cfg['cn_url'], auth_token=cfg['token'])
if cfg['info'] == 'user':
# do the full user-driven info gathering process
ufields = info_chx.user_input()
fields = info_chx.transfer_info(ufields)
else:
# grab the info from a json
fields = utils.load_json(cfg['json_file'])
info_chx.input_test(fields)
# still need to ask the user for some names
# now we're cooking
# get the node path using the end of the path in the 'node_id' field
end_node_subj = fields['node']['node_id'].split(':')[-1]
loc = utils.node_path(nodedir=end_node_subj)
# initialize a repository there (step 5)
utils.init_repo(loc)
names = {}
for f in ('default_owner', 'default_submitter', 'contact_subject'):
# add a subject for owner and submitter (may not be necessary if they exist already)
# add subject for technical contact (step 6)
val = fields[f] if f not in 'contact_subject' else fields['node'][f]
name = utils.get_or_create_subj(loc=loc, value=val, cn_url=cfg['cn_url'], title=f)
# store this for a few steps later
names[val] = name
# set the update schedule and set the state to up
fields['node']['schedule'] = utils.set_schedule()
fields['node']['state'] = 'up'
# okay, now overwrite the default node.json with our new one (step 8)
utils.save_json(loc=os.path.join(loc, 'node.json'), jf=fields)
# add node as a subject (step 7)
utils.get_or_create_subj(loc=loc, value=fields['node']['node_id'],
cn_url=cfg['cn_url'],
name=end_node_subj)
# restart the mnlite process to pick up the new node.json (step 9)
utils.restart_mnlite()
# run scrapy to harvest metadata (step 10)
if not cfg['local']:
utils.harvest_data(loc, end_node_subj)
# now run tests
data_chx.test_mdata(loc, num_tests=cfg['check_files'])
# create xml to upload for validation (step 15)
files = utils.create_names_xml(loc, node_id=fields['node']['node_id'], names=names)
# uploading xml (proceed to step 14 and ssh to find xml in ~/d1_xml)
ssh, work_dir, node_id = utils.start_ssh(server=cfg['cn_url'],
node_id=fields['node']['node_id'],
loc=loc,
ssh=cfg['ssh'])
time.sleep(0.5)
utils.upload_xml(ssh=ssh, server=CN_SRVR[cfg['mode']], files=files, node_id=node_id, loc=loc)
# create and validate the subject in the accounts service (step 16)
utils.create_subj_in_acct_svc(ssh=ssh, cert=cfg['cert_loc'], files=files, cn=cfg['cn_url'], loc=loc)
utils.validate_subj_in_acct_svc(ssh=ssh, cert=cfg['cert_loc'], names=names, cn=cfg['cn_url'], loc=loc)
# download the node capabilities and register the node
node_filename = utils.dl_node_capabilities(ssh=ssh, baseurl=SO_SRVR[cfg['mode']], node_id=node_id, loc=loc)
utils.register_node(ssh=ssh, cert=cfg['cert_loc'], node_filename=node_filename, cn=cfg['cn_url'], loc=loc)
utils.approve_node(ssh=ssh, script_loc=APPROVE_SCRIPT_LOC, loc=loc)
# close connection
ssh.close() if ssh else None

def main():
"""
Uses getopt to set config values in order to call
:py:func:`mnlite.mnonboard.cli.run`.

:returns: Config variable dict to use in :py:func:`mnlite.mnonboard.cli.run`
:rtype: dict
"""
# get arguments
try:
opts = getopt.getopt(sys.argv[1:], 'hiPvLd:l:c:',
['help', 'init', 'production', 'verbose', 'local' 'dump=', 'load=', 'check=']
)[0]
except Exception as e:
L.error('Error: %s' % e)
print(HELP_TEXT)
exit(1)
for o, a in opts:
if o in ('-h', '--help'):
# help
print(HELP_TEXT)
exit(0)
if o in ('-i', '--init'):
# do data gathering
CFG['info'] = 'user'
if o in ('-P', '--production'):
# production case
CFG['cn_url'] = CN_SRVR_BASEURL % CN_SRVR['production']
CFG['mode'] = 'production'
else:
# testing case
CFG['cn_url'] = CN_SRVR_BASEURL % CN_SRVR['testing']
CFG['mode'] = 'testing'
if o in ('-d', '--dump'):
# dump default json to file
utils.save_json(a, default_json())
exit(0)
if o in ('-l', '--load'):
# load from json file
CFG['info'] = 'json'
CFG['json_file'] = a
if o in ('-c', '--check'):
try:
CFG['check_files'] = int(a)
except ValueError:
if a == 'all': # this should probably not be used unless necessary!
CFG['check_files'] = a
else:
L.error('Option -c (--check) requires an integer number of files to check.')
print(HELP_TEXT)
exit(1)
if o in ('-L', '--local'):
CFG['local'] = True
L.info('Local mode (-L) will not scrape the remote site and will only test local files.')
L.info('running mnonboard in %s mode.\n\
data gathering from: %s\n\
cn_url: %s\n\
metadata files to check: %s' % (CFG['mode'],
CFG['info'],
CFG['cn_url'],
CFG['check_files']))
try:
run(CFG)
except KeyboardInterrupt:
print()
L.error('Caught KeyboardInterrupt, quitting...')
exit(1)

if __name__ == '__main__':
main()
Loading