-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filesystem testing functions init #146
Conversation
yup, looks good. Make sure you have the no-commit to main pre-commit hook set in the repo just in case. OK, so now we would like to start to write tests for the methods, but also think about what we want the api to be able to do beyond what it was doing before (which was just getting data from ymls and/or json files.) We probably want to keep that functionality, but add cif reading for example. Also, what will be hand back? I guess powder cif objects? This is what we would want tests for I guess. |
Hello Professor, Also, what is chained_db used for? Is it used to chain multiple databases? I was going through the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please see inline comments. When these are fixed I will merge this. We will work on each individual test individually on different branches.
tests/test_fsclient.py
Outdated
pass | ||
|
||
|
||
def test_close(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this. I think it more explicitly tests that close is working because it ensures that something is open before it closes it. I think it is probably better to pass in rc explicitly too for greater readibility (I should have done that with open I guess):
def test_close():
fsc = FileSystemClient(rc)
assert fsc.open
assert fsc.dbs == rc.databases
actual = fsc.close()
assert fsc.dbs is None
assert fsc.closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't reference rc since I had to make a new class. For now, I just checked if fsc.dbs is an instanceof
nested dicts.
tests/test_fsclient.py
Outdated
|
||
@pytest.mark.skip("Not written") | ||
def test_update_one(): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make sure all your files have a trailing eol at the end. Set up your PyCharm (or whatever) to do it auomatically.
tests/test_fsclient.py
Outdated
def test_close(): | ||
fsc = FileSystemClient(rc) | ||
fsc.close() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think close this up? Not sure what pep8 has to say about this, but it keeps the tests grouped visually bit better. Do what pep8 says but close up if it has nothign to say.
Great questions. So now it starts let's have the conversation this way, what do we want the methods to take as inputs and then return as outputs. I suggest we do the following. Now this has a skeleton test rig and is passing tests I will merge it. Then let's create a branch for each function we are working on. Let's say we start with Under the carpet we have lots of questions. Our basic data are cif files so how do we filter them? Do we want to store everything in json? What happens if someone adds a new cif file? We could have a flow like this:
If 1) fails, assume there is a new cif file. Parse the cif file into json and add it to the collection. Of course, there is quite a bit going on there, but this flow may be a good one. We could also check if cifs have been updated. If we obtain a SHA hash from our cif file and store it on disc, we could compare the SHA in the collection associated with each file and the SHA of each file. This will make the Does this make sense? |
please see my comments. We want to minimize our use of cif files, even
when we are using the filesystem, because "find_one" would imply loading
all the cifs into memory, filtering them, finding one thing and returning
it.....v slow! So for sure we will keep some kind of collection like thing
(list of dicts) that stores some info about the cifs. Can you suggest a
design for how to handle this? I could see it that we json serialize
everything in the cifs, or that we save a summary of things from them in
json that we want to search over and when we match a filter we actually
load the cif file from disc and parse it. Not sure which is best tbh, but
why don't you think about it? We don't want to develop heavily on the fs
backend, so doing something simple but relatively robust is probably the
best option so everything works, but then put our focus on the mpcontribs
client.... With this in mind, I suggest saving a summary in the json
collection.
btw, the way the collections work in regolith they are a dict of the format
`{'<id1>': {<document1_contents>}, '<id2>': {<document2_contents>}} but
after loading it turns into [{'_id': <id1>, <document1_contents>}, {'_id':
<id2>, <document2_contents>}] whic his easier to iterate over I guess. I
am not sure if we want to do the same or not.
btw2 the `chained_db` builds collections across dbs, so the `people`
collection may be in `bg_group` and in `bg_public` and after chaining it
becomes just one big combined collection. We don't need this complication
here so let's not use that.
…On Wed, Mar 1, 2023 at 4:03 AM Robin Lee ***@***.***> wrote:
Hello Professor,
I have a quick question about the database object. In regolith we used it
to store data in defaultdict(lambda: defaultdict(dict)) type but since
we're working only with cif files now I'd figured this nested structure is
not necessary. Should we still stick to this datatype or is this something
we can change and test?
Also, what is chained_db used for? Is it used to chain multiple databases?
I was going through the all_documents() function but still couldn't
figure out.
—
Reply to this email directly, view it on GitHub
<#146 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABAOWUNP4ALJQRV53OWIY4DWZ24ADANCNFSM6AAAAAAVLJBKXU>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Simon Billinge
Professor, Columbia University
|
OK, I thought some more. Here is a concrete plan. Maybe move this to an issue so we don't lose it when this PR closes.
We need to bite the bullet and build the runcontrol infrastructure too. To keep things simple, make runcontrol to be a dict and pass in what we need. Use the Schema in Regolith (databases and so on). Later we may want to import runcontrol from regolith and reuse all the nice things but that will add a lot of bloat to the dependencies (gooey etc. that we won't be using). Then we can merge that branch. Then we want to refactor the front-end to use the client, so replace the and so on.... |
Yep makes sense! I'll be asking more questions if they come up while working on the new branch! |
Sorry for the delay, but I was having a hard time trying to reconfigure my git workflow since my merges were kind of entangled. Hope this branch resolves the workflow back to normal on track.