-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use paths in codebase #42
Conversation
* The variable `environment` is not used when fetching sdists Signed-off-by: Jono Yang <jyang@nexb.com>
Ensure that site-package dir exists. Other minor adjustments from a scancode-toolkit release Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
These were buggy in some corner cases. They have been updated such that: * --latest-version works. * we can reliable fetch combinations of wheels and sdists for multiple OS combos at once * we now support macOS universal wheels (for ARM CPUs) Caching is now simpler: we have essentially a single file-based cache under .cache. PyPI indexes are fetched and not cached, unless the new --use-cached-index is used which can be useful when fetching many thirdparty in a short timeframe. The first PyPI repository in a list has precendence and we never fetch from other repositories if we find wheels and sdsists there. This avoid pounding too much on the self-hosted repo. Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
This is much faster Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm running into errors when using the VirtualCodebase
. I am passing in two JSON scans into VirtualCodebase
to test the functionality we have where we can create a VirtualCodebase
from multiple scans. I am able to create the VirtualCodebase
but using the walk()
method causes an exception to be raised:
Traceback (most recent call last):
File "/home/jono/nexb/src/commoncode/src/commoncode/resource.py", line 1228, in children
return sorted(children, key=_sorter)
File "/home/jono/nexb/src/commoncode/src/commoncode/resource.py", line 1220, in <lambda>
_sorter = lambda r: (r.has_children(), r.name.lower(), r.name)
AttributeError: 'NoneType' object has no attribute 'has_children'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jono/nexb/src/commoncode/src/commoncode/resource.py", line 867, in walk
for res in root.walk(self, topdown=topdown, ignored=ignored):
File "/home/jono/nexb/src/commoncode/src/commoncode/resource.py", line 1196, in walk
ignored=ignored,
File "/home/jono/nexb/src/commoncode/src/commoncode/resource.py", line 1187, in walk
for child in self.children(codebase):
File "/home/jono/nexb/src/commoncode/src/commoncode/resource.py", line 1230, in children
raise Exception(f'Cannot sort children: {children!r}:\n{children_paths!r}') from e
Exception: Cannot sort children: [None, None]:
['codebase/package', 'codebase/django-audit-tools-0.4.0']
I took a look at the paths of the VirtualCodebase
I created by using the resources_by_path
attribute and I saw that the paths root Resources of the two scans I used do not start with virtual_root
, whereas all of the other Resources in the VirtualCodebase
do. I think that the exception is occurring because the Resources doesn't have virtual_root
in its path and it cannot find any other children because of the difference in the path prefix.
Signed-off-by: Jono Yang <jyang@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
These are no longer needed. Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
- When you create a VirtualCodebase with multiple scans, we now prefix each scan path with a codebase-1/, codebase-2/, etc. directory in addition to the "virtual_root" shared root directory. Otherwise files data was overwritten and inconsistent when each location "files" were sharing leading path segments. - When you create a VirtualCodebase with more than one Resource, we now recreate the directory tree for any intermediary directory used in a path that is otherwise missing from files path list. In particular this behaviour changed when you create a VirtualCodebase from a pervious Codebase created with a "full_root" argument. Previously, the missing paths of a "full_root" Codebase were kept unchanged. Noet that the VirtualCodebase has always ignored the "full_root" argument. - The Resource has no rid (resource id) and no pid (parent id). Instead we now use internally a simpler mapping of {path: Resource} object. - The Codebase and VirtualCodebase are now iterable. Iterating on a codebase is the same as a top-down walk. - The Resource.path now never contains leading or trailing slash. We also normalize the path everywhere. In particular this behaviour is visible when you create a Codebase with a "full_root" argument. Previously, the paths of a "full_root" Codebase were prefixed with a slash "/". Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
@JonoYang the latest push is fixing this. And several other issues. Thanks! |
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
This was removed in a previous commit. Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Just a few nits.
memory. Beyond this number, Resource are saved on disk instead. -1 means | ||
no memory is used and 0 means unlimited memory is used. | ||
|
||
`max_depth` is the maximum depth of subdirectories to descend below and | ||
``max_depth`` is the maximum depth of subdirectories to descend below and | ||
including `location`. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add a line about paths
in the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks okay. I like that you set the iter dunder on Codebase to yield its resources.
I feel that the solution we have for creating a codebase from multiple scans is a bit kludgy, but reasonable. Treating each scan as an individual codebase is easier than attempting to always merge scan directory structure/data together. I had not realized the implication of getting scancode to merge two directories named codebase
from two different scans.
On a whim, I've updated commoncode in scancode and all the references to rid
in scancode/cli.py
will need to be updated.
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
The Codebase and VirtualCodebase no longer have a "full_root" and "strip_root" constructor arguments and attributes. These can still be passed but they will be ignored. These were needed only for path output and this is now were these arguments and code lives. - Resource.path is now always the plain path where the first segment is the last segment of the root location, e.g. the root fiename. - The Resource now has new "full_root_path" and "strip_root_path" properties that return the corresponding paths. - The Resource.to_dict and the new Codebase.to_list both have a new "full_root" and "strip_root" arguments - The Resource.get_path() method accepts "full_root" and "strip_root" arguments. - The Resource.create_child() method has been removed. - The "Codebase.original_location" attributed has been removed. No known users of commoncode used this. Also format code and organize imports. Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Philippe Ombredanne <pombredanne@nexb.com>
Signed-off-by: Jono Yang <jyang@nexb.com>
This PR merges the latest skeleton and implements new internals for resource Codebase and Resource.
The key change is dropping using numeric resource ids and using a simpler map of path->Resource instead.
The not-yet-implemented part is focusing a codebase on the subset the new
Codebase(paths)
argument.This is still a work in progress because of this. But early feedback is welcomed and needed.