Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scanner changes #1

Closed
wants to merge 4 commits into from
Closed

Scanner changes #1

wants to merge 4 commits into from

Conversation

nitin-deamon
Copy link
Owner

There are 2 commits in this PR.

  1. Commit should go to azcopy repo as it export the different APIs.
  2. Commit where a sample scanner.go is there which utilize these exported APIs to give scanner functionality. We are utilizing azcopy creation of crawler code which is linked to traverser interface depending on type of source directory.

Need to try with /dev/NULL as destination if it works we can provide dummy destination to /dev/NULL.
Right now i tried with giving src-> local filesystem, dest-> blob and vice-versa.

- Changes here to export function from cmd package
- Changes to use ste/concurrency to load environment variable.
- This file has build in function which init enumerator and cook
  the structure required for enumerator.
- To scan call function Scanner(src, dest) with source and destination
  folder. Destination here can be dummy like blob where you want to copy
  the content. As of now azcopy use enum and enumerator require
  destination folder. So we need to provide dummy destination folder.
This is a sample code , for test purpose created this file under new
folder scanner.
Otherwise it will be part of go routine which call scanner.
- This file has build in function which init enumerator and cook
  the structure required for enumerator.
- To scan call function Scanner(src, dest) with source and destination
  folder. Destination here can be dummy like blob where you want to copy
  the content. As of now azcopy use enum and enumerator require
  destination folder. So we need to provide dummy destination folder.
@shalinijoshi19
Copy link
Collaborator

@nitin-deamon could you also include @johnmic in these PRs so he can build some context etc? Thanks!

@nitin-deamon
Copy link
Owner Author

@nitin-deamon could you also include @johnmic in these PRs so he can build some context etc? Thanks!

done @shalinijoshi19

@nitin-deamon nitin-deamon requested a review from johnmic August 4, 2021 04:29
@zezha-msft
Copy link
Collaborator

Could you please add mohsha-msft nakulkar-msft adreed-msft as reviewers and to this repo? Thanks

@nitin-deamon
Copy link
Owner Author

Could you please add mohsha-msft nakulkar-msft adreed-msft as reviewers and to this repo? Thanks

Hi @zezha-msft I added them to repo. Once they accept the invitation , i will added them as reviewer.

- Variable case change required for export some functionality,
  cause testcase broken.
- This patch fix broken testcases.
Copy link
Collaborator

@zezha-msft zezha-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To set the scope for the refactoring, we should clarify the intended usage. There are a couple of choices:

  1. Put in raw inputs with rawCopyCmdArgs and invoke the whole execution chain, this is essentially running AzCopy in the same process. Doing so means using AzCopy's schema for saving job plan files and job tracking. This is probably not what you want since the lifecycle manager will call os.exit when the job finishes.
  2. Leverage the existing enumerator to traverse, filter, and process stored objects in a customized way that you'd like. Doing so requires all the constructs to be exported. Several of them are missing right now.
  3. Only leverage certain traverser, filter to build your own enumerator or different concept entirely.

@@ -46,7 +46,7 @@ import (
// represent a local or remote resource object (ex: local file, blob, etc.)
// we can add more properties if needed, as this is easily extensible
// ** DO NOT instantiate directly, always use newStoredObject ** (to make sure its fully populated and any preprocessor method runs)
type storedObject struct {
type StoredObject struct {
name string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were these properties missed?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Ze, Till now sample code not using any of its properties. May be as i add more functionality i need those properties. Then i will export them. Or i can write function in cmd which work on these properties.

@@ -0,0 +1,361 @@
// Copyright © 2017 Microsoft <nitinsingla@microsoft.com>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: please remove the date 2017, for new files we don't need to put a date.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure got it Ze, Thanks

package scanner

import (
"fmt"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please set up go fmt to run on file save in your IDE.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure Ze, i modified the file using vi, let me start using VSC and add go fmt to run on file save.

return nil
}

return cmd.NewCopyEnumerator(traverser, filters, processor, finalizer), nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use the existing initEnumerator function?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for not using initEnumerator function is we want Processing function and Finalizer Function should work according to our requirement. That diverge lead me to use initResourceEnumerator and create our own initEnumerator function.

@@ -26,73 +26,73 @@ type BucketToContainerNameResolver interface {
ResolveName(bucketName string) (string, error)
}

func (cca *cookedCopyCmdArgs) initEnumerator(jobPartOrder common.CopyJobPartOrderRequest, ctx context.Context) (*copyEnumerator, error) {
var traverser resourceTraverser
func (cca *CookedCopyCmdArgs) initEnumerator(jobPartOrder common.CopyJobPartOrderRequest, ctx context.Context) (*CopyEnumerator, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be useful for you too I think

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes Ze, as i mentioned right now thought process is to create our own enumerator. So that it will not effect azcopy code base. I am trying not to change azcopy functionality as of now.

@@ -587,14 +587,14 @@ type preFilterProvider interface {
type syncEnumerator struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these enumerators should be useful too.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sync Enumerator may be not , we are trying to scan source only on the basis of Last modified time. As per initial thought we don't want to scan destination, try to keep some meta-data on premise. Which help in finding delta.

@shalinijoshi19
Copy link
Collaborator

There are 2 commits in this PR.

  1. Commit should go to azcopy repo as it export the different APIs.
  2. Commit where a sample scanner.go is there which utilize these exported APIs to give scanner functionality. We are utilizing azcopy creation of crawler code which is linked to traverser interface depending on type of source directory.

Need to try with /dev/NULL as destination if it works we can provide dummy destination to /dev/NULL.
Right now i tried with giving src-> local filesystem, dest-> blob and vice-versa.

@nitin-deamon do we have an updated set of commits at this point already? My assumption is we were going to split this so that #1 could be a new PR in the azcopy repo and #2 would be merged into our msazure instance?

Copy link
Collaborator

@shalinijoshi19 shalinijoshi19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @nitin-deamon when do you think we'd be able to close on this one? Are there more changes for us to look at since?

@nitin-deamon
Copy link
Owner Author

Hey @nitin-deamon when do you think we'd be able to close on this one? Are there more changes for us to look at since?

@shalinijoshi19 Already created the branch in azcopy repo, and cherry pick the changes to it. I need to push changes to our Repo.

nitin-deamon pushed a commit that referenced this pull request Oct 4, 2021
@linuxsmiths linuxsmiths closed this Mar 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants