Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow listing of directory contents #39

Merged
merged 16 commits into from
May 17, 2022

Conversation

Ruby184
Copy link

@Ruby184 Ruby184 commented Apr 10, 2022

Proposed changes

I propose to add listing of directory contents for drivers. To allow flexibility, we should implement it using async iterators which are natively supported from Node.js 10.x. But also to allow easy manipulation we should wrap async iterables and add some nice API for transforming the listing output and maybe allow extensions.

DirectoryListing class

To wrap listing and allow transformations of the output we are using the class DirectoryListing which:

  • implements AsyncIterable so it can be used directly using for-await-of loop
  • allows to pipe listing through series of transformation functions which takes current async iterable and returns one with transformed output
  • it is implementing some shortcut functions which are similar to array ones: filter and map
  • also it allows to use toArray() function to convert async iterable to array if we do not want to use for-await-of.
  • it is macroable and constructor is exposed on DriveManager instance as DirectoryListing to be extended from outside

DriveListItem interface

It is a type of items returned from driver listing. Currently it is specified as follows:

export interface DriveListItem<T = any> {
  /**
   * Location of list item on disk which can be used in driver methods
   */
  location: string

  /**
   * Flag to know if item represents file or directory
   */
  isFile: boolean

  /**
   * Original list item returned from underlying driver
   */
  original: T
}

Drivers can extend the interface and provide type for original or add additional properties:

export interface LocalDriveListItem extends DriveListItem<fsExtra.Dirent> {}

To prevent repeation I have introduced optional type parameter to DriverContract (it defaults to DriveListItem if not provided):

/**
 * Shape of the generic driver
 */
export interface DriverContract<T extends DriveListItem = DriveListItem> {
  /**
   * Return a listing directory iterator for given location.
   */
  list(location: string): DirectoryListingContract<this, T>
}

export interface LocalDriverContract extends DriverContract<LocalDriveListItem> {
  //
}

Driver list implementation

Disk driver needs to implement list(location: string): DirectoryListingContract<this, DriveListItem> function which returns instance of DirectoryListing class. It accepts to constructor instance of disk driver (this) and async generator function which yields entries in given location. Because generators cannot be arrow functions we use driver instance (this) so we can bind this inside generator function to it.
See example of implementation of LocalDriver (note that we are using this.adapter inside async generator):

public list(location: string): DirectoryListingContract<this, LocalDriveListItem> {
  return new DirectoryListing(this, async function* () {
    try {
      const dir = await this.adapter.opendir(this.makePath(location))

      for await (const dirent of dir) {
        yield {
          location: `${location}/${dirent.name}`.replace(/^\/+|\/+$/g, ''),
          isFile: dirent.isFile(),
          original: dirent,
        }
      }
    } catch (error) {
      throw CannotListDirectoryException.invoke(location, error)
    }
  })
}

Usage

Example usage of driver list function inside user code:

// listing files using for-await-of loop
for await (const item of driver.list('some/dir')) {
  console.log(item)
}

// get list as an array
const items = await driver.list('some/dir').toArray()

// filter only files and exclude directories
// (we can always call toArray to return result as array not as iterable)
for await (const item of driver.list('some/dir').filter((i) => i.isFile)) {
  console.log(item)
}

// get stats of every entry in directory
for await (const item of driver.list('some/dir').map((i, _, dr) => dr.getStats(i.location))) {
  console.log(item)
}

// pipe iterable to transformation function
// (filter and map are using pipe to add function to chain)
// here we are using it to do recursive listing of directory
async function* recursiveList(source) {
  for await (const item of source) {
    yield item

    if (!item.isFile) {
      yield* this.list(item.location).pipe(recursiveList)
    }
  }
}

for await (const item of driver.list('some/dir').pipe(recursiveList)) {
  console.log(item)
}

PathPrefixer

To solve problems with current implementation and centralize path manipulation we are introducing new class PathPrefixer which:

  • allows prefixing location with root path (or prefix)
  • always uses / as separator
  • normalizes location paths
    • absolute paths are no longer used as-is without prefixing them with disk root - instead they are treated as paths from the root - for example with root path /home/user using /some/directory/ as location has the same behaviour as some/directory and both resolves to path /home/user/some/directory not /some/directory/ in first case as before)
    • prevent path traversal beyond the root - throw Exception when passing location path like ../some/directory. Note that some/../directory is OK as it resolves to directory
  • is used to generate location paths for list items - even for cloud drivers just using empty '' prefix
  • allows us to introduce prefix config option for cloud providers to create disk where objects are always transparently prefixed with given prefix

Types of changes

What types of changes does your code introduce?

Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

BREAKING CHANGE: Other extending drivers implementing DriverContract now need to implement new
method list(location: string): DirectoryListingContract<this, DriveListItem>

BREAKING CHANGE: LocalDriver is now treating absolute paths differently and are always prefixed with disk root path to prevent accessing files beyond the disk root

Checklist

  • I have read the CONTRIBUTING doc
  • Lint and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works.
  • TODO: I have added necessary documentation (if appropriate)

TODO

  • Solve problems with absolute paths and relative paths using .. (add path prefixer to resolve paths)
  • Make DirectoryListing Macroable so it can be extended by users and packages
  • Consider adding recursive transformer to DirectoryListing
  • Write tests

Further comments

More description of further changes needed to related packages.

Implementaion of list for s3 and gcs

After we agree on the proposal of the API, merge PR, and release a prerelease version of @adonisjs/drive and @adonisjs/core, I will open a PRs for @adonisjs/drive-s3 and @adonisjs/drive-gcs for implementing list from changed DriverContract interface.
These cloud drivers are similar in concept of storing objects and not having traditional directories. But directories could be handled with prefixes of object keys and delimiter. Here is an example of implementation of list for s3 driver (not tested):

import { DirectoryListing, CannotListDirectoryException } from '@adonisjs/core/build/standalone'
import { ListObjectsV2Command, ListObjectsV2CommandInput } from '@aws-sdk/client-s3'
import type { DirectoryListingContract, DriveListItem } from '@ioc:Adonis/Core/Drive'

export class S3Driver implements S3DriverContract {
  //...
  public list(location: string): DirectoryListingContract<this, DriveListItem> {
    return new DirectoryListing(this, async function* () {
      const input: ListObjectsV2CommandInput = {
        Bucket: this.config.bucket,
        Prefix: location.replace(/^\/+|\/+$/g, '') + '/',
        Delimiter: '/'
      }

      while (true) {
        try {
          const response = await this.adapter.send(new ListObjectsV2Command(input))

          for (const file of response.Contents) {
            yield { isFile: true, location: file.Key }
          }
    
          for (const dir of response.CommonPrefixes) {
            yield { isFile: false, location: dir.Prefix }
          }
    
          if (!response.IsTruncated) {
            break
          }
    
          input.ContinuationToken = response.NextContinuationToken
        } catch (error) {
          throw CannotListDirectoryException.invoke(location, error)
        }
      }
    })
  }
}

It is still WIP, just first proof of concept implemented and tests are missing.

BREAKING CHANGE: Other extending drivers implementing DriverContract now need to implement new
method list(location: string): DirectoryListingContract<this>
@thetutlage
Copy link
Member

Hey @Ruby184 . Great proposal

Can you also investigate and add the listing support to s3 and gcs?

I would want all the core drivers to have feature parity

@Ruby184
Copy link
Author

Ruby184 commented Apr 11, 2022

Thanks @thetutlage, sure I will add the support also to these drivers. I have added the section according to implementation at the end of the proposal (Implementaion of list for s3 and gcs). But we need to agree first on the shape of the API to be stable, and then we can extend existing drivers.

… for drivers

To solve problems with current implementation and centralize path manipulation we are introducing
new class `PathPrefixer` which also changes behaviour for resolving paths.

BREAKING CHANGE: Absolute paths are treated differently and are always prefixed with disk root path
to prevent accessing files beyond the disk root
@thetutlage
Copy link
Member

thetutlage commented Apr 15, 2022

Looks good at the surface. I might share small modification once I try it in real.

One thing, what's the benefit/use case of having original property?

@Ruby184
Copy link
Author

Ruby184 commented Apr 15, 2022

I have added original so maybe extensions of DirectoryListing could use it for getting information that are returned from the underlying driver which are not available on DriveListItem and do not have to make another request, maybe for s3 where the metadata are returned for given objects. But we can drop it and every driver will just add the properties to extended DriveListItem.

@Ruby184 Ruby184 marked this pull request as ready for review April 18, 2022 11:00
@Ruby184 Ruby184 changed the title WIP: Allow listing of directory contents Allow listing of directory contents Apr 18, 2022
@thetutlage thetutlage self-requested a review May 5, 2022 07:10
@thetutlage thetutlage added the Type: Feature Request Request to add a new feature to the package label May 5, 2022
* Shape of the directory listing constructor, we export the constructor for others to add macros and getters to the class.
*/
export interface DirectoryListingConstructorContract
extends MacroableConstructorContract<DirectoryListingContract<DriverContract, DriveListItem>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want directory listing class to be macroable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a good idea for good DX to allow chainable one line calls which are readable. For example, maybe somebody wants to list files matching a glob pattern, so he can add a nice chainable method and use it like this:

// get json files in directory
const files = await driver.list('some/dir').glob('*.json').toArray()

Comment on lines +87 to +107
filter(
predicate: (item: T, index: number, driver: Driver) => Promise<boolean> | boolean
): DirectoryListingContract<Driver, T>

/**
* Transform generated items of listing with the given mapper function.
*/
map<M>(
mapper: (item: T, index: number, driver: Driver) => Promise<M> | M
): DirectoryListingContract<Driver, M>

/**
* Do recursive listing of items. Without the next function it will do listing of leaf nodes only.
* For advanced usage you can pass the next function which will get as parameter current item and it should
* return the next location for list or null if the recursion should stop and yield the current item.
* For advanced usage you can also limit the depth of recursion using the second argument of next function.
*/
recursive(
next?: (current: T, depth: number, driver: Driver) => Promise<string | null> | string | null
): DirectoryListingContract<Driver, T>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need these additional functions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you mean filter and map it is good for developers which prefer functional programming with higher-order functions over imperative style using for loop.
If you mean pipe it is added so you can add transforming functions - when extending listing with custom method.

@thetutlage
Copy link
Member

Okay, So I have been through the PR for the most part. Some bits looks complicated to me (because my understand of iterables is limited)

Whenever things looks complicated to me, I try to question the use case we are trying to serve. So let's do that in this case as well.

  • For me the primary reason for having the list function is to allow listing all the files inside a given directory.
  • Depending upon the use case, I might want the listing to be recursive as well.
  • Optionally, I might want to filter and only pick files of a given type. Like only .json files.
  • And finally, I think we are using iterables because of the performance. Instead of fetching everything at once and keep it as a Buffer in the memory, we are using iterables to render the first batch of files and then fetch the next set of files by calling the next function.

Are my assumptions correct?

@Ruby184
Copy link
Author

Ruby184 commented May 6, 2022

  • For me the primary reason for having the list function is to allow listing all the files inside a given directory.

Yes, it is the main point and use case for list.

  • Depending upon the use case, I might want the listing to be recursive as well.

Sure, sometimes we want the list of all files in given directory no matter how deep they are in given directory.

  • Optionally, I might want to filter and only pick files of a given type. Like only .json files.

Yes, filtering files is a common use case. With current implementation you can do it in many ways according to what you prefer.

// 1. just use the for loop and if
const files = []
for await (const item of driver.list('some/dir')) {
  if (item.location.endsWith('.json')) {
    files.push(item.location)
  }
}

// 2. use filter higher-order function
const files = []
for await (const item of driver.list('some/dir').filter((i) => i.location.endsWith('.json'))) {
  files.push(item.location)
}

// 3. do not use for-await-of at all and just use higher-order functions and convert result to array
const files = await driver.list('some/dir').filter((i) => i.location.endsWith('.json')).map((i) => i.location).toArray()
  • And finally, I think we are using iterables because of the performance. Instead of fetching everything at once and keep it as a Buffer in the memory, we are using iterables to render the first batch of files and then fetch the next set of files by calling the next function.

Yes, async iterables are great when you just need to fetch chunks of data on demand, as you said. It is maybe a little slower as you need to wait for each call for the next chunk of data, but not all the data are in memory the whole time. As usage is not as straightforward as with arrays, I tried to add nice API and allow working with iterables for people who do not like it and prefer working with directory listing as arrays. So it is a compromise.

@thetutlage
Copy link
Member

Cool. Thanks for the answers.

So what I am thinking is when exactly am I going to use iterables? If I am creating a JSON API, I will have to anyways collect the tree inside the buffer and then send it as response. Same is the case when rendering the tree inside edge templates

@Ruby184
Copy link
Author

Ruby184 commented May 8, 2022

Yeah, you are right. The main use case for iterables is to perform operations on the files, for example you want to delete all the files in the directory (or maybe move all files to other location etc.)

for await (const item of driver.list('some/dir').recursive())) {
  await driver.delete(item.location)
}

@thetutlage
Copy link
Member

Yeah, you are right. The main use case for iterables is to perform operations on the files, for example you want to delete all the files in the directory (or maybe move all files to other location etc.)

Yup, that was the thing I was missing. Totally makes sense.

Now, how should we approach with the same API for s3 and gcs. Does their SDKs provides everything we need?

@Ruby184
Copy link
Author

Ruby184 commented May 9, 2022

As I have mentioned in the proposal, s3 is good and also looked into gcs and we should be able to make a list also with "directories" using apiResponse.prefixes as it is shown here:
https://github.com/googleapis/nodejs-storage/blob/21cbce0459f82a4ad7b7741611cc97c9c3d8a6ca/src/bucket.ts#L2470-L2504

@thetutlage
Copy link
Member

@Ruby184 Thanks for being patient :)

The PR looks good to me. So let's merge it. A couple of questions before I hit the merge button

  • Can we just remove the Macroable from the DirectoryListing class. I don't see any immediate use case for extending this class. So I will wait for the moment to arrive and then make it macroable
  • How should we deal with the breaking change we will introduce by making DriverContract to accept generics and the list method

@Ruby184
Copy link
Author

Ruby184 commented May 9, 2022

  • Can we just remove the Macroable from the DirectoryListing class. I don't see any immediate use case for extending this class. So I will wait for the moment to arrive and then make it macroable

I like the idea the developers could add the methods what they consider as needed, for example mentioned glob or just other shortcuts but If you are not keen of that feel free to remove it.

  • How should we deal with the breaking change we will introduce by making DriverContract to accept generics and the list method

I think we should handle it by releasing the major version of the package as Semantic Versioning recommends for such changes.

@targos
Copy link
Member

targos commented May 9, 2022

I'm not sure it's a breaking change. The generic has a default value.

@Ruby184
Copy link
Author

Ruby184 commented May 9, 2022

I'm not sure it's a breaking change. The generic has a default value.

Yeah this one is not a breaking change and should be BC but there are 2 other breaking changes:

BREAKING CHANGE: Other extending drivers implementing DriverContract now need to implement new
method list(location: string): DirectoryListingContract<this, DriveListItem>

BREAKING CHANGE: LocalDriver is now treating absolute paths differently and are always prefixed with disk root path to prevent accessing files beyond the disk root

@targos
Copy link
Member

targos commented May 9, 2022

BREAKING CHANGE: Other extending drivers implementing DriverContract now need to implement new
method list(location: string): DirectoryListingContract<this, DriveListItem>

We can make it non-breaking by making the method optional (the default implementation could throw an error).

BREAKING CHANGE: LocalDriver is now treating absolute paths differently and are always prefixed with disk root path to prevent accessing files beyond the disk root

We could argue that this is a security fix and as such may be breaking in a patch/minor release.

@Ruby184
Copy link
Author

Ruby184 commented May 9, 2022

We can make it non-breaking by making the method optional (the default implementation could throw an error).

Yes, but I do not think it is a good idea as we cannot add default implementation which will throw error to extended drivers outside the core because use returns an instance of the driver directly. And by making it optional, users would always need to check for existence or access it with ! like this driver.list!('some/dir'). And later we will need to make it required, so in my opinion we will just delay the change for no added benefit. Are there any disadvantages of releasing it as major version?

We could argue that this is a security fix and as such may be breaking in a patch/minor release.

Yeah, I agree with that this is considered as security fix. And I do not expect people were using absolute paths with these methods when using local driver.

@targos
Copy link
Member

targos commented May 10, 2022

Are there any disadvantages of releasing it as major version?

The package is part of adonisjs/core. I don't know how often @thetutlage intents to make major releases of the core, but as a general thought: it would be unfortunate to require major releases when a new API is added (this new method doesn't make old code throw or change behavior).

@thetutlage
Copy link
Member

The absolute paths one is a bug fix and usually bug fix breaks existing code, since the old behavior was unexpected.

Also, I agree with @targos here, this is just a breaking change for other packages relying on Drive base interface and not the end user code. So, I will prefer not making a breaking change.

So, to conclude. Let's make the list method optional.

@Ruby184
Copy link
Author

Ruby184 commented May 10, 2022

Yeah, sure, I got it. Actually I have got an idea, let me try to make some changes. I think I can make it optional and prevent the need of checking for existence in user code when using core drivers.
@thetutlage should I also remove macroable or did you change your mind? :)

@thetutlage
Copy link
Member

@thetutlage should I also remove macroable or did you change your mind? :)

I don't think anyone will use it immediately. If you don't need then just remove it. I try to keep the API scope as slim as possible and expand as I get feature requests

@Ruby184
Copy link
Author

Ruby184 commented May 10, 2022

@thetutlage @targos All should be good and ready to merge.

@thetutlage
Copy link
Member

All looks good to me. One final thing, can we have tests for other listing methods as well? Like pipe, map and filter?

@Ruby184
Copy link
Author

Ruby184 commented May 13, 2022

@thetutlage I have added the tests you wanted :)

@thetutlage thetutlage self-requested a review May 17, 2022 05:45
@thetutlage
Copy link
Member

Looks all good to me and I am ready to merge. What you think @targos ?

@targos
Copy link
Member

targos commented May 17, 2022

Sorry I don't have time to look at it again. I trust your judgment 👍

@thetutlage thetutlage merged commit e98cb43 into adonisjs:develop May 17, 2022
@thetutlage
Copy link
Member

@Ruby184 Thanks a ton for taking the time to create this PR and also showing patience with all the back and forth we had. 🎉

@Akpagni5547
Copy link

Good evening everyone, really sorry if this is not where I have to express my concern. Indeed, I had to use the list functionality on Drive.use('s3').list!('images').toArray(),
it gives me the following error:
"message": "Drive_1.default.use(...)).list is not a function",
when I use it with local, there is no problem, but with s3 it does not work

@Ruby184
Copy link
Author

Ruby184 commented May 23, 2022

Hello @Akpagni5547, implementation for s3 and gcs is not ready yet, so list method does not exist for the drivers. I am working on it and should be available in the following days. Thanks for being patient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Request Request to add a new feature to the package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants