Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

feat: load SharePoint Pages content, feat: load docs from root folder in drive, feat: optionally only load specific file types. #930

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 106 additions & 3 deletions llama_hub/microsoft_sharepoint/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,114 @@ More info on Microsoft Graph APIs - [Refer here](https://learn.microsoft.com/en-

To use this loader `client_id`, `client_secret` and `tenant_id` of the registered app in Microsoft Azure Portal is required.

This loader loads the files present in a specific folder in sharepoint.
This loader can:
- Load files present in a specific folder in SharePoint
- Load all files present in the drive of a SharePoint
- Load all pages under a SharePoint site

CraftingLevi marked this conversation as resolved.
Show resolved Hide resolved

If the files are present in the `Test` folder in SharePoint Site under `root` directory, then the input for the loader for `file_path` is `Test`

![FilePath](file_path_info.png)

### Example loading all files and pages
If `sharepoint_folder_path` is not provided it defaults to `""`.
In that case, the root folder of the SharePoint Drive is used as the folder to load files from.

If both `sharepoint_folder_path` is not provided and `recursive` is set to `True`, all files in the SharePoint Drive are loaded.
If `recursive` is not provided, it defaults to `False`. In this case, files from subfolders are not loaded.

```python
from llama_index import download_loader
SharePointLoader = download_loader("SharePointReader")

loader = SharePointLoader(
client_id = "<Client ID of the app>",
client_secret = "<Client Secret of the app>",
tenant_id = "<Tenant ID of the Micorsoft Azure Directory>"
)

documents = loader.load_data(
sharepoint_site_name = "<Sharepoint Site Name>",
recursive = True,
)
```

### Example loading a single folder
To load a single folder, specify the `sharepoint_folder_path` with the name of the folder or path from the root directory.

Example: `sharepoint_folder_path = "my/folder/path"`

In order to load only the documents from this `sharepoint_folder_path`, and not the pages for the `sharepoint_site_name`,
you need to provide the `include` argument as `['documents']`. By default, `include` is equal to `['documents', 'pages']`.

If you do not want to include files from subfolders for the given `sharepoint_folder_path`, remove the argument `recursive` (defaults to `False`).

```python
from llama_index import download_loader
SharePointLoader = download_loader("SharePointReader")

loader = SharePointLoader(
client_id = "<Client ID of the app>",
client_secret = "<Client Secret of the app>",
tenant_id = "<Tenant ID of the Micorsoft Azure Directory>"
)

documents = loader.load_data(
sharepoint_site_name = "<Sharepoint Site Name>",
sharepoint_folder_path = "<Folder Path>",
recursive = True,
include = ['documents']
)
```



### Example loading just pages
In order to load only the pages for the `sharepoint_site_name`,
you need to provide the `include` argument as `['pages']`. By default, `include` is equal to `['documents', 'pages']`.

Note: `recursive` and `sharepoint_folder_path` arguments have no effect if `documents` is not in the list of the argument `include`.

```python
from llama_index import download_loader
SharePointLoader = download_loader("SharePointReader")

loader = SharePointLoader(
client_id = "<Client ID of the app>",
client_secret = "<Client Secret of the app>",
tenant_id = "<Tenant ID of the Micorsoft Azure Directory>"
)

documents = loader.load_data(
sharepoint_site_name = "<Sharepoint Site Name>",
include = ['pages']
)
```

### Example loading just documents
```python
from llama_index import download_loader
SharePointLoader = download_loader("SharePointReader")

loader = SharePointLoader(
client_id = "<Client ID of the app>",
client_secret = "<Client Secret of the app>",
tenant_id = "<Tenant ID of the Micorsoft Azure Directory>"
)

documents = loader.load_data(
sharepoint_site_name = "<Sharepoint Site Name>",
recursive = True,
include = ['documents']
)
```

### Example loading just documents with filetype .docx or .pdf

If you want to only load specific filetypes, provide the file extension names in `file_types`.
Example: to only include .pdf and .docx files, set `file_types` to `['docx', 'pdf']`

```python
from llama_index import download_loader
SharePointLoader = download_loader("SharePointReader")
Expand All @@ -37,9 +139,10 @@ loader = SharePointLoader(
)

documents = loader.load_data(
sharepoint_site_name: "<Sharepoint Site Name>",
sharepoint_folder_path: "<Folder Path>",
sharepoint_site_name = "<Sharepoint Site Name>",
recursive = True,
include = ['documents'],
file_types = ['docx', 'pdf']
)
```

Expand Down
Loading
Loading