Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat (cli): initialize the CLI of GraphAr and support baseETL functions #616

Merged
merged 35 commits into from
Oct 31, 2024

Conversation

jasinliu
Copy link
Contributor

@jasinliu jasinliu commented Sep 3, 2024

As title.

@jasinliu jasinliu closed this Sep 3, 2024
@jasinliu jasinliu reopened this Sep 3, 2024
@jasinliu jasinliu marked this pull request as draft September 3, 2024 12:19
@jasinliu jasinliu changed the title [WIP] GraphAr Cli GraphAr Cli Oct 25, 2024
@jasinliu jasinliu marked this pull request as ready for review October 25, 2024 09:37
@jasinliu jasinliu requested review from acezen and Thespica October 25, 2024 09:37
cli/README.md Outdated Show resolved Hide resolved
@@ -0,0 +1,223 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we put these import configuration files to testing repo? I think the configuration files are better to put together with the data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I will move it.

cli/src/main.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@acezen acezen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can refer to 6a13df3 to fix the ORC support problem in CI.

cli/src/util.h Outdated

#pragma once

#include <arrow/adapters/orc/adapter.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

cli/src/util.h Outdated
#include <graphar/graph_info.h>
#include <parquet/arrow/reader.h>

#include <iostream>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the std library header should put at the begin.

Copy link
Member

@SemyonSinchenko SemyonSinchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a loot! It looks like a very big step forward for GraphAr. The main question in my mind at the moment is why are we going to re-define basic structures as pyantic models instead of using the code, generated from the protobuf messages? In my understanding the whole idea of proto was to avoid multiple definitions of the same things across the project.

cli/pyproject.toml Outdated Show resolved Hide resolved
cli/pyproject.toml Outdated Show resolved Hide resolved
cli/src/graphar_cli/config.py Outdated Show resolved Hide resolved

logger = getLogger("graphar_cli")

# TODO: convert to constants
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect to see these variables in UPPER_CASE

cli/src/graphar_cli/config.py Show resolved Hide resolved
cli/src/graphar_cli/config.py Show resolved Hide resolved
# under the License.

import logging
from typing import Union
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thing it should be moved to the TYPE_CHECKING block.


from logging import getLogger

from .config import ImportConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thing it can be moved to the TYPE_CHECKING block, isn't? I do not see runtime calls to this class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide some examples of using TYPE_CHECKING? It would help me a lot, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.python.org/3/library/typing.html#typing.TYPE_CHECKING

Thanks! I will try it with the protobuf changes.

cli/src/main.cc Show resolved Hide resolved
cli/src/main.cc Show resolved Hide resolved
cli/src/main.cc Show resolved Hide resolved
.github/workflows/cli.yml Outdated Show resolved Hide resolved
cli/README.md Show resolved Hide resolved
Comment on lines 84 to 114
# macos:
# name: ${{ matrix.architecture }} macOS ${{ matrix.macos-version }} CLI
# runs-on: macos-${{ matrix.macos-version }}
# if: ${{ !contains(github.event.pull_request.title, 'WIP') && github.event.pull_request.draft == false }}
# strategy:
# fail-fast: false
# matrix:
# include:
# - architecture: AMD64
# macos-version: "12"
# - architecture: ARM64
# macos-version: "14"
# steps:
# - uses: actions/checkout@v3
# with:
# submodules: true

# - name: Install dependencies
# run: |
# brew bundle --file=cpp/Brewfile


# - name: Build GraphAr And Run Tests
# working-directory: "cli"
# run: |
# pip install ./
# graphar --help
# graphar check -p ../testing/neo4j/MovieGraph.graph.yml
# graphar show -p ../testing/neo4j/MovieGraph.graph.yml -v Person
# graphar show -p ../testing/neo4j/MovieGraph.graph.yml -es Person -e ACTED_IN -ed Movie
# graphar import -c import.full.yml
Copy link
Contributor

@Thespica Thespica Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can disable the job with if: false, which is considered better to me, rather than ignore the code.

similar to example in this doc:

image

Ps. there should also be a note to explain with why we skip this job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion! Thanks very much!

macos:
name: ${{ matrix.architecture }} macOS ${{ matrix.macos-version }} CLI
runs-on: macos-${{ matrix.macos-version }}
if: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note to explain why we skip this job?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a TODO

Copy link
Contributor

@Thespica Thespica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM~
Thank you!

@acezen
Copy link
Contributor

acezen commented Oct 30, 2024

Thanks a loot! It looks like a very big step forward for GraphAr. The main question in my mind at the moment is why are we going to re-define basic structures as pyantic models instead of using the code, generated from the protobuf messages? In my understanding the whole idea of proto was to avoid multiple definitions of the same things across the project.

Hi, Sam, The CI currently is based on the C++ library and the protobuf now not full integrate into c++ library yet, so this PR has a little re-define of the base structure but will update when the protobuf is ready.

@@ -0,0 +1,138 @@
graphar: # Global configuration of imported data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the import congfig files should move to testing repo too and it's better to use.the yaml as the only format to specify configurations.

Copy link
Contributor

@acezen acezen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the import congfig files should move to testing repo too and it's better to use.the yaml as the only format to specify configurations.

@acezen acezen changed the title GraphAr Cli feat (cli): initialize the CLI of GraphAr and support baseETL functions Oct 30, 2024
name: str
vertex_chunk_size: Optional[int] = 100
edge_chunk_size: Optional[int] = 1024
file_type: Literal["parquet", "orc", "csv", "json"] = DEFAULT_FILE_TYPE
Copy link
Member

@SemyonSinchenko SemyonSinchenko Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just curious what was the reason to use strings instead of creating a enumeration? I the file_type in at least four places and it may become challenging to support it in the future... What do you think about something like this:

from enum import Enum

class FileType(str, Enum):
    PARQUET = "parquet"
    ORC = "orc"
    CSV = "csv"
    JSON = "json"

and use it instead of the string literals?

That also allows to avoid checking it each time like this if file_type not in SUPPORT_FILE_TYPES:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FileType(str, Enum) with recent versions of pydantic will raise a warning. I think it is necessary to pin a minor version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@SemyonSinchenko SemyonSinchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, I left a minor comment

@acezen acezen merged commit dcef976 into apache:main Oct 31, 2024
6 checks passed
Elssky pushed a commit to Elssky/incubator-graphar that referenced this pull request Nov 11, 2024
…ns (apache#616)

* init cli

* fix include

* add  ci

* add vertex info

* change config

* finish

* license

* fix license

* remove conda recipe

* enbale ci

* fix typo

* fix dependency

* update ci

* add data

* fix ci

* fix cmake

* add arrow in cmake

* add dependency

* fix review

* fix ci

* fix ci

* fix ci

* fix ci

* Update cli.yml

* Update cli.yml

* fix config type

* fix ci with new testing

* use enum

* use enum

* pin pydantic  version
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants