Skip to content

Commit

Permalink
feat(CI):enable markdownlint and typos in docs.yml (#508)
Browse files Browse the repository at this point in the history
  • Loading branch information
ywh555hhh authored Aug 12, 2024
1 parent d460f3d commit 8bb741e
Show file tree
Hide file tree
Showing 18 changed files with 94 additions and 97 deletions.
14 changes: 12 additions & 2 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,18 @@ jobs:
with:
node-version: '18'

- name: Run markdownlint
run: |
npm install -g markdownlint-cli
markdownlint 'docs/**/*.md' --fix --config 'docs/.markdownlint.yaml'
- name: Run typos
run: |
curl -sSL https://github.com/crate-ci/typos/releases/download/v1.23.6/typos-v1.23.6-x86_64-unknown-linux-musl.tar.gz -o typos.tar.gz
tar -xzf typos.tar.gz
chmod +x typos
./typos docs
- name: Checkout Website
uses: actions/checkout@v4
with:
Expand All @@ -74,5 +86,3 @@ jobs:
- name: Build
working-directory: website
run: pnpm build

# TODO: enable markdownlint & typos
8 changes: 8 additions & 0 deletions docs/.markdownlint.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Ignore MD013 because the document requires long lines to keep code examples intact
MD013: false

# Ignore MD033 because inline HTML is necessary in some cases, such as specific formatting needs
MD033: false

# Ignore MD025 because the document structure requires multiple top-level headings to reflect different chapters or sections
MD025: false
5 changes: 4 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,13 @@ sidebar_position: 0
Welcome to the documentation for Apache GraphAr. Here, you can find information about the GraphAr File Format, including specification and libraries.

### [Overview](/docs/overview)

Overview of the Apache GraphAr project.

### [Specification](/docs/category/specification)

Documentation about the Apache GraphAr file format.

### [Libraries](/docs/category/libraries)
Documentation about the libraries of Apache GraphAr.

Documentation about the libraries of Apache GraphAr.
2 changes: 1 addition & 1 deletion docs/libraries/cpp/examples/graphscope.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The time performance of *ArrowFragmentBuilder* and *ArrowFragmentWriter*
in GraphScope is heavily dependent on the partitioning of the graph into
GraphAr format files, that is, the *vertex chunk size* and *edge chunk size*, which
are specified in the vertex information file and in the edge information
file, respectively.
file, respectively.

Generally speaking, fewer chunks are created if the file size is large.
On small graphs, this can be disadvantageous as it reduces the degree of
Expand Down
1 change: 0 additions & 1 deletion docs/libraries/cpp/examples/out-of-core.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,6 @@ neighbors. Please refer to
[cc_push_example.cc](https://github.com/apache/incubator-graphar/blob/main/cpp/examples/cc_push_example.cc)
for the complete code.
:::tip
In this example, two kinds of edges are used. The
Expand Down
4 changes: 2 additions & 2 deletions docs/libraries/cpp/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -202,7 +202,7 @@ the above graph and outputs the end vertices for each edge.

```cpp
graph_info = ...
auto expect = graphar::EdgesCollection::Make(graph_info, "person", "konws", "person", graphar::AdjListType::ordered_by_source);
auto expect = graphar::EdgesCollection::Make(graph_info, "person", "knows", "person", graphar::AdjListType::ordered_by_source);
auto edges = expect.value();

for (auto it = edges->begin(); it != edges->end(); ++it) {
Expand Down Expand Up @@ -287,4 +287,4 @@ with URI schema, e.g., "s3://bucket-name/path/to/data" or "s3://\[access-key:sec

[Code example](https://github.com/apache/incubator-graphar/blob/main/cpp/test/test_info.cc#L777-L792) demonstrates how to read data from S3.

Note that once you use cloud storage, you need to call `graphar::InitalizeS3` to initialize S3 APIs before starting the work and call`graphar::FinalizeS3()` to shut down the APIs after the work finish.
Note that once you use cloud storage, you need to call `graphar::InitializeS3` to initialize S3 APIs before starting the work and call`graphar::FinalizeS3()` to shut down the APIs after the work finish.
10 changes: 5 additions & 5 deletions docs/libraries/java/how_to_develop_java.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ GraphAr Java library based on GraphAr C++ library and an efficient FFI
for Java and C++ called
[FastFFI](https://github.com/alibaba/fastFFI).

### Source Code Level
### Source Code Level

- Interface
- Class
Expand Down Expand Up @@ -80,8 +80,8 @@ Please refer to
## How To Test

```bash
$ export GAR_TEST_DATA=$PWD/../../testing/
$ mvn clean test
export GAR_TEST_DATA=$PWD/../../testing/
mvn clean test
```

This will build GraphAr C++ library internally for Java. If you already
Expand All @@ -96,11 +96,11 @@ To ensure CI for checking code style will pass, please ensure check
below is success:

```bash
$ mvn spotless:check
mvn spotless:check
```

If there are violations, running command below to automatically format:

```bash
$ mvn spotless:apply
mvn spotless:apply
```
39 changes: 19 additions & 20 deletions docs/libraries/java/java.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,19 +11,19 @@ Based on an efficient FFI for Java and C++ called
library allows users to write Java for generating, loading and
transforming GraphAr format files. It consists of several components:

- **Information Classes**: As same with in the C++ library, the
- **Information Classes**: As same with in the C++ library, the
information classes are implemented to construct and access the meta
information about the **graphs**, **vertices** and **edges** in
GraphAr.

- **Writers**: The GraphAr Java writer provides a set of interfaces
- **Writers**: The GraphAr Java writer provides a set of interfaces
that can be used to write Apache Arrow VectorSchemaRoot into GraphAr format
files. Every time it takes a VectorSchemaRoot as the logical table
for a type of vertices or edges, then convert it to ArrowTable, and
then dumps it to standard GraphAr format files (CSV, ORC or Parquet files) under
the specific directory path.

- **Readers**: The GraphAr Java reader provides a set of interfaces
- **Readers**: The GraphAr Java reader provides a set of interfaces
that can be used to read GraphAr format files. It reads a collection of vertices
or edges at a time and assembles the result into the ArrowTable.
Similar with the reader in the C++ library, it supports the users to
Expand All @@ -41,49 +41,48 @@ Firstly, install llvm-11. `LLVM11_HOME` should point to the home of
LLVM 11. In Ubuntu, it is at `/usr/lib/llvm-11`. Basically, the build
procedure the following binary:

- `$LLVM11_HOME/bin/clang++`
- `$LLVM11_HOME/bin/ld.lld`
- `$LLVM11_HOME/lib/cmake/llvm`
- `$LLVM11_HOME/bin/clang++`
- `$LLVM11_HOME/bin/ld.lld`
- `$LLVM11_HOME/lib/cmake/llvm`

Tips:

- Use Ubuntu as example:
- Use Ubuntu as example:

```bash
$ sudo apt-get install llvm-11 clang-11 lld-11 libclang-11-dev libz-dev -y
$ export LLVM11_HOME=/usr/lib/llvm-11
sudo apt-get install llvm-11 clang-11 lld-11 libclang-11-dev libz-dev -y
export LLVM11_HOME=/usr/lib/llvm-11
```

- Or compile from source with this [script](https://github.com/alibaba/fastFFI/blob/main/docker/install-llvm11.sh):
- Or compile from source with this [script](https://github.com/alibaba/fastFFI/blob/main/docker/install-llvm11.sh):

```bash
$ export LLVM11_HOME=/usr/lib/llvm-11
$ export LLVM_VAR=11.0.0
$ sudo ./install-llvm11.sh
export LLVM11_HOME=/usr/lib/llvm-11
export LLVM_VAR=11.0.0
sudo ./install-llvm11.sh
```

Make the graphar-java-library directory as the current working
directory:

```bash
$ git clone https://github.com/apache/incubator-graphar.git
$ cd incubator-graphar
$ git submodule update --init
$ cd maven-projects/java
git clone https://github.com/apache/incubator-graphar.git
cd incubator-graphar
git submodule update --init
cd maven-projects/java
```

Compile package:

```bash
$ mvn clean install -DskipTests
mvn clean install -DskipTests
```

This will build GraphAr C++ library internally for Java. If you already installed GraphAr C++ library in your system,
you can append this option to skip: `-DbuildGarCPP=OFF`.

Then set GraphAr as a dependency in maven project:


```xml
<dependencies>
<dependency>
Expand Down Expand Up @@ -212,4 +211,4 @@ StdPair<Long, Long> range = reader.getRange().value();

See [test for
readers](https://github.com/apache/incubator-graphar/blob/main/maven-projects/java/src/test/java/org/apache/graphar/readers)
for the complete example.
for the complete example.
27 changes: 12 additions & 15 deletions docs/libraries/pyspark/how-to.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ spark = (
## GraphAr PySpark initialize

PySpark bindings are heavily relying on JVM-calls via ``py4j``. To
initiate all the neccessary things for it just call
initiate all the necessary things for it just call
``graphar_pyspark.initialize()``:

```python
Expand All @@ -53,15 +53,14 @@ from graphar_pyspark.enums import GarType, FileType

Main objects of GraphAr are the following:

- GraphInfo
- VertexInfo
- EdgeInfo
- GraphInfo
- VertexInfo
- EdgeInfo

You can check [Scala library documentation](../spark/spark.md)
for the more detailed information.


## Creating objects in graphar_pyspark
## Creating objects in graphar_pyspark

GraphAr PySpark package provide two main ways how to initiate
objects, like ``GraphInfo``:
Expand All @@ -71,7 +70,6 @@ objects, like ``GraphInfo``:
- ``from_scala(jvm_ref)`` when you create an object from the
corresponded JVM-object (``py4j.java_gateway.JavaObject``)


```python
help(Property.from_python)

Expand All @@ -95,7 +93,7 @@ print(type(python_property))

You can always get a reference to the corresponding JVM object. For
example, if you want to use it in your own code and need a direct link
to the underlaying instance of Scala Class, you can just call
to the underlying instance of Scala Class, you can just call
``to_scala()`` method:

```python
Expand Down Expand Up @@ -128,9 +126,9 @@ Each public property and method of the Scala API is provided in
python, but in a pythonic-naming convention. For example, in Scala,
``Property`` has the following fields:

- name
- data_type
- is_primary
- name
- data_type
- is_primary

For each of such a field in Scala API there is a getter and setter
methods. You can call them from the Python too:
Expand All @@ -142,7 +140,7 @@ python_property.get_name()
```

You can also modify fields, but be careful: when you modify field of
instance of the Python class, you modify the underlaying Scala Object
instance of the Python class, you modify the underlying Scala Object
at the same moment!

```python
Expand All @@ -168,7 +166,6 @@ modern_graph = GraphInfo.load_graph_info("../../testing/modern_graph/modern_grap
After that you can work with such an objects like regular python
objects:


```python
print(modern_graph_v_person.dump())

Expand All @@ -195,14 +192,14 @@ label: person
version: gar/v1
"
```

```python
print(modern_graph_v_person.contain_property("id") is True)
print(modern_graph_v_person.contain_property("bad_id?") is False)

True
True
```

Please, refer to Scala API and examples of GraphAr Spark Scala
library to see detailed and business-case oriented examples!
4 changes: 0 additions & 4 deletions docs/libraries/pyspark/pyspark.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,6 @@ GraphAr PySpark uses poetry as a build system. Please refer to
to find the manual how to install this tool. Currently GraphAr PySpark
is build with Python 3.9 and PySpark 3.2


Make the graphar-pyspark-library directory as the current working
directory:

Expand All @@ -75,7 +74,6 @@ cd incubator-graphar/pyspark

Build package:


```bash
poetry build
```
Expand All @@ -87,7 +85,6 @@ generated in the directory *pyspark/dist/*.

You cannot install graphar-pyspark from PyPi for now.


## How to Use

### Initialization
Expand All @@ -97,7 +94,6 @@ Scala. You need to have *spark-x.x.x.jar* in your *spark-jars*.
Please refer to [GraphAr scala documentation](../spark/spark.md) to get
this JAR.


```python
// create a SparkSession from pyspark.sql import SparkSession

Expand Down
5 changes: 1 addition & 4 deletions docs/libraries/spark/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ sidebar_position: 1

Examples of this co-working integration have been provided as showcases.


### Examples

### Transform GraphAr format files
Expand All @@ -24,7 +23,6 @@ the original data is first loaded into a Spark DataFrame using the GraphAr Spark
Then, the DataFrame is written into generated GraphAr format files through a GraphAr Spark Writer,
following the meta data defined in a new information file.


### Compute with GraphX

Another important use case of GraphAr is to use it as a data source for graph
Expand All @@ -33,7 +31,6 @@ a GraphX graph from reading GraphAr format files and executing a connected-compo
Also, executing queries with Spark SQL and running other graph analytic algorithms
can be implemented in a similar fashion.


### Import/Export graphs of Neo4j

[Neo4j](https://neo4j.com/product/neo4j-graph-database) graph database provides
Expand Down Expand Up @@ -210,4 +207,4 @@ See [GraphAr2Neo4j.scala][graphar2neo4j] for the complete example.
[transformer-example]: https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TransformExample.scala
[compute-example]: https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/ComputeExample.scala
[neo4j2graphar]: https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/main/scala/org/apache/graphar/example/Neo4j2GraphAr.scala
[graphar2neo4j]: https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/main/scala/org/apache/graphar/example/GraphAr2Neo4j.scala
[graphar2neo4j]: https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/main/scala/org/apache/graphar/example/GraphAr2Neo4j.scala
Loading

0 comments on commit 8bb741e

Please sign in to comment.