feat(CI):enable markdownlint and typos in docs.yml (#508)

apache · Aug 12, 2024 · 8bb741e · 8bb741e
1 parent d460f3d
commit 8bb741e
Show file tree

Hide file tree

Showing 18 changed files with 94 additions and 97 deletions.
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -54,6 +54,18 @@ jobs:
         with:
           node-version: '18'
 
+      - name: Run markdownlint
+        run: |
+          npm install -g markdownlint-cli
+          markdownlint 'docs/**/*.md' --fix --config 'docs/.markdownlint.yaml'
+
+      - name: Run typos
+        run: |
+          curl -sSL https://github.com/crate-ci/typos/releases/download/v1.23.6/typos-v1.23.6-x86_64-unknown-linux-musl.tar.gz -o typos.tar.gz
+          tar -xzf typos.tar.gz
+          chmod +x typos
+          ./typos docs
+
       - name: Checkout Website
         uses: actions/checkout@v4
         with:
@@ -74,5 +86,3 @@ jobs:
       - name: Build
         working-directory: website
         run: pnpm build
-
-# TODO: enable markdownlint & typos
diff --git a/docs/.markdownlint.yaml b/docs/.markdownlint.yaml
@@ -0,0 +1,8 @@
+# Ignore MD013 because the document requires long lines to keep code examples intact
+MD013: false
+
+# Ignore MD033 because inline HTML is necessary in some cases, such as specific formatting needs
+MD033: false
+
+# Ignore MD025 because the document structure requires multiple top-level headings to reflect different chapters or sections
+MD025: false
diff --git a/docs/index.md b/docs/index.md
@@ -8,10 +8,13 @@ sidebar_position: 0
 Welcome to the documentation for Apache GraphAr. Here, you can find information about the GraphAr File Format, including specification and libraries.
 
 ### [Overview](/docs/overview)
+
 Overview of the Apache GraphAr project.
 
 ### [Specification](/docs/category/specification)
+
 Documentation about the Apache GraphAr file format.
 
 ### [Libraries](/docs/category/libraries)
-Documentation about the libraries of Apache GraphAr. 
+
+Documentation about the libraries of Apache GraphAr.
diff --git a/docs/libraries/cpp/examples/graphscope.md b/docs/libraries/cpp/examples/graphscope.md
@@ -30,7 +30,7 @@ The time performance of *ArrowFragmentBuilder* and *ArrowFragmentWriter*
 in GraphScope is heavily dependent on the partitioning of the graph into
 GraphAr format files, that is, the *vertex chunk size* and *edge chunk size*, which
 are specified in the vertex information file and in the edge information
-file, respectively. 
+file, respectively.
 
 Generally speaking, fewer chunks are created if the file size is large.
 On small graphs, this can be disadvantageous as it reduces the degree of

diff --git a/docs/libraries/cpp/examples/out-of-core.md b/docs/libraries/cpp/examples/out-of-core.md
@@ -89,7 +89,6 @@ neighbors. Please refer to
 [cc_push_example.cc](https://github.com/apache/incubator-graphar/blob/main/cpp/examples/cc_push_example.cc)
 for the complete code.
 
-
 :::tip
 
 In this example, two kinds of edges are used. The

diff --git a/docs/libraries/cpp/getting-started.md b/docs/libraries/cpp/getting-started.md
@@ -202,7 +202,7 @@ the above graph and outputs the end vertices for each edge.
 
 ```cpp
 graph_info = ...
-auto expect = graphar::EdgesCollection::Make(graph_info, "person", "konws", "person", graphar::AdjListType::ordered_by_source);
+auto expect = graphar::EdgesCollection::Make(graph_info, "person", "knows", "person", graphar::AdjListType::ordered_by_source);
 auto edges = expect.value();
 
 for (auto it = edges->begin(); it != edges->end(); ++it) {
@@ -287,4 +287,4 @@ with URI schema, e.g., "s3://bucket-name/path/to/data" or "s3://\[access-key:sec
 
 [Code example](https://github.com/apache/incubator-graphar/blob/main/cpp/test/test_info.cc#L777-L792) demonstrates how to read data from S3.
 
-Note that once you use cloud storage, you need to call `graphar::InitalizeS3` to initialize S3 APIs before starting the work and call`graphar::FinalizeS3()` to shut down the APIs after the work finish.
+Note that once you use cloud storage, you need to call `graphar::InitializeS3` to initialize S3 APIs before starting the work and call`graphar::FinalizeS3()` to shut down the APIs after the work finish.
diff --git a/docs/libraries/java/how_to_develop_java.md b/docs/libraries/java/how_to_develop_java.md
@@ -10,7 +10,7 @@ GraphAr Java library based on GraphAr C++ library and an efficient FFI
 for Java and C++ called
 [FastFFI](https://github.com/alibaba/fastFFI).
 
-### Source Code Level 
+### Source Code Level
 
 - Interface
 - Class
@@ -80,8 +80,8 @@ Please refer to
 ## How To Test
 
 ```bash
-$ export GAR_TEST_DATA=$PWD/../../testing/
-$ mvn clean test
+export GAR_TEST_DATA=$PWD/../../testing/
+mvn clean test
 ```
 
 This will build GraphAr C++ library internally for Java. If you already
@@ -96,11 +96,11 @@ To ensure CI for checking code style will pass, please ensure check
 below is success:
 
 ```bash
-$ mvn spotless:check
+mvn spotless:check
 ```
 
 If there are violations, running command below to automatically format:
 
 ```bash
-$ mvn spotless:apply
+mvn spotless:apply
 ```
diff --git a/docs/libraries/java/java.md b/docs/libraries/java/java.md
@@ -11,19 +11,19 @@ Based on an efficient FFI for Java and C++ called
 library allows users to write Java for generating, loading and
 transforming GraphAr format files. It consists of several components:
 
--  **Information Classes**: As same with in the C++ library, the
+- **Information Classes**: As same with in the C++ library, the
    information classes are implemented to construct and access the meta
    information about the **graphs**, **vertices** and **edges** in
    GraphAr.
 
--  **Writers**: The GraphAr Java writer provides a set of interfaces
+- **Writers**: The GraphAr Java writer provides a set of interfaces
    that can be used to write Apache Arrow VectorSchemaRoot into GraphAr format
    files. Every time it takes a VectorSchemaRoot as the logical table
    for a type of vertices or edges, then convert it to ArrowTable, and
    then dumps it to standard GraphAr format files (CSV, ORC or Parquet files) under
    the specific directory path.
 
--  **Readers**: The GraphAr Java reader provides a set of interfaces
+- **Readers**: The GraphAr Java reader provides a set of interfaces
    that can be used to read GraphAr format files. It reads a collection of vertices
    or edges at a time and assembles the result into the ArrowTable.
    Similar with the reader in the C++ library, it supports the users to
@@ -41,49 +41,48 @@ Firstly, install llvm-11. `LLVM11_HOME` should point to the home of
 LLVM 11. In Ubuntu, it is at `/usr/lib/llvm-11`. Basically, the build
 procedure the following binary:
 
--  `$LLVM11_HOME/bin/clang++`
--  `$LLVM11_HOME/bin/ld.lld`
--  `$LLVM11_HOME/lib/cmake/llvm`
+- `$LLVM11_HOME/bin/clang++`
+- `$LLVM11_HOME/bin/ld.lld`
+- `$LLVM11_HOME/lib/cmake/llvm`
 
 Tips:
 
--  Use Ubuntu as example:
+- Use Ubuntu as example:
 
 ```bash
-$ sudo apt-get install llvm-11 clang-11 lld-11 libclang-11-dev libz-dev -y
-$ export LLVM11_HOME=/usr/lib/llvm-11
+sudo apt-get install llvm-11 clang-11 lld-11 libclang-11-dev libz-dev -y
+export LLVM11_HOME=/usr/lib/llvm-11
 ```
 
--  Or compile from source with this [script](https://github.com/alibaba/fastFFI/blob/main/docker/install-llvm11.sh):
+- Or compile from source with this [script](https://github.com/alibaba/fastFFI/blob/main/docker/install-llvm11.sh):
 
 ```bash
-$ export LLVM11_HOME=/usr/lib/llvm-11
-$ export LLVM_VAR=11.0.0
-$ sudo ./install-llvm11.sh
+export LLVM11_HOME=/usr/lib/llvm-11
+export LLVM_VAR=11.0.0
+sudo ./install-llvm11.sh
 ```
 
 Make the graphar-java-library directory as the current working
 directory:
 
 ```bash
-$ git clone https://github.com/apache/incubator-graphar.git
-$ cd incubator-graphar
-$ git submodule update --init
-$ cd maven-projects/java
+git clone https://github.com/apache/incubator-graphar.git
+cd incubator-graphar
+git submodule update --init
+cd maven-projects/java
 ```
 
 Compile package:
 
 ```bash
-$ mvn clean install -DskipTests
+mvn clean install -DskipTests
 ```
 
 This will build GraphAr C++ library internally for Java. If you already installed GraphAr C++ library in your system,
 you can append this option to skip: `-DbuildGarCPP=OFF`.
 
 Then set GraphAr as a dependency in maven project:
 
-
 ```xml
 <dependencies>
     <dependency>
@@ -212,4 +211,4 @@ StdPair<Long, Long> range = reader.getRange().value();
 
 See [test for
 readers](https://github.com/apache/incubator-graphar/blob/main/maven-projects/java/src/test/java/org/apache/graphar/readers)
-for the complete example.
+for the complete example.
diff --git a/docs/libraries/pyspark/how-to.md b/docs/libraries/pyspark/how-to.md
@@ -30,7 +30,7 @@ spark = (
 ## GraphAr PySpark initialize
 
 PySpark bindings are heavily relying on JVM-calls via ``py4j``. To
-initiate all the neccessary things for it just call
+initiate all the necessary things for it just call
 ``graphar_pyspark.initialize()``:
 
 ```python
@@ -53,15 +53,14 @@ from graphar_pyspark.enums import GarType, FileType
 
 Main objects of GraphAr are the following:
 
--  GraphInfo
--  VertexInfo
--  EdgeInfo
+- GraphInfo
+- VertexInfo
+- EdgeInfo
 
 You can check [Scala library documentation](../spark/spark.md)
 for the more detailed information.
 
-
-##  Creating objects in graphar_pyspark
+## Creating objects in graphar_pyspark
 
 GraphAr PySpark package provide two main ways how to initiate
 objects, like ``GraphInfo``:
@@ -71,7 +70,6 @@ objects, like ``GraphInfo``:
 - ``from_scala(jvm_ref)`` when you create an object from the
    corresponded JVM-object (``py4j.java_gateway.JavaObject``)
 
-
 ```python
 help(Property.from_python)
 
@@ -95,7 +93,7 @@ print(type(python_property))
 
 You can always get a reference to the corresponding JVM object. For
 example, if you want to use it in your own code and need a direct link
-to the underlaying instance of Scala Class, you can just call
+to the underlying instance of Scala Class, you can just call
 ``to_scala()`` method:
 
 ```python
@@ -128,9 +126,9 @@ Each public property and method of the Scala API is provided in
 python, but in a pythonic-naming convention. For example, in Scala,
 ``Property`` has the following fields:
 
--  name
--  data_type
--  is_primary
+- name
+- data_type
+- is_primary
 
 For each of such a field in Scala API there is a getter and setter
 methods. You can call them from the Python too:
@@ -142,7 +140,7 @@ python_property.get_name()
 ```
 
 You can also modify fields, but be careful: when you modify field of
-instance of the Python class, you modify the underlaying Scala Object
+instance of the Python class, you modify the underlying Scala Object
 at the same moment!
 
 ```python
@@ -168,7 +166,6 @@ modern_graph = GraphInfo.load_graph_info("../../testing/modern_graph/modern_grap
 After that you can work with such an objects like regular python
 objects:
 
-
 ```python
 print(modern_graph_v_person.dump())
 
@@ -195,14 +192,14 @@ label: person
 version: gar/v1
 "      
 ```
-            
+
 ```python
 print(modern_graph_v_person.contain_property("id") is True)
 print(modern_graph_v_person.contain_property("bad_id?") is False)
 
 True
 True
 ```
-            
+
 Please, refer to Scala API and examples of GraphAr Spark Scala
 library to see detailed and business-case oriented examples!
diff --git a/docs/libraries/pyspark/pyspark.md b/docs/libraries/pyspark/pyspark.md
@@ -65,7 +65,6 @@ GraphAr PySpark uses poetry as a build system. Please refer to
 to find the manual how to install this tool. Currently GraphAr PySpark
 is build with Python 3.9 and PySpark 3.2
 
-
 Make the graphar-pyspark-library directory as the current working
 directory:
 
@@ -75,7 +74,6 @@ cd incubator-graphar/pyspark
 
 Build package:
 
-
 ```bash
 poetry build
 ```
@@ -87,7 +85,6 @@ generated in the directory *pyspark/dist/*.
 
 You cannot install graphar-pyspark from PyPi for now.
 
-
 ## How to Use
 
 ### Initialization
@@ -97,7 +94,6 @@ Scala. You need to have *spark-x.x.x.jar* in your *spark-jars*.
 Please refer to [GraphAr scala documentation](../spark/spark.md) to get
 this JAR.
 
-
 ```python
 // create a SparkSession from pyspark.sql import SparkSession
 

diff --git a/docs/libraries/spark/examples.md b/docs/libraries/spark/examples.md
@@ -11,7 +11,6 @@ sidebar_position: 1
 
 Examples of this co-working integration have been provided as showcases.
 
-
 ### Examples
 
 ### Transform GraphAr format files
@@ -24,7 +23,6 @@ the original data is first loaded into a Spark DataFrame using the GraphAr Spark
 Then, the DataFrame is written into generated GraphAr format files through a GraphAr Spark Writer,
 following the meta data defined in a new information file.
 
-
 ### Compute with GraphX
 
 Another important use case of GraphAr is to use it as a data source for graph
@@ -33,7 +31,6 @@ a GraphX graph from reading GraphAr format files and executing a connected-compo
 Also, executing queries with Spark SQL and running other graph analytic algorithms
 can be implemented in a similar fashion.
 
-
 ### Import/Export graphs of Neo4j
 
 [Neo4j](https://neo4j.com/product/neo4j-graph-database) graph database provides
@@ -210,4 +207,4 @@ See [GraphAr2Neo4j.scala][graphar2neo4j] for the complete example.
 [transformer-example]: https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/TransformExample.scala
 [compute-example]: https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/test/scala/org/apache/graphar/ComputeExample.scala
 [neo4j2graphar]: https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/main/scala/org/apache/graphar/example/Neo4j2GraphAr.scala
-[graphar2neo4j]: https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/main/scala/org/apache/graphar/example/GraphAr2Neo4j.scala
+[graphar2neo4j]: https://github.com/apache/incubator-graphar/blob/main/maven-projects/spark/graphar/src/main/scala/org/apache/graphar/example/GraphAr2Neo4j.scala
-Original file line number
+Diff line change
@@ Expand Up / @@ -89,7 +89,6 @@ neighbors. Please refer to @@
     [cc_push_example.cc](https://github.com/apache/incubator-graphar/blob/main/cpp/examples/cc_push_example.cc)
     for the complete code.
     :::tip
     In this example, two kinds of edges are used. The
@@ Expand Down @@