Merge pull request #358 from Qbeast-io/update-md-files

Update .md qb-spark files
Qbeast-io · Jul 24, 2024 · 47f271e · 47f271e
2 parents b685bd3 + 5221277
commit 47f271e
Show file tree

Hide file tree

Showing 4 changed files with 28 additions and 24 deletions.
diff --git a/docs/CloudStorages.md b/docs/CloudStorages.md
@@ -10,19 +10,23 @@ We currently support Hadoop 2.7 and 3.2 (recommended), so feel free to use any o
 Nevertheless, if you use Hadoop 2.7 you'll need to add some **extra** configurations depending on the provider, which you can find below.
 Note that some versions may not work for a cloud provider, so please read carefully.
 
-### Configs for Hadoop 2.7
-<details><summary>AWS S3</summary>
-There's no known working version of Hadoop 2.7 for AWS S3. However, you can try to use it.<br />
-Remember to include the following option if using Hadoop 2.7:<br />
-<code>--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem</code>
-</details>
+## Configs for Hadoop 2.7
+
+### AWS S3
+
+There's no known working version of Hadoop 2.7 for AWS S3. However, you can try to use it.
+
+Remember to include the following option if using Hadoop 2.7:
+``` --conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem ```
+
+### Azure Blob Storage
+
+- You can use this provider with Hadoop 2.7. To do so, you need to change the Hadoop library to 2.7 (remember to change your Spark installation as well):
+``` org.apache.hadoop:hadoop-azure:2.7.4 ```
+
+- In addition, you must include the following config to use the _wasb_ filesystem:
+``` --conf spark.hadoop.fs.AbstractFileSystem.wasb.impl=org.apache.hadoop.fs.azure.Wasb ```
 
-<details><summary>Azure Blob Storage</summary>
-- You can use this provider with Hadoop 2.7. To do so, you need to change the hadoop library to 2.7 (remember to change your Spark
-installation as well):<br />
-<code>org.apache.hadoop:hadoop-azure:2.7.4</code><br>
-- In addition you must include the following config to use the _wasb_ filesystem:<br /><code>--conf spark.hadoop.fs.AbstractFileSystem.wasb.impl=org.apache.hadoop.fs.azure.Wasb</code>
-</details>
 
 ## AWS S3
 Amazon Web Services S3 does not work with Hadoop 2.7. For this provider you'll need Hadoop 3.2.
@@ -66,4 +70,4 @@ $SPARK_HOME/bin/spark-shell \
 --packages io.qbeast:qbeast-spark_2.12:0.3.2,\
 io.delta:delta-core_2.12:1.2.0,\
 org.apache.hadoop:hadoop-azure:3.2.0
-```
+```
diff --git a/docs/FAQ.md b/docs/FAQ.md
@@ -1,12 +1,12 @@
 # FAQ: Frequently Asked Questions
-<hr>
+<hr />
 
 Q - I get an error like this when first indexing with qbeast following the steps from Quickstart:
   ```
   java.io.IOException: (null) entry in command string: null chmod 0644
   ```
 A - You can find the solution [here](https://stackoverflow.com/questions/48010634/why-does-spark-application-fail-with-ioexception-null-entry-in-command-strin/48012285#48012285)
-<hr>
+<hr />
 
 Q - I run into an "out or memory error" when indexing with qbeast format.
 
@@ -24,4 +24,4 @@ Try to `repartition` the `DataFrame` before writing on your Spark Application:
 ```scala
 df.repartition(200).write.format("qbeast").option("columnsToIndex", "x,y").save("/tmp/qbeast")
 ```
-<hr>
+<hr />
diff --git a/docs/OTreeAlgorithm.md b/docs/OTreeAlgorithm.md
@@ -8,7 +8,7 @@ The two primary goals of the **OTree algorithm** are
 ### Recursive Space Division
 One of the most important techniques used to build a **multi-dimensional index** is through **recursive space division**; a bounded vector space initially containing all the data is **recursively divided** into **equal-sized**, **non-overlapping** subspaces, as long as they exceed the predefined **capacity**.
 
-For a dataset indexed with `n` columns, the constructed index is an n-dimensional vector space composed of <img src="https://render.githubusercontent.com/render/math?math=2^n"> subspaces, or what we call `cubes`, with **non-overlapping** boundaries. Each cube can contain a predefined number of element `cap`, and exceeding it would trigger **recursively dividing** a cube into child cubes by halving the ranges in all dimensions until the number of elements included no longer exceeds `cap`.
+For a dataset indexed with `n` columns, the constructed index is an n-dimensional vector space composed of <img src="https://render.githubusercontent.com/render/math?math=2^n" /> subspaces, or what we call `cubes`, with **non-overlapping** boundaries. Each cube can contain a predefined number of element `cap`, and exceeding it would trigger **recursively dividing** a cube into child cubes by halving the ranges in all dimensions until the number of elements included no longer exceeds `cap`.
 
 Say that we use two columns, `x`, and `y` to build the index, and the parameter cap for each cube is 2. The first image in the figure below is the **root cube**, containing more than two elements. The cube is split into four **equal-sized**, **non-overlapping** child cubes with one space division step, as shown in the middle image. Three of the four cubes are in good condition as a result of the division.
 
@@ -92,7 +92,7 @@ The rest of the page describes the theoretical details about the OTree, includin
 
 
 <p align="center">
-  <img src="./images/proper-cube.png">
+  <img src="./images/proper-cube.png" />
 </p>
 
 
@@ -113,7 +113,7 @@ The following image depicts the three possible states, and whether a cube is of
 
 
 <p align="center">
-  <img src="./images/states-and-transitions.png">
+  <img src="./images/states-and-transitions.png" />
 </p>
 
 
@@ -148,4 +148,4 @@ The following image depicts the three possible states, and whether a cube is of
   - READ:
     - `f >= maxWeight`: don't read anything
     - `f < maxWeight`: read elements from the `payload` with `weight <= f`
-
+
diff --git a/docs/QbeastFormat.md b/docs/QbeastFormat.md
@@ -8,7 +8,7 @@ A **transaction log** in Delta Lake holds information about what objects compris
 
 
 <p align="center">
-  <img src="./images/delta.png" width=600 height=500>
+  <img src="./images/delta.png" width="600" height="500" />
 </p>
 
 
@@ -281,7 +281,7 @@ revisions.foreach(revision =>
 ```
 > Note that **Revision ID number 0 is reserved for Stagin Area** (non-indexed files). This ensures compatibility with underlying table formats.
 
-## Compaction (<v0.6.0)
+## Compaction (&lt;v0.6.0)
 
 > Compaction is **NOT available from version 0.6.0**. Although it is present, it calls the `optimize` command underneath.
 > Read all the reasoning and changes on the [Qbeast Format 0.6.0](./QbeastFormat0.6.0.md) document and check the issue [#294](https://github.com/Qbeast-io/qbeast-spark/issues/294) for more info.
@@ -304,7 +304,7 @@ table.compact(0)
 ```
 
 
-## Index Replication (<v0.6.0)
+## Index Replication (&lt;v0.6.0)
 
 
-> Analyze and Replication operations are **NOT available from version 0.6.0**. Read all the reasoning and changes on the [Qbeast Format 0.6.0](./QbeastFormat0.6.0.md) document.
+> Analyze and Replication operations are **NOT available from version 0.6.0**. Read all the reasoning and changes on the [Qbeast Format 0.6.0](./QbeastFormat0.6.0.md) document.