apache · xhochy · Nov 1, 2016 · Nov 4, 2016 · Nov 4, 2016 · Nov 9, 2016
diff --git a/python/doc/INSTALL.md b/python/doc/INSTALL.md
diff --git a/python/doc/index.rst b/python/doc/index.rst
@@ -31,14 +31,16 @@ additional functionality such as reading Apache Parquet files into Arrow
 structures.
 
 .. toctree::
-   :maxdepth: 4
-   :hidden:
+   :maxdepth: 2
+   :caption: Getting Started
 
+   Installing pyarrow <install.rst>
+   Pandas <pandas.rst>
    Module Reference <modules.rst>
 
-Indices and tables
-==================
+.. toctree::
+   :maxdepth: 2
+   :caption: Additional Features
+
+   Parquet format <parquet.rst>
 
-* :ref:`genindex`
-* :ref:`modindex`
-* :ref:`search`
diff --git a/python/doc/install.rst b/python/doc/install.rst
@@ -0,0 +1,151 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+Install PyArrow
+===============
+
+Conda
+-----
+
+To install the latest version of PyArrow from conda-forge using conda:
+
+.. code-block:: bash
+
+    conda install -c conda-forge pyarrow
+
+Pip
+---
+
+Install the latest version from PyPI:
+
+.. code-block:: bash
+
+    pip install pyarrow
+
+.. note::
+    Currently there are only binary artifcats available for Linux and MacOS.
+    Otherwise this will only pull the python sources and assumes an existing
+    installation of the C++ part of Arrow.
+    To retrieve the binary artifacts, you'll need a recent ``pip`` version that
+    supports features like the ``manylinux1`` tag.
+
+Building from source
+--------------------
+
+First, clone the master git repository:
+
+.. code-block:: bash
+
+    git clone https://github.com/apache/arrow.git arrow
+
+System requirements
+~~~~~~~~~~~~~~~~~~~
+
+Building pyarrow requires:
+
+* A C++11 compiler
+
+  * Linux: gcc >= 4.8 or clang >= 3.5
+  * OS X: XCode 6.4 or higher preferred
+
+* `CMake <https://cmake.org/>`_
+
+Python requirements
+~~~~~~~~~~~~~~~~~~~
+
+You will need Python (CPython) 2.7, 3.4, or 3.5 installed. Earlier releases and
+are not being targeted.
+
+.. note::
+    This library targets CPython only due to an emphasis on interoperability with
+    pandas and NumPy, which are only available for CPython.
+
+The build requires NumPy, Cython, and a few other Python dependencies:
+
+.. code-block:: bash
+
+    pip install cython
+    cd arrow/python
+    pip install -r requirements.txt
+
+Installing Arrow C++ library
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+First, you should choose an installation location for Arrow C++. In the future
+using the default system install location will work, but for now we are being
+explicit:
+
+.. code-block:: bash
+
+    export ARROW_HOME=$HOME/local
+
+Now, we build Arrow:
+
+.. code-block:: bash
+
+    cd arrow/cpp
+
+    mkdir dev-build
+    cd dev-build
+
+    cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME ..
+
+    make
+
+    # Use sudo here if $ARROW_HOME requires it
+    make install
+
+To get the optional Parquet support, you should also build and install 
+`parquet-cpp <https://github.com/apache/parquet-cpp/blob/master/README.md>`_.
+
+Install `pyarrow`
+~~~~~~~~~~~~~~~~~
+
+
+.. code-block:: bash
+
+    cd arrow/python
+
+    # --with-parquet enable the Apache Parquet support in PyArrow
+    # --build-type=release disables debugging information and turns on
+    #       compiler optimizations for native code
+    python setup.py build_ext --with-parquet --build-type=release install
+    python setup.py install
+
+.. warning::
+    On XCode 6 and prior there are some known OS X `@rpath` issues. If you are
+    unable to import pyarrow, upgrading XCode may be the solution.
+
+.. note::
+    In development installations, you will also need to set a correct
+    ``LD_LIBARY_PATH``. This is most probably done with
+    ``export LD_LIBARY_PATH=$ARROW_HOME/lib:$LD_LIBARY_PATH``.
+
+
+.. code-block:: python
+
+    In [1]: import pyarrow
+
+    In [2]: pyarrow.from_pylist([1,2,3])
+    Out[2]:
+    <pyarrow.array.Int64Array object at 0x7f899f3e60e8>
+    [
+      1,
+      2,
+      3
+    ]
+
diff --git a/python/doc/pandas.rst b/python/doc/pandas.rst
@@ -0,0 +1,114 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+Pandas Interface
+================
+
+To interface with Pandas, PyArrow provides various conversion routines to
+consume Pandas structures and convert back to them.
+
+DataFrames
+----------
+
+The equivalent to a Pandas DataFrame in Arrow is a :class:`pyarrow.table.Table`.
+Both consist of a set of named columns of equal length. While Pandas only
+supports flat columns, the Table also provides nested columns, thus it can
+represent more data than a DataFrame, so a full conversion is not always possible.
+
+Conversion from a Table to a DataFrame is done by calling
+:meth:`pyarrow.table.Table.to_pandas`. The inverse is then achieved by using
+:meth:`pyarrow.from_pandas_dataframe`. This conversion routine provides the
+convience parameter ``timestamps_to_ms``. Although Arrow supports timestamps of
+different resolutions, Pandas only supports nanosecond timestamps and most
+other systems (e.g. Parquet) only work on millisecond timestamps. This parameter
+can be used to already do the time conversion during the Pandas to Arrow
+conversion.
+
+.. code-block:: python
+
+    import pyarrow as pa
+    import pandas as pd
+
+    df = pd.DataFrame({"a": [1, 2, 3]})
+    # Convert from Pandas to Arrow
+    table = pa.from_pandas_dataframe(df)
+    # Convert back to Pandas
+    df_new = table.to_pandas()
+
+
+Series
+------
+
+In Arrow, the most similar structure to a Pandas Series is an Array.
+It is a vector that contains data of the same type as linear memory. You can
+convert a Pandas Series to an Arrow Array using :meth:`pyarrow.array.from_pandas_series`.
+As Arrow Arrays are always nullable, you can supply an optional mask using
+the ``mask`` parameter to mark all null-entries.
+
+Type differences
+----------------
+
+With the current design of Pandas and Arrow, it is not possible to convert all
+column types unmodified. One of the main issues here is that Pandas has no
+support for nullable columns of arbitrary type. Also ``datetime64`` is currently
+fixed to nanosecond resolution. On the other side, Arrow might be still missing
+support for some types.
+
+Pandas -> Arrow Conversion
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
++------------------------+--------------------------+
+| Source Type (Pandas)   | Destination Type (Arrow) |
++========================+==========================+
+| ``bool``               | ``BOOL``                 |
++------------------------+--------------------------+
+| ``(u)int{8,16,32,64}`` | ``(U)INT{8,16,32,64}``   |
++------------------------+--------------------------+
+| ``float32``            | ``FLOAT``                |
++------------------------+--------------------------+
+| ``float64``            | ``DOUBLE``               |
++------------------------+--------------------------+
+| ``str`` / ``unicode``  | ``STRING``               |
++------------------------+--------------------------+
+| ``pd.Timestamp``       | ``TIMESTAMP(unit=ns)``   |
++------------------------+--------------------------+
+| ``pd.Categorical``     | *not supported*          |
++------------------------+--------------------------+
+
+Arrow -> Pandas Conversion
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
++-------------------------------------+--------------------------------------------------------+
+| Source Type (Arrow)                 | Destination Type (Pandas)                              |
++=====================================+========================================================+
+| ``BOOL``                            | ``bool``                                               |
++-------------------------------------+--------------------------------------------------------+
+| ``BOOL`` *with nulls*               | ``object`` (with values ``True``, ``False``, ``None``) |
++-------------------------------------+--------------------------------------------------------+
+| ``(U)INT{8,16,32,64}``              | ``(u)int{8,16,32,64}``                                 |
++-------------------------------------+--------------------------------------------------------+
+| ``(U)INT{8,16,32,64}`` *with nulls* | ``float64``                                            |
++-------------------------------------+--------------------------------------------------------+
+| ``FLOAT``                           | ``float32``                                            |
++-------------------------------------+--------------------------------------------------------+
+| ``DOUBLE``                          | ``float64``                                            |
++-------------------------------------+--------------------------------------------------------+
+| ``STRING``                          | ``str``                                                |
++-------------------------------------+--------------------------------------------------------+
+| ``TIMESTAMP(unit=*)``               | ``pd.Timestamp`` (``np.datetime64[ns]``)               |
++-------------------------------------+--------------------------------------------------------+
+