Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-484: Revise README to include more detail about software components #286

Closed
wants to merge 4 commits into from
Closed
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 28 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,17 +32,38 @@ Arrow is a set of technologies that enable big-data systems to process and move
Initial implementations include:

- [The Arrow Format](https://github.com/apache/arrow/tree/master/format)
- [Arrow Structures and APIs in C++](https://github.com/apache/arrow/tree/master/cpp)
- [Arrow Structures and APIs in Java](https://github.com/apache/arrow/tree/master/java)
- [Java implementation](https://github.com/apache/arrow/tree/master/java)
- [C++ implementation](https://github.com/apache/arrow/tree/master/cpp)
- [Python interface to C++ libraries](https://github.com/apache/arrow/tree/master/python)

Arrow is an [Apache Software Foundation](www.apache.org) project. More info can be found at [arrow.apache.org](http://arrow.apache.org).
Arrow is an [Apache Software Foundation](www.apache.org) project. Learn more at
[arrow.apache.org](http://arrow.apache.org).

#### What's in the Arrow libraries?

The reference Arrow implementations contain a number of distinct software
components:

- Columnar vector/array and table row batch containers supporting nested data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should drop DataFrame (or superset of a DataFrame) somewhere. That's at least what people in the PyData space are more used too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

- Fast, language agnostic metadata messaging layer (using Google's Flatbuffers
library)
- Reference-counted off-heap buffer memory management, for zero-copy memory
sharing and handling memory-mapped files
- Low-overhead IO interfaces to files on disk, HDFS (C++ only)
- Self-describing binary wire formats (streaming and batch/file-like) for
remote procedure calls (RPC) and
interprocess communication (IPC)
- Integration tests for verifying binary compatibility between the
implementations (e.g. sending data from Java to C++)
- Conversions to and from other in-memory data structures (e.g. Python's pandas
library)

#### Getting involved

Right now the primary audience for Apache Arrow are the designers and
developers of data systems; most people will use Apache Arrow indirectly
through systems that use it for internal data handling and interoperating with
other Arrow-enabled systems.
Right now the primary audience for Apache Arrow are the developers of data
systems; most people will use Apache Arrow indirectly through systems that use
it for internal data handling and interoperating with other Arrow-enabled
systems.

Even if you do not plan to contribute to Apache Arrow itself or Arrow
integrations in other projects, we'd be happy to have you involved:
Expand Down