Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-352: [Format] Add time resolution to Interval, schema documentation #920

Closed
wants to merge 880 commits into from
Closed
Changes from all commits
Commits
Show all changes
880 commits
Select commit Hold shift + click to select a range
c10b7d7
ARROW-1003: [C++] Check flag _WIN32 instead of __WIN32
bgosztonyi May 11, 2017
84413b0
ARROW-901: [Python] Add Parquet unit test for fixed size binary
wesm May 11, 2017
1c6f3ef
ARROW-813: [Python] setup.py sdist must also bundle dependent cmake m…
xhochy May 11, 2017
a7722dc
ARROW-993: [GLib] Add missing error checks in Go examples
kou May 11, 2017
a8338f1
ARROW-995: [Website] Fix a typo
kou May 11, 2017
010bd22
ARROW-482 [Java] Exposing custom field metadata
elahrvivaz May 11, 2017
b066660
ARROW-996: [Website] Add 0.3.0 release announce in Japanese
kou May 11, 2017
a4f29f3
ARROW-29: [C++] FindRe2 cmake module
maxhora May 12, 2017
05e8f68
ARROW-1010: [Website] Provide for translations without repeating blog…
wesm May 12, 2017
95ee96b
ARROW-1016: Python: Include C++ headers (optionally) in wheels
xhochy May 12, 2017
9e875a6
ARROW-819: Public Cython and C++ API in the style of lxml, arrow::py:…
wesm May 13, 2017
dbbbc66
ARROW-988 [JS] Add entry to Travis CI matrix
May 13, 2017
99ff240
ARROW-1011: [FORMAT] fix typo and mistakes in Layout.md
cloud-fan May 14, 2017
5739e04
ARROW-1008: [C++] Add abstract stream writer and reader C++ APIs. Giv…
wesm May 14, 2017
d8d3d84
ARROW-1022: [Python] Add multithreaded read option to read_feather
wesm May 14, 2017
852ee4f
ARROW-1024: Python: Update build time numpy version to 1.10.1
xhochy May 14, 2017
c7839e9
ARROW-1017: [Python] Fix memory leaks in conversion to pandas.DataFrame
wesm May 14, 2017
393f46a
ARROW-1023: Python: Fix bundling of arrow-cpp for macOS
xhochy May 14, 2017
37dbddf
ARROW-1004: [Python] Add conversions for numpy object arrays with int…
wesm May 14, 2017
0543379
ARROW-1028: [Python] Fix IPC docs per API changes
wesm May 15, 2017
edfb2dc
ARROW-1027: [Python] Allow negative indexing in fields/columns on pya…
cpcloud May 15, 2017
ba9348f
ARROW-1031: [GLib] Support pretty print
kou May 15, 2017
4381845
ARROW-1033: [Python] pytest discovers scripts/test_leak.py
cpcloud May 15, 2017
681afab
ARROW-977: [java] Add Timezone aware timestamp vectors
julienledem May 15, 2017
b23b864
ARROW-1015 [Java] Schema-level metadata
elahrvivaz May 15, 2017
abbd815
ARROW-1025: [Website] Improved changelog for website, include git sho…
wesm May 16, 2017
222cbfe
ARROW-998: [Format] Clarify that the IPC file footer contains an addi…
wesm May 16, 2017
86a9055
ARROW-182: [C++] Factor out Array::Validate into a separate function
wesm May 16, 2017
47e289a
ARROW-961: [Python] Rename InMemoryOutputStream to BufferOutputStream
wesm May 16, 2017
ce0bb53
ARROW-1002: [C++] Fix inconsistency with padding at start of IPC file…
wesm May 16, 2017
8a8e7bb
ARROW-1037: [GLib] Follow reader name change
kou May 16, 2017
49c5398
ARROW-1038: [GLib] Follow writer name change
kou May 16, 2017
e7e8d61
ARROW-1040: [GLib] Support tensor IO
kou May 16, 2017
bed0197
ARROW-881: [Python] Reconstruct Pandas DataFrame indexes using metadata
cpcloud May 16, 2017
c4086fe
ARROW-997: [Java] Implementing transferPair for FixedSizeListVector
elahrvivaz May 17, 2017
a4f3259
ARROW-1030: Python: Account for library versioning in parquet-cpp
xhochy May 17, 2017
fee4475
ARROW-1029: [Python] Fixes for building pyarrow with Parquet support …
wesm May 17, 2017
62a17b7
ARROW-1044: [GLib] Support Feather
kou May 17, 2017
0eec40a
ARROW-1046: [Python] Reconcile pandas metadata spec
wesm May 17, 2017
37cdc6e
ARROW-970: [Python] Nicer experience if user accidentally calls pyarr…
wesm May 17, 2017
ff72951
ARROW-1053: [Python] Remove unnecessary Py_INCREF in PyBuffer causing…
wesm May 19, 2017
a8f8ba0
[maven-release-plugin] prepare release apache-arrow-0.4.0
wesm May 19, 2017
cf4ef5e
Increment version to 0.5.0-SNAPSHOT
wesm May 23, 2017
a6e77f4
ARROW-1054: [Python] Fix test failure on pandas 0.19.2, some refactoring
wesm May 19, 2017
b06602d
ARROW-1049: [java] vector template cleanup
julienledem May 22, 2017
d2cc199
ARROW-1062: [GLib] Follow API changes in examples
kou May 23, 2017
84b7ee1
ARROW-1057: Fix cmake warning and msvc debug asserts
rip-nsk May 23, 2017
aa652cb
ARROW-1060: [Python] Add unit tests for reference counts in memoryvie…
wesm May 23, 2017
33117d9
ARROW-1034: [PYTHON] Resolve wheel build issues on Windows
maxhora May 23, 2017
1cb18d5
ARROW-1061: [C++] Harden decimal parsing against invalid strings
cpcloud May 24, 2017
078357a
ARROW-1066: [Python] pandas 0.20.1 deprecation of pd.lib causes a war…
jreback May 24, 2017
4e4435e
ARROW-424: [C++] Make ReadAt, Write HDFS functions threadsafe
wesm May 24, 2017
8a700cc
ARROW-1063: [Website] Updates for 0.4.0 release, release posting
wesm May 24, 2017
03e8b54
ARROW-1069: Add instructions for publishing maven artifacts
julienledem May 26, 2017
51b6bf2
ARROW-897: [GLib] Extract CI configuration for GLib
kou May 26, 2017
530f0da
[Doc] Change cpp api doc, std:shared_pointer_cast to std::static_poin…
kumar-m-kiran May 29, 2017
8229688
ARROW-1078: [Python] Account for Apache Parquet shared library consol…
wesm May 31, 2017
5c155c3
ARROW-1075: [GLib] Fix build error on macOS
kou Jun 1, 2017
092afb6
ARROW-990: [JS] Add tslint support for linting TypeScript
Jun 2, 2017
ba97f34
ARROW-1084: Implementations of BufferAllocator should handle Netty's …
Jun 2, 2017
0576ff5
ARROW-1085: [java] Follow up on template cleanup. Missing method for …
julienledem Jun 2, 2017
931a877
ARROW-1070: [C++] Use physical types for Feather date/time types
wesm Jun 3, 2017
a81aefb
ARROW-1082: [GLib] Add CI on macOS
kou Jun 3, 2017
8f2b44b
ARROW-1051: [Python] Opt in to Parquet unit tests to avoid accidental…
wesm Jun 5, 2017
a44155d
ARROW-986: [Format] Add brief explanation of dictionary batches in IP…
wesm Jun 5, 2017
cfaddab
ARROW-1050: [C++] Export arrow::ValidateArray
wesm Jun 5, 2017
316930c
ARROW-1056: [Python] Ignore pandas index in parquet+hdfs test
wesm Jun 5, 2017
4e134e5
ARROW-1091: Decimal scale and precision are flipped
julienledem Jun 5, 2017
a367fd4
ARROW-1086: include additional pxd files during package build
Jun 6, 2017
44dba71
ARROW-1020: [Format] Revise language for Timestamp type in Schema.fbs…
wesm Jun 6, 2017
1a72acd
[Doc] Fix a few links for files moved in ARROW-957
tkelman May 24, 2017
5589dda
ARROW-1080: C++: Add tutorial about converting to/from row-wise repre…
xhochy Jun 6, 2017
c3e865d
ARROW-1090: Improve build_ext usability with --bundle-arrow-cpp
Jun 6, 2017
402baa4
ARROW-1092: More Decimal and scale flipped follow-up
julienledem Jun 6, 2017
ac54075
ARROW-1088: [Python] Only test unicode filenames if system supports them
jeffknupp Jun 6, 2017
2a12482
ARROW-1094: [C++] Always truncate buffer read in ReadableFile::Read i…
wesm Jun 6, 2017
4631543
[maven-release-plugin] prepare release apache-arrow-0.4.1
wesm Jun 7, 2017
41b58e4
[maven-release-plugin] prepare for next development iteration
wesm Jun 7, 2017
e344066
ARROW-1095: Add Arrow logo PNG to website img folder
wesm Jun 7, 2017
7a7b0c2
ARROW-1048: Use existing LD_LIBRARY_PATH in source release script to …
wesm Jun 7, 2017
a382034
ARROW-1101: Implement write(TypeHolder) methods in UnionListWriter
Jun 8, 2017
a44d584
ARROW-1102: Make MessageSerializer.serializeMessage() public
julienledem Jun 8, 2017
ae6142d
ARROW-1107: [JAVA] Fix getField() for NullableMapVector
StevenMPhillips Feb 3, 2017
06c26a2
ARROW-1108: [JAVA] Check if ArrowBuf is empty buffer in getActualCons…
StevenMPhillips Mar 27, 2017
0e680f0
ARROW-1109: [JAVA] transferOwnership fails when readerIndex is not 0
julienledem Apr 11, 2017
c6cf124
ARROW-1110: [JAVA] make union vector naming consistent
julienledem Apr 14, 2017
11deee6
ARROW-1111: [JAVA] Make aligning buffers optional, and allow -1 for u…
StevenMPhillips Jun 6, 2017
ac64853
ARROW-1112: [JAVA] Set lastSet for VarLength and List vectors when lo…
StevenMPhillips Jun 6, 2017
2a2b109
ARROW-742: [C++] std::wstring_convert exceptions handling
maxhora Jun 13, 2017
25ba44c
ARROW-460: [C++] JSON read/write for dictionaries
wesm Jun 13, 2017
d25ea63
ARROW-1115: [C++] use CCACHE_FOUND value for ccache path
Jun 13, 2017
697df1b
ARROW-1117: [Docs] Minor issues in GLib README
sekikn Jun 14, 2017
d1de66b
ARROW-1118: [Site] Website updates for 0.4.1
wesm Jun 15, 2017
5b66c25
ARROW-1122: [Website] Add turbodbc + arrow blog post
MathMagique Jun 15, 2017
3f26dfa
ARROW-1096: [C++] CreateFileMapping maximum size calculation issue
maxhora Jun 16, 2017
d54bf48
ARROW-1122: [Website] Change timestamp to yield correct Jekyll date
wesm Jun 16, 2017
1a23419
ARROW-1124: Increase numpy dependency to >=1.10.x
xhochy Jun 17, 2017
5be05ac
ARROW-742: [C++] Use gflags from toolchain; Resolve cmake FindGFlags …
maxhora Jun 18, 2017
d874d4e
ARROW-1081: Fill null_bitmap correctly in TestBase
xhochy Jun 19, 2017
b5e8a48
ARROW-1128: [Docs] command to build a wheel is not properly rendered
sekikn Jun 19, 2017
86c67d0
ARROW-1129: [C++] Fix gflags issue in Linux/macOS toolchain builds
wesm Jun 20, 2017
f0f1ca6
ARROW-1138: Travis: Use OpenJDK7 instead of OracleJDK7
xhochy Jun 22, 2017
ef579ca
ARROW-1123: Make jemalloc the default allocator
xhochy Jun 22, 2017
5e34309
ARROW-1104: Integrate in-memory object store into arrow
pcmoritz Jun 22, 2017
222628c
ARROW-1140: [C++] Allow optional build of plasma
cpcloud Jun 22, 2017
608b89e
ARROW-1073: C++: Adapative integer builder
xhochy Jun 22, 2017
a16c124
ARROW-1137: Python: Ensure Pandas roundtrip of all-None column
xhochy Jun 22, 2017
c1ec0c7
ARROW-1039: Python: Remove duplicate column
xhochy Jun 23, 2017
074dde4
ARROW-1143: C++: Fix comparison of NullArray
xhochy Jun 23, 2017
e209e58
ARROW-1144: [C++] Remove unused variable
cpcloud Jun 23, 2017
6768f52
ARROW-1139: Silence dlmalloc warning on clang-4.0
pcmoritz Jun 23, 2017
8bf567e
ARROW-1136: [C++] Add null checks for invalid streams
wesm Jun 23, 2017
b7befeb
ARROW-1132: [Python] Unable to write pandas DataFrame w/MultiIndex co…
cpcloud Jun 23, 2017
1514016
ARROW-1146: Add .gitignore for *_generated.h files in src/plasma/format
cpcloud Jun 23, 2017
98f7cac
ARROW-1142: [C++] Port over compression toolchain and interfaces from…
wesm Jun 23, 2017
73007de
ARROW-1147: [C++] Allow optional vendoring of flatbuffers in plasma
cpcloud Jun 23, 2017
41524d6
ARROW-1135: [C++] Use clang 4.0 in one of the Linux builds
wesm Jun 24, 2017
f3bcf76
ARROW-1145: [GLib] Add get_values()
kou Jun 24, 2017
bea30d6
ARROW-1113: [C++] Upgrade to gflags 2.2.0, use tarball instead of git…
wesm Jun 24, 2017
fc3f8c2
ARROW-1131: [Python] Enable the Parquet unit tests by default if the …
wesm Jun 26, 2017
5de6eb5
ARROW-978: [Python] - Change python documentation sphinx theme to boo…
jeffknupp Jun 26, 2017
ec6e183
ARROW-1151: [C++] Add branch prediction to RETURN_NOT_OK
wesm Jun 26, 2017
bfe15db
ARROW-1152: [Cython] read_tensor should work with a readable file
pcmoritz Jun 26, 2017
3e754a0
ARROW-1155: [Python] Add null check when user improperly instantiates…
wesm Jun 26, 2017
cb5f2b9
ARROW-1157: C++/Python: Decimal templates are not correctly exported …
xhochy Jun 27, 2017
b065228
ARROW-1154: [C++] Import miscellaneous computational utility code fro…
wesm Jun 27, 2017
a588938
ARROW-1159: [C++] Use dllimport for visibility when not building Arro…
wesm Jun 27, 2017
bddb219
ARROW-834: Python Support creating from iterables
holdenk Jun 28, 2017
65558db
ARROW-1162: Empty data vector transfer between list vectors should no…
sudheeshkatkam Jun 29, 2017
6958252
ARROW-1165: [C++] Refactor PythonDecimalToArrowDecimal to not use tem…
cpcloud Jun 29, 2017
af83c45
ARROW-1166: Fix errors in example and missing reference in Layout.md
fangzheng Jun 29, 2017
9f500af
ARROW-1170: C++: Link to pthread on ARROW_JEMALLOC=OFF
xhochy Jun 30, 2017
456330f
ARROW-599: CMake support of LZ4 compression lib
maxhora Jul 1, 2017
930db87
ARROW-1169: [C++] jemalloc externalproject doesn't build with CMake's…
xhochy Jun 29, 2017
c294ec3
ARROW-1125: partial schemas for Table.from_pandas
Jul 1, 2017
96e7e99
ARROW-960: Add section on how to develop with pip
xhochy Jun 27, 2017
e268ce8
ARROW-915: [Python] Struct Array reads limited support
itaiin Jul 2, 2017
9e4906f
ARROW-1160: C++: Implement DictionaryBuilder
xhochy Jul 2, 2017
2e5ddfe
ARROW-1179: C++: Add missing virtual destructors
xhochy Jul 3, 2017
2c3e8b0
ARROW-692: Integration test data generator for dictionary types
wesm Jul 3, 2017
a6d0c26
ARROW-1180: [GLib] Fix a returning invalid address bug in garrow_tens…
kou Jul 3, 2017
e18abac
ARROW-1181: [Python] Parquet multiindex test should be optional
BryanCutler Jul 3, 2017
cdee23c
ARROW-600: ZSTD compression lib support
maxhora Jul 3, 2017
681479d
ARROW-1182: C++: Specify BUILD_BYPRODUCTS for zlib and zstd
xhochy Jul 3, 2017
e5a08dd
ARROW-1098. [Format] modify document mistake
Jul 4, 2017
7c18ddd
ARROW-966: [Python] Also accept Field instance in pyarrow.list_
wesm Jul 4, 2017
edcded3
ARROW-1148: [C++] Raise minimum CMake version to 3.2
wesm Jul 4, 2017
cbbd04b
ARROW-1172: [C++] Refactor to use unique_ptr for builders
kou Jul 4, 2017
7d86c28
ARROW-693: [Java] Add dictionary support to JSON reader and writer
BryanCutler Jul 4, 2017
00a7d55
ARROW-1185: [C++] Status class cleanup, warn_unused_result attribute …
wesm Jul 5, 2017
83a4405
ARROW-599: [C++] Lz4 compression codec support
maxhora Jul 6, 2017
c398fda
ARROW-462: [C++] Implement in-memory conversions between non-nested p…
wesm Jul 7, 2017
3309d12
ARROW-1174: [GLib] Fix ListArray test failure
kou Jul 7, 2017
b6b876c
ARROW-1193: [C++] Support pkg-config for arrow_python.so
kou Jul 9, 2017
e894532
ARROW-1197: [GLib] Fix a bug that record batch related functions for …
kou Jul 9, 2017
7870804
ARROW-1074: Support lists and arrays in pandas DataFrames without exp…
Jul 10, 2017
f73c1c3
ARROW-1201: [Python] Incomplete Python types cause a core dump when r…
cpcloud Jul 10, 2017
cab07c2
ARROW-1202: [C++] Remove semicolons from status macros
cpcloud Jul 10, 2017
bc16e0e
ARROW-1196: [C++] Release, Debug, Toolchain, NMake Generator Appveyor…
maxhora Jul 10, 2017
471a85f
ARROW-1168: [Python] pandas metadata may contain "mixed" data types
cpcloud Jul 10, 2017
ad57ea8
ARROW-1125: Python: Add public C++ API to unwrap PyArrow object
xhochy Jul 11, 2017
8452071
ARROW-1199: [C++] Implement mutable POD struct for Array data
wesm Jul 11, 2017
dbedc8d
ARROW-1186: [C++] Add support to build only Parquet dependencies
Jul 11, 2017
e8c09c6
ARROW-1205: C++: Reference to type objects in ArrayLoader may cause s…
xhochy Jul 11, 2017
afb1928
ARROW-1206: [C++] Add finer grained control of compression library su…
wesm Jul 11, 2017
f0ecc06
ARROW-1208: [C++] Temporary remove conda's build of zstd from Toolcha…
maxhora Jul 12, 2017
28e06d8
ARROW-1194: [Python] Expose MockOutputStream in pyarrow.
robertnishihara Jul 13, 2017
74bc873
ARROW-1150: Silence AdaptiveIntBuilder compiler warning on MSVC
xhochy Jul 13, 2017
85892a2
ARROW-1187: Python: Feather: Serialize a DataFrame with None column
xhochy Jul 13, 2017
248a9d8
ARROW-1212: [GLib] Add garrow_binary_array_get_offsets_buffer()
kou Jul 13, 2017
c7e0995
ARROW-1208: [C++] Install zstd from conda for Toolchain Appveyor buil…
maxhora Jul 14, 2017
8cad26e
ARROW-1200: C++: Switch DictionaryBuilder to signed integers
xhochy Jul 14, 2017
cb31b8b
ARROW-1215: [Python] Generate documentation for class members in API …
pcmoritz Jul 14, 2017
bfe3959
ARROW-962: [Python] Add schema attribute to RecordBatchFileReader
wesm Jul 15, 2017
f62db83
ARROW-1100: [Python] Add mode property to NativeFile
wesm Jul 15, 2017
d46b7ea
ARROW-992: [Python] Try to set a __version__ in in-place local builds
wesm Jul 15, 2017
9ff39f3
ARROW-1216: [Python] Fix creating numpy array from arrow buffers on p…
pcmoritz Jul 15, 2017
099f61c
ARROW-1218: [C++] Fix arrow build if no compression library is used
pcmoritz Jul 15, 2017
bb0a758
ARROW-1214: [Python/C++] Add C++ functionality to more easily handle …
wesm Jul 15, 2017
dc4216f
ARROW-575: Python: Auto-detect nested lists and nested numpy arrays i…
xhochy Jul 15, 2017
e438e15
ARROW-1217: [GLib] Add GInputStream based arrow::io::RandomAccessFile
kou Jul 16, 2017
f266f17
ARROW-1220: [C++] Cmake script errors out if lib is not found under *…
maxhora Jul 16, 2017
bf01966
[Python] Correct function name in use with pandas documentation
MarkLavrynenko Jul 16, 2017
50b518a
ARROW-1183: [Python] Implement pandas conversions between Time32, Tim…
wesm Jul 16, 2017
cdf7db9
ARROW-1223: [GLib] Fix function name that returns wrapped object
kou Jul 16, 2017
d538426
ARROW-1228: [GLib] Fix test file name
kou Jul 17, 2017
8644ee1
ARROW-1227: [GLib] Support GOutputStream
kou Jul 17, 2017
e370174
ARROW-1222: [Python] Raise exception when passing unsupported Python …
wesm Jul 17, 2017
5fbfd8e
ARROW-597: [Python] Add read_pandas convenience to stream and file re…
wesm Jul 17, 2017
b474cac
ARROW-1221: [C++] Add run_clang_format.py script, exclusions file. Pi…
wesm Jul 17, 2017
ea9bc83
ARROW-1229: [GLib] Use "read" instead of "get" for reading record batch
kou Jul 17, 2017
0396240
ARROW-1190: [JAVA] Fixing VectorLoader for duplicate field names
antonymayi Jul 17, 2017
1541a08
ARROW-1177: [C++] Check for int32 offset overflow in ListBuilder, Bin…
wesm Jul 17, 2017
b4d34f8
ARROW-1191: [JAVA] Implement getField() method for complex readers
StevenMPhillips Jul 17, 2017
a1c8b83
ARROW-1079: [Python] Filter out private directories when building Par…
wesm Jul 18, 2017
6035d9b
ARROW-1233: [C++] Validate libs availability in conda toolchain
maxhora Jul 18, 2017
8152433
ARROW-1188: [Python] Handle Feather case where category values are nu…
wesm Jul 18, 2017
a73252d
ARROW-1235: [C++] Make operator<< for Array/Status and std::ostream i…
wesm Jul 18, 2017
362e754
ARROW-1103: [Python] Support read_pandas (with index metadata) on dir…
wesm Jul 18, 2017
c5a89b7
ARROW-1120: Support for writing timestamp(ns) to Int96
xhochy Jul 18, 2017
fe9c7ef
ARROW-1236: Fix lib path in pkg-config file
Zaharid Jul 19, 2017
6999dbd
ARROW-935: [Java] Build Javadoc and site with OpenJDK8 in Java CI build
wesm Jul 19, 2017
2c5b412
ARROW-1167: [Python] Support chunking string columns in Table.from_pa…
wesm Jul 19, 2017
5aa0809
[GLib] Update rat_exclusion_files.txt
wesm Jul 19, 2017
db181d1
ARROW-1244: Exclude C++ Plasma source tree when creating source release
wesm Jul 20, 2017
62ef2cd
[C++] Remove Plasma source tree for 0.5.0 release pending IP Clearance
wesm Jul 20, 2017
e9f76e1
[maven-release-plugin] prepare release apache-arrow-0.5.0
wesm Jul 20, 2017
9b26ed8
[maven-release-plugin] prepare for next development iteration
wesm Jul 20, 2017
2c81015
[C++] Restore Plasma source tree after 0.5.0 release
wesm Jul 23, 2017
fabf7fb
ARROW-1241: [C++] Appveyor build matrix extended with Visual Studio 2…
maxhora Jul 24, 2017
e1b098e
ARROW-1240: [JAVA] security: upgrade slf4j to 1.7.25 and logback to 1…
Jul 24, 2017
457bb07
ARROW-1237: [JAVA] expose the ability to set lastSet
siddharthteotia Jul 24, 2017
05f7058
ARROW-1239: [JAVA] upgrading git-commit-id-plugin
antonymayi Jul 24, 2017
a94f471
ARROW-1149: [Plasma] Create Cython client library for Plasma
pcmoritz Jul 24, 2017
6042c48
ARROW-1195: [C++] CpuInfo init with cores number, frequency and cache…
maxhora Jul 24, 2017
ecdc86b
ARROW-1249: [JAVA] expose fillEmpties from Nullable variable length v…
siddharthteotia Jul 25, 2017
886e2af
ARROW-1259: [Plasma] Speed up plasma tests
pcmoritz Jul 25, 2017
9e692af
ARROW-1245: [Integration] Enable JavaTester in Integration tests
BryanCutler Jul 25, 2017
11c92bf
ARROW-1246: [Format] Draft Flatbuffer metadata description for Map
wesm Jul 25, 2017
204f148
ARROW-1260: [Plasma] Use factory method to create Python PlasmaClient
pcmoritz Jul 25, 2017
07b89bf
ARROW-1219: [C++] Use Google C++ code formatting
wesm Jul 25, 2017
08cec90
ARROW-1252: [Website] Updates for 0.5.0 and short blog post summarizi…
wesm Jul 25, 2017
ed54dce
ARROW-1253: [C++/Python] Speed up C++ / Python builds by using conda-…
wesm Jul 25, 2017
f90fa49
[Website] Fix link to 0.5.0 post on install page
wesm Jul 25, 2017
e9e17b5
ARROW-1258: [C++] Suppress Clang dlmalloc compiler warnings
wesm Jul 26, 2017
2eeaa95
ARROW-1248: [Python] Suppress return-type-c-linkage warning in Cython…
wesm Jul 26, 2017
676a4a9
ARROW-1255: [Plasma] Fix typo in plasma protocol; add DCHECK for Read…
Yeolar Jul 26, 2017
5708cd1
[Java] Fix some typos in code comments and exception messages
rendel Jul 26, 2017
dca5d96
ARROW-1275: [C++] Deafult Snappy static lib suffix updated to "_static"
maxhora Jul 26, 2017
d76e43e
ARROW-1268: [WEBSITE] Added blog post for Spark integration toPandas()
BryanCutler Jul 27, 2017
cae3510
ARROW-1274: [C++] Fix CMake >= 3.3 warning. Also add option to suppre…
wesm Jul 27, 2017
7b3378f
ARROW-1204: [C++] Remove WholeProgramOptimization(/GL) compilation fl…
maxhora Jul 27, 2017
f72279b
ARROW-1288: Fix many license headers to use proper ASF one
wesm Jul 27, 2017
b7639c1
ARROW-1285: [Python] Delete any incomplete file when attempt to write…
wesm Jul 28, 2017
ff6c6e0
ARROW-1276: enable parquet serialization of empty DataFrames
crepererum Jul 28, 2017
8841bc0
ARROW-1281: [C++/Python] Add Docker setup for testing HDFS IO in C++ …
wesm Jul 28, 2017
33c85cd
[Java] Fix letter case in rat plugin config
Jul 27, 2017
4df2a0b
ARROW-1290: [C++] Double buffer size when exceeding capacity in arrow…
wesm Jul 28, 2017
44855bb
ARROW-1273: [Python] Add Parquet read_metadata, read_schema convenien…
wesm Jul 28, 2017
3b14765
ARROW-1289: [Python] Add PYARROW_BUILD_PLASMA CMake option, follow se…
wesm Jul 28, 2017
1dd0f5f
ARROW-1267: [Java] Handle zero length case in BitVector.splitAndTransfer
StevenMPhillips Jul 29, 2017
05af640
ARROW-276: [JAVA] Nullable Vectors should extend BaseValueVector and …
siddharthteotia Jul 29, 2017
ec32617
ARROW-1192: [JAVA] Use buffer slice for splitAndTransfer in List and …
siddharthteotia Jul 29, 2017
5aea3a3
ARROW-1287: [Python] Implement whence argument for pyarrow.NativeFile…
wesm Jul 29, 2017
b4e9ba1
ARROW-968: [Python] Support slices in RecordBatch.__getitem__
wesm Jul 29, 2017
ea1b67c
ARROW-1294: [C++] Pin cmake=3.8.0 in MSVC toolchain build
wesm Jul 29, 2017
4108bda
ARROW-1291: [Python] Cast non-string DataFrame columns to strings in …
wesm Jul 29, 2017
2288bfc
ARROW-1264: [Python] Raise exception in Python instead of aborting if…
wesm Jul 30, 2017
b4eec62
ARROW-932: [Python] Fix MSVC compiler warnings, build Python with /WX…
wesm Jul 30, 2017
4c6c8e7
Add time resolution to Interval, format documentation explaining Inte…
wesm Jul 31, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 42 additions & 3 deletions format/Schema.fbs
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ enum MetadataVersion:short {

/// These are stored in the flatbuffer in the Type union below

/// A scalar null type for data that cannot be classified as any other
/// type. For example, if a CSV or JSON file were parsed and a particular field
/// had all null values, then we could choose Null for this field. In some
/// systems, Null can be casted to any other type
table Null {
}

Expand All @@ -36,6 +40,8 @@ table Null {
table Struct_ {
}

/// Variable-length list value type. List lengths are encoded in the offsets
/// buffer. See Layout.md for more detail
table List {
}

Expand Down Expand Up @@ -95,18 +101,20 @@ table FloatingPoint {
precision: Precision;
}

/// Unicode with UTF-8 encoding
table Utf8 {
/// Variable-length binary type
table Binary {
}

table Binary {
/// Variable-length Unicode with UTF-8 encoding
table Utf8 {
}

table FixedSizeBinary {
/// Number of bytes per value
byteWidth: int;
}

/// Boolean type, 1 bit per value
table Bool {
}

Expand Down Expand Up @@ -175,8 +183,39 @@ table Timestamp {
}

enum IntervalUnit: short { YEAR_MONTH, DAY_TIME}


/// A type representing an absolute time difference, also called "timedelta" in
/// some systems.
///
/// SQL systems support many varieties of interval types. The PostgreSQL
/// interval type is a 16-byte type with microsecond resolution. Microsoft SQL
/// Server has yet different interval type representations. Some other
/// libraries, like NumPy, have a 64-bit integer-based timedelta type with time
/// resolution metadata
///
/// We support two styles of intervals encoded in a 64-bit integer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on prior threads I'm not clear if both types are 64-bit or only DAY_TIME with YEAR-MONTh being 32-bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrow Java uses 4 byte vectors for IntervalYearVector and 8 byte vectors for IntervalDayVector.

We would have to use 64-bit for DAY_TIME and 32-bit for YEAR_MONTH

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

32-bits for YEAR_MONTH intervals is fine with me. The main purpose of this patch is to support other interval resolutions than milliseconds in DAY_TIME

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have two different widths would it make sense to make to separate interval tables?
It might make the contract a little bit less confusing (YEAR_MONTH has a fixed resolution), DAY_TIME has a configurable one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with @emkornfield. The two interval types differ on both the width and the resolution. It's probably cleaner to create two new types, and deprecate this one (but not delete it to handle backward compatibility).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for deleting last comment, I missed the one white line among all the green :(

@wesm do you agree with the approach? Do you have bandwidth to update the PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can update this PR. I'd like to get @jacques-n or others to sign off on the format changes before it gets merged.

I don't think the current iteration of the metadata is implemented for IPC in Java, so this is not that disruptive.

Copy link
Contributor

@emkornfield emkornfield Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually looking at this file it seems like there are other types that have different widths based on metadata (i.e.date in seconds since epoch are 32 bit and in milliseconds 64 bits, so we can probably keep as is).

Still ramping up on the java side of things, but it looks like the easiest path forward to not break existing class/structure hierarchy or cause undue confusion is to add a new IntervalUnit (maybe called EPOCH?) above and denote DAY_TIME as "optional" in the standard, but still have it reflect the current java definition (days and milliseconds stored contiguosly as 2 int32s, https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/IntervalDayVector.java)

It appears YEAR_MONTH already reflects the definition here (32 bit integer representing months https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/IntervalYearVector.java)

///
/// - YEAR_MONTH, value indicates number of elapsed whole months
///
/// - DAY_TIME, value indicate absolute time offset according to the difference
/// in UNIX time (i.e. excluding leap seconds). The resolution defaults to
/// millisecond, but can be any of the other supported TimeUnit values as
/// with Timestamp and Time types
///
/// References
/// - https://www.postgresql.org/docs/9.6/static/datatype-datetime.html
/// - https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/interval-data-types
table Interval {
/// The kind of interval, YEAR_MONTH or DAY_TIME
///
/// TODO(wesm): Should this be renamed to kind and change resolution to be
/// "unit" for consistency with the other temporal types?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be in favor of this.

unit: IntervalUnit;

/// The unit of time resolution for DAY_TIME. If null, assumed to be
/// milliseconds
resolution: TimeUnit = MILLISECOND;
}

/// ----------------------------------------------------------------------
Expand Down