-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-352: [Format] Add time resolution to Interval, schema documentation #920
Closed
Closed
Changes from all commits
Commits
Show all changes
880 commits
Select commit
Hold shift + click to select a range
c10b7d7
ARROW-1003: [C++] Check flag _WIN32 instead of __WIN32
bgosztonyi 84413b0
ARROW-901: [Python] Add Parquet unit test for fixed size binary
wesm 1c6f3ef
ARROW-813: [Python] setup.py sdist must also bundle dependent cmake m…
xhochy a7722dc
ARROW-993: [GLib] Add missing error checks in Go examples
kou a8338f1
ARROW-995: [Website] Fix a typo
kou 010bd22
ARROW-482 [Java] Exposing custom field metadata
elahrvivaz b066660
ARROW-996: [Website] Add 0.3.0 release announce in Japanese
kou a4f29f3
ARROW-29: [C++] FindRe2 cmake module
maxhora 05e8f68
ARROW-1010: [Website] Provide for translations without repeating blog…
wesm 95ee96b
ARROW-1016: Python: Include C++ headers (optionally) in wheels
xhochy 9e875a6
ARROW-819: Public Cython and C++ API in the style of lxml, arrow::py:…
wesm dbbbc66
ARROW-988 [JS] Add entry to Travis CI matrix
99ff240
ARROW-1011: [FORMAT] fix typo and mistakes in Layout.md
cloud-fan 5739e04
ARROW-1008: [C++] Add abstract stream writer and reader C++ APIs. Giv…
wesm d8d3d84
ARROW-1022: [Python] Add multithreaded read option to read_feather
wesm 852ee4f
ARROW-1024: Python: Update build time numpy version to 1.10.1
xhochy c7839e9
ARROW-1017: [Python] Fix memory leaks in conversion to pandas.DataFrame
wesm 393f46a
ARROW-1023: Python: Fix bundling of arrow-cpp for macOS
xhochy 37dbddf
ARROW-1004: [Python] Add conversions for numpy object arrays with int…
wesm 0543379
ARROW-1028: [Python] Fix IPC docs per API changes
wesm edfb2dc
ARROW-1027: [Python] Allow negative indexing in fields/columns on pya…
cpcloud ba9348f
ARROW-1031: [GLib] Support pretty print
kou 4381845
ARROW-1033: [Python] pytest discovers scripts/test_leak.py
cpcloud 681afab
ARROW-977: [java] Add Timezone aware timestamp vectors
julienledem b23b864
ARROW-1015 [Java] Schema-level metadata
elahrvivaz abbd815
ARROW-1025: [Website] Improved changelog for website, include git sho…
wesm 222cbfe
ARROW-998: [Format] Clarify that the IPC file footer contains an addi…
wesm 86a9055
ARROW-182: [C++] Factor out Array::Validate into a separate function
wesm 47e289a
ARROW-961: [Python] Rename InMemoryOutputStream to BufferOutputStream
wesm ce0bb53
ARROW-1002: [C++] Fix inconsistency with padding at start of IPC file…
wesm 8a8e7bb
ARROW-1037: [GLib] Follow reader name change
kou 49c5398
ARROW-1038: [GLib] Follow writer name change
kou e7e8d61
ARROW-1040: [GLib] Support tensor IO
kou bed0197
ARROW-881: [Python] Reconstruct Pandas DataFrame indexes using metadata
cpcloud c4086fe
ARROW-997: [Java] Implementing transferPair for FixedSizeListVector
elahrvivaz a4f3259
ARROW-1030: Python: Account for library versioning in parquet-cpp
xhochy fee4475
ARROW-1029: [Python] Fixes for building pyarrow with Parquet support …
wesm 62a17b7
ARROW-1044: [GLib] Support Feather
kou 0eec40a
ARROW-1046: [Python] Reconcile pandas metadata spec
wesm 37cdc6e
ARROW-970: [Python] Nicer experience if user accidentally calls pyarr…
wesm ff72951
ARROW-1053: [Python] Remove unnecessary Py_INCREF in PyBuffer causing…
wesm a8f8ba0
[maven-release-plugin] prepare release apache-arrow-0.4.0
wesm cf4ef5e
Increment version to 0.5.0-SNAPSHOT
wesm a6e77f4
ARROW-1054: [Python] Fix test failure on pandas 0.19.2, some refactoring
wesm b06602d
ARROW-1049: [java] vector template cleanup
julienledem d2cc199
ARROW-1062: [GLib] Follow API changes in examples
kou 84b7ee1
ARROW-1057: Fix cmake warning and msvc debug asserts
rip-nsk aa652cb
ARROW-1060: [Python] Add unit tests for reference counts in memoryvie…
wesm 33117d9
ARROW-1034: [PYTHON] Resolve wheel build issues on Windows
maxhora 1cb18d5
ARROW-1061: [C++] Harden decimal parsing against invalid strings
cpcloud 078357a
ARROW-1066: [Python] pandas 0.20.1 deprecation of pd.lib causes a war…
jreback 4e4435e
ARROW-424: [C++] Make ReadAt, Write HDFS functions threadsafe
wesm 8a700cc
ARROW-1063: [Website] Updates for 0.4.0 release, release posting
wesm 03e8b54
ARROW-1069: Add instructions for publishing maven artifacts
julienledem 51b6bf2
ARROW-897: [GLib] Extract CI configuration for GLib
kou 530f0da
[Doc] Change cpp api doc, std:shared_pointer_cast to std::static_poin…
kumar-m-kiran 8229688
ARROW-1078: [Python] Account for Apache Parquet shared library consol…
wesm 5c155c3
ARROW-1075: [GLib] Fix build error on macOS
kou 092afb6
ARROW-990: [JS] Add tslint support for linting TypeScript
ba97f34
ARROW-1084: Implementations of BufferAllocator should handle Netty's …
0576ff5
ARROW-1085: [java] Follow up on template cleanup. Missing method for …
julienledem 931a877
ARROW-1070: [C++] Use physical types for Feather date/time types
wesm a81aefb
ARROW-1082: [GLib] Add CI on macOS
kou 8f2b44b
ARROW-1051: [Python] Opt in to Parquet unit tests to avoid accidental…
wesm a44155d
ARROW-986: [Format] Add brief explanation of dictionary batches in IP…
wesm cfaddab
ARROW-1050: [C++] Export arrow::ValidateArray
wesm 316930c
ARROW-1056: [Python] Ignore pandas index in parquet+hdfs test
wesm 4e134e5
ARROW-1091: Decimal scale and precision are flipped
julienledem a367fd4
ARROW-1086: include additional pxd files during package build
44dba71
ARROW-1020: [Format] Revise language for Timestamp type in Schema.fbs…
wesm 1a72acd
[Doc] Fix a few links for files moved in ARROW-957
tkelman 5589dda
ARROW-1080: C++: Add tutorial about converting to/from row-wise repre…
xhochy c3e865d
ARROW-1090: Improve build_ext usability with --bundle-arrow-cpp
402baa4
ARROW-1092: More Decimal and scale flipped follow-up
julienledem ac54075
ARROW-1088: [Python] Only test unicode filenames if system supports them
jeffknupp 2a12482
ARROW-1094: [C++] Always truncate buffer read in ReadableFile::Read i…
wesm 4631543
[maven-release-plugin] prepare release apache-arrow-0.4.1
wesm 41b58e4
[maven-release-plugin] prepare for next development iteration
wesm e344066
ARROW-1095: Add Arrow logo PNG to website img folder
wesm 7a7b0c2
ARROW-1048: Use existing LD_LIBRARY_PATH in source release script to …
wesm a382034
ARROW-1101: Implement write(TypeHolder) methods in UnionListWriter
a44d584
ARROW-1102: Make MessageSerializer.serializeMessage() public
julienledem ae6142d
ARROW-1107: [JAVA] Fix getField() for NullableMapVector
StevenMPhillips 06c26a2
ARROW-1108: [JAVA] Check if ArrowBuf is empty buffer in getActualCons…
StevenMPhillips 0e680f0
ARROW-1109: [JAVA] transferOwnership fails when readerIndex is not 0
julienledem c6cf124
ARROW-1110: [JAVA] make union vector naming consistent
julienledem 11deee6
ARROW-1111: [JAVA] Make aligning buffers optional, and allow -1 for u…
StevenMPhillips ac64853
ARROW-1112: [JAVA] Set lastSet for VarLength and List vectors when lo…
StevenMPhillips 2a2b109
ARROW-742: [C++] std::wstring_convert exceptions handling
maxhora 25ba44c
ARROW-460: [C++] JSON read/write for dictionaries
wesm d25ea63
ARROW-1115: [C++] use CCACHE_FOUND value for ccache path
697df1b
ARROW-1117: [Docs] Minor issues in GLib README
sekikn d1de66b
ARROW-1118: [Site] Website updates for 0.4.1
wesm 5b66c25
ARROW-1122: [Website] Add turbodbc + arrow blog post
MathMagique 3f26dfa
ARROW-1096: [C++] CreateFileMapping maximum size calculation issue
maxhora d54bf48
ARROW-1122: [Website] Change timestamp to yield correct Jekyll date
wesm 1a23419
ARROW-1124: Increase numpy dependency to >=1.10.x
xhochy 5be05ac
ARROW-742: [C++] Use gflags from toolchain; Resolve cmake FindGFlags …
maxhora d874d4e
ARROW-1081: Fill null_bitmap correctly in TestBase
xhochy b5e8a48
ARROW-1128: [Docs] command to build a wheel is not properly rendered
sekikn 86c67d0
ARROW-1129: [C++] Fix gflags issue in Linux/macOS toolchain builds
wesm f0f1ca6
ARROW-1138: Travis: Use OpenJDK7 instead of OracleJDK7
xhochy ef579ca
ARROW-1123: Make jemalloc the default allocator
xhochy 5e34309
ARROW-1104: Integrate in-memory object store into arrow
pcmoritz 222628c
ARROW-1140: [C++] Allow optional build of plasma
cpcloud 608b89e
ARROW-1073: C++: Adapative integer builder
xhochy a16c124
ARROW-1137: Python: Ensure Pandas roundtrip of all-None column
xhochy c1ec0c7
ARROW-1039: Python: Remove duplicate column
xhochy 074dde4
ARROW-1143: C++: Fix comparison of NullArray
xhochy e209e58
ARROW-1144: [C++] Remove unused variable
cpcloud 6768f52
ARROW-1139: Silence dlmalloc warning on clang-4.0
pcmoritz 8bf567e
ARROW-1136: [C++] Add null checks for invalid streams
wesm b7befeb
ARROW-1132: [Python] Unable to write pandas DataFrame w/MultiIndex co…
cpcloud 1514016
ARROW-1146: Add .gitignore for *_generated.h files in src/plasma/format
cpcloud 98f7cac
ARROW-1142: [C++] Port over compression toolchain and interfaces from…
wesm 73007de
ARROW-1147: [C++] Allow optional vendoring of flatbuffers in plasma
cpcloud 41524d6
ARROW-1135: [C++] Use clang 4.0 in one of the Linux builds
wesm f3bcf76
ARROW-1145: [GLib] Add get_values()
kou bea30d6
ARROW-1113: [C++] Upgrade to gflags 2.2.0, use tarball instead of git…
wesm fc3f8c2
ARROW-1131: [Python] Enable the Parquet unit tests by default if the …
wesm 5de6eb5
ARROW-978: [Python] - Change python documentation sphinx theme to boo…
jeffknupp ec6e183
ARROW-1151: [C++] Add branch prediction to RETURN_NOT_OK
wesm bfe15db
ARROW-1152: [Cython] read_tensor should work with a readable file
pcmoritz 3e754a0
ARROW-1155: [Python] Add null check when user improperly instantiates…
wesm cb5f2b9
ARROW-1157: C++/Python: Decimal templates are not correctly exported …
xhochy b065228
ARROW-1154: [C++] Import miscellaneous computational utility code fro…
wesm a588938
ARROW-1159: [C++] Use dllimport for visibility when not building Arro…
wesm bddb219
ARROW-834: Python Support creating from iterables
holdenk 65558db
ARROW-1162: Empty data vector transfer between list vectors should no…
sudheeshkatkam 6958252
ARROW-1165: [C++] Refactor PythonDecimalToArrowDecimal to not use tem…
cpcloud af83c45
ARROW-1166: Fix errors in example and missing reference in Layout.md
fangzheng 9f500af
ARROW-1170: C++: Link to pthread on ARROW_JEMALLOC=OFF
xhochy 456330f
ARROW-599: CMake support of LZ4 compression lib
maxhora 930db87
ARROW-1169: [C++] jemalloc externalproject doesn't build with CMake's…
xhochy c294ec3
ARROW-1125: partial schemas for Table.from_pandas
96e7e99
ARROW-960: Add section on how to develop with pip
xhochy e268ce8
ARROW-915: [Python] Struct Array reads limited support
itaiin 9e4906f
ARROW-1160: C++: Implement DictionaryBuilder
xhochy 2e5ddfe
ARROW-1179: C++: Add missing virtual destructors
xhochy 2c3e8b0
ARROW-692: Integration test data generator for dictionary types
wesm a6d0c26
ARROW-1180: [GLib] Fix a returning invalid address bug in garrow_tens…
kou e18abac
ARROW-1181: [Python] Parquet multiindex test should be optional
BryanCutler cdee23c
ARROW-600: ZSTD compression lib support
maxhora 681479d
ARROW-1182: C++: Specify BUILD_BYPRODUCTS for zlib and zstd
xhochy e5a08dd
ARROW-1098. [Format] modify document mistake
7c18ddd
ARROW-966: [Python] Also accept Field instance in pyarrow.list_
wesm edcded3
ARROW-1148: [C++] Raise minimum CMake version to 3.2
wesm cbbd04b
ARROW-1172: [C++] Refactor to use unique_ptr for builders
kou 7d86c28
ARROW-693: [Java] Add dictionary support to JSON reader and writer
BryanCutler 00a7d55
ARROW-1185: [C++] Status class cleanup, warn_unused_result attribute …
wesm 83a4405
ARROW-599: [C++] Lz4 compression codec support
maxhora c398fda
ARROW-462: [C++] Implement in-memory conversions between non-nested p…
wesm 3309d12
ARROW-1174: [GLib] Fix ListArray test failure
kou b6b876c
ARROW-1193: [C++] Support pkg-config for arrow_python.so
kou e894532
ARROW-1197: [GLib] Fix a bug that record batch related functions for …
kou 7870804
ARROW-1074: Support lists and arrays in pandas DataFrames without exp…
f73c1c3
ARROW-1201: [Python] Incomplete Python types cause a core dump when r…
cpcloud cab07c2
ARROW-1202: [C++] Remove semicolons from status macros
cpcloud bc16e0e
ARROW-1196: [C++] Release, Debug, Toolchain, NMake Generator Appveyor…
maxhora 471a85f
ARROW-1168: [Python] pandas metadata may contain "mixed" data types
cpcloud ad57ea8
ARROW-1125: Python: Add public C++ API to unwrap PyArrow object
xhochy 8452071
ARROW-1199: [C++] Implement mutable POD struct for Array data
wesm dbedc8d
ARROW-1186: [C++] Add support to build only Parquet dependencies
e8c09c6
ARROW-1205: C++: Reference to type objects in ArrayLoader may cause s…
xhochy afb1928
ARROW-1206: [C++] Add finer grained control of compression library su…
wesm f0ecc06
ARROW-1208: [C++] Temporary remove conda's build of zstd from Toolcha…
maxhora 28e06d8
ARROW-1194: [Python] Expose MockOutputStream in pyarrow.
robertnishihara 74bc873
ARROW-1150: Silence AdaptiveIntBuilder compiler warning on MSVC
xhochy 85892a2
ARROW-1187: Python: Feather: Serialize a DataFrame with None column
xhochy 248a9d8
ARROW-1212: [GLib] Add garrow_binary_array_get_offsets_buffer()
kou c7e0995
ARROW-1208: [C++] Install zstd from conda for Toolchain Appveyor buil…
maxhora 8cad26e
ARROW-1200: C++: Switch DictionaryBuilder to signed integers
xhochy cb31b8b
ARROW-1215: [Python] Generate documentation for class members in API …
pcmoritz bfe3959
ARROW-962: [Python] Add schema attribute to RecordBatchFileReader
wesm f62db83
ARROW-1100: [Python] Add mode property to NativeFile
wesm d46b7ea
ARROW-992: [Python] Try to set a __version__ in in-place local builds
wesm 9ff39f3
ARROW-1216: [Python] Fix creating numpy array from arrow buffers on p…
pcmoritz 099f61c
ARROW-1218: [C++] Fix arrow build if no compression library is used
pcmoritz bb0a758
ARROW-1214: [Python/C++] Add C++ functionality to more easily handle …
wesm dc4216f
ARROW-575: Python: Auto-detect nested lists and nested numpy arrays i…
xhochy e438e15
ARROW-1217: [GLib] Add GInputStream based arrow::io::RandomAccessFile
kou f266f17
ARROW-1220: [C++] Cmake script errors out if lib is not found under *…
maxhora bf01966
[Python] Correct function name in use with pandas documentation
MarkLavrynenko 50b518a
ARROW-1183: [Python] Implement pandas conversions between Time32, Tim…
wesm cdf7db9
ARROW-1223: [GLib] Fix function name that returns wrapped object
kou d538426
ARROW-1228: [GLib] Fix test file name
kou 8644ee1
ARROW-1227: [GLib] Support GOutputStream
kou e370174
ARROW-1222: [Python] Raise exception when passing unsupported Python …
wesm 5fbfd8e
ARROW-597: [Python] Add read_pandas convenience to stream and file re…
wesm b474cac
ARROW-1221: [C++] Add run_clang_format.py script, exclusions file. Pi…
wesm ea9bc83
ARROW-1229: [GLib] Use "read" instead of "get" for reading record batch
kou 0396240
ARROW-1190: [JAVA] Fixing VectorLoader for duplicate field names
antonymayi 1541a08
ARROW-1177: [C++] Check for int32 offset overflow in ListBuilder, Bin…
wesm b4d34f8
ARROW-1191: [JAVA] Implement getField() method for complex readers
StevenMPhillips a1c8b83
ARROW-1079: [Python] Filter out private directories when building Par…
wesm 6035d9b
ARROW-1233: [C++] Validate libs availability in conda toolchain
maxhora 8152433
ARROW-1188: [Python] Handle Feather case where category values are nu…
wesm a73252d
ARROW-1235: [C++] Make operator<< for Array/Status and std::ostream i…
wesm 362e754
ARROW-1103: [Python] Support read_pandas (with index metadata) on dir…
wesm c5a89b7
ARROW-1120: Support for writing timestamp(ns) to Int96
xhochy fe9c7ef
ARROW-1236: Fix lib path in pkg-config file
Zaharid 6999dbd
ARROW-935: [Java] Build Javadoc and site with OpenJDK8 in Java CI build
wesm 2c5b412
ARROW-1167: [Python] Support chunking string columns in Table.from_pa…
wesm 5aa0809
[GLib] Update rat_exclusion_files.txt
wesm db181d1
ARROW-1244: Exclude C++ Plasma source tree when creating source release
wesm 62ef2cd
[C++] Remove Plasma source tree for 0.5.0 release pending IP Clearance
wesm e9f76e1
[maven-release-plugin] prepare release apache-arrow-0.5.0
wesm 9b26ed8
[maven-release-plugin] prepare for next development iteration
wesm 2c81015
[C++] Restore Plasma source tree after 0.5.0 release
wesm fabf7fb
ARROW-1241: [C++] Appveyor build matrix extended with Visual Studio 2…
maxhora e1b098e
ARROW-1240: [JAVA] security: upgrade slf4j to 1.7.25 and logback to 1…
457bb07
ARROW-1237: [JAVA] expose the ability to set lastSet
siddharthteotia 05f7058
ARROW-1239: [JAVA] upgrading git-commit-id-plugin
antonymayi a94f471
ARROW-1149: [Plasma] Create Cython client library for Plasma
pcmoritz 6042c48
ARROW-1195: [C++] CpuInfo init with cores number, frequency and cache…
maxhora ecdc86b
ARROW-1249: [JAVA] expose fillEmpties from Nullable variable length v…
siddharthteotia 886e2af
ARROW-1259: [Plasma] Speed up plasma tests
pcmoritz 9e692af
ARROW-1245: [Integration] Enable JavaTester in Integration tests
BryanCutler 11c92bf
ARROW-1246: [Format] Draft Flatbuffer metadata description for Map
wesm 204f148
ARROW-1260: [Plasma] Use factory method to create Python PlasmaClient
pcmoritz 07b89bf
ARROW-1219: [C++] Use Google C++ code formatting
wesm 08cec90
ARROW-1252: [Website] Updates for 0.5.0 and short blog post summarizi…
wesm ed54dce
ARROW-1253: [C++/Python] Speed up C++ / Python builds by using conda-…
wesm f90fa49
[Website] Fix link to 0.5.0 post on install page
wesm e9e17b5
ARROW-1258: [C++] Suppress Clang dlmalloc compiler warnings
wesm 2eeaa95
ARROW-1248: [Python] Suppress return-type-c-linkage warning in Cython…
wesm 676a4a9
ARROW-1255: [Plasma] Fix typo in plasma protocol; add DCHECK for Read…
Yeolar 5708cd1
[Java] Fix some typos in code comments and exception messages
rendel dca5d96
ARROW-1275: [C++] Deafult Snappy static lib suffix updated to "_static"
maxhora d76e43e
ARROW-1268: [WEBSITE] Added blog post for Spark integration toPandas()
BryanCutler cae3510
ARROW-1274: [C++] Fix CMake >= 3.3 warning. Also add option to suppre…
wesm 7b3378f
ARROW-1204: [C++] Remove WholeProgramOptimization(/GL) compilation fl…
maxhora f72279b
ARROW-1288: Fix many license headers to use proper ASF one
wesm b7639c1
ARROW-1285: [Python] Delete any incomplete file when attempt to write…
wesm ff6c6e0
ARROW-1276: enable parquet serialization of empty DataFrames
crepererum 8841bc0
ARROW-1281: [C++/Python] Add Docker setup for testing HDFS IO in C++ …
wesm 33c85cd
[Java] Fix letter case in rat plugin config
4df2a0b
ARROW-1290: [C++] Double buffer size when exceeding capacity in arrow…
wesm 44855bb
ARROW-1273: [Python] Add Parquet read_metadata, read_schema convenien…
wesm 3b14765
ARROW-1289: [Python] Add PYARROW_BUILD_PLASMA CMake option, follow se…
wesm 1dd0f5f
ARROW-1267: [Java] Handle zero length case in BitVector.splitAndTransfer
StevenMPhillips 05af640
ARROW-276: [JAVA] Nullable Vectors should extend BaseValueVector and …
siddharthteotia ec32617
ARROW-1192: [JAVA] Use buffer slice for splitAndTransfer in List and …
siddharthteotia 5aea3a3
ARROW-1287: [Python] Implement whence argument for pyarrow.NativeFile…
wesm b4e9ba1
ARROW-968: [Python] Support slices in RecordBatch.__getitem__
wesm ea1b67c
ARROW-1294: [C++] Pin cmake=3.8.0 in MSVC toolchain build
wesm 4108bda
ARROW-1291: [Python] Cast non-string DataFrame columns to strings in …
wesm 2288bfc
ARROW-1264: [Python] Raise exception in Python instead of aborting if…
wesm b4eec62
ARROW-932: [Python] Fix MSVC compiler warnings, build Python with /WX…
wesm 4c6c8e7
Add time resolution to Interval, format documentation explaining Inte…
wesm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,6 +27,10 @@ enum MetadataVersion:short { | |
|
||
/// These are stored in the flatbuffer in the Type union below | ||
|
||
/// A scalar null type for data that cannot be classified as any other | ||
/// type. For example, if a CSV or JSON file were parsed and a particular field | ||
/// had all null values, then we could choose Null for this field. In some | ||
/// systems, Null can be casted to any other type | ||
table Null { | ||
} | ||
|
||
|
@@ -36,6 +40,8 @@ table Null { | |
table Struct_ { | ||
} | ||
|
||
/// Variable-length list value type. List lengths are encoded in the offsets | ||
/// buffer. See Layout.md for more detail | ||
table List { | ||
} | ||
|
||
|
@@ -95,18 +101,20 @@ table FloatingPoint { | |
precision: Precision; | ||
} | ||
|
||
/// Unicode with UTF-8 encoding | ||
table Utf8 { | ||
/// Variable-length binary type | ||
table Binary { | ||
} | ||
|
||
table Binary { | ||
/// Variable-length Unicode with UTF-8 encoding | ||
table Utf8 { | ||
} | ||
|
||
table FixedSizeBinary { | ||
/// Number of bytes per value | ||
byteWidth: int; | ||
} | ||
|
||
/// Boolean type, 1 bit per value | ||
table Bool { | ||
} | ||
|
||
|
@@ -175,8 +183,39 @@ table Timestamp { | |
} | ||
|
||
enum IntervalUnit: short { YEAR_MONTH, DAY_TIME} | ||
|
||
|
||
/// A type representing an absolute time difference, also called "timedelta" in | ||
/// some systems. | ||
/// | ||
/// SQL systems support many varieties of interval types. The PostgreSQL | ||
/// interval type is a 16-byte type with microsecond resolution. Microsoft SQL | ||
/// Server has yet different interval type representations. Some other | ||
/// libraries, like NumPy, have a 64-bit integer-based timedelta type with time | ||
/// resolution metadata | ||
/// | ||
/// We support two styles of intervals encoded in a 64-bit integer | ||
/// | ||
/// - YEAR_MONTH, value indicates number of elapsed whole months | ||
/// | ||
/// - DAY_TIME, value indicate absolute time offset according to the difference | ||
/// in UNIX time (i.e. excluding leap seconds). The resolution defaults to | ||
/// millisecond, but can be any of the other supported TimeUnit values as | ||
/// with Timestamp and Time types | ||
/// | ||
/// References | ||
/// - https://www.postgresql.org/docs/9.6/static/datatype-datetime.html | ||
/// - https://docs.microsoft.com/en-us/sql/odbc/reference/appendixes/interval-data-types | ||
table Interval { | ||
/// The kind of interval, YEAR_MONTH or DAY_TIME | ||
/// | ||
/// TODO(wesm): Should this be renamed to kind and change resolution to be | ||
/// "unit" for consistency with the other temporal types? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would be in favor of this. |
||
unit: IntervalUnit; | ||
|
||
/// The unit of time resolution for DAY_TIME. If null, assumed to be | ||
/// milliseconds | ||
resolution: TimeUnit = MILLISECOND; | ||
} | ||
|
||
/// ---------------------------------------------------------------------- | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on prior threads I'm not clear if both types are 64-bit or only DAY_TIME with YEAR-MONTh being 32-bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arrow Java uses 4 byte vectors for IntervalYearVector and 8 byte vectors for IntervalDayVector.
We would have to use 64-bit for DAY_TIME and 32-bit for YEAR_MONTH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
32-bits for YEAR_MONTH intervals is fine with me. The main purpose of this patch is to support other interval resolutions than milliseconds in DAY_TIME
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have two different widths would it make sense to make to separate interval tables?
It might make the contract a little bit less confusing (YEAR_MONTH has a fixed resolution), DAY_TIME has a configurable one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with @emkornfield. The two interval types differ on both the width and the resolution. It's probably cleaner to create two new types, and deprecate this one (but not delete it to handle backward compatibility).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for deleting last comment, I missed the one white line among all the green :(
@wesm do you agree with the approach? Do you have bandwidth to update the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can update this PR. I'd like to get @jacques-n or others to sign off on the format changes before it gets merged.
I don't think the current iteration of the metadata is implemented for IPC in Java, so this is not that disruptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually looking at this file it seems like there are other types that have different widths based on metadata (i.e.date in seconds since epoch are 32 bit and in milliseconds 64 bits, so we can probably keep as is).
Still ramping up on the java side of things, but it looks like the easiest path forward to not break existing class/structure hierarchy or cause undue confusion is to add a new IntervalUnit (maybe called EPOCH?) above and denote DAY_TIME as "optional" in the standard, but still have it reflect the current java definition (days and milliseconds stored contiguosly as 2 int32s, https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/IntervalDayVector.java)
It appears YEAR_MONTH already reflects the definition here (32 bit integer representing months https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/IntervalYearVector.java)