Consistent interface for stream reading and writing tabular data (csv/xls/json/etc).
- supports various formats: csv/tsv/xls/xlsx/json/native/etc
- reads data from variables, filesystem or Internet
- streams data instead of using a lot of memory
- processes data via simple user processors
- saves data using the same interface
To get started:
$ pip install tabulator
Open tabular stream from csv source:
from tabulator import Stream
with Stream('path.csv', headers=1) as stream:
print(stream.headers) # will print headers from 1 row
for row in stream:
print(row) # will print row values list
Stream
takes the source
argument:
<scheme>://path/to/file.<format>
and uses corresponding Loader
and Parser
to open and start to iterate over the tabular stream. Also user can pass scheme
and format
explicitly as constructor arguments. User can force Tabulator to use encoding of choice to open the table passing encoding
argument.
In this example we use context manager to call stream.open()
on enter and stream.close()
when we exit:
- stream can be iterated like file-like object returning row by row
- stream can be used for manual iterating with
iter(keyed/extended)
function - stream can be read into memory using
read(keyed/extended)
function with row countlimit
- headers can be accessed via
headers
property - rows sample can be accessed via
sample
property - stream pointer can be set to start via
reset
method - stream could be saved to filesystem using
save
method
Below the more expanded example is presented:
from tabulator import Stream
def skip_even_rows(extended_rows):
for number, headers, row in extended_rows:
if number % 2:
yield (number, headers, row)
stream = Stream('http://example.com/source.xls',
headers=1, encoding='utf-8', sample_size=1000,
post_parse=[skip_even_rows], sheet=1)
stream.open()
print(stream.sample) # will print sample
print(stream.headers) # will print headers list
print(stream.read(limit=10)) # will print 10 rows
stream.reset()
for keyed_row in stream.iter(keyed=True):
print keyed_row # will print row dict
for extended_row in stream.iter(extended=True):
print extended_row # will print (number, headers, row)
stream.reset()
stream.save('target.csv')
stream.close()
For the full list of options see - https://github.com/frictionlessdata/tabulator-py/blob/master/tabulator/stream.py#L17
Stream(source,
headers=None,
scheme=None,
format=None,
encoding=None,
post_parse=None,
sample_size=None,
**options)
closed/open/close/reset
headers -> list
sample -> rows
iter(keyed/extended=False) -> (generator) (keyed/extended)row[]
read(keyed/extended=False, limit=None) -> (keyed/extended)row[]
save(target, format=None, encoding=None, **options)
exceptions
~cli
Please read the contribution guideline:
Thanks!