Fixed
BulkFactTable.__init__
now sets the attributes keyrefs
, measures
, and all
.
These attributes are required by the FactTablePartitioner
.
- Note
- This is the last version to actively support Python 2. Support for it will slowly be reduced as we continue to develop pygrametl.
- Added
drawntabletesting
a new module for testing ETL flows. The module makes it easy to define the preconditions and postconditions for the database as part of each test. This is done simply by "drawing" the tables and their contents using strings.AccumulatingSnapshotFactTable
a new class supporting accumulating snapshot fact tables where facts can be updated as a process progresses.BatchFactTable.__init__
now optionally takes the argumentusemultirow
. When this argument isTrue
(the default isFalse
), batches are loaded usingexecute
with a singleINSERT INTO name VALUES
statement instead ofexecutemany()
. (GitHub issue #19).closecurrent
method added toSlowlyChangingDimension
to make it possible to set an end date for the most current version without adding a new version.A (read-only) property
awaitingrows
added toBatchFactTable
and_BaseBulkloadable
to get the number of inserted rows awaiting to be loaded into the database table. (GitHub issue #23)- Changed
SlowlyChangingDimension.scdensure
now checks if the newest version has itstoatt
set to a value different frommaxto
(iftoatt
is defined). This can happen from a call toclosecurrent
or a manual update. If it is the case, a new version will be added whenscdensure
is called even if no other differences are present.Generators in
datasources
don't raiseStopIteration
anymore as required by PEP 479.__author__
and__maintainer__
removed from all .py files.__version__
removed from all .py files exceptpygrametl/__init__.py
The version of pygrametl is thus now available aspygrametl.__version__
and will be updated for every release.- Fixed
Outdated information stating that type 1 slowly changing dimensions are not supported has been removed from the documentation. In addition, minor errors and inconsistencies have been corrected throughput the documentation. (GitHub issue #27)
Wrong use of paramstyle in
ConnectionWrapper.executemany
fixed.A call to an incorrect method in
aggregators.Avg.finish()
.The
datespan()
function now checks whetherfromdate
andtodate
are strings before calling.split()
. In addition, the function now usesdict.items()
instead ofdict.iteritems()
which is not supported in Python 3.Incorrect quotation of identifiers in
SlowlyChangingDimension
fixed.Missing key value of root when calling
getbykey
ofSnowflakedDimension
fixed.
- Added
PandasSource
a new class, that given a PandasDataFrame
acts as a data source. Each row of theDataFrame
is returned as adict
that can be loaded into a data warehouse usingtables
.MappingSource
a new class, that given a data source and a dictionary of columns to callables, maps the callables over each element of the specified column before returning the row.- Changed
SlowlyChangingDimension
improved to makeversionatt
optional. (GitHub issue #12. Thanks to HereticSK)ConnectionWrapper.__init__
now optionally takes the argumentcopyintonew
. When this argument isTrue
(the default isFalse
), a newdict
with parameters is created when a statement is executed. The newdict
only holds the k/v pairs needed by the statement. This is to avoidDatabaseError: ORA-01036: illegal variable name/number
with cx_Oracle. (GitHub issue #9).First argument to
TypedCSVSource.__init__
renamed fromcsvfile
tof
to be consistent with documentation andCSVSource
- Fixed
ConnectionWrapper.execute
does not pass the argumentarguments
to the underlying cursor's execute method ifarguments
isNone
. Some drivers raise anError
ifNone
is passed, some don't.
- Added
TypedCSVSource
a new class that reads a CSV file (by means ofcsv.DictReader
) and performs user-specified casts (or other function calls) on the values before returning the rows.Added
definequote
function to enable quoting of SQL identifiers in all tables.Added
getdbfriendlystr
function to enable conversion of values into strings that are accepted by an RDBMS. Boolean values become`0
or1
,None
values can be replaced by another value.All Bulkloadables now accept the argument
strconverter
to their__init__
methods. This should be a function that converts values into strings that are written to a temporary file and eventually bulkloaded. The default value is the newgetdbfriendlystr
.SlowlyChangingDimension
can now optionally be given the argumentuseorderby
when instantiated. IfTrue
(the default), the SQL used bylookup
usesORDER BY
(this is the same behaviour as before). IfFalse
,ORDER BY
is not used and the SQL used bylookup
will fetch all versions of the member and then find the key value for the newest version with Python code. For some systems, this can lead to significant performance improvements.- Changed
Generator used in
ConnectionWrapper.fetchalltuples
to reduce memory consumption. (Thanks to Alexey Kuzmenko)SlowlyChangingDimension
can sometimes avoid deleting from the cache on updates, now checked in the same way as inCachedDimension
rowfactory
now tries to usefetchmany
. (Suggested by Alexey Kuzmenko)._BaseBulkloadable
now has the methodinsert
while the methods_insertwithnull
and_insertwithoutnull
have been removed (and subclasses do thus not pick one of them at runtime). Theinsert
method will always callstrconverter
(see above) no matter if anullsubst
has been specified or not._BaseBulkloadable
will now raise aTypeError
if nonullsubst
is specified and aNone
value is present. Before this change, theNone
value would silently be converted into the string'None'
. Users must now give anullsubst
argument when instantiating a subclass of_BaseBulkloadable
that should be able to handleNone
values.SubprocessFactTable
has been changed similarly to_BaseBulkloadable
and does now defineinsert
which usesstrconverter
. Thus_insertwithnull
and_insertwithoutnull
have been removed.getunderlyingmodule
has been changed and now tries different possible module names and looks for'paramstyle'
and'connect'
.ConnectionWrapper
now usesgetunderlyingmodule
in__init__
when trying to determine the paramstyle to use.- Fixed
Using
cachesize=0
withSlowlyChangingDimension
no longer causes crash.Problem with double use of namemappings in
_before_update
inCachedDimension
andSlowlyChangingDimension
fixed. (Thanks to Alexey Kuzmenko).Problem with
rowfactory
only returning one row fixed. (Thanks to Alexey Kuzmenko).Problem with
JDBCConnectionWrapper.rowfactory
returning dictionaries with incorrect keys fixed. (GitHub issue #5).Problem with
TypeOneSlowlyChangingDimension
cachingNone
after an update if a namemapping mapped to an attribute not in the update row fixed.Problem in
__init__.copy
fixed.Namemapping is now used when comparing measure values in
FactTable.ensure
withcompare=True
.
- Note
- This is the last version to support versions of Python 2 older than 2.7
- Added
TypeOneSlowlyChangingDimension
a new class that adds support for efficient loading and updating of a type 1 exclusive slowly changing dimension.CachedBulkLoadingDimension
a new class that supports bulk loading a dimension without requiring the caching of all rows that are loaded.Alternative implementation of
FIFODict
based on anOrderedDict
. (Thanks to Alexey Kuzmenko).Dimension classes with finite caches can now be prefilled more efficiently using the
FETCH FIRST
SQL statement for increased performance.Examples on how to perform bulk loading in MySQL, Oracle Database, and Microsoft SQL Server. (Thanks to Alexey Kuzmenko).
- Changed
It is now verified that
lookupatts
is a subset of all attributes.All method calls to a superclass constructor now uses named parameters.
Made cosmetic changes, and added additional information about how to ensure cache coherency between pygrametl and the database to existing docstrings.
The entire codebase was updated to adhere more closely to PEP 8 using autopep8.
- Fixed
Using
dependson
no longer causes crashes due to multiple loads of a table. (Thanks to Alexey Kuzmenko).Using
defaultidvalue
no longer causesDimension.ensure
to fail to insert correctly, or makeCachedDimension.ensure
produce duplicates. (Thanks to Alexey Kuzmenko).Using
SlowlyChangingDimension
with the cache disabled no longer causes a crash inSlowlyChangingDimension.scdensure
.Using
BulkDimension
,CachedBulkDimension
orBulkFactTable
withtempdest
andusefilename
no longer causes a crash in_BaseBulkloadable._bulkloadnow
.
- Fixed
SnowflakedDimension
no longer crashes due tolevellist
not being a list before the length of it is computed.FactTable
now inserts the correct number of commas to the SQL statements used for inserting rows, independent of the value ofkeyrefs
.
- Fixed
- Using other parameter styles than
pyformat
no longer causes a crash inConnectionWrapper
.
- Added
A new quick start guide was added to the documentation.
Added code examples for all classes in pygrametl except
Steps
.pygrametl now officially supports Python 2.6.X, Python 2.7.X, Python 3, Jython 2.5.X and Jython 2.7.X.
BulkDimension
a new class that supports bulk loading of dimension tables._BaseBulkloadable
with common functionality forBulkFactTable
andBulkDimension
.SQLSource
can now pass parameters to the cursor'sexecute
function.- Fixed
- Importing everything from
tables
using a wildcard now longer causes a crash.
- Added
Created a PyPI package and uploaded it to pypi.python.org/project/pygrametl.
Added code examples for some of the classes in pygrametl.
- Changed
- Documentation is now written in reStructuredText and compiled using Sphinx.