Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Doctrine examples #17

Merged
merged 11 commits into from
Apr 23, 2018

Conversation

programarivm
Copy link
Contributor

  1. Extraload\Extractor\Doctrine\QueryExtractor is added
  2. examples/05_default_doctrine_statement_console.php is created
  3. README update

@programarivm
Copy link
Contributor Author

  • Just added a Doctrine PreparedQueryExtractor

Copy link
Owner

@umpirsky umpirsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job, I would recommand two things:

  • Do one row at a time.
  • Merge this two implementations into one.

Thanks!

README.md Outdated
@@ -21,9 +21,9 @@
Powerful ETL library.


## Example
## 1. Examples
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why numbers? :)

Copy link
Contributor Author

@programarivm programarivm Apr 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay no problem :)

use Symfony\Component\Console\Output\ConsoleOutput;

$config = new \Doctrine\DBAL\Configuration();
$connectionParams = array(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can share $connectionParams between your examples, it will be easier to configure only once and be able to run all examples that way.

Copy link
Contributor Author

@programarivm programarivm Apr 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's do this. Thanks.


$stmt->execute();

$this->data = $stmt->fetchAll();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should fetch inside extract(), as late as possible.

Copy link
Contributor Author

@programarivm programarivm Apr 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your feedback.


$stmt->execute();

$this->data = $stmt->fetchAll();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For large data sets, and that's common in ETL, fetching all at once can be impossible to process and lead to out of memory issues, especially if doing heavy transformations. It's better if we can extract row by row, and only keep one row in memory at a time.

public function __construct(Connection $conn, string $sql)
{
$this->position = 0;
$this->data = $conn->query($sql)->fetchAll();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same applies here.

use Doctrine\DBAL\Connection;
use Extraload\Extractor\ExtractorInterface;

class QueryExtractor implements ExtractorInterface
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of this two classes is redundant. I'd refactor it to only QueryExtractor.

(Connection $conn, string $sql, array $values) will do the same job if $values = [].

Copy link
Contributor Author

@programarivm programarivm Apr 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe my original suggestion brings a bit of awareness to the API user about the fact that they are using either queries or prepared queries.

The logic being run under the hood is a bit different. QueryExtractor runs $conn->query($sql); whereas PreparedQueryExtractor runs $conn->prepare($sql);.

@programarivm
Copy link
Contributor Author

programarivm commented Apr 22, 2018

Thanks for the review! QueryExtractor and PreparedQueryExtractor are merged now into AbstractExtractor if this makes more sense.

@programarivm
Copy link
Contributor Author

programarivm commented Apr 22, 2018

Just added one more Doctrine example using a MapFieldsTransformer -- btw I won't push any more examples to this PR until the MapFieldsTransformer example is reviewed.

Copy link
Owner

@umpirsky umpirsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we have one class doing both query with and without parameters?

$this->fields = $fields;
}

public function transform($data)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this transformer, but we can make it better by using https://symfony.com/doc/current/components/property_access.html and make it work with multi-dimensional arrays and objects. What do you think?

Also, it would be great to extract transformet in a separate PR.

@programarivm
Copy link
Contributor Author

  • The MapFieldsTransformer is temporarily removed from this PR
  • QueryExtractor and PreparedQueryExtractor are merged into QueryExtractor -- which is now a prepared query with 3 params with the third one being optional

Copy link
Owner

@umpirsky umpirsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is that $this->data can become huge. But I guess we can optimize later.

Thanks!

@programarivm
Copy link
Contributor Author

Yeah, it makes sense. $this->data can always be bigger and bigger. Perhaps that is a problem that may be addressed by splitting huge queries into smaller chunks to multiprocess them in parallel.

@umpirsky umpirsky merged commit 6fe47da into umpirsky:master Apr 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants