Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit collections #10

Open
zhil opened this issue May 15, 2017 · 13 comments
Open

Limit collections #10

zhil opened this issue May 15, 2017 · 13 comments

Comments

@zhil
Copy link

zhil commented May 15, 2017

In order to generate fixtures from live data in symfony2 I use trappar/AliceGeneratorBundle and code like

$fixturesContext = FixtureGenerationContext::create()->setMaximumRecursion(2);
$fixtureGenerator = $this->get('trappar_alice_generator.fixture_generator');
file_put_contents("data.yml",$fixtureGenerator->generateYaml($someEntity,$fixturesContext));

The problem is - when $someEntity have large collections - generator just hang up.

I didnt founded solution, so I have added simple wrapper

<?php
namespace AppBundle\Test;

use Doctrine\Common\Collections\Criteria;

class FixturesCollection
{
    public static function limitCollection($entity, $collectionName, $limit)
    {
        $refMatchTeams = new \ReflectionProperty($entity,$collectionName);

        $criteria = Criteria::create()->setMaxResults($limit);
        $refMatchTeams->setAccessible(true);
        $refMatchTeams->setValue($entity,$refMatchTeams->getValue($entity)->matching($criteria));
    }
}

And use it like

$fixturesContext = FixtureGenerationContext::create()->setMaximumRecursion(2);
$fixtureGenerator = $this->get('trappar_alice_generator.fixture_generator');
FixturesCollection::limitCollection($someEntity, "largeCollection1",5);
FixturesCollection::limitCollection($someEntity, "largeCollection2",1);
file_put_contents("data.yml",$fixtureGenerator->generateYaml($someEntity,$fixturesContext));

I have few questions

  1. Do trappar/AliceGenerator already have solution for this problem? addPersistedObjectConstraint do not works, because large collections are still loaded and everything is extremely slow.
  2. If library doesnt have solution - I would like to create a PR. Do you have any ideas/suggestions about where should I put this snippet in bundle? Class name ideas? It would be nice to add some kind of "maximumCollectionSize" to context and apply it to all collections in context - what do you think?
@zhil
Copy link
Author

zhil commented May 15, 2017

@trappar I have added WIP PR - please check #11

not sure about default value for maximumCollectionChilds. setting value would change current behavior, but probably it worth being changed.

@trappar
Copy link
Owner

trappar commented May 15, 2017

This is already handled in a couple ways, but I'm sure there is room for improvement. See Limiting Recursion, and specifically Limiting Recursion With Object Constraints.

This doesn't limit the total entities returned, like what you're suggesting, but that kind of arbitrary limiting is probably not desirable default behavior since it could be very confusing. I'll have a look at it though and get back to you.

@zhil
Copy link
Author

zhil commented May 15, 2017

@trappar using addPersistedObjectConstraint do not prevent large collections from being loaded. It create huge performance problem.

Please check PR - I just created dump for few GB database with 40 entities within a seconds. It have all entities and relations dumped. :)

agree, default limiting could be confusing. maximumCollectionChilds is not set by default.

@zhil
Copy link
Author

zhil commented May 15, 2017

@trappar PR does not work correctly - generated yaml have null value for limited collections. Working on patch now.

zhil added a commit to zhil/AliceGenerator that referenced this issue May 15, 2017
@zhil
Copy link
Author

zhil commented May 15, 2017

@trappar fixed

@zhil
Copy link
Author

zhil commented May 26, 2017

@trappar what do you think about this PR?

@trappar
Copy link
Owner

trappar commented May 26, 2017

I'm not sure. I feel like the idea is somewhat flawed. It limits results to totally arbitrary sets of entities - just whatever comes first. A similar but more controlled style of limiting can already be done by just constraining all the desired objects. So you can already say to return a specific set of comments rather than returning just the first X number of them. There's also the ignore annotation/metadata which can be used to arbitrarily prevent crawling.

Also, the way you implemented it doesn't quite fit in with the rest of the library. I went ahead and implemented it myself just to see how difficult it would be. It's a little sloppy and requires more testing, but here's what I came up with: db84b4e

This adds three methods to the FixtureGenerationContext: setMaximumObjects, setMaximumObjectsPerType, and setMaximumObjectsForType. What do you think?

@zhil
Copy link
Author

zhil commented May 26, 2017

well, main goal of this PR - add easy way for creating test data sample from very large database.
It could be either dump of all entity types for first tests (no matter what entities would be selected) OR dump of buggy data for isolated tests.

For example, I have an Api with tons of entities and GBs of data. Whenever I get some bug, like "in case X api provide wrong output" - I simply add test fixtures dump right after buggy data set is loaded. After that I cleanup generated file and got yml file for isolated tests within a few minutes.

Right now I use it like this

       // we have $team, $match, $someTrickyExtraDataWithTonsOfDependencies already preloaded
       // even using maximumRecursion=2 hung-out script because of tons of relations

        CollectionHandler::limitCollection($team,"match",250); // only for this team we will put in fixtures 250 matches
        CollectionHandler::limitCollection($match,"incidents",1000); // only for this match we will load 1000 incidents

        $fixturesContext = FixtureGenerationContext::create()
            ->setMaximumRecursion(3)
            ->setMaximumCollectionChilds(3) // all not listed collections will have only 3 childs
            ->setEntityCollectionLimit('Name\Space\Match',5) // by default only 5 matches would be loaded for collections
             ->setEntityCollectionLimit('Name\Space\Incident',10) // by default only 10 incidents would be loaded for collections
        ;
        $fixtureGenerator = $c->get('trappar_alice_generator.fixture_generator');
        file_put_contents(
            "data.yml",
            $fixtureGenerator->generateYaml(
[$team,$match,someTrickyExtraDataWithTonsOfDependencies],
                $fixturesContext
            )
        );

Looks like your commit is adding similar functionality, I dont really get how to use it though :)

@trappar
Copy link
Owner

trappar commented May 26, 2017

I see what you're doing. The code I came up with allows you to designate a total object limit, and a per-entity limit, but it doesn't do anything about limiting the number of entities returned on a per-relation basis like what you're doing (if I'm understanding what you're doing right). I get that in your case the database you're working with is massive and you don't want to just dump everything, but in my experience it's better to cherry pick specific test cases that you want using object constraints than it is to get a random sampling of the whole database - like what you're doing. If you just queried for a handful of useful matches/incidents and set those as object constraints then none of this would be an issue.

I don't feel like your changes are ready to be merged as is. I'd like to see the functionality entirely controlled by setting values in the FixtureGenerationContext, and I'd want to make sure that ValueVisitor wasn't visiting objects unnecessarily. I'm not 100% sure, but I'm guessing in your code that some objects get visited even if they aren't added to the results, which would be a performance problem.

You can try to make those changes if you want, or just keep using your fork. For now I'll just leave this issue/PR/my branch open.

@zhil
Copy link
Author

zhil commented May 29, 2017

Using own fork would be problematic - didnt founded any easy ways to load it in composer with "minimum-stability": "stable"
So, I would try to make this PR acceptable :)

regarding

I'd like to see the functionality entirely controlled by setting values in the FixtureGenerationContext

Are you about CollectionHandler::limitCollection( function? FixtureGenerationContext seems to be wrong place for it - should I place in some new helper class?

I'd want to make sure that ValueVisitor wasn't visiting objects unnecessarily. I'm not 100% sure, but
I'm guessing in your code that some objects get visited even if they aren't added to the results, which
would be a performance problem.
I using this code now and its very fast.
Collections are limited before ValueVisitor actually read them. Can you please give me more details about which one case are not covered?

zhil added a commit to zhil/AliceGenerator that referenced this issue May 30, 2017
@zhil
Copy link
Author

zhil commented May 30, 2017

I have located another approach of generating test data. Something like

 // api endpoint preload needed data
         $fixturesContext = FixtureGenerationContext::create()
            ->setMaximumRecursion(0)
        ;
        $entities = [];
        foreach ($this->getDoctrine()->getManager()->getUnitOfWork()->getIdentityMap() as $entityType) {
            $entities = array_merge($entities,$entityType);
        }
        file_put_contents("dump.yml",$this->get("trappar_alice_generator.fixture_generator")->generateYaml($entities,$fixturesContext));

Its pretty simple and dump data fixture, which I actually need.

The only problem - even with setMaximumRecursion(0) CollectionHandler still fetch all collections from database, which make script very slow (a couple of minutes)

with setMaximumRecursion(-1) only first entity is dumped, all other are skipped.

So, I added simple property, which add ability to do not preload from database collections, which was not used before.

Its in PR https://github.com/trappar/AliceGenerator/pull/13/files

With that PR generator works in less that a seconds (taked few minutes without that PR)

Let me know if its acceptable.

zhil added a commit to zhil/AliceGenerator that referenced this issue May 30, 2017
zhil added a commit to zhil/AliceGenerator that referenced this issue Jul 8, 2017
@zhil
Copy link
Author

zhil commented Jul 8, 2017

@trappar what do you think about second approach? Right not its very hard to build "default" fixtures set for large entities count.

zhil added a commit to zhil/AliceGenerator that referenced this issue Jul 8, 2017
@zhil
Copy link
Author

zhil commented Jul 27, 2017

@trappar can you please check https://github.com/trappar/AliceGenerator/pull/13/files and let me know if its acceptable?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants