Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge performance hit in my code? #2307

Closed
mithralaya opened this issue Sep 16, 2014 · 5 comments
Closed

Huge performance hit in my code? #2307

mithralaya opened this issue Sep 16, 2014 · 5 comments

Comments

@mithralaya
Copy link

I have a collection similar to below BSON. The whole table(baskets) count is 900k for about 60 stores. The largest store collection is 90k. I have a custom date filter for our dashboard to display some graphs and numbers.

/* 47 */
{
    "_id" : ObjectId("535ff14c2e441acf44708ec7"),
    "tip" : "0",
    "basketTransactionCash" : "2204",
    "basketTransactionCard" : "0",
    "completed" : ISODate("2014-04-07T14:35:17.000Z"),
    "consumerId" : null,
    "storeId" : 1,
    "basketId" : 210048,
    "basketProduct" : [ 
        {
            "_id" : ObjectId("535feffa2e441acf446facb7"),
            "name" : "Vanilla Spice Hot Chocolate",
            "productId" : 23,
            "basketProductInstanceId" : 838392,
            "quantity" : 1,
            "productModifiedPrice" : null,
            "productInstancePrice" : "633",
            "vatPercentage" : "0.2"
        }
    ],
    "created" : ISODate("2014-04-07T14:35:42.000Z"),
    "__v" : 0
}

/* 48 */
{
    "_id" : ObjectId("535ff14c2e441acf44708ede"),
    "tip" : "0",
    "basketTransactionCash" : "230",
    "basketTransactionCard" : "0",
    "completed" : ISODate("2014-04-07T14:41:51.000Z"),
    "consumerId" : 1,
    "storeId" : 1,
    "basketId" : 210072,
    "basketProduct" : [ 
        {
            "_id" : ObjectId("535feffa2e441acf446facf3"),
            "name" : "Melon",
            "productId" : 4544,
            "basketProductInstanceId" : 838430,
            "quantity" : 1,
            "productModifiedPrice" : null,
            "productInstancePrice" : "200",
            "vatPercentage" : "0"
        }, 
        {
            "_id" : ObjectId("535feffa2e441acf446facf4"),
            "name" : "30p",
            "productId" : 8496,
            "basketProductInstanceId" : 838431,
            "quantity" : 1,
            "productModifiedPrice" : null,
            "productInstancePrice" : "30",
            "vatPercentage" : "0"
        }
    ],
    "created" : ISODate("2014-04-07T14:42:16.000Z"),
    "__v" : 0
}

/* 49 */
{
    "_id" : ObjectId("535ff14c2e441acf44708ee2"),
    "tip" : "0",
    "basketTransactionCash" : "2204",
    "basketTransactionCard" : "0",
    "completed" : ISODate("2014-04-07T14:41:54.000Z"),
    "consumerId" : 2,
    "storeId" : 1,
    "basketId" : 210076,
    "basketProduct" : [ 

        {
            "_id" : ObjectId("535feffb2e441acf446fad02"),
            "name" : "Creamy Natural Yoghurt",
            "productId" : 69,
            "basketProductInstanceId" : 839800,
            "quantity" : 1,
            "productModifiedPrice" : null,
            "productInstancePrice" : "911.7",
            "vatPercentage" : "0.2"
        }, 
        {
            "_id" : ObjectId("535feffb2e441acf446fad03"),
            "name" : "Melon",
            "productId" : 4544,
            "basketProductInstanceId" : 839801,
            "quantity" : 1,
            "productModifiedPrice" : null,
            "productInstancePrice" : "200",
            "vatPercentage" : "0"
        }, 
        {
            "_id" : ObjectId("535feffb2e441acf446fad04"),
            "name" : "30p",
            "productId" : 8496,
            "basketProductInstanceId" : 839802,
            "quantity" : 1,
            "productModifiedPrice" : null,
            "productInstancePrice" : "30",
            "vatPercentage" : "0"
        }
    ],
    "created" : ISODate("2014-04-07T14:43:00.000Z"),
    "__v" : 0
}  

I have used native mongo driver to aggregate and reduce the result set and squeeze it to completed date hourly. The query I used in my nodejs application looks something similar to this.

var where = { "completed": {"$gte": fromDate, "$lt": toDate}, "storeId": storeId };


                        var resultArray = [];

                        var cursor = MongoBasket.collection.aggregate(
                            [
                                {
                                    "$match": where

                                },
                                {
                                    "$group": {
                                        "_id": {
                                            "year": {
                                                "$year": "$completed"
                                            },
                                            "month": {
                                                "$month": "$completed"
                                            },
                                            "day": {
                                                "$dayOfMonth": "$completed"
                                            },
                                            "hour": {
                                                "$hour": "$completed"
                                            }
                                        },
                                        "totalTip": {
                                            "$push": "$tip"
                                        },
                                        "basketTransactionCashTotal": {
                                            "$push": "$basketTransactionCash"
                                        },
                                        "basketTransactionCardTotal": {
                                            "$push": "$basketTransactionCard"
                                        },
                                        "consumerIds": {
                                            "$push": "$consumerId"
                                        },
                                        "storeId": {
                                            "$addToSet": "$storeId"
                                        },
                                        "basketIds": {
                                            "$push": "$basketId"
                                        },
                                        "basketProducts": {
                                            "$push": {
                                                "product": "$basketProduct",
                                                "associatedBasketId": "$basketId"
                                            }
                                        }
                                    }
                                },
                                {
                                    "$sort": {
                                        _id: 1
                                    }
                                }
                            ],
                            {
                                cursor: {
                                    batchSize: 2000000
                                },
                                allowDiskUse:true
                            }
                        );

                        // Use cursor as stream
                        cursor.on('data', function(data) {
                            resultArray.push(data);
                        });

                        cursor.on('end', function() {
                            var end = +new Date();
                            var executionTime_milliSeconds = end - req.startTime;
                            console.log(resultArray.length, JSON.stringify(resultArray).length, executionTime_milliSeconds);

                        });

I use native driver technique after the mongoose cursor failed to work(you can refer to my previous post in #2306).

Now I get results like

Aggregated hourly

{
        "totalTip" : "0",
        "basketTransactionCashTotal" : "4638",
        "basketTransactionCardTotal" : "0",
        "completed" : "2014-04-07 14",
        "consumerIds" : [null, 1, 2],
        "storeId" : 1,
        "basketIds" : [210048, 210072, 210076]
        "basketProduct" : [ 
            {
                "_id" : ObjectId("535feffa2e441acf446facf3"),
                "name" : "Melon",
                "productId" : 4544,
                "basketProductInstanceId" : 838430,
                "quantity" : 1,
                "productModifiedPrice" : null,
                "productInstancePrice" : "200",
                "vatPercentage" : "0"
            }, 
            {
                "_id" : ObjectId("535feffa2e441acf446facf4"),
                "name" : "30p",
                "productId" : 8496,
                "basketProductInstanceId" : 838431,
                "quantity" : 1,
                "productModifiedPrice" : null,
                "productInstancePrice" : "30",
                "vatPercentage" : "0"
            },
            {
                "_id" : ObjectId("535feffb2e441acf446fad02"),
                "name" : "Creamy Natural Yoghurt",
                "productId" : 69,
                "basketProductInstanceId" : 839800,
                "quantity" : 1,
                "productModifiedPrice" : null,
                "productInstancePrice" : "911.7",
                "vatPercentage" : "0.2"
            }, 
            {
                "_id" : ObjectId("535feffb2e441acf446fad03"),
                "name" : "Melon",
                "productId" : 4544,
                "basketProductInstanceId" : 839801,
                "quantity" : 1,
                "productModifiedPrice" : null,
                "productInstancePrice" : "200",
                "vatPercentage" : "0"
            }, 
            {
                "_id" : ObjectId("535feffb2e441acf446fad04"),
                "name" : "30p",
                "productId" : 8496,
                "basketProductInstanceId" : 839802,
                "quantity" : 1,
                "productModifiedPrice" : null,
                "productInstancePrice" : "30",
                "vatPercentage" : "0"
            },
            {
                "_id" : ObjectId("535feffa2e441acf446facb7"),
                "name" : "Vanilla Spice Hot Chocolate",
                "productId" : 23,
                "basketProductInstanceId" : 838392,
                "quantity" : 1,
                "productModifiedPrice" : null,
                "productInstancePrice" : "633",
                "vatPercentage" : "0.2"
            }
        ]
    }

Total record size: 90K documents.
Aggregated Size: 2897 documents
Time taken to execute the aggregated same raw query in mongo shell: 1300ms
Total string length for aggregate size: 40876933
Time taken to print the final pushed array: 79575ms(1.15 mins)
My internet speed at the time of querying: 70mbps

Please help me further optimise the query. Is my approach wrong? I am completely clueless.

Many thanks,

Karthik

@vkarpov15
Copy link
Collaborator

Hmm you may want to try reducing the batch size, its kinda huge. Also, do you have an index on completed?

@mithralaya
Copy link
Author

Hi vkarpov,

Yes I have index on completed. I need to think of changing the batch size.

@mithralaya
Copy link
Author

But when I run the same query on mongo shell, its really fast.

The streaming takes lot of time. Not sure if it is latency between servers or just something wrong with the code.

Many thanks,
karthik

@vkarpov15
Copy link
Collaborator

Are you using the same batch size with the shell? The tricky bit with the large batch size is that the node driver waits until it gets all the documents before it starts sending them to you. Furthermore, pushing all 90k documents into an array is also a fairly expensive operation, and defeats the point of streaming: with streaming, ideally you're processing one document at a time and then letting the garbage collector reclaim that memory. You could probably trim a fair bit off of your runtime by just printing in the on('data') handler :)

@mithralaya
Copy link
Author

Thank you so much for your help @vkarpov15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants