Skip to content
This repository has been archived by the owner on Jul 20, 2023. It is now read-only.

asyncBatchAnnotateFiles filename output concatenates "output-x-to-x.json" #295

Closed
baslentfert opened this issue Dec 14, 2018 · 5 comments
Closed
Assignees
Labels
api: vision Issues related to the googleapis/nodejs-vision API. 🚨 This issue needs some love. triage me I really want to be triaged.

Comments

@baslentfert
Copy link

baslentfert commented Dec 14, 2018

Environment details

  • OS: windows 10
  • Node.js version: v9.2.0
  • npm version:6.5.0
  • @google-cloud/vision version: ^0.23.0

Steps to reproduce

function processFilename(fileName) {
// Path to PDF file within bucket

//  const gcsSourceUri = `gs://${bucketName}/pdfs/${fileName}`;
let gcsSourceUri = `gs://${bucketName}/${fileName}`;
let gcsDestinationUri = `gs://${bucketName}/${fileName}.json`;

let inputConfig = {
    // Supported mime_types are: 'application/pdf' and 'image/tiff'
    mimeType: 'application/pdf',
    gcsSource: {
        uri: gcsSourceUri,
    },
};
let outputConfig = {
    gcsDestination: {
        uri: gcsDestinationUri,
    },
};
//    let features = [{ type: 'DOCUMENT_TEXT_DETECTION', model: "builtin/latest" }];
let features = [{ type: 'DOCUMENT_TEXT_DETECTION' }];
let request = {
    requests: [{
        inputConfig: inputConfig,
        features: features,
        outputConfig: outputConfig,
    }, ],
};

client
    .asyncBatchAnnotateFiles(request)
    .then(results => {
        const operation = results[0];
        // Get a Promise representation of the final result of the job
        operation
            .promise()
            .then(filesResponse => {

                //                    console.log(JSON.stringify(filesResponse));

                let destinationUri = filesResponse[0].responses[0].outputConfig.gcsDestination.uri;
                console.log('Json saved to: ' + destinationUri);

                //          console.log(filesResponse[0].responses);
            })
            .catch(function(error) {
                console.log(error);
            });
    })
    .catch(function(error) {
        console.log(error);
    });

}

for example the input filename:
aabb.pdf
then the output will be:
aabb.pdf.jsonoutput-1-to-1.json

(if the pdf contained 1 page)

Thanks!

@JustinBeckwith JustinBeckwith added the triage me I really want to be triaged. label Dec 15, 2018
@baslentfert
Copy link
Author

baslentfert commented Dec 15, 2018

i forgot to mention i used
const vision = require('@google-cloud/vision').v1p2beta1;

wehn i use:
const vision = require('@google-cloud/vision').v1;

i still get the extra filename "add on" jsonoutput-1-to-1.json
but i am not shure if it is "jsonoutput-1-to-1.json" of "output-1-to-1.json" that is added to the given filename.

i made a screenshot:
https://www.lentfert.net/google.cloud-example.jpg

in the code above i do not add the extra string "jsonoutput-1-to-1.json":
let gcsSourceUri = gs://${bucketName}/${fileName};
let gcsDestinationUri = gs://${bucketName}/${fileName}.json;

@JustinBeckwith JustinBeckwith added the 🚨 This issue needs some love. label Dec 19, 2018
@nnegrey
Copy link
Contributor

nnegrey commented Dec 27, 2018

Ah, looks like that code needs to be updated.
Should be:

const gcsDestinationUri = `gs://${bucketName}/

And the output-1-to-1.json is expected output.

@nnegrey
Copy link
Contributor

nnegrey commented Dec 27, 2018

#300

@nnegrey nnegrey self-assigned this Dec 27, 2018
@nnegrey nnegrey closed this as completed Dec 27, 2018
@mickdekkers
Copy link

@nnegrey just a heads up, the docs here still state that GcsDestination can represent a single file.

Also, I'm not sure if this is the best place to ask, but what would be the best way to determine the final output location of the JSON file(s)? The AsyncBatchAnnotateFilesResponse only seems to contain the OutputConfig you pass to asyncBatchAnnotateFiles (so just the gs://${bucketName}/, no gs://${bucketName}/output-x-to-x.json). Is scanning the bucket with the Storage API's Bucket.getFiles method the recommended way?

@nnegrey
Copy link
Contributor

nnegrey commented Jan 18, 2019

Ah. Good catch, let me look into that. I wonder if you have to specify the batch_size as 1 in the OutputConfig to use a filename.

Yea, because it may end up splitting your output to multiple files due to size, so you'll have to check what's created. To simplify that check, It's recommended to use a prefix gs://$bucketName}/prefix/ so that you know only your output will be there. Less searching that way.

@google-cloud-label-sync google-cloud-label-sync bot added the api: vision Issues related to the googleapis/nodejs-vision API. label Jan 31, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api: vision Issues related to the googleapis/nodejs-vision API. 🚨 This issue needs some love. triage me I really want to be triaged.
Projects
None yet
Development

No branches or pull requests

4 participants