diff --git a/source/applications/gridfs.txt b/source/applications/gridfs.txt new file mode 100644 index 00000000000..db3429f844b --- /dev/null +++ b/source/applications/gridfs.txt @@ -0,0 +1,230 @@ +.. index:: GridFS + +====== +GridFS +====== + +.. default-domain:: mongodb + +:term:`GridFS` is a specification for storing and retrieving files that +exceed the :term:`BSON`-document :ref:`size limit +` of 16MB. + +Instead of storing a file in an single document, GridFS divides a file +into chunks and stores each of those chunks as a separate document. By +default GridFS limits chunk size to 256k. GridFS uses two collections to +store files. One collection stores the file chunks, and the other stores +file metadata. + +When you query for a file stored through GridFS, GridFS reassembles the chunks +as needed. You can perform range queries on files stored through GridFS. +You also can access information from random sections of files, for +example skipping into the middle of a video. + +GridFS is useful not only for storing files that exceed 16MB but also +for storing any files for which you want access without having to load the +entire file into memory. For more information on when to use GridFS, see +:ref:`faq-developers-when-to-use-gridfs`. + +.. index:: GridFS; initialize +.. _gridfs-implement: + +Implement GridFS +---------------- + +To store and retrieve files using :term:`GridFS`, use either of the following: + +- A MongoDB driver. See the :doc:`drivers` + documentation for information on using GridFS with your driver. + +- The :program:`mongofiles` command-line tool in the :program:`mongo` + shell. See :doc:`/reference/mongofiles`. + +.. index:: GridFS; collections +.. _gridfs-collections: + +GridFS Collections +------------------ + +:term:`GridFS` stores files in two collections: + +- ``chunks`` stores the binary chunks. For details, see + :ref:`gridfs-chunks-collection`. + +- ``files`` stores the file's metadata. For details, see + :ref:`gridfs-files-collection`. + +GridFS places the collections in a common bucket by prefixing each with +the bucket name. By default, GridFS stores the collections in the ``fs`` +bucket: + +- ``fs.files`` +- ``fs.chunks`` + +You can choose a different default bucket name than ``fs``, as well as +create additional buckets. + +To access files, you use the bucket name. For example, if you use GridFS +to create a ``photos`` bucket, then to issue the :method:`findOne() +` command from the :program:`mongo` shell you would type: + +.. code-block:: javascript + + db.photos.findOne() + +.. index:: GridFS; chunks collection +.. _gridfs-chunks-collection: + +The chunks Collection +~~~~~~~~~~~~~~~~~~~~~ + +Each document in the ``chunks`` collection represents a different chunk +of a document that has been parsed by :term:`GridFS`. The following is a +prototype document from the ``chunks`` collection.: + +.. code-block:: javascript + + { + "_id" : , + "files_id" : , + "n" : , + "data" : + } + +A document from the ``chunks`` collection contains the following fields: + +.. data:: chunks._id + + The unique :term:`ObjectID` of the chunk. + +.. data:: chunks.files_id + + The ``_id`` of the "parent" document, as specified in the ``files`` + collection. + +.. data:: chunks.n + + The sequence number of the chunk. Chunks are numbered in order, + starting with 0. + +.. data:: chunks.data + + The chunk's payload as a :term:`BSON` binary type. + +The ``chunks`` collection uses a :term:`compound index` on ``files_id`` and +``n``, as described in :ref:`gridfs-index`. + +.. index:: GridFS; files collection +.. _gridfs-files-collection: + +The files Collection +~~~~~~~~~~~~~~~~~~~~ + +Each document in the ``files`` collection represents a +document that has been stored by :term:`GridFS`. The following is a +prototype of a ``files`` collection document: + +.. code-block:: javascript + + { + "_id" : , + "length" : , + "chunkSize" : + "uploadDate" : + "md5" : + + "filename" : , + "contentType" : , + "aliases" : , + "metadata" : , + } + +A document from the ``files`` collection contains some or all of the +following fields. You can create additional fields: + +.. data:: files._id + + The unique ID for this document. The ``_id`` is of the data type you + chose for the original document. The default type for MongoDB + documents is :term:`BSON` :term:`ObjectID`. + +.. data:: files.length + + The size of the document in bytes. + +.. data:: files.chunkSize + + The size of each chunk. GridFS divides the document into chunks of + the size specified here. The default size is 256 kilobytes. + +.. data:: files.uploadDate + + The date the document was first stored by GridFS. This value has the + ``Date`` data type. + +.. data:: files.md5 + + An MD5 hash returned from the filemd5 API. This value has the ``String`` + data type. + +.. data:: files.filename + + A human-readable name for the document. This field is optional. + +.. data:: files.contentType + + A valid MIME type for the document. This field is optional. + +.. data:: files.aliases + + An array of alias strings. This field is optional. + +.. data:: files.metadata + + Any additional information you want to store. This field is optional. + +.. index:: GridFS; index +.. _gridfs-index: + +GridFS Index +------------ + +:term:`GridFS` uses a :term:`unique `, :term:`compound +` index on the ``chunks`` collection for ``files_id`` +and ``n``. The index allows efficient retrieval of chunks using the +``files_id`` and ``n`` values, as shown in the following example: + +.. code-block:: javascript + + cursor = db.fs.chunks.find({files_id: myFileID}).sort({n:1}); + +See the :doc:`/applications/drivers` documentation for your driver to +learn whether this index is created by default. + +The following command creates this index from the shell: + +.. code-block:: javascript + + db.fs.chunks.ensureIndex({files_id:1, n:1}, {unique: true}); + +Example Interface +----------------- + +The following is an example of the GridFS interface in Java. The example +is for demonstration purposes only. For API specifics, see the +:doc:`/applications/drivers` documentation for your driver. + +.. code-block:: java + + /* + * default root collection usage - must be supported + */ + GridFS myFS = new GridFS(myDatabase); // returns a default GridFS (e.g. "fs" bucket collection) + myFS.storeFile(new File("/tmp/largething.mpg")); // saves the file into the "fs" GridFS store + + /* + * specified root collection usage - optional + */ + + GridFS myContracts = new GridFS(myDatabase, "contracts"); // returns a GridFS where "contracts" is root + myFS.retrieveFile("smithco", new File("/tmp/smithco_20090105.pdf")); // retrieves object whose filename is "smithco"