Welcome!

Java IoT Authors: Yeshim Deniz, Pat Romanski, Liz McMillan, Elizabeth White, Carmen Gonzalez

Related Topics: Containers Expo Blog, Java IoT, Linux Containers, Open Source Cloud

Containers Expo Blog: Blog Post

GridFS and MongoDB: Pros and Cons

Should I use GridFS for file storage with MongoDB

GridFS looks like a great idea on paper - a virtual filesystem held within MongoDB which allows for larger than 16MB files to be held, synced and replicated. It's very tempting when architecting your solutions to want to consider using GridFS. It appears to be able to take on the problem of storing many thousands or millions of files without consuming file-system resources where there are often hard limits on the number of file names. It also seems to allow for massive files to be stored without any obvious downsides.

It is important, though, to know what GridFS is under the hood. For any file being stored with GridFS, the file is chopped into 255KB chunks. Those chunks are saved in a bucket, called fs, and a collection in that bucket, fs.chunks. Metadata about the files is stored in another collection in the same bucket, fs.files, though you can have more buckets with different bucket names in the same database. An index makes retrieving the chunks quick. All this chunking and metadata management is not done by the MongoDB database though. It is a task performed by the client's driver which is then wrapped in a GridFS API for that driver.

When you put or retrieve a large file of size nKB, in its entirety, what is happening is the driver is retrieving all the relevant chunks, all nKB/255KB chunks, as documents, assembling them at the client end and writing them out to wherever they are needed. So a 16MB file is retrieved as 65 documents of 255K each. Consider what would happen if you did that regularly on your MongoDB database outside of GridFS; there would be severe competition for the servers RAM between those documents and the rest of the database.

The chunking with GridFS and the fact that it is done by the driver also means that large operations like replacing an entire file within GridFS are not atomic and there's no built in versioning to fall back on. This may, or may not, be a problem for some applications where files are concurrently accessible by many users or their applications. You can, though, work around this by layering on your own versioning scheme over GridFS, only making replacement files the latest version when they have completed writing.

There is an upside of to chunking though - it is remarkably cheap to access particular sections of files because they are broken up into manageable blocks during the chunking so if you need to access particular parts of large binary files, you won't be pushing the working set out of memory in the server. You can also adjust the chunk size so if your application would work better with a smaller or larger chunk, you can tune your GridFS usage by requesting a particular size of chunk. Smaller chunks would push fewer parts of the working set out of memory, larger chunks less so.

You can avoid the entire issue of contention with your working set of data by having another MongoDB server dedicated to GridFS storage and optimized towards your file storage use patterns. This also lets you focus on tuning the best performance out of your core database instance without having to look over your shoulder for the march of the GridFS files through your working set. With MongoHQ, creating a separate database instance is easier than ever and there'll be a plan to suit your needs.

If you are dealing with files less than 16MB and want to handle them as atomic entities, it is also worth considering whether you need to use GridFS at all because you can have MongoDB documents with 16MB fields. You will have to ensure that when reading the database, you only pull the large fields into memory when you need to, but this does give you the atomic replacement writes and architecturally simpler system you may desire. The down side is that access to binary ranges within the file will likely require downloading the file, modifying it and re-writing it, but it is all about balance and matching your file storage and use patterns.

So, as we said at the beginning, "it depends" if GridFS is a good fit for your application's file storage needs. There are some pitfalls which you can avoid at planning time with some estimates of what quantity of file data you want to store and how you want to access it. There are also many benefits to a MongoDB and GridFS solution, especially in terms of replication and synchronization.

More Stories By DJ Walker-Morgan

Content Curator at MongoHQ, Dj has been both a developer and writer since Apples came in ][ flavors and Commodores had Pets.

@ThingsExpo Stories
Companies can harness IoT and predictive analytics to sustain business continuity; predict and manage site performance during emergencies; minimize expensive reactive maintenance; and forecast equipment and maintenance budgets and expenditures. Providing cost-effective, uninterrupted service is challenging, particularly for organizations with geographically dispersed operations.
SYS-CON Events announced today that BMC Software has been named "Siver Sponsor" of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2015 at the Javits Center in New York, New York. BMC is a global leader in innovative software solutions that help businesses transform into digital enterprises for the ultimate competitive advantage. BMC Digital Enterprise Management is a set of innovative IT solutions designed to make digital business fast, seamless, and optimized from mainframe to mo...
SoftLayer operates a global cloud infrastructure platform built for Internet scale. With a global footprint of data centers and network points of presence, SoftLayer provides infrastructure as a service to leading-edge customers ranging from Web startups to global enterprises. SoftLayer's modular architecture, full-featured API, and sophisticated automation provide unparalleled performance and control. Its flexible unified platform seamlessly spans physical and virtual devices linked via a world...
The IoT is changing the way enterprises conduct business. In his session at @ThingsExpo, Eric Hoffman, Vice President at EastBanc Technologies, discuss how businesses can gain an edge over competitors by empowering consumers to take control through IoT. We'll cite examples such as a Washington, D.C.-based sports club that leveraged IoT and the cloud to develop a comprehensive booking system. He'll also highlight how IoT can revitalize and restore outdated business models, making them profitable...
IoT generates lots of temporal data. But how do you unlock its value? How do you coordinate the diverse moving parts that must come together when developing your IoT product? What are the key challenges addressed by Data as a Service? How does cloud computing underlie and connect the notions of Digital and DevOps What is the impact of the API economy? What is the business imperative for Cognitive Computing? Get all these questions and hundreds more like them answered at the 18th Cloud Expo...
SYS-CON Events announced today that EastBanc Technologies will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. EastBanc Technologies has been working at the frontier of technology since 1999. Today, the firm provides full-lifecycle software development delivering flexible technology solutions that seamlessly integrate with existing systems – whether on premise or cloud. EastBanc Technologies partners with p...
SYS-CON Events announced today that Commvault, a global leader in enterprise data protection and information management, has been named “Bronze Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY, and the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Commvault is a leading provider of data protection and information management...
In his session at 18th Cloud Expo, Bruce Swann, Senior Product Marketing Manager at Adobe, will discuss how the Adobe Marketing Cloud can help marketers embrace opportunities for personalized, relevant and real-time customer engagement across offline (direct mail, point of sale, call center) and digital (email, website, SMS, mobile apps, social networks, connected objects). Bruce Swann has more than 15 years of experience working with digital marketing disciplines like web analytics, social med...
SYS-CON Events announced today that Tintri Inc., a leading producer of VM-aware storage (VAS) for virtualization and cloud environments, will exhibit at the 18th International CloudExpo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, New York, and the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today Object Management Group® has been named “Media Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY, and the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
SYS-CON Events announced today that ContentMX, the marketing technology and services company with a singular mission to increase engagement and drive more conversations for enterprise, channel and SMB technology marketers, has been named “Sponsor & Exhibitor Lounge Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York City, New York. “CloudExpo is a great opportunity to start a conversation with new prospects, but what happens after the...
The IoTs will challenge the status quo of how IT and development organizations operate. Or will it? Certainly the fog layer of IoT requires special insights about data ontology, security and transactional integrity. But the developmental challenges are the same: People, Process and Platform. In his session at @ThingsExpo, Craig Sproule, CEO of Metavine, will demonstrate how to move beyond today's coding paradigm and share the must-have mindsets for removing complexity from the development proc...
What a difference a year makes. Organizations aren’t just talking about IoT possibilities, it is now baked into their core business strategy. With IoT, billions of devices generating data from different companies on different networks around the globe need to interact. From efficiency to better customer insights to completely new business models, IoT will turn traditional business models upside down. In the new customer-centric age, the key to success is delivering critical services and apps wit...
Join us at Cloud Expo | @ThingsExpo 2016 – June 7-9 at the Javits Center in New York City and November 1-3 at the Santa Clara Convention Center in Santa Clara, CA – and deliver your unique message in a way that is striking and unforgettable by taking advantage of SYS-CON's unmatched high-impact, result-driven event / media packages.
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, will provide an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life ...
As cloud and storage projections continue to rise, the number of organizations moving to the cloud is escalating and it is clear cloud storage is here to stay. However, is it secure? Data is the lifeblood for government entities, countries, cloud service providers and enterprises alike and losing or exposing that data can have disastrous results. There are new concepts for data storage on the horizon that will deliver secure solutions for storing and moving sensitive data around the world. ...
SYS-CON Events announced today that MobiDev will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. MobiDev is a software company that develops and delivers turn-key mobile apps, websites, web services, and complex software systems for startups and enterprises. Since 2009 it has grown from a small group of passionate engineers and business managers to a full-scale mobile software company with over 200 develope...
WebRTC is bringing significant change to the communications landscape that will bridge the worlds of web and telephony, making the Internet the new standard for communications. Cloud9 took the road less traveled and used WebRTC to create a downloadable enterprise-grade communications platform that is changing the communication dynamic in the financial sector. In his session at @ThingsExpo, Leo Papadopoulos, CTO of Cloud9, will discuss the importance of WebRTC and how it enables companies to fo...
SYS-CON Events announced today that MangoApps will exhibit at SYS-CON's 18th International Cloud Expo®, which will take place on June 7-9, 2016, at the Javits Center in New York City, NY. MangoApps provides modern company intranets and team collaboration software, allowing workers to stay connected and productive from anywhere in the world and from any device. For more information, please visit https://www.mangoapps.com/.
SYS-CON Events announced today TechTarget has been named “Media Sponsor” of SYS-CON's 18th International Cloud Expo, which will take place on June 7–9, 2016, at the Javits Center in New York City, NY, and the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. TechTarget is the Web’s leading destination for serious technology buyers researching and making enterprise technology decisions. Its extensive global networ...