August 27, 2011

Ever since we launched Paprika Cloud Sync last December (a service that automatically keeps your recipes, grocery lists, and meals synced between the iPhone, iPad, and Mac versions of Paprika), people have been asking us why we decided to build our own sync solution instead of simply using Dropbox1. We actually did initially consider basing our sync on Dropbox, but quickly realized that it was not a feasible solution for the type of data we needed to sync, and ended up deciding to roll our own solution.

In this post I want to explain the motivations behind our decision. There are many other apps out there that could benefit from implementing a cloud sync solution, so hopefully this will be of interest to other developers out there as well.

Database driven apps vs File driven apps

Paprika is a database driven app that utilizes Apple’s Core Data framework to store its data. Core Data is very useful for storing large amounts of data, storing relationships between different types of data, and presenting it in an efficient manner. It is a great tool for the type of data that Paprika needs to store (albeit a bit quirky in some places). However, using a database driven approach does have certain implications when it comes to syncing.

In a database driven app, all of the data needs to be maintained in a coherent state in order for the app to be fully usable. In Paprika, recipes can be organized into categories, added to the grocery list, and added to meal plans. This means that there is a large web of relationships (between recipes, categories, groceries, and meals) that needs to be synced simultaneously.

In contrast, a file driven app treats each file as an individual unit: the file can be synced individually without affecting any other files, and the app can still be used even if not all of the files on the server have been properly synced down. If a sync gets interrupted on a certain file, none of the other files are affected, and you can simply re-sync the last file from where it previously left off.

As you can guess, Dropbox uses a file based approach to cloud storage, which means it works quite well for file driven apps, but is not so great for database driven apps like Paprika.

Goals

There were several major goals we wanted to achieve with the sync:

  1. Keep it fast.
  2. Allow multiple users to make simultaneous changes to the same data and intelligently resolve conflicts between users.
  3. Safely handle sync interruptions, and be able to resume the sync from where it left off.
  4. Allow the user to continue using the app while the sync runs in the background.
  5. Safely handle data migrations as users update to new versions of the app.

Keep it fast.

Obviously from a user experience perspective, the faster the sync runs, the better. But this is especially important for us because we expect our users to be syncing at the grocery store, and in other places where they might not be on a fast wifi connection. Because each recipe contains a decent amount of text (along with a photo), and recipe collections tend to grow rather large over time, there is potentially a large amount of data to be sent during each sync.

The basic strategy to keeping the sync fast is to minimize the amount of data that needs to be transferred during each sync. The first thing we need to do is compress all data that is being sent to and from the sync server. Additionally, to prevent unnecessary data transfer, we also need to keep track of which recipes have been modified between syncs, and only upload recipes that have actually changed. This ensures that we do not waste time uploading recipes or photos during the sync process unless we are absolutely sure that they need to be transferred.

The techniques necessary to implement this strategy are beyond the scope of what is provided by Dropbox’s file based API.

Allow multiple users to make simultaneous changes to the same data and intelligently resolve conflicts between users.

Because Paprika is a recipe and grocery list organizer, we expect that a single recipe database will be frequently shared between multiple users across multiple devices. For example: a husband and wife that share an iPad but have individual iPhones. This means that we need the ability to allow multiple users to make simultaneous changes to the same database, and these changes need to be successfully merged together when a sync occurs.

There are many different types of conflicts that can result from two users editing the same data on two separate devices, and each conflict needs to be resolved individually. For example: if one user modifies a recipe while another user deletes the same recipe, the deletion needs to be undone, and the sync should preserve the recipe modifications.

By building our own sync service, we were able to design it to intelligently handle merging these types of conflicts during the sync process.

Safely handle sync interruptions, and be able to resume the sync from where it left off.

Interruptions can definitely occur during the syncing process (say, due to disconnection). There are two main issues to address after an interruption occurs:

  1. The interruption needs to be safely handled to ensure that the database was not left in an incoherent state, where half of the data was synced but the other half was not. This needs to be ensured both with the data on the client device (iPhone/iPad), as well as the data on the cloud server. Otherwise, it could cause strange behaviors or even data loss.
  2. The next time a sync runs, it needs to be able to resume from where it left off. It would be massively inefficient if every time a sync was interrupted, it had to start over from the beginning.

Addressing these issues is difficult to impossible without fine grained control of both the client and server during the sync process.

Allow the user to use the app while the sync runs in the background.

Despite the optimizations described above to keep the sync fast, there are definitely cases where running a sync will take a long time. For example, if you are syncing a lot of recipes for the first time, or if your internet connection happens to be unbearably slow. In order to maintain the best user experience, we wanted to make sure the user could still interact with the app while the sync runs in the background. The consequence of this decision is that the sync process must be fully integrated into the app itself, and not tacked on as an afterthought.

Safely handle data migrations as users update to new versions of the app

As we continue to introduce new features to Paprika, database changes may occur between each version. Every time a user upgrades, their database must be safely migrated to the newest version. Even in absence of a cloud sync, this process is already quite tricky to implement.

But with a cloud sync, it’s even more complicated: what if a user upgrades Paprika on their iPhone but not the iPad? Both versions still have to be able to safely sync together despite having different database models. This is another scenario that requires fine grained control of both the client and server during the sync process.

Our Decision

All of these factors led us to decide that Dropbox was not a feasible solution for our cloud sync service, and that we should build our own solution that could properly minimize data transfer, intelligently handle merging conflicts between users, safely handle interruptions during the sync process, and safely handle data migrations during upgrades.

Don’t get me wrong, Dropbox works great if your app needs to sync individual files, and there are a lot of apps that work with that model: text editors, document editors, comic book readers, PDF readers, etc. With those types of apps, the user is mostly operating on one file at a time, and the files are completely independent from one another. You will rarely have conflicts during the syncing process, and each file can be synced individually without affecting any of the other files in the collection.

But for Paprika (and most other database driven apps2), building a Dropbox based sync is not a feasible solution.

  1. To be specific, this applies not only to Dropbox, but any file based sync solution.
  2. We’re not the only ones to have reached this conclusion. </sub>