100% Integration Reliability
Zencoder is an essential software dependency for most of our customers. And while we aim at 100% uptime, there may be times when you can't connect to Zencoder:
- Our service might be affected by problems at an upstream provider (e.g. Amazon Web Services)
- We occasionally need to perform system maintenance that requires temporary downtime
- You have exceeded your API rate limit
When this happens and Zencoder is down, your application will typically get a '503 Service Unavailable' response from Zencoder, but you could get a different error (like a 500). If you have exceeded your API rate limit, you will get a '403 Rate Limit Exceeded' response.
The good news: since video encoding is an asynchronous process, you can build your application to never experience downtime or problems related to our availability. If you do this, the worst case scenario is that your jobs take a bit longer. But no errors occur. We highly recommend that you do this.
To put it more strongly, if you care about reliability, you should follow this approach to integration — for Zencoder, or for any critical API that you integrate with.
Steps to reliable app integration
1. Include a Secondary URL as a backup in case upload to your primary location fails.
2. If you get a non-successful response code from Zencoder — basically, something other than a 200 or 201 — don't fail the job. A response code of 503 doesn't mean that your video can't be processed. It just means that Zencoder is temporarily unavailable.
3. If you get a connection error when trying to connect to Zencoder, do the same thing.
4. Similarly, wrap your API requests in a timeout. We recommend a 30 second timeout; Zencoder usually responds in less than a second, so 30 seconds is usually plenty of time.
5. In all three of these cases — if you get a non-successful response code, can't connect, or the API request times out — flag the job as 'pending'.
6. Periodically, resubmit any jobs in the 'pending' state. You could use cron to do this every minute, for instance.
Once the jobs are resubmitted, everything behaves like normal. This way, a failed job submission only makes the job take a little longer rather than causing trouble for your application or your users.
OK, so this isn't Pseudocode — it's Ruby. But Ruby is pretty easy to read. This example uses SystemTimer (a reliable timeout library). You should be able to find equivalent libraries in other languages or frameworks.
1. Imagine a Videos table that includes these columns. (It will obviously have more, including columns to store a Zencoder job ID and a Zencoder output file ID.)
create_table :videos do |t| t.string "state" t.integer "lock_version" end
2. A Video should include a state machine with the following states:
- pending (not yet submitted to Zencoder)
- submitting (currently submitting to Zencoder)
- transcoding (successfully submitted to Zencoder)
- finished (Zencoder finished transcoding, and the job is done)
- failed (Zencoder was unable to transcode the video)
3. When a new video is ingested, save the video in the 'submitting' state and trigger a background job to submit the video to Zencoder.
# got a new video! video = Video.new(params) video.state = "submitting" video.save submit_to_zencoder(video)
You really should background the submit_to_zencoder method. In Ruby, using DelayedJob, this might look like this:
But we'll stick with our
submit_to_zencoder(video) method for example purposes.
submit_to_zencoder function looks something like this. This should be run asynchronously, in the background.
def submit_to_zencoder(video) begin SystemTimer.timeout_after(30.seconds) do response = Zencoder::Job.create(attributes) if response.code == 201 video.state = "transcoding" video.save else video.state = "pending" video.save end end # Rescue any connection error. Our plugin abstracts these as # Zencoder::HTTPError. # # If you're not using the Zencoder plugin, this includes things # like Errno::ECONNRESET, Errno::ETIMEDOUT, Errno::ECONNREFUSED, # Errno::EHOSTDOWN, and SocketError. rescue Timeout::Error, Zencoder::HTTPError video.state = "pending" video.save end end
5. Every so often — e.g. every minute — try to resubmit jobs that are in the 'pending' state.
def resubmit_pending_jobs videos = Video.find(:all, :conditions => "state = 'pending'") videos.each do |video| begin video.state = "submitting" video.save submit_to_zencoder(video) rescue ActiveRecord::StaleObjectError end end end
Also, by adding a 'lock_version' column to the videos table, we introduce optimistic locking. This means that if the record gets updated between the
Video.find query and
video.save, it won't submit the job to Zencoder. This will prevent the job to be submitted to Zencoder twice accidentally. You could use pessimistic or database locking or some other lock method to accomplish the same thing.
It's that easy…
All things considered, this is a pretty simple approach to ensuring 100% integration reliability between Zencoder and your application. It's a few more steps than just naively submitting a job; but it ensures that no matter what happens — whether it's an occasional timeout, or unexpected downtime at Zencoder, or scheduled maintenance — your app runs reliably.