Locally on ye olde MacBook Air, I have twitter-based photo input working to a degree, and I'd like to take a moment to walk through the practical issues involved (and there are a bunch of them).
I'm mostly doing this to see if anyone has some suggestions about better ways to handle this (because the plan right now is considerably less than ideal).
On a gallery edit screen, you will be able to pick a hashtag on twitter to pull in photos from.
Once you do that, the gallery will get added to a queue of galleries that need twitter integration. Right now, the plan is to run a cron job every few minutes that pulls down a twitter search (one limited to containing links and without retweets) for each gallery and shuffles each tweet off to a resque job to process said tweet.
The resque job will do a few things with the tweet:
- Find any links in the tweet
- Expand any of said links that are shortened (using the longurl gem)
- For each of those links, it will hit the embed.ly api (using the embedly gem) and get the oEmbed version.
- If that oEmbed version is a photo type, it'll then add the photo to the gallery. The tweet text becomes the caption, the twitter user becomes the from_user and the original link to the image is also stored).
- Then our duplicate checking kicks in. Right now it checks if the image is the same size as another added in the last 30 minutes. I'll probably add a check that the original link is the same (and much higher in the chain).
- If it determines the photo is a duplicate, it will delete itself.
- If it doesn't, it will fire off an email to the gallery owner saying they have a new submission and push the update out to any browsers currently listening for live updates on the frontend.
It's a lot of work, with a lot of http calls in there (especially for multiple-shortened links). At this moment, I think my backend server's beefy enough to handle it, but I've been horribly wrong about this sort of thing in the past.
It's also fraught with potential points of failure. What if Twitter's search api endpoints are down? Or one of the many lengtheners longurl checks? Or embed.ly? That's on top of my usual failure points: MongoHQ, S3 and my general screwups from time to time (the later of those being a much bigger concern).
In the end, though, the twitter input combined with the realtime frontend interface is really amazing to behold. It's impressively demoable and can become a real selling point. It's something I feel I have to do to increase the value of the product to customers.
Of course, in an ideal world, I wouldn't have to do a couple of those steps.
- Polling: I hate polling. Hate it. But I don't want to be responsible for maintaining n+1 streams to the twitter streaming api right now (and I suspect they wouldn't much like me doing that, either). I'd also have to kick off streams on the fly (doable, but not fun with existing tools). Piping search rss into Superfeedr for Pubsubhubbub is another appealing option (and one I might consider a little later), but then I lose some of the niceties that come with the twitter gem (that's a minor quibble, but one that comes into play when you're trying to get a feature out the door).
- Expanding shortened urls: If you've ever been on the new Twitter interface with Firebug or some other web developer toolbar open, you'll notice the search views hit a different search endpoint than developers are allowed to use. That interface returns json with a) the links already unshortened and b) with the image information already hydrated. It'd be ideal for this, but Twitter has that locked down for internal use right now (copying the url into a fresh browser window hits a Forbidden error). Pity. UPDATE: Yesterday, this URL returned Forbidden (and slightly different results, like hydrated image info [I think ... working from memory here]). Today, not so much. Interesting.
- Duplicate checking: This one's all me. I need to find a way to handle that better before it even hits the database and S3 rather than going through the trouble of adding it just to delete it later.
So, tweets aren't going to be nearly as realtime as I'd like (polling and rate limiting) and are going to be expensive computational-wise.
My thought right now, both to limit usage and make sure I'm getting the revenue I need to make everything make sense, is to limit this feature to those on a highest-end, Kaboodle plan. I don't like to do that, but in this case, I think it's the best decision for now.
At this moment, the plan is to have twitter-based input working and deployed late tonight, so I can test it out on tweets coming from this weekend's Startup Weekend here in Indianapolis.
Again, suggestions welcome.
Montabe