Kinesis, the streaming data service that Amazon Web Services announced in November, is now publicly available. The service is comparable in theory to popular open source technologies such as Apache Storm, only Kinesis brings with the fully managed experience that’s becoming par for the course within AWS.
Stream processing is becoming increasingly popular as companies — especially internet-based ones — are looking for ways to move beyond the batch-processing workloads on which they’ve been relying on for years. Really, it’s about taking advantage of the timeliness of data rather than waiting minutes or even hours to analyze along with everything else collected since the last batch process ran. Storm is probably the most-popular tool for the job, processing the data as it crosses the wire before sending someplace like Hadoop to be analyzed along with other historical data.
Twitter, which employs Storm creator Nathan Marz via its 2011 acquisition of Backtype, is possibly the prototypical Storm user. Storm lets Twitter do things like keep users’ timelines up to date and track of breaking trends, but the company relies on a bevy of other tools (Hadoop, of course, among them) to do things like analyzing longer-term trends and training its search engine models.
Like Storm, Kinesis can process data in real time before shipping it into another data store — most likely Elastic MapReduce, Redshift or DynamoDB within the AWS platform. Unlike Storm, however, Kinesis can maintain data for up to 24 hours and is automatically scalable up to hundreds of terabytes per hour via a software development kit, or SDK. Kinesis does include a connector for porting data to Storm, which AWS General Manager for Data Science Matt Wood said is a possibility in cases where existing Storm users want to keep using it for processing data while automating the collection with Kinesis.
The SDK is a big key to Kinesis, because the service really is designed to “open up the opportunities of building more responsive applications,” AWS said in an interview. Early-access users include mobile-game developer Supercell, which is using Kinesis to feed real-time dashboards with data streaming off its game servers, and marketing platform Bizo. Wood said Bizo has a “remarkably small development team” and, thanks to Kinesis, has been able to move a lot of precious man-hours away from managing a data pipeline and onto more valuable tasks.
As I wrote when Kinesis was announced, AWS is the only cloud provider offering a service like it, so we’ll see how long it is before others follow suit with streaming services of their own. Corporate computing infrastructure now includes a full pipeline for capturing and process data, and other cloud providers are going to need a lot more than just a Hadoop service if they hope to stem the tide of customers already choosing AWS.