Monthly Archives: July 2012

Meet Amazon EC2: Bigdata on the Cloud

I have been exploring Amazon EC2 as a Cloud based alternative to Midsized Software Product Company. I stumbled across this Slideshow Presentation from Netflix, a frontier in running their entire IT on Amazon’s EC2 platform. As per Netflix, they don’t have any data center, amazing isn’t it?

So I started exploring Amazon EC2 and how a company can run their entire IT in Amazon EC2. I was more interested in the technology stand point.

For a starter, in Amazon you can create various Linux instances including Ubuntu for free. They are elastic servers, where you can increase the RAM, Processing power on demand. Once you setup the instance, you can ssh on to the machine and do pretty much whatever you want. Refer this youtube link for how to setup Amazon.

As per Amazon, in a month, you get 750hrs free server usage, in simple words, that is plenty for testing your business idea. There is standard Amazon Machine Images (AMI) which has various pre-configured stacks including LAMP. Developing a decent Web application and exposing to the users is easy.

The interesting thing I noticed, it does have good Hadoop, MapReduce support. For more details of how to setup Hadoop in Amazon refer this youtube link. There are few commandline interface (cli) tools to manage EMR.

In Amazon the equivalent of HDFS is s3. Equivalent of Hadoop is Elastic MapReduce.