Tech Talk: Realtime Analytics at Ning

I am very excited about the Analytics tech talk today, Apr 27th 2011. It starts at 6pm in our Palo Alto office and is open to everyone interested.  You can read the details and RSVP on the meetup page.

We do tech talks regularly at Ning. You can checkout the video from our first one.

Free Pizza, beer, and Analytics!  See you in the evening.

Low latency Analytics at Ning

The Analytics team at Ning has recently been working on a low latency feed from our platform. With more and more features being offered to our Network Creators, we need to be able to get quick feedback from our products and systems: if a deployment introduces an overall page rendering slowdown across Ning networks for instance, we want to know about it in a matter of minutes, not hours.

Our platform is instrumented using our open-source Thrift collectors which aggregate and dump data into HDFS. A periodic process imports the data from Hadoop to our Data Warehouse (Netezza), and we’ve built a variety of dashboards to visualize trends over time.

Continue reading “Low latency Analytics at Ning”

Sweeper, an HDFS browser

When monitoring a Hadoop cluster, two numbers come in mind: disk usage and how many files are tracked by the namenode. In combination with traditional monitoring tools (Nagios, Zenoss, …), we also use a small in-house utility, Sweeper, to quickly find large directories to clean up or small files to combine.

Continue reading “Sweeper, an HDFS browser”

Scribe at Ning – case study

Lately, the analytics team at Ning has been re-thinking the way we collect data and store it into HDFS.

The system we had in place was a combination of a few subsystems, including shell scripts (rsync run from cron writing directly to HDFS), and a custom built Hadoop endpoint, named ‘collector’. The collector works as follow: the clients post serialized Thrift events to a set of servers which in turn write them into HDFS. These servers provide additional services including data validation as well as bucketizing to limit the number of files the Hadoop Namenode needs to track.

Continue reading “Scribe at Ning – case study”