Scribe at Ning – case study

Posted by pierre on June 28, 2010 – 12:33 PM

Lately, the analytics team at Ning has been re-thinking the way we collect data and store it into HDFS.

The system we had in place was a combination of a few subsystems, including shell scripts (rsync run from cron writing directly to HDFS), and a custom built Hadoop endpoint, named ‘collector’. The collector works as follow: the clients post serialized Thrift events to a set of servers which in turn write them into HDFS. These servers provide additional services including data validation as well as bucketizing to limit the number of files the Hadoop Namenode needs to track.

We decided to streamline this log aggregation process and to improve the robustness of the architecture by making all subsystems use the collectors exclusively, giving us a single front door to HDFS. This forced us to re-think the way we scaled this endpoint. Historically, a load balancer (VIP) front-ended a set of collectors which could be scaled horizontally to accommodate any increase in load. There were a couple of disadvantages to this approach. As the number of clients increased, we had to provision more collectors to handle the increased number of connections even though our data transfer rate was low. The other challenge was distributing this model across multiple data centers.

To solve these issues, we needed a way to abstract the event transfer pipes between the clients and the collectors. Scribe, Facebook’s log aggregator, suited our needs very well. We added a tree of Scribe servers between the clients and the collector VIP. These intermediate Scribe servers took care of aggregating and forwarding requests from the large set of clients. The collectors were thus able to process a large number of incoming events with a fixed set of connections from the Scribe servers. The overall deployment architecture is shown in the figure below.

Scribe architecture

Scribe architecture

Even though Scribe is capable of writing to HDFS directly, we added a Scribe endpoint to our collectors which allows us to filter and transform the event data before storing it in HDFS. The primary advantage of having this intermediate layer is that it allows us to do processing of the live event stream. For example, we broadcast certain events through JMS Topics that other parts of our system subscribe to.

However, we encountered a few issues with this deployment under heavy load. By default, the Scribe server connects to a single address that it needs to forward the messages to. This made load balancing difficult when used in conjunction with a load balancing VIP. Once a connection has been set up, there is no way to re-balance the load as additional collectors are added. This resulted in some of our collectors being heavily loaded while others sat idle.

To address this problem, we added additional functionality in Scribe that forces it to reconnect after having sent a fixed number of messages. After sending a certain number of messages (this is a configurable value), the Scribe client closes and re-opens the connection. The load is balanced across the collectors as we open new connections periodically. The graphs below show the load distributed between six collectors on a staging environment (the connection is recycled after one million events are sent).

Events received by collectors

Events received by collectors over time in a staging environment

Rate of events received

Rate of events received by collectors in a staging environment

The graph below shows the load in our production environment. The service is currently handling 2,800 messages per second (20 Mbits of data per second). The high frequency sinusoidal components illustrate the load balancing feature. The overall waveform reflects the evolution of traffic throughput between days and nights (the peak being around 10:00AM PDT).

Rate of events

Rate of events received by collectors in production

This modification has been open-sourced. See our repository on GitHub. By the way, we are currently working on open-sourcing the collectors. Stay tuned for the announcement on our Ning code blog.

pierre Posted by pierre, written on June 28, 2010 – 12:33 PM.

Also from Ning Code…

 
  1. 4 Responses to “Scribe at Ning – case study”

  2. Hey Pierre,

    If you’re managing a large Scribe installation, you may be interested in checking out Flume: http://github.com/cloudera/flume.

    Later,
    Jeff

    By Jeff on Jul 2, 2010

  3. Interesting post!

    I had some questions and a few places clarification:
    — can you say how large are the individual events are?
    — can you say how many front end and back end servers are feeding the front/left tier scribe servers? (servers/scribe?)
    — where are the data center boundaries in the architecture diagram that you alluded to in the prose?
    — I think the scribe open/close is primarily for the back/right scribe tier to do better load balancing across the collectors? is this right?
    — What purpose does the front/left tier of scribes fanning out to the back/right tier of scribes serve? Are the data streams being partitioned here or are the being replicated?

    Thanks,
    Jon.

    PS. Feel free to email, I would be really interested in chatting more about this!

    By jmhsieh on Jul 3, 2010

  4. Maybe I don’t have enough background information on the subject, and would appreciate someone throwing a Lisp tome my way to enlighten me. The architecture gives me cause for some concern, however.

    If we’re generating sufficient load that the VIP is going to throw an event to one of three Scribe front-ends, wouldn’t filtering the results to two Scribe back-ends just bottleneck the system? If we have two beefy machines on the back-end handling the aggregation, why not just put them up front, and save ourselves the need for the three front-end machines?

    No, I am positive I don’t have enough background on the subject — pointers to relevant documentation explaining why this is good architecture would be awesome. Thanks!!

    By Samuel A. Falvo II on Aug 25, 2010

  1. 1 Trackback(s)

  2. Aug 4, 2010: movw $0×1f01,0xb8000; hlt; » New Hadoop related open-source projects

Post a Comment

Comment moderation is enabled. Your comment will not appear until it has been approved. We look forward to hearing from you and encourage your comments, critiques, questions, and suggestions. We ask that you stay on topic, respect other people's opinions, and avoid profanity and offensive statements. For questions specific to your Ning Network, please drop us a note in the Ning Help Center.

Attend Tech Talks by Ning's Engineering & Ops teams at Ning HQ in downtown Palo Alto, CA!

Archives by Category

Search this Blog


RSS