Feedback requested on Services Metrics proposal
rmiller at mozilla.com
Fri Oct 14 12:21:06 PDT 2011
On 10/14/11 12:14 PM, Mike Connor wrote:
> On 2011-10-14, at 3:04 PM, Rob Miller wrote:
>> On 10/13/11 8:13 PM, Richard Newman wrote:
>>> (Randomly diving in to parts of this. Sorry!)
>>>> * Regarding timers, I really like the concept but wonder if it is ideal to release one timer event instead of 2 distinct "start" and "end" events, as they are produced. The reason I like distinct "point" events is that they are emitted immediately and thus can be consumed by near real-time monitors for more rapid reaction, potentially while an event is in-flight. They also have benefits for replaying a system's behavior directly from the log stream, without needing to muck with the message timeline. You also don't have to maintain as much state in the producer, just an ID so a downstream system can correlate the start and end events for a particular pair. The downside is downstream systems need to pair up the events and there is overhead of an extra event emitted. But, I think the benefits are compelling.
>>> I concur. To add to Greg's thought:
>>> If the timer's enclosed code doesn't terminate within a reasonable amount of time, logging explicit start and end pairs (or just the start!) is waaay more useful than waiting to log until the code finishes!
>>> The term we used at Tellme for aggregating raw log events into more structured data was "sessionizing". Start and end correlation was one part of this. I presume that our metrics infrastructure has some similar capability for arbitrary stream processing of events. As you approach this problem of generating, delivering, and storing these raw log messages, it's worth thinking about the inevitable analysis layer that goes on top.
>> This is a great point. As described in the propoal, so far we've identified 3 concrete back ends:
>> - statsd (for counter and timer events)
>> - sentry (for errors)
>> - bagheera / hadoop (for everything else)
> - ArcSight (for security-relevant events) via CEF logs.
Noted, thanks. I'll include mention of this in the next round of spec
revisions that I'm about to make. :)
More information about the Services-dev