The Relicans

Cover image for A bits about Telemetry types
Vladimir Ulogov
Vladimir Ulogov

Posted on

A bits about Telemetry types

In the business of Monitoring and Observability, you are constantly hear words "Telemetry" and "Metrics". And for the right reason. We are performing Monitoring and Observation through telemetry that we receiving or, and that is important, calculating. Without data produced by the sources, which could be hosts, or other equipment, applications, frontends and backends, software and hardware components, cloud and container infrastructure, business processes, we can not make any observation and therefore conclusion about infrastructure that you are observing.

But big question remains: what are the Telemetry, which types of Telemetry we are dealing with and how different types of Telemetry telling us a stories about our infrastructure.

Because an end-goal of Telemetry analysis is to come with a relevant stories about an infrastructure and overall business that this infrastructure involved with. People do not want to hear unsorted facts, but rather stories produced from those facts and understanding, how your Telemetry can tell you the story is based on understanding of what your Telemetry are.

What your Telemetry are ?

Where Metric defines data, Telemetry represents some data associated with timestamp. We can surely define a three major groups of the Telemetry data types:

  • Facts
  • Calculations
  • Relations

What is your Facts ?

Fact is most basic form of Telemetry, originated from your infrastructure and telling you the different facts about infrastructure and business processes that you are observing. Facts are atomic and non-divisible. Facts are describing the single atomic state, measured at some moment in time. Facts are always placed on a timeline. Facts are always produced by the sources in infrastructure or business processes. There is no such thing as "orphaned Fact". Here are few types of Facts that we can recognize:

  • Binary Fact
  • Value Fact
  • Complex Fact
  • Descriptive Fact

Binary Fact

Binary Fact, as name suggests can be always in one of two states. TRUE or FALSE, WORKING or NONWORKING, 1 or 0 and so on.

The use of Binary Fact are limited to indicate a binary state of some Telemetry. For example: "Application A, on Host B: STARED or STOPPED". Or "User C on cluster D authentication: SUCCESSFUL or FAILED". If state of Telemetry can not be represented as binary, this should be the case for a Value or Complex Fact. Binary Facts can be an outcome of Calculations.

Value Fact

Value Fact is a indicator of state for a Telemetry item, that can be expressed through some value. Large majority of Telemetry items are Value Facts. Value Fact characterized by the single value associated with given Fact measured or obtained at specific time. Examples for a Value Fact could be:

  • Load average on host A is 0.25
  • Battery status on UPS B is CHARGING
  • Throughput on interface C of switch D is 10500 kBps

Value Facts could be used to represent a variety of true Facts about observed infrastructure or business process. When we are defining a Value Facts, we must understand, that there are different types of the values for such Facts:

  • Raw value. The value of the Value Fact representing a current state of the metric, represented by Telemetry item without any extra assumptions. For example, if "Answer is 42", we are taking this value without any further interpretation.
  • Counter. The counter value representing a count of some event, such as number of packets, number of starts or restarts of application and so on. Counter is always greater than 0, integer and commonly increasing. Every counter have a "reset timestamp". "Reset timestamp" is a timestamp, where counter was set to 0.
  • Delta. Arithmetical difference between previous value and current value. Cases for the Delta Facts are limited to the situations, where we do not care about an actual value, but rather immediate trends.
  • Rate. Counted number of events per timeframe (second, minute, hour, etc ...). Applications for the Rate Value are non-exclusively limited to the cases, where we do not care about actual count of the event, and we do not care about deltas, but rather to see how the counter is aligned on the Timeline, that allow to the observer to estimate various performance and load Metrics. For example: "Number of I/O operations in database is 100/sec"

And as well, as for the Binary Fats, Value Fact could be outcome of calculations.

Complex Fact.

Value Fact is variation of the Complex Fact, the only difference is in the number of values. Value Fact have a single value, Complex Fact do have a multiple values defined by keys. The values of Complex Fact can be of the same types as for a Value Facts. Application for the Complex Facts are included the cases, where multiple values collected at the same time during the same probe or measure. For example: Load average on host A is lavg5 X, lavg10 Y, lavg15 Z.

Complex Fact can be a result of Computation.

Descriptive Fact.

Common example of Descriptive Facts are the Logs. And the description of what is Descriptive Fact as follows. Descriptive Fact is a special case of the Value Fact, where the value is a string, originally intended to be interpreted by humans. There are several technics including parsing and pattern searching and matching been developed as Computations functions over values of Descriptive Facts. Those technics are intended to produce a Value Fact, Complex Facts or a Binary Facts out of values of a Descriptive Facts.

This is rather non-exhaustive list of the value types for the Value Facts, but as I hope it is covering wide variety of the most common use cases. If you thing that I've missed some, let me know in comments.

Do you compute this ?

Computation types of telemetry data are producers the facts, but by itself are different from telemetry Facts by the origin. When the Facts are the data harvested from your infrastructure and business processes, Computations are performed over collected or computed Facts.

Computation is an integral part of data (Facts) collection. Not all Facts that's required for the analysis can and shall be gathered. Many of them are the result of computation over already known Facts. Before we will discuss the types of the Computing, let's talk about Telemetry Observation Matrix.

All metrics are fitting into a matrix

Image description

Telemetry Observation Matrix (TOM) is a two or three dimensional matrix where the columns representing sources of telemetry, rows, representing the specific metrics and each individual cell representing telemetry collected or calculated from the specific sources for specific metric.

When you fill TOM with telemetry collected in about same timeframe, you will get the representative snapshot of collected and calculated telemetry for your infrastructure or business processes as it exists at the specific moment.

By adding third dimension to the TOM, we are adding a timeline.

Types of Calculation.

Now, let's discuss about types of Computation.

  • Free-form computation.
  • Pattern searching.
  • Aggregation.

What can you compute ?

Free-form computation, as name suggests, do not limited itself to the use of the Telemetry data arranged over any axis of TOM, or any other grouped items. Free-form computation performing over existing data, located anywhere on the timeline, while computing formula is defined by the user. There are too many examples of free-form computation, but let me bring the one: calculate an average per-process memory utilization of all processes, where the name of the process is matched to string "java".

It is all about patterns.

One of the important analytical tasks is to detect if Telemetry data sample is matching one or more of known patterns. Detection of patterns is one of very important aspects of behavioral analysis. For example: is your memory utilization is spiking than sharply dropped. This could be a potentially indication that some processes were killed due to a memory overconsumption. There are number of ways and methods to detect a patterns in data. Tried and true method is to apply statistical analysis on known data sample. Also, use of Machine Learning become more and more popular for that task.

Row, Column, Timeline ....

Aggregation is a process of computation over one of the three dimensions of TOM.

  • Column Aggregation, is a computation performed over Telemetry stored in column of TOM. This will bring to the computation, different Telemetry items from the single source. Common use case is to aggregate data from related Telemetry in the single source. For example: compute the total utilization of all data partitions on the host.
  • Row Aggregation, is a computation performed over Telemetry stored in row of TOM. This will bring to the computation same Telemetry items located in different sources. Common use cases is to perform computation for the Telemetry items in the cluster. Example: calculate difference between minimum and maximum load average on the members of cluster.
  • Timeline Aggregation, is a computation performed over Telemetry values of the same Telemetry item, from the same source, but with sampling on the timeline axis of TOM. Common use cases for this type of Aggregation is to perform statistical, ML (and other) analysis of the Telemetry item across the time. Example: Calculate an average memory utilization on host X in the last 60 minutes.

Outcome of computation

Regardless of what you are computing and which computing method you choose for the job, likely you will have a Value Fact, Complex Fact or a Binary Fact as an outcome of your computation. And this fact will be a part of your TOM, having timestamp of the time of computation performed. Since, the outcome of computation will be the part of TOM, other Computations could use the outcome of other Computations, producing new outcomes, which could be the base for other computation and so on.

Computations over known Facts or other Computation outcomes are the heart and a soul of Telemetry Analysis and observation. If in your observation you are rely only on data collected from your infrastructure or business processes, your observations are shallow and likely, you will not have a deeper understanding of the processes that you are trying to observe. Your conclusions are always as good, as the data you have.

What is your relations ?

The third Telemetry type is Relation. Relation is a process of establishing a Fact, existing between two Entities. Facts linked to Relation, could be a Binary Facts, Value Facts or Complex Facts. In addition to establish the Fact, that the relationship between entities do exists, Relation establishing the fact that the certain Telemetry values and timestamp been recorded associated with that relation. What is the use for the Relation Telemetry type ? Every time, that you are looking to analyze some fine-tuned facts, some of this analysis will be difficult if possible at all without establishing relationship between entities and capture Telemetry. Example: Number of requests from user X to application Y is Z.

Conclusion

In this very short article, I've made an attempt to bring together information about various types of Telemetry data that could be used for exchanging, storing and analysis. This data is generated by some infrastructure and/or business processes and usually do have a static typing. And while I am discounting that I can miss something, but at the same time, I am hoping that I covered most important types of Telemetry data.

Discussion (0)