The Relicans

Cover image for Open Evolution – Containers, Observability and Symmetry with Michael Hausenblas
Mandy Moore
Mandy Moore

Posted on

Open Evolution – Containers, Observability and Symmetry with Michael Hausenblas

Jonan Scheffler interviews Michael Hausenblas who is a Solution Engineering Lead in the AWS open source observability service team. He also serves as a Cloud Native Ambassador at the CNCF. Together, they chat about open source observability including but not limited to Prometheus/OpenMetrics, Grafana, OpenTelemetry, and OpenSearch.

Should you find a burning need to share your thoughts or rants about the show, please spray them at devrel@newrelic.com. While you're going to all the trouble of shipping us some bytes, please consider taking a moment to let us know what you'd like to hear on the show in the future. Despite the all-caps flaming you will receive in response, please know that we are sincerely interested in your feedback; we aim to appease. Follow us on the Twitters: @ObservyMcObserv.

Jonan Scheffler: Hello and welcome back to Observy McObservface, proudly brought to you by New Relic's Developer Relations team, The Relicans. Observy is about observability in something a bit more than the traditional sense. It's often about technology and tools that we use to gain visibility into our systems. But it is also about people because, fundamentally, software is about people. You can think of Observy as something of an observability variety show where we will apply systems thinking and think critically about challenges across our entire industry, and we very much look forward to having you join us. You can find the show notes for this episode along with all of The Relicans podcasts on developer.newrelic.com/podcasts. We're so pleased to have you here this week. Enjoy the show.

Hello and welcome back to Observy McObservface. I'm Jonan. And I'm joined today by my guest, Michael Hausenblas. How are you, Michael?

Michael Hausenblas: Thank you very much, and thank you very much for having me. Excellent. It's late at night here in Ireland, so the lights are almost out. And I'm very much looking forward to having a little chat.

Jonan: I recognize your Irish accent. I would have guessed you're from Ireland, yeah.

Michael: Oh yeah. As you can tell from my last name, Hausenblas.

Jonan: Hausenblas, yeah.

Michael: Truth to be told, I'm originally from Austria, and we moved to Ireland some 11 years ago. Our kids do have a very Irish accent, but unfortunately, or whatever, I don't. The German accent kind of shines through, but that's fine with me. I'm an old person. [laughs] ,

Jonan: Austria is in the Northern region of Germany, right? If there's one thing I know about Austrians it's that they like to be mistaken for Germans.

Michael: That is absolutely not the case.

Jonan: Okay. [laughs]

Michael: Yes. Thank you. It's a bit like UK and Ireland. You're looking to your bigger neighbor, and you go like, eh, whatever. No. Germany from Denmark down there to Switzerland and Austria, so Austria is geographically and mentally between Germany and Austria. We are a little bit funnier than the Germans, almost as chill as the Italians, so a nice combination of we have a certain strictness and so on. But at the same time, we have this dolce far niente. We can still lean back and just chill. That's a very nice mixture.

Jonan: I love Austria. I got to visit recently for a conference, and I'm trying to remember which one it was. It may have been Euruko. It was held in Vienna.

Michael: Oh, nice.

Jonan: The European Ruby Conference. And it was beautiful. I had a really nice time, yeah. So Austria, Germany, Ireland, and anywhere else in the world that you've lived as well?

Michael: Well, I've lived...it depends. In the late '90s...as I said, I am old. I'm almost 46 now.

Jonan: Oh my goodness.

Michael: So it was '97... Yeah, I know, right? Shocking. In '97, '98, these two years, I used to work at Shell Exploration Labs. It was huge, like 4,000-people labs, and I was as an intern during study. And so two times, three months I was living in the Netherlands, but I don't know if that really counts as properly living there. But yeah.

Jonan: I think that definitely counts. I think most Americans, for us, it's maybe more rare to leave our countries than for those who grew up in Europe. And I think it is entirely reasonable to say for an American that living three months abroad means that you've lived overseas.

Michael: [laughs]

Jonan: And you're now an international citizen. So you worked at Shell. Did they ever let you go on an oil platform?

Michael: No, but I did have a fire training where they would light up something, and you would actually go there, and you would be super nervous. Like, did I do that right? And so on. And then like, "Yay, I made it." And then they would show you that they actually just turned off the gas, so you didn't really do anything.

Jonan: [laughs]

Michael: But you're prepared. You now know how to deal with that. And they paid me a three-month Dutch course. I did learn Dutch in the Netherlands, but that was like 20 odd years ago.

Jonan: Wow.

Michael: Besides, "Yes, please," and "Can I have a beer?" or whatever, not much left.

Jonan: That's enough to make a friend in any country in the world.

Michael: Absolutely.

Jonan: That's what you need. So you have been working at Shell, and then you went off and you worked at Red Hat for a while. But I'm sure it wasn't contiguous.

Michael: The first ten years actually I was in academia, research, applied research, so from 2001 until 2012. I started out as a developer, Java and then C++, and then in 2012, I wrapped up as a research fellow, so postdoc researcher doing all these research things. And then I said, "Okay, that's it. I'm going to change tracks. I'm going to move to an industry. I'm going to try something new." And I started at MapR. It was back then when Hadoop was still the hot, new thing. And MapR is now part of HPE, I believe. It was a startup, a San Jose-based startup, and from then on, yeah. Then Silicon Valley Bay Area back then every two years you move somewhere else and yeah, another startup Mesosphere and then Red Hat. And since 2019, I'm with AWS.

Jonan: I've heard of AWS. I hear that they're the most of the internet. [laughs]

Michael: Yes, we sell books, and we also sell cloud, yes.

Jonan: Yes, cloud, yeah. I'm sure many people have seen this, but I saw The Memo from Jeff Bezos back in the day when this transition to cloud services was first starting where it says, "Any team that's building anything here you are presenting an external-facing API for the rest of the company to use. If you're creating a service, you're not interacting through a database. You are pretending as though other customers can use it." And I'm reading through this whole email. I'm like, "Wow, this is a smart approach." And at the end, it says, "If you don't do this, you're fired." [chuckles] And I was like, okay, that seems a little heavy-handed, but okay.

Michael: [laughs] Absolute motivation there. I do like, and I have always admired, this idea of loose coupling. I think it was just in certain parts for the broader like the mainstream a little bit too early. Because if you think back, what was the first environment when you used in whatever capacity as a developer or whatever? Maybe you were brought up with VMs, or maybe it was really like, "Here's some hardware, some metal." I think that was 2004 or 2005 or whatever, right before or around the time when the AWS was conceptually brought up, and that's more than 16 years. A lot has happened in the meantime. We have a lot of…

Jonan: That's like eight different tech jobs worth of years.

Michael: [chuckles] Right, yeah.

Jonan: Wow, yeah. It's been quite a ride. Hasn't it? As cloud technology has taken over the world. I mean, it does make a lot of sense, the loose coupling approach. I spend a lot of time talking to people in code schools because I came through a code school myself about a decade ago. And it's surprising to me how often it comes up that people decide a convenient means of interaction is for different applications to share a database. I very often will let people understand why that's a poor choice by teaching them load testing, and they're like, "Well, we want these two applications to work together on the same database." And then they learn about database locks real quickly and other things like that. But it's really a world today where that's kind of just table stakes, like those kinds of understanding about cloud and why cloud is valuable. Those who are not on board with the plan for cloud today generally are there because they are forced somehow by historical legacy systems. They're still trapped in on-prem, or they have some privacy concerns or other things like that. It's pretty well accepted today that that is the best way to build software. And it's very interesting to me how quickly that transition happened because, for an entire industry of professionals to agree on a thing as the right approach, especially in something like software, that's pretty rare.

Michael: Right. I'm sometimes a little bit surprised how different levels of offerings in terms of abstraction level have a different acceptance rate or acceptance...like if you overall look… I can't remember when Gmail came out, but I believe 2005 as well or something around that time.

Jonan: That's about right.

Michael: Maybe it's already updated by now. I have no idea.

Jonan: Well, they updated the software, I think in 2014, so nine years sounds about right for Google's update process, yeah. I'm teasing you, Google. [laughs]

Michael: At least it's not yet on the Killed by Google site list. [laughs]

Jonan: Yeah, right?

Michael: So that's already a plus. No, I'm just kidding. So arguably, this is a SaaS or cloud offering that people have been using for a long, long time. I have never, or I very seldomly hear company startups or whatever raising concerns around that saying like, "Oh my God." Literally your entire business conversations everything up to…here is a new password to the production server and so on. Everything goes through that, but that's okay. Whereas if I spin up a VM and if I can't control where it runs, if it cannot give me that guarantee that it runs in this country, that's the end of the world. It's like either actually both is bad or both is good. How is that different? I sometimes don't get it. Maybe I'm missing something, but I don't get where...if it's a higher level of abstraction, it's so obvious that this is not a web page, actually. It's like, yeah, sure there is a web page, but actually, this is a cloud service. You might be misleadingly using it because you think of it as mail, but this is really a cloud service where you have things stored, et cetera. And if it's breached and if someone gets into it, it's probably as bad as an openness-free bucket or a VM that someone can get into a VM. So that's the thing that I still don't really get it in terms of pushback or in terms of objections towards the cloud. But we have been using SaaS-level things for many, many years, much longer than any infrastructure-related things.

Jonan: Yeah. I mean, if you're working with air-gapped servers, well, I understand that's an inconvenient style of system to put in a cloud [chuckles] when you can't talk to it. But beyond that, I think a lot of the push around that kind of stuff where you see people doing stuff like legally forcing local cloud things…I think China and Russia require you've got to be using the servers there to get into their countries and things like that. And this is, obviously, I think all just to gerrymander the cloud landscape and make sure that it doesn't all end up in the Seattle area, which has plenty of issues for the internet. I get to now click away and accept cookies notification on every single website or a similar legislation that exists. Those pieces are important, though. And I appreciate that people are thinking about the privacy in the case of the GDPR compliance and things.

So you did mention VMs, and I wanted to talk about that. I think it's easy to draw a parallel between the transition to cloud and then the rapid transition that was to VMs and virtualization. We're not building bespoke servers for very long before we go over to the VM plan, and then we have containerization and many different options for containerization. And in the case of containerization specifically, Docker coming onto the scene and kind of changing the landscape. And I don't mean just because they have a cute whale, which they do, it's a cute whale, but because they create this standard of building that thing. They came on, and in a similar way, AWS created a standard for cloud services. The containerization transformation, when did that happen? Like five years ago?

Michael: Yeah, something like that. I would say 2014, 2015? I remember at DockerCon 2015 in San Francisco, where I was at the time working at Mesosphere, and that was, I think, maybe the second...I think it was the second DockerCon, and it was clear how impactful it was. You would see VMware there. You would see a lot of industry heavyweights there, essentially at this relatively, again, relatively small startup back then, and then not a very clear business plan and whatnot. But it was in some hotel off of Market Street somewhere. It wasn't a great location, but it was clear how vibrant, and how big, and how important that thing is. And I think the earliest thing that I recall around that was when I was still working at MapR, the job before, so like Hadoop stuff. And a very senior engineer…and we looked at Docker, and he had a strong background in security from IBM. And he looked at them and said, "But they're all sharing the same kernel. That's insane." [chuckles] And I was like, oh yeah, well, yeah, he has a point. And that's the first time when I was like, well, this has nothing to do...or you should not be comparing that with virtual machines where virtual machines you're essentially isolating yourself. You're building an environment that is isolated where you can run and have multi-tenancy. Whereas with Docker, what you're really doing is you're doing application-level dependency management. It's a bit like if someone knows Python, they know virtualenv where you can carve out part of the system where you can do whatever you like, and you don't touch or poison anything else on your server or machine. And that's the same thing, just on steroids, for any kind of language or any kind of environment. You can just carve out all your dependencies, make sure you have everything with you, and move that around. But it has nothing to do with isolation or that kind of thing.

Jonan: And the isolation pieces we've seen fall down pretty repeatedly over the years with various container escape exploits and things where you are able to get out of those environments and have access to other containers on the system. But I think you're right that the exact value and that in your case, the Python virtualenv…I've got that in the Docker container and right alongside it, I'm running Ruby with chruby, and right along that, I've got a Node container. And they all coexist and use the same syscalls and share that kernel, which very conveniently comes around using eBPF. And we'll get to the observability piece of that in a moment because that's really interesting to me.

Michael: Oh yeah.

Jonan: So we see this huge shift, and somehow Docker ends up being in the middle of it. I think what they created was just momentum in a single direction where a lot of different companies have the containerization technology, some of it proprietary, some of it public. And they were swimming in one place, and I think Go helped in no small part because Go was a burgeoning language people were very excited about. Yeah, I was going to talk maybe a little bit about the Go ecosystem, and then I want to hear your thoughts on what parallels you might draw between that and observability. But before that, maybe let's talk about how we moved from containers to now Kubernetes. We've got a whole nother layer of abstraction there. And when we first met, I said, "So, I understand that you know a lot about Kubernetes." And you said, "Well, I know some things." And I'm looking at a page, Michael, with six books, five of which have Kubernetes on the cover of them. I feel like that might be out of the average developer on the planet, maybe someone who knows more about Kubernetes than others. But please, let's talk about that a little bit, like how Go played into the container transition and then how containers led to Kubernetes.

Michael: Right. And just to add, certain people like to read a lot. I simply like to write a lot, but that doesn't mean that it's all good.

Jonan: [laughs]

Michael: It just means I write a lot. But seriously, the Go thing the aspect that you had there is really interesting. And I make the comparison a lot with the same role that C played in Unix land and then later on Linux land. You have this system language that defines the APIs and everything, all the system tools. The first generation of these are written in C. Nowadays, we see a lot of the good old tooling and so on being rewritten in Rust but from the '70s up until whatever a couple of years ago. And I see the same function or the same role Go is playing for cloud-native stuff. If you look around, there is Nomad, Docker, Kubernetes, a lot of if not all of HashiCorp stuff, a lot of other things are written in Go. And if you think of this Kubernetes as the core kernel or whatever of a cloud-native operating system or whatever, then what C is to Unix and Linux, Go is essentially to this kind of cloud-native environment. And I find it great, and I've done a couple of…in previous roles at Red Hat and also in a couple of...I don't know if you remember Velocity. That was one of O'Reilly’s when they still were doing conferences in this and in other contexts like even Go for sysadmins. So the idea Go is already so established for developers, but what about folks who nowadays would be more maybe SRE style, DevOps, infrastructure operators, who maybe you would use Bash or Python? We can get them over to Go. And the absolute joy and uptake there is like, that's awesome. Let us do all these things. And it's pretty easy to learn. You can learn Go in less than a day, right?

Jonan: Yeah.

Michael: It's very, very straightforward. And then, together with all the ecosystem, it's really great. And coming back to your question or your thoughts in that direction, I think that it was like a natural evolution from containers Docker on a single host or a single Node to well, what if you have more because you want to scale out or you want to have high availability or whatever? Well, then you end up having multiple Nodes, and then someone or something needs to coordinate, needs to say, "Yeah, launch this container over there.”And what if it fails over?" "Well, we restart it somewhere else,” that and service discovery, how do you then know where a certain part is to draw traffic to it? And there are a number of these things that initially...I mentioned I used to work at Mesosphere. Things like Mesos is dead, Mesos and Marathon really on top of the frameworks that Docker did. Docker had this broad approach from Docker, the thing that you would use on your desktop to build and push something in a container image, but then also Docker swarm and then all these Docker container orchestration things. And then Kubernetes came along, and HashiCorp has Nomad and many, many…does other things in containers as well, but you can also use it as a container orchestrator.

So it's like a natural evolution that, in a sense, initially, you had PCs, and when you would start somehow connecting them, well, there's a whole new set of things that you can do with them. There are things like; I don't know, playing that was one of my first things in high school in Novell 4, playing snake in the LAN in the network. Like, that's awesome. You cannot do that with ten people on your machine, a single machine. But if you have a network, then you can do that, and the same here. If you have one Node, yeah, sure, you can do certain things. But if you have many, then, well, there is this need for certain functionality to orchestrate or coordinate or whatever. And that's why these container orchestrators and leading amongst them in terms of a very clean API and an open-source project is Kubernetes. That's kind of natural evolution.

Jonan: So the snake LAN party I'm fascinated about. I think we should have another snake LAN party someday soon. That's something that needs to exist in, say, at a conference like a multiplayer snake on as old hardware as we can find.

Michael: Oh yeah.

Jonan: The Kubernetes transition what I'm curious...I mean, obviously, I think if we had a very clear understanding of how it was that Docker ended up being the winner for containers and Kubernetes ended up being a winner, we would both be billionaires. But what sorts of things about Kubernetes do you think led it to become such an explosive and popular choice for container orchestration? Because as you noted, there have been other platforms that do this work, and then there were many more, even open-source projects, that people were building. How did Kubernetes take the lead?

Michael: I think there are two factors. And remember, I was pretty much from the beginning...so the first activity around Kubernetes was for me in July 2015 at the 1.0 launch party or launch event in Portland when I was at Mesosphere, and we were sponsoring and being part of that Kubernetes launch. So I know the players and what's going on in the ecosystem quite well from the early days. And I think the combination of having an extremely clean API design where tons of lessons learned like the designers and the creators of Kubernetes being able to draw from Borg and Omega and so on. And so many, many years of oh, that's not a great way to do it, or oh, labels are actually a good thing or many, many lessons learned. If you have that opportunity that you can create something from scratch, and you can draw from these rich experiences, and then you get it right, and Kubernetes definitely got it right. Like, if I compared it with the set and then the really intertwined and not so fun APIs and payloads and whatnot that I had to deal with in the context of Mesos and Marathon, then clearly Kubernetes is a different piece. So that's from the technical part. But then also the very smart move from Google very early on to essentially create an independent and neutral place with CNCF to donate or to move Kubernetes there rather than to keep it. I would guess if Google would have kept the governance of committees, it might still have been a success. But it would have been one of the many successful projects at Google and not the mainstream neutral, huge project that draws; I don't know, 20,000 or 30,000 people at the KubeCon. And then you have user groups and whatnot in every country of the world, et cetera. So this is really, I think, after the Linux kernel, the second biggest success story around this Linux ecosystem, Linux Foundation.

Jonan: I hear it's catching on, this Kubernetes thing.

Michael: Yeah, getting around.

Jonan: KubeCon is doubling every year. And I think a lot of people, when they look at that from a business owner's perspective, you think why would Google possibly be motivated to take such an explosive and popular technology and release it to the world and open source the whole thing under this foundation that now exists to protect trademarks and open-source projects across the ecosystem and all the software ecosystems really? The model that the CNCF is demonstrating here for cloud-native is, I think, applicable across the board and really a valuable thing for open source long term. But why would Google possibly want to do that? Well, Google wouldn't have had what Kubernetes has become if they hadn't done it. And they continue to derive plenty of value from the ecosystem even existing. If you were looking to run Kubernetes in the cloud early on, Google had some pretty good offerings there because they had had it under development for a long time. So we have this world then today where the CNCF is expanding, and there are many other projects like Prometheus and Grafana I know you were involved in. And with returning to the observability discussion, I think there's a lot changing in observability right now that is pretty exciting. I wonder where you think we are in that similar ecosystem growth curve?

Michael: Funny you should mention it. That's exactly what I...if I look at that and it motivated me to...after two years in the container service team in AWS, I moved to the open-source observability service team. So service team, just for context, that's the folks in AWS that write and operate a service like EKS or manage Grafana, manage Prometheus. I would say we are somewhere maybe comparable to 2016, 2015, in container land, so the thing becoming mainstream. Nowadays, if you count the number of observability-related events alone this year, you would have o11y first, and o11y this, and o11y that. O11y being short for observability because there are 11 letters between o and y. These numeral things, whatever they're called, seem to be very popular nowadays, k8s and o11y and so on. So it's becoming mainstream, and the number of startups, the number of projects, everything, whatever indicator you're looking at seems to explode. And it's becoming now that this kind of base layer of okay, how do you build an image? How do you launch a container image as a container in Kubernetes or wherever? Now, that is kind of sorted. Now the focus is moving to well, okay, now I have a containerized microservice, and what do I do without observability? I'm flying blind. I don't know what's going on in the request pass. I need some way to know what's going on there. I need metrics. I need traces. I need all these things to actually be able to get actionable insights to troubleshoot, to understand the system's health, to improve my application, and so on.

Jonan: Which brings us to eBPF, I guess, which I am fascinated by where we're now able to instrument directly in the kernel. I mean, it's scary enough that we're sharing the kernel in the first place, but now we'll just run some arbitrary code in there. It'll be fine. But they've got this Extended Berkeley Packet Filter which using a VM makes it actually quite safe to do all of this in a very clearly defined API to do this work. And now we are able to watch those syscalls as they happen. So application A makes a network request out, and that's going through the same kernel that application B and application C are going through on the same machine. And we can measure the time it takes for those requests to come back from the kernel, and it means that I don't have to necessarily add that layer of instrumentation into my application. I could get rather a lot of information out of our example Python and JavaScript and Ruby apps running on that server or in that container VM. When those requests come back through, I have a lot of instrumentation without necessarily adding any code to those applications. I mean, I wouldn't even ask if you think that's going to continue to become popular because I think it is.

Michael: Oh yeah, absolutely. I think BPF is the only technology that I know that changed its name twice, once in one direction, then back originally Berkeley Packet Filters and then E as in the extended BPF, and then last year or two, actually, let's drop the E again, it's actually only BPF. I was like, okay, whatever [laughs] I get it. And now it doesn't stand for anything. BPF is just BPF. And on the one hand, this idea of having a very small footprint, virtual machine in the kernel that you can just feed any kind of code with certain restrictions that just run your user code in the kernel, that's super exciting. But for many years to me…I'm a Brendan Gregg fan; looking at what he does there and looking at that, it's like, okay, that's cool sh*t. I want to do that as well. But I didn't quite understand besides these very clear use cases that he had and still has in the context of Netflix performance and whatnot; where would I want to use that as a kind of run-off-the-mill developer or whatever? And then Isovalent came along and things like Cilium in the context of Kubernetes.

And nowadays, like the last half a year, it was clear when one acquaintance and friend after the other started working at Isovalent, I was like, ah yeah, I get what's going on. [laughs] And if you just looked who joined the last couple of months, it's pretty clear what's going on. So yes, eBPF, and they are not the only ones. They're like, many, many different startups and companies betting more and more. And to me, this has enormous potential. This is not just observability for eBPF. You could imagine, if you think in the context of a service mesh, why not having the functionality that currently you have as a sidecar, for example, Envoy or whatever you're using, data plane as service mesh as implemented directly in the kernel. It's a bunch of BPF programs. Why not? So this is really way beyond the currently obvious use cases or areas like observability and security, networking-related stuff—Super, super exciting. And I'm super bullish about what's going on in this space. The thing that I'm still a little bit looking at because I'm a huge fan of open standards and open specifications, and we do see that in the context of CNCF. If you look at OpenTelemetry, if you look at OpenMetrics, et cetera, I'm not saying the BPF stuff; the ecosystem there is not an open standard. Pretty much everything there is part of the kernel, right?

Jonan: Yeah.

Michael: So that means it's an open standard. But higher-level things, once you start layering things on top of that, I would love to see something like that being an open specification as well so that vendors can compete on the implementation, but there is an interoperable basis there too. You swap out because you're moving from that one vendor or one cloud to the other or whatever is the reason for the changes.

Jonan: I'm a huge fan of OpenTelemetry. I think it's a huge step forward for observability generally. And I think it's driving a lot of the vendors in competing on the implementation discussion that you had touched on there. I think eBPF is still very early to have those layers of abstraction that are going to enable that kind of stuff. But observability suddenly, I feel, is becoming much more of a household name in tech than it was, and I don't just mean the marketing term. We can talk about monitoring or whatever we want to call it, telemetry. But the idea that this is a thing, I think, is partially driven by containers. But I wonder if you have any thoughts on...like, if we draw that same comparison, the Docker containers which grew into this abstraction and Kubernetes, then we have projects like Prometheus and Grafana and OpenTelemetry. The Prometheus polling approach became very popular very quickly. Most anything that is in this ecosystem exposes a Prometheus metrics endpoint. So what is coming in terms of layers of abstractions where we had containers, and that turned into Kubernetes, and now we've got this observability trend and all of these open standards, and that's going to turn into something. Let's make a guess.

Michael: Right. Obviously, I'm biased because I'm working upstream in a project called Polly, but let's get to that in a second. Coming from the bottom layer, if you look at...there are the sources, like, I don't know, VPC Flow Logs and your application that you've instrumented and whatnot, your database, and so on. They all emit signals, logs and traces, and metrics, and whatnot. And then you have the agents that scrape or ingest or whatever these signals to certain destinations like you want to look at it in Grafana, or you want to use OpenSearch or wherever you want to consume and get insights there. I heard you folks also have something like that that might be related, so whatever you're using.

Jonan: Whatever it is that this New Relic company does over here. [laughs]

Michael: Whatever you're using, exactly. And I would argue that in the telemetry layer, so getting the signals from the sources to the destinations, we are already in 2021 in a very good place. We have with OpenTelemetry and OpenMetrics; unfortunately, unfortunately, logs we're still a little bit in the air, but we have the telemetry layer consolidation around OpenTelemetry as the collector, as the agent pretty much covered. So by the end of the year, we should be in a really, really good place. So, where do we move from there? Well, what about the rest of the stack or the rest of the solution? Well, there's still a lot of things going on. You don't have a standard dashboarding way or whatever alerting. Maybe you could argue that the Prometheus alert manager is also standard given what Grafana now did with 8.0, supporting that in the unified alerting. But still, we don't have, and I don't think that we will ever have a standard for these front ends or destinations. But there are certain things that you can standardize. And again, the basic idea of these standards of these open specifications is make it boring. Make it table stakes so that anyone who provides something is a vendor in that space, can compete on the implementation and not on different formats, different protocols, different APIs, and so on. And that is the motivation behind Polly, which is kind of like Prometheus mixin is reloaded. Think of it just as a way to configure observability systems.

So if you take, for example, Grafana, you have dashboards and alerts in there. And those dashboards consist of certain panels that say, "Draw this graph here or show this number here.” But they have certain parameters in there that may depend on; for example, if you have a Kubernetes cluster, you have different namespaces than I have. So this is a parameter that varies between our environments. So I'm going to make this part of the dashboard variable when I say, "Someone supply me that parameter," when they install the dashboard, for example, or import a dashboard, and that is the focus of Polly, provide a framework, provide an end-to-end. How are these things published? How are these things consumed? Based on a language called Q, which also comes from this Google around their...I can't remember what they call it. I think it's Borg Configuration Language, BCL, how they configure their jobs, you know, list. It's essentially the same lesson learned from Borg and Omega going into Kubernetes. Just the same happened here for Q, and Q being this very flexible...and sometimes if you squint, it looks a little bit like Go for certain things, very, very powerful way, which is used in a number of things that we found out like Dagger, for example, the new thing that the former Docker co-founder is currently establishing around deployment and many others in the last half a year. I would say this uptake of Q is really fascinating. For Polly, this is like an implementation detail, but it's interesting to see how certain things happen in parallel, and Q is certainly beyond observability. It happens just to be that we in Polly are using Q as well.

So the next step to wrap up is what happens next? I guess standardizing, now that we have the telemetry layer more or less standardized, moving on to open specifications in this destination part. So how do we configure? How do we combine? How do we correlate things? If you look at correlation, there are no standards. There are no specifications around. But how do you get from…the only thing that comes to mind, which you can argue is a standard are Exemplars, right? How do you get from a metric in OpenMetrics to a trace ID, embedding a trace ID, you can jump to trace, that is relevant for that metric? That's the only thing that is standardized if you wish, but all the other transitions from how do I get from logs to traces, logs to metrics to...and so on. And so it's like there are many, many transitions possible. And different vendors and different systems provide a talk in their context, different correlation methods, but much of it is still manual. You need to manually as a human look at it and then take that and then via a timestamp or whatever by whatever method then aligning and jumping from one to the other. So that's where I'm hoping others might jump in and want to help in standardizing this part because it's useful.

Jonan: It is really useful. And I think it's...I mean, when you're making these kinds of predictions, it's often enough to look at what the vendors are doing to provide added value on top of the existing standards, much like New Relic, at least it was...I mean, I worked there, so I probably was pretty myopic, but one of the first places I saw this cross-application tracing implementation where we've got this New Relic ID and the headers go across a wire, and we just follow the number across the whole thing. Similar technologies exist right now. A huge part of the value of New Relic and similar platforms is that all of the data is in one place, and they do all this correlation work. So if I'm understanding the proposal, I want to make sure that I'm making it clear to myself here. You're talking about the rest of those implementations. We have the OpenMetrics trace ID stuff allowing us to do that cross-application tracing now. And then we have similar needs to correlate a particular log entry to that trace or an error or an alert to tie all of those things together, and then yet the standards don't exist for those different formats of data to tie them in one piece.

Michael: Correct, yeah.

Jonan: So when we have that, if we had that, then on top of that, you also talked about this UI layer, which I found very interesting because we're using React for ours. People write their UI components in React for the dashboards. And then you've got different languages like Grafana uses PromptQL in some cases, but actually, it's more flexible than that now. You can use SQL in different languages in there as well. Do you think there's a world where the actual query language in between comes to be something like we standardize on GraphQL in the observability community? I don't imagine even what that would look like. But do you think that there's some standard way of working at that layer with a UI?

Michael: That's an interesting question I haven't really thought about yet. But it's kind of obvious now that you say it. I would argue that for any kind of time-related things, so if you look at metrics, that's certainly something that happens in a time context. Logs as well might not be regular because they happen whenever they happen. But you can also have this time dimension traces equally. So you could have a...I'm not saying that this must necessarily be PromptQL, but you could have a...in contrast to vanilla SQL or SQL, which doesn't have a time component per se, a time-related query language that essentially is able to deal with all kinds of signals no matter what it is. I'm not sure how profiles would work in that context, but at least for the three basic signals, logs, metrics, and traces, that would actually be an interesting thing, having a universal time-based query language, which if you are a fan of the label semantics, then you already have that to a certain extent. Within Grafana, you have PromptQL for metrics. And I would need to lie; I don't know what the...is it LogQL what log it uses? I don't know, the other one. But it has the same...you're dealing with labels, and you have this time component, and it looks very, very similar. I don't know if that would be a win to have a standardized language for all signals or if different signals if it's better for it to have different time-based languages for query languages for different signal types. But yeah, I guess we have found some new activity where it's worth it to drill deeper and see maybe use the upcoming KubeCon or events when we are allowed to travel again to sit together and think about that. That's actually a very nice way to go about that. Because these kinds of conversations always lead to some insights, and it's like, oh, actually, why don't we go about that? Maybe someone is already working out there. If you're working out there, hit me up. I'm interested.

Jonan: Yeah, this is what I miss so much about physical events. Well, we've made our prediction. And the last step in every show, the interview question, is what advice you might give someone now having had a long and healthy career in the industry. What advice would you give either yourself or someone else starting out and looking to follow in your footsteps a bit? I think, especially with regards to your success, finding a path that all along has kept you in open source, which I think is quite rare. Maybe you have some advice for people starting out today.

Michael: Right. I think I'm really bad at advising because I would certainly not listen to myself there.

Jonan: [chuckles]

Michael: But I would say stay curious. As long as you're curious, everything else, I guess, follows. I very often in interviews and what not say that I was just at the right time, at the right place. I was lucky, and others would say, "Yeah, that might be true." And I still believe that no matter who you look at, people with hindsight saying, "Yeah, the reason why I'm successful, this is the reason. Here are the steps, and you just need to reproduce that, and then you get to the same position." That's bullshit. That's not true. That's survivorship bias. You cannot say that. What you can say is if you're prepared, if you're working towards something, if you want to be an artist, or if you want to be a developer, or if you want to be whatever and you put some energy and effort into that preparing yourself, then you might be better prepared for when the opportunity comes along or whatever strikes to grab that and make something out of it. Whereas if you're not, then maybe you don't even recognize the opportunity, maybe you go like, "Eh, whatever." But don't underestimate there's a lot of luck. Only 1 out of whatever 50 or 100 startups really makes it. It's not that the others were slackers. No, everyone was working hard, but this one was in a particular lucky situation, and they made it. And the only thing that you can really do is, yeah, put the effort in. But if you are curious, if you ask, "Why?" And it's like, oh, that's a great idea. Why is that the case? Just ask why. Curiosity, I think, is something that just keeps you, no matter how old you are; it keeps you just young in your mind and your worldview.

Jonan: Yeah. And we're very fortunate to work in an industry where that's possible. We get to learn and to be curious our entire lives.

Michael: That's exactly it, yeah.

Jonan: Well, it's been an absolute pleasure talking to you, Michael. Thank you so much for coming on the show. I look forward to seeing you out at a physical event sometime. I want to sit down and plan out our GraphQL takeover of observability [laughter], whatever that ends up being. And I'm going to go check out Q. I hadn't heard of it, and I'm quite excited that there is an option beyond YAML, to be frank. So, I will go look into that. But thank you again, and I will see you next time.

Michael: Thanks for having me. Cheers.

Jonan: Thank you so much for joining us. We really appreciate it. You can find the show notes for this episode along with all of the rest of The Relicans podcasts on therelicans.com. In fact, most anything The Relicans get up to online will be on that site. We'll see you next week. Take care.

Discussion (0)