The Relicans

Cover image for Resilient Elasticity: Green is Not Enough with Jay Gordon
Mandy Moore
Mandy Moore

Posted on

Resilient Elasticity: Green is Not Enough with Jay Gordon

Jonan Scheffler interviews Microsoft Cloud Advocate Jay Gordon about how important it is to maintain your reputation as a business, that monitoring and observability are both integral parts of how we keep things online and running nowadays, and how serverless is the future and how it’s going to eventually make problems like hosting easier for devs.

Should you find a burning need to share your thoughts or rants about the show, please spray them at While you’re going to all the trouble of shipping us some bytes, please consider taking a moment to let us know what you’d like to hear on the show in the future. Despite the all-caps flaming you will receive in response, please know that we are sincerely interested in your feedback; we aim to appease. Follow us on the Twitters: @ObservyMcObserv.

play pause Observy McObservface

Jonan Scheffler: Hello and welcome back to Observy McObservface, proudly brought to you by New Relic's developer relations team, The Relicans. Observy is about observability in something a bit more than the traditional sense. It's often about technology and tools that we use to gain visibility into our systems. But it is also about people because, fundamentally, software is about people. You can think of Observy as something of an observability variety show where we will apply systems thinking and think critically about challenges across our entire industry. And we very much look forward to having you join us. You can find the show notes for this episode along with all of The Relicans podcasts on We're so pleased to have you here this week. Enjoy the show.

Welcome back to Observy McObservface. I am Jonan Scheffler. I work at New Relic. I want to remind everyone before we kick off here that New Relic has a big user conference coming up called FutureStack. And if you are interested in attending, and you should absolutely be interested in attending because our entire DevRel team, The Relicans, is going to be there speaking. You should stop by and sign up. You have until May 25th to get on it, so I recommend you go today. So, I am joined today by my guest, Jay Gordon. How are you, Jay?

Jay Gordon: I'm doing good. How are you?

Jonan: I'm hanging in there. I am anxiously awaiting a post-pandemic world. I understand you recently got a vaccine.

Jay: Yeah. Luckily, here in New York City, where I'm based out of, there's been a really great distribution of the vaccine. And I think we're on a path to some sort of positive post-Covid-19 world. And I really do believe in the resiliency of this City and something I hold pretty near and dear to my heart. And so I'm very thankful that I got my second dose, and it will be all set just in time for me to go to my first Yankees game of 2021.

Jonan: That's awesome. I'm really happy for you. I'm out here in Oregon. We have our anarchist jurisdiction out here lagging behind a little bit. We had our vaccine distribution cut significantly. We also have some issues as a state getting the things rolled out, but it's coming. In the next couple of months, vaccines are supposed to be available to everyone. My parents were able to get their first round recently, and it's nice to be on the other side. I agree with you about New York, by the way; New Yorkers are nothing if not resilient. For all of the conversations I've had in my life about East Coast, West Coast communication, I think there is something to that. Like, the perspective of people on the coasts is definitely different, but most of the time, especially in tech, we find enough common ground. But New Yorkers, I think resilient is a good way to describe them. That’s great. I'm glad you're going back to the Yankees’ games.

Jay: Thank you. I'm terminally East Coast. I don't know how I'm ever going to leave the New York City area. I've tried a number of times, and the closest I got to leaving was going to New Jersey. [laughter] So I've been here in New York City since I was a teenager. I moved here very, very young, and I can't imagine my life anywhere else at this point.

Jonan: I feel kind of the same about Oregon, actually. Like, having been born in Colorado, I moved out here when I was quite young, maybe when I was about 10 years old, but I'm definitely Oregon-people. I would like to live overseas again at some point, but I expect that I'll settle somewhere with lots of trees; that's my jam. So, let's talk about some nerd stuff. I understand you've been doing this computer thing for quite some time. You and I are about the same age, actually, exactly the same age. And you came into software about the time that I also finished school, which was right around the .com crash, the big bubble bursting, right?

Jay: Yeah. I was a college dropout, and one of the big reasons is that I came from a place where there wasn't a lot of money, period. There was just not a lot of money in my home. And so I was obligated to go find full-time work at a very young age. And so around the time I was maybe 20, I had started working as I guess, what you would call a sysadmin and working in operations for different companies some very, very small till around 2002 when I found myself at a company out of New Jersey called Datapipe where I really did make my bones, if you will, or get things started. And I'm very thankful for having some longevity in this world because it's one of those industries that it's very easy to get burnt out on it. And I've definitely had moments where I wanted to kind of say, “Hey, maybe I should have just been a barber or a butcher.” But I don't know if I would have the experiences I'm having now if I would have ended up doing those types of things.

Jonan: Yeah, definitely not. I was having a conversation with a friend the other day about this, someone who's just coming into software. I talk to a lot of code school graduates because I came out of a code school myself. And they're all so excited to be here. They've just made this big change. They were working in bars, you know, a lot of people right now are from the restaurant and bar industry because they took such a big hit right now. So I think of the friend who was a bartender saying to me, “I can't wait to get into tech. It's going to be such an exciting journey to see where I can take this.” And I'm thinking, having now been here for 10, 11, 12 years, there have definitely been times I wanted to walk because it's got a lot of problems. But for all of that, we certainly have opportunity. I feel very fortunate for that.

Jay: The interesting part about that opportunity is that for a long time, it was very, very gated. There was only a real certain type of archetype, a person that was in technology. And there was a term that I was actually talking about with someone last night that I really believe was overly popularized in the time that I got into tech, this kind of character, and that was the Bastard Operator From Hell, this person that believed that their thoughts, opinions, and viewpoints on technology were just infallible. They could not be questioned. And that way of going about your business is just not something you can do now.

Jonan: Yeah. And I'm so excited to see it. When I get in these moments, I have a strategy, a couple of strategies actually, but one of them is to just look back at what the industry looked like when I got here and how much progress we've made. And it's true that opportunity is not spread evenly today, not even close. But we have made progress, and I can see a world where that progress continues to march forward. That like, I guess, rude genius in the corner that was tolerated in the industry for so long, what people didn't see I think at the time is that it doesn't matter if you have your 10X engineer in the corner, even though that's not really a thing, if they're a jerk to everyone, they're going to push far more than their productivity out the door just by being a jerk.

Jay: The average 10X rockstar ninja person is more than likely just overworked, and they are under some sort of poor management that allows them to be the dominant voice and person. I like to always go back to the Phoenix project and Brent. We always need less Brents, not that Brent was a terrible person as a character, but the work that he had to take on in that book was terrible. And if you've not read the book, I recommend you do because I think that a lot of what we talk about in modern tech really the DevOps movement, agile, they all kind of lend themselves, or I should say they allow themselves to kind of grab some of the big ideas that came from that book. And the thing that stuck with me the most was always that need to go through a single person to get anything really accomplished.

And I mentioned this to you before we started recording; you know, I worked at a place for years that we did not have any real way of triaging support issues. I was in a Linux department or a Unix department, as we called it. The phone calls would go into the main line, and the people were offered three choices, and it wasn't by a punch down, you know, dial tone choice. It was literally talking to an operator, and they said, “Do you want Windows, Unix, or sales?” And if it went to the Unix team, I was just transferred a call, and I had no clue what it was in reference to. It was the luck of the draw; pick it up and see what was wrong. And so in real-time, I was dealing with real problems that on the other side -- and we're talking about in the early 2000s where you couldn't just spin up a new VM and say, “Okay, things are fixed.” It meant sometimes having to look real-time into systems, log in via SSH. I was in Windows for a little bit, so I had to go into the server, go into IIS or, in this case, go onto machines and troubleshoot massive email like Qmail exploits or things like that where people had expectations of what their technology solutions were supposed to look like. But what they didn't realize is that we were in an environment at that time where things were just kind of Wild West. You had no clue how certain intrusions would happen. There were so many ways to get into things.

And I think back to the days of PHP-Nuke and all these frameworks that existed that allowed people to easily create a CMS. A blog, at that time, was just a personal website. And the problem with them is that when you cowboy your technology solutions, you find out really quickly where your gaps are because they present themselves. They say to you, “Hey, guess what? You've got 2 million emails outbound in your queue right now. Do you know why?” “Me neither,” but we have to spend time figuring it out. And so early tools around monitoring were very, very black and white, especially, and we were super reactive as opposed to proactive to these types of things. The big things that I can recall is a company I worked for had a service that they had inside of the company that someone built, and it was called what's down? And it was like a combination of Nagios and some front-end scripts and stuff like that. But literally, it was just like, what's down? And once in a while, there would need to be manual intervention. Someone would have to, in the data center, make changes, find servers, replace drives. And while all this was going on, when it shows it's down to us, it's probably been worse for the user or the customer way beforehand.

And so the thing that I really appreciate about modern tooling around say, monitoring, observability, whatever you want to call them, they're both important parts of how we keep things online nowadays, is the fact that we look for practical ways of measuring uptime and failure as opposed to is it just green? And I think we know nowadays that something just being green isn't necessarily enough. How much time is it taking for an HTTP call or HTTPS call to happen for someone to go and retrieve data from a database and then modify that data into something that's presentable? We actually can stick tools in the middle and actually take a look and say, “You know what? This particular HTTP call is taking too much time, and we need to decide if it's negatively impacting our business if we're missing out on sales of a particular product because someone is sitting in looking at a call that's just taking too long.” Ultimately, we talk about reputation and how your reputation as a business can be damaged because of these small, extra gaps in time that happened between the initial click and what happens after that.

Jonan: Yeah, this damage to the reputation piece is interesting. I think it's representative of the fact that there were grades. You talked a little bit about green is not enough. This binary thinking that existed in the system was called what's down? As in, it's either on, or it's off, and when it's off, then we go and fix it. And it's very much this reactive world. And today, we're able to recognize not even just is this one HTTP request taking too long when we communicate with this API? But hey, we used to check the Twitter API, and it took us this 50 milliseconds, and now it's taking us 150, and it's growing. What's going on there? We can investigate before it gets to timeout level, and we stop actually getting the tweets that we need for our application to function, and we actually have that business impact. Right now, it's slow. And there have been lots of studies done about if a thing takes too long, the user goes away like shopping cart abandonment with things. You get too many spinners on the website.

Jay: Yeah, there's a greater impact nowadays. I just wanted to make sure that I said that one of the big things that you get in real-time now is a response, not just from your direct customer that's actually purchasing services whether you're a big cloud provider or you're someone small. You're actually hearing it from the people who are trying to use these services in real-time on places like Twitter. So you're going to find out real quick when someone doesn't think your product or your website or your application is not working the way they assumed it was. And like I said, it all goes back to the reputation that your business has, and your application has.

I'm going to talk actually tomorrow with Ana Medina of Gremlin who's just one of the best people I know. And one of the things that she helps people understand is that we're going to fail. It's going to happen at some point. It's how you prepare for the unexpected that really shows how resilient of not just a website you have or an app but how resilient as a business you are and how you can proactively prepare for the worst. And I think that if we’ve learned anything in the last 12 to 13 months, is that all good plans can go right into the toilet. I know I had plans to go to places all over the world last year; it certainly didn't happen. So, what did I have to do? I had to go to my failure mitigation plan, and I had to say, “How am I going to do this job?” Because I work in DevRel as well for Microsoft doing DevRel around DevOps and fundamentals on Azure and I had to start saying, “How am I going to reach people? How am I going to communicate? How am I going to effectively get people to understand what it is I want them to do?” And so I had to, on the fly, make a decision on how to do this, how to present it, how to make things work for not just me, but for the people who are going to ingest the content. And so I really believe in testing out your pressure points now and finding the pain before the pain finds you. And I believe in that not just in our applications, but I believe in that in our day-to-day lives. We have to sometimes do some chaos engineering on ourselves.

So it's been a very interesting trip through technology over the last 20 something years because I've watched all these different waves of technology movements and their proliferation across our business. I remember when Infrastructure as Code really just took off and how we all made the assumptions that the world was going to be run by Chef and Puppet or CFEngine or something for the rest of our lives. And then we realized things move on, things change. People need new ways of doing things. We need maybe simpler tools. Maybe we need ways to integrate other tools and then, eventually, the cloud. And the cloud really, I think, made us all rethink how we were doing things like monitoring, observability, whatever you want to call it. We all had to sit down and think there's no longer just one place to get your information. You have to go across a number of different services. You have to recognize the impact of a failure across those different services at different points and make decisions on how to, like I said, mitigate that move on.

Jonan: Yeah. And we had to change our thinking around the tooling that we had built already. There was this huge shift where suddenly -- like, the answer used to be well, hey, when your customer starts using too much of their server, you get them a bigger server. And now the answer is we'll get them 100, just spin up some more VMs, get them more containers. And we're now in this world where we have technologies like Kubernetes. You've got to adapt your observability strategy to address a much more complex world than we had in many ways. But we also have an advantage there in that we can watch these things evolve, and we can respond in real-time, and we get feedback in real-time through channels like Twitter. Unfortunately, in these public forums, if your stuff is slow or broken, you're going to find out real quick, and you have to respond quickly if you want to manage the reputation. Because when you were back in that call center, it was an individual customer having a problem, and yeah, they may talk to 10 people about it, but they weren't tweeting about it.

Jay: No, they would have to go to some -- I remember the early days of people having to go to their users on other forums or Usenet or things like that to be able to scream out how frustrated they were. And sometimes they would be frustrated with the company I worked for, sometimes network outages occur, a line card would die in a switch, and we'd have to replace the switch. They would be frustrated if we didn't have replacement RAM for a server that they had for seven years or something stupid like that. We're dealing with much different problems now. And I think that one of the things that really changed everything, at least for me, was how elasticity really impacted the way we work in technology, the availability of resources on a dime and the ability to destroy them when you need to and how that impacted our overall costs. I think that that was a really huge innovation for us.

And I was working for that hosting company when they were one of the very first Managed Service Providers for AWS. And I thought, wow, why would they want to do this? They can just get a bigger server with us. And I didn't grok how important elasticity would become until I actually was in a situation where it was so important, and that was when I was working at BuzzFeed. And working at BuzzFeed elasticity ended up becoming so critical because we had one of our largest nights of traffic ever while I was there. I was on call that night, and there was that dress that no one knew what color it was. And it was funny to me how these little innocuous things could require so many different resources to keep everything online. So we're talking about a Mongo server for collecting the votes. We're talking about a CMS on the front end so that people could actually write posts about this thing, and they couldn't even accomplish that at first because the resources weren't really available. And what we had to do was on the fly say, “Well, let's make sure auto-scaling for this is enabled. Let's make sure that we are able to migrate from X size RDS instance to Y size RDS instance without it overly impacting us. How are we going to create larger Memcached servers? These were things that, prior to working at a place like BuzzFeed, were secondary thoughts to me. And now all these things that I'm talking about, this is every day, matter of fact, that's old. You don't hear people talking about “Yeah, I’ve spun up a Memcached server just to make sure that the…” No, I mean, you just don't hear about that a lot nowadays because there are other easier ways of accomplishing our goals with keeping things online and making it fast. And so, watching a revolution happen and watching companies build tools to take part in that revolution. I remember New Relic, and I don't want to just be y’all, company man, for you on your product, but I remember when New Relic was still fresh and new, and APM was something that people didn't really talk about a lot. And now, I dare you to find a major company with a website that isn't using some sort of APM tooling or some sort of observability tool to ensure that what they have is going to stay online and be reliable.

Jonan: Yeah. I mean, the idea would be absurd if you were starting out a company and you just didn't install these things from the beginning to make sure that you could grow and scale. I think that the industry around that observability piece has shifted a lot with this transition to our new world, where this elasticity has become so important. But I'm curious to hear what you think might be coming. You talked a little bit about this world where replacing the server was not an option, and now you can do it with a click. And now we have these containers, and we have Kubernetes. So if I was to have you back in a year, what do you think will have changed? Maybe in a year is too soon, let's say a couple of years, two to five-year scale. What do you think is coming for us? Which parts of the movements you see happening now do you think are important and will perpetuate?

Jay: Serverless. It was originally kind of a joke where people are like, “You need servers. It's not truly serverless.” But I've watched how Microsoft Azure functions have been so huge for us and how it becomes integrated into all these other more complex services to reduce overhead for people. And I believe that ultimately, we've gone into Database as a Service; we've gone into Containers as a Service and being able to bring up these things on a fly. And I think that eventually, startup time, boot time, whatever you want to call it, is going to just become intolerable.

I always still like to call it hosting, and I think it's important to just still refer to it as hosting. I think we're going to find ways to make hosting look easier for developers and ultimately reduce how much people are going to need to configure. I don't know if you're going to be sitting down and writing as much YAML in a couple of years. I don't know if you're going to be so concerned about container orchestration in a few years. And I think it's the same way I'm not really thinking about Zen Hosts or KVM hosts anymore, you know why? Sure, six, seven years ago, just top of mind, always. How am I going to understand how virtualization works? We don't even think about it now. We just say, “Yeah, we need a VM.” And the fact that you can run a VM on your local computer and simulate a lot of the same experiences with virtual machines or even Minikube or something like that to be able to do local development on these things. And so I think that at the same time, we're going to see serverless functions and things like that become more important. And it's always going to be the reason why I think it's always been this way is the developers are going to lead the charge. And this is going to mean that people in the world of operations are going to have to continue to evolve. There was a way to skin a cat for the longest time, and now we're not dealing with a cat. We're dealing with a thousand kittens all at the same time. So I think that to wrangle all those kittens, we're going to just stop having to need one giant litter box. We're going to have to figure out a way to present things faster, easier, and they don't stink quite as much. I love my analogies if you can't tell.

Jonan: I really liked that analogy, actually. I think you're right that the tooling and the modernization that's happened -- I'm thinking about this local development environment you mentioned. I very rarely run anything outside of a container for local development. Like, when I'm working with a Postgres database that's in a container, I've got a volume, so my data persists. And when I need to upgrade, I'm not even thinking about what random files has this installation of whatever software changed on my computer that I need to go find and delete and get ready to upgrade? I just blow the container away, and there's a new container available, right? Because that's the reality, we have today. And that level of abstraction lets me move faster and do the thing that I like, which is shipping. I want to deliver software; that's why I work in software. And I feel like that's the one common thread that's going to drive us to keep innovating with these things. I'm excited to see a world where that perpetuates and where technologies like serverless continue to play an important role. I want to ask you, too, we talked a little bit about these days when you were on the phone, the blue dress effect, the butterfly effect that happened when you were at BuzzFeed where this meme comes about, and then wow, everyone's up and on-call and responding to this thing. At whatever point in your career you choose, there are people out there today listening to this who feel like they're in about that place. What advice would you have for yourself at some point in the past?

Jay: I used to do a podcast called On-Call Nightmares. It's since been sunset. But when I did that podcast, I got a lot of time to reflect on hearing other people's nightmares that they dealt with in their technology careers, and I got to reflect on some of mine. And I think if I had time to sit in and find 2005 Jay and have a real conversation, one, I would tell him, “Be a little bit more quiet,” in general. The other thing I would tell him is that “The worst is not necessarily the absolute worst. You don't need to get too upset over what you think is tragic and brutal as far as dealing with outages or dealing with failure in technology,” because I would take things to heart and I'd get upset. Nowadays, I've come to learn that failure is as much a part of our lives as breathing oxygen. We're going to fail. It is impossible to live a life that is completely devoid of failure. And so I would tell someone else right now, I think it's just as important, that “Bad things will happen, but you will figure them out, and you will persevere because that's the only way that you will continue to have a career that's meaningful, not just for who you work for, but for yourself.”

Jonan: Yeah. That's, I think, the resiliency we started the episode off discussing. The resiliency of New Yorkers, I actually think, is something that human beings have in common generally. We encounter a lot of difficult things. We fail a lot. We usually find our way around those failures. That's part of what it means to be human.

Jay: How you get up after you trip and fall is an important story to be able to tell people.

Jonan: Yeah, it has been a really nice conversation. I really appreciate you coming on the show with us today, Jay. If people wanted to follow up with you, track some of your work online, where would they go?

Jay: Sure. You can find me on Twitter. It’s pretty much the easiest place, @jaydestro, J-A-Y-D-E-S-T-R-O. And I do a weekly live stream for Microsoft Azure; it's called Azure Fun Bytes. Every week, I have a guest, we sit down for an hour, and we learn about the fundamentals and the products that make up Azure and also the complementary products as well. So I hope to see someone like yourself join me and help people understand New Relic's tooling nowadays on Azure.

Jonan: I would love that. Sounds fun. Thank you again, Jay. I hope you have a wonderful day.

Jay: You do the same. Thanks a lot for having me on, Jonan.

Jonan: I want to remind you all that New Relic and The Relicans are going to be at our upcoming conference, FutureStack coming up on May 24th. You can stop by and read about it. We would love to have you there. I hope you have a wonderful day.

Thank you so much for joining us. We really appreciate it. You can find the show notes for this episode along with all of the rest of The Relicans podcasts on In fact, most anything The Relicans get up to online will be on that site. You'll also find news there of FutureStack, our upcoming conference here at New Relic. We would love to have you join us. We'll see you next week. Take care.

Discussion (0)