Principle SRE at Equinix, Amy Tobey talks to Jonan about leadership anxiety and managing that, using SLOs as durable processes in our businesses that drag our focus back to customers on a regular basis, and the fact that as software developers, we can't learn it all because it's impossible.
Instead, Amy says that in order to succeed in the field, we’ve got to pick something that we can dig into and get good at some small corner of it. Then! We need to find ourselves a little, secure space and work outwards from there.
Should you find a burning need to share your thoughts or rants about the show, please spray them at firstname.lastname@example.org. While you’re going to all the trouble of shipping us some bytes, please consider taking a moment to let us know what you’d like to hear on the show in the future. Despite the all-caps flaming you will receive in response, please know that we are sincerely interested in your feedback; we aim to appease. Follow us on the Twitters: @ObservyMcObserv.
Jonan Scheffler: Hello and welcome back to Observy McObservface, proudly brought to you by New Relic's developer relations team, The Relicans. Observy is about observability in something a bit more than the traditional sense. It's often about technology and tools that we use to gain visibility into our systems. But it is also about people because, fundamentally, software is about people. You can think of Observy as something of an observability variety show where we will apply systems thinking and think critically about challenges across our entire industry. And we very much look forward to having you join us. You can find the show notes for this episode along with all of The Relicans podcasts on developer.newrelic.com/podcasts. We're so pleased to have you here this week. Enjoy the show.
Jonan: Hello and welcome back to Observy McObservface, proudly brought to you by New Relic's developer relations team, The Relicans. Observy is about observability in something a bit more than the traditional sense. It's often about technology and tools that we use to gain visibility into our systems. But it is also about people because, fundamentally, software is about people. You can think of Observy as something of an observability variety show where we will apply systems thinking and think critically about challenges across our entire industry, and we very much look forward to having you join us. You can find the show notes for this episode along with all of The Relicans podcasts on developer.newrelic.com/podcasts. We're so pleased to have you here this week. Enjoy the show.
Jonan: I am joined today by my guest, Amy Tobey. How are you, Amy?
Amy Tobey: I'm doing well. Thank you.
Jonan: Thank you so much for coming on the podcast. I love your -- in the background here on the call, I'm seeing a bunch of stuffies, and there's the dog from the “this is fine” graphic where everything's burning around. [Chuckles]
Amy: It’s interesting being a 40 something engineer on Zoom calls all day and having this be the first thing that people see when they meet me. It's not just me but also my array of books and stuffies. And usually, I stand so that they can't quite see Deadpool until I move around, and then they're squinting at me. And then there's my Moogle, which is kind of from the Final Fantasy universe. So the lion was one of my early dog's favorite toys, and so I still keep that around. So there's a lot of personal stuff here. I got Totoro a Chocobo, and a–
Jonan: You said Totoro like you speak Japanese. Do you speak Japanese?
Amy: [Speaking Japanese]
Jonan: [Speaking Japanese] So this means a little bit. In Japanese culture, anytime anyone asks you if you are capable of doing a thing as a foreigner, the correct response is, “Oh, I can do a tiny bit.” If you were fluent in Japanese and you've lived there for 20 years, the answer to do you speak Japanese? Is I try, I do a little bit.
Amy: [Laughs] I've kind of picked up on it, but only a couple of times. I say I speak Japanese, and they're like, “Oh?” then they start talking full speed. And then [Speaking Japanese] [Laughter]
Jonan: Your pronunciation is pretty on point. Well done. I'm a huge fan of Totoro as well. Working remotely is really interesting because I think many of us are experienced with it working in software. It's a harsh transition, though, for a lot of the world. And I know certainly with a lot of my coworkers here at New Relic, it has been a challenge. We had an in-office culture for certain parts of the business in a big way. What's the hardest part about going remote for you?
Amy: It's always been the same. I've been remote on and off since 2008 was my first remote gig. And the main thing is I miss the water cooler. And when I go to virtual conferences, I miss the hallway, and it's just the unrecorded free space where people just talk and be human together. I feel that is the thing that I miss most. And I think that hurts our ability to work together the most over time. It's harder to develop relationships and banter and culture when you don't have those places for people to mix without structure.
Jonan: Yeah. I think it's really important those casual socializing spaces. The water cooler thing I think is interesting because the first time I ever heard of this was a rule from GitHub, but I'm sure other people have similar rules that you don't talk about business around the watercooler as to say if you're in a hallway or you're out to lunch with someone, you can't talk about the projects you're working on. If you're talking about a project, if someone else on your team or in the company wants to know about the conversation you're having, it needs to be in a way that they can replay it when they're remote. The key to remote work is just putting all of that in a Slack conversation, putting it in a recorded meeting.
Amy: In theory.
Jonan: In theory.
Amy: I was at GitHub for a little while, and I never heard that there, but everybody was conscious of it because when you were in the office, there's almost nobody else there on your team. I would be there, and it was really rare for anybody else that I worked with daily to be around because I only lived an hour away. Most everybody else is scattered all over the world. So really, what you're trying to solve there is you and I go off to the watercooler and have a discussion and make some decisions. And we don't go back and write that down, which almost nobody remembers to do that; it's just a human thing. Then obviously, we've excluded the rest of the team from that context and from that conversation in that decision. And so it's mainly in hybrid remote work where I think that's supercritical. But if your team's at an offsite and they start talking tech, I don't see like something somebody should set up like we shouldn't talk about this because there's a rule against it. You should have the conversation and then go back and talk with the rest of your team. And the guide there is to make sure that you're transparent in situations where it's hard to remember to do it to include the rest of the group.
Jonan: I like the way you talk about this because you're very forgiving of humans in this whole thing like, look, this is just what people do. And I think we have a tendency to make hard and fast rules about things: this is the way it must be done. And you're right that we should be taking a much more iterative approach.
Amy: We have to. I don't believe we have a choice going forward into the future. We'll circle back to a little tech here or a little bit of closer to tech. How long have you been doing tech?
Jonan: I've been here about ten years in various spheres.
Amy: I'm at year 21 in this industry. And I know everybody's tired of hearing me say that, but I find it useful to draw from that experience. And the thing is that the point where I am now having been in operations, and SRE, DevOps the whole time is if I don't learn how to just dance with the system to really just relax and let people be people and do their silly people things, I wouldn't have any hair left. It would be all gray and falling out.
Amy: I’d just hate myself because there's nothing we can do about it. People are going to be people, and human dynamics are human dynamics. And all the rules in the world, I've just never seen them solve the problems and even worse, the thing I was mentioning about us going into the future is things ain't getting any simpler. When I started, LAMP was the rage. That's how I built -- My first few years of my career was on LAMP. And that's pretty simple. You got a box; you start up Apache, throw some Perl or PHP in there, you got MySQL. You can do it in a day and be up and running. And now the work I do, what I'm doing at Equinix is we think about a product we're going to ship. We got thousands of companies to think about. And just the complexity space that we work in today is so massive that we have to get to the point where we are enabling the full capacity of humans to think and to act independently because the control structures just aren't going to get us there. We already know that.
Jonan: This is a really good take. I hope to get to a point where I have that perspective; I think to some degree, I do. The part, though, where rules don't solve problems really, I love the expression dancing with systems, by the way.
Amy: It's not mine. I'm horrible at remembering the names for citations, which is why I would absolutely fail in academia. Somebody will probably add it later; we can look it up for the show notes. But I heard it once, and I was like, this is so perfect. You're at New Relic; I’m at Equinix. We could go to any one of these companies, even some of the startups and just look at just how much stuff, the trade-offs we have to make, and the sheer amount of technology that we build every day, and then you put the people systems on top of that, and customer systems, and sales, and marketplaces, and communities, and there's just no hope for one person to understand it all. So what do we do? We dance.
Jonan: I love it. I came from a non-traditional background. I was a poker dealer and a car salesman and all these things. I came into tech, and I remember having this inclination to just learn everything. If I can just read enough documentation, I will be even better. And in the beginning, this is true.
Amy: It is.
Jonan: There's a lot to learn, and you do have to level up. But you get to a point where you have diminishing returns on that. It's not about learning more technology or reading more documentation faster. It's about learning about people and how they interact with each other. Speaking of Equinix, I don't think that we've talked much about what Equinix does. Maybe you have something to share.
Amy: Sure. Most people know of the name Equinix from the older business this 20 something-year-old data center company so colocation and kind of premium. So when you go into an Equinix facility -- I've been in a lot of data centers over the years, and you walk into some of them, and there's a guy at the door, and it's like going to the convenience store. He lets you in, and there are boxes laying around and servers on the floor and wires hanging everywhere. I've been in those data centers, but Equinix data centers are a little different. When you go into them, there's a mantrap; there are handprint readers; you've got to show your ID. They're always very clean, very well wired, and our connectivity is pretty hard to match. So that's Equinix, which owns, I think, 200 and something data centers around the world.
Amy: What's happening is we have this little thing happening out in the world called cloud, and the world is changing. The world of data center infrastructure is changing. Companies don't want to staff, having data center engineers on their payroll anymore. This is the world we're going into is; businesses have to focus on their core competency and stop picking up extra businesses to be in that are undifferentiated. We were talking earlier about how New Relic had some data center space and was looking at the cloud, and there are reasons why a vendor like New Relic would need some on-prem stuff for Federal or different government stuff, old companies security requirements that maybe we don't agree with. The customer there still needs to fulfill their own rules, so there are reasons for it. But what's frustrating as an engineer is I don't want to go into the data center anymore. It's cold, it's a lot of work, it's loud, and I did that stuff in my youth, and it's cool; I love it. But today, I just want an API. I want to hit an API, and I want a server to pop up somewhere. And even early in the call, I was like, you know this VM thing? I just don't like it very much. Performance variability and they’re unreliable. These days they're pretty reliable, you know, that's all changed. But still, there are just things you can do on bare-metal you can't really do as well in a VM. And so there's the market here, and that's what we're going after with Equinix Metal which people used to know as Packet.
Jonan: This is a very succinct summary of a solution to a problem a lot of large corporations have where we can talk about cloud all day long, but there are a certain number of problems that just will not be solved there with what we do. And I'm excited to hear actually that Equinix is the company bringing that to market; that's excellent news. So we were talking a little bit about the dancing with systems piece and about maintaining flexibility. I think that we talk on this show a lot about best practices. Here's what you should be doing: you should have this incident commander, and then you should have this scribe, and then you should...we can talk about that all day. The reality for most people who are out there practicing this stuff is that we have 10% of that at best, and much of it is broken anyway, and we're just doing what we can to get it to work. So I want to hear your thoughts on that.
Amy: Well, I have a few.
Jonan: A couple, huh?
Amy: [Chuckles] Being in operations most of my career, I've been around incidents the whole time, but it was 2015 through 2017 that I was at Netflix on the core team that I really started to get my head around how important both during incident coordination phases and after incident doing the analysis and figuring out how we can change our system. So that first part, like at Netflix, Netflix isn't really begging for money in the streets kind of thing. The technology arm of that company is a tiny, tiny little part of how much money they make and spend. And so that team was extremely well-funded, and it has the best in the world these days like J. Paul Reed is there, Ryan Kitchens, Jessica DeVita. I’m missing some people that I really like, and I hope they're not mad at me.
So what happens there is there are so many microservices that nobody really can keep it all together. And so you have to run in a style where teams run independently, but when you do that, there's still the question of when a team makes a mistake and misses something, or just emergent behavior happens, and something weird that nobody could ever anticipate happens, who catches that when it falls through the system? So the core team responded to alerts on core metrics for the whole company. So how many people are watching Netflix at any given time or able was the only metric we got alerted on. There were a couple of other ones, but they rarely ever alerted. And so if that started to go sideways, we would get alerted, and we would show up and have to figure out what the heck is going on across thousands of microservices, find the right team, get them online, and get it fixed. And so that coordination phase that's where I really learned that and saw the value in it because before it was okay. You get the three or four people who really know how to fix stuff on a call. And we sit down and stare at dashboards and terminals together for a few hours, and we would always get to a solution, and so it didn't seem like a big deal.
But then when you see that many people having to work together at the size of a Netflix or a Google, or really even a lot of medium-sized businesses out there, there's so much to track with any given incident these days. And if they get even a little bit complex, if you just have people milling around at a Slack channel, it's going to take a lot longer to get to a solution, and you're going to miss stuff. You're going to miss things. You’re going to have more mistakes; you're going to have more people going off on wild goose chases. You even have things that happen where people try things to help and make the situation worse. We've all seen that before too. And so it's not the incident commander’s job to stop those things from happening but to make sure that there's common ground among all those people so that it happens less frequently.
It's again; it’s back to dancing with the system. What we're there for is to make sure that everybody has that cadence of this is where we're at and just make sure everybody's always on the same page with status updates, and that's one of the things I think is really important because well, the best part is and this is what I tell all the incident commanders is those regular status updates those aren't really for the team so much because you ask the team and you say, “Hey, where are you at? Go around and tell your subject matter experts.” But what the real magic is is when you start posting regular status updates, the leadership team's anxiety plummets, so instead of them mother-henning and watching the channel trying to help and meddling, and then you get power dynamics involved, and now everything is harder. But by providing that regular post of information to them, now they know what's going on, and good leaders, as long as they got an idea of what's going on, can trust their people to take care of things. And so that's one of the things the incident commander does or the scribe or stuff like that.
But to go a little bit into the theory there, nobody funds a scribe; let's just stop pretending. I hear about Incident Command System all the time in this space. Everybody keeps trying to go like, hey, let's look at Incident Command System. Incident Command System is amazing. It is also huge. It is meant for massive things like wildfires covering tens of thousands of acres of land, and yeah, it is a different space, and it's a command and control routine, and that doesn't work in the field that we're in. Well, it would work, but it's expensive, and none of us have the funding to do it all the way. So we do these boiled-down versions, and so we really have to focus on what the real value is.
Jonan: Yeah, and I want to call out actually another piece in a moment that executive anxiety. But focusing on the value and the outcome is an important piece of all of this in a way that a lot of people don't see upfront. The question you asked in the beginning about Netflix is one metric that you care about is, are people watching or are they able to watch Netflix? That's what drives us at every step is the customer experience. Keep them at the front.
Amy: Are we talking about SLOs now?
Jonan: Well, we’re hinting at SLOS.
Jonan: I gather you have some opinions on SLOs. I want to talk very quickly, though, about this point that you made about the leadership anxiety and managing that. And I wish that I had learned much earlier on in my career how to manage up a little bit better and understand that if you want your boss to back off of you with the micromanaging, then push too much information on your boss.
Jonan: Hey, here's what I'm doing, here's what I'm doing so much that your boss gets bored by these constant updates and looks elsewhere for problems to solve or leadership to apply.
Amy: That works for some personality types, yeah.
Jonan: Yeah, it does, and it varies across the board, but those regular status updates, very calming for leaders. That's a really good point. So yeah, about those SLOs.
Amy: Focus on your customers. And if we look around at where success is in our industry and if we look at perhaps -- there's one company I think is a little bit famous for being customer-obsessed. I'm a little uncomfortable with being obsessed with customers, but I definitely think we should put them front and center as who we're serving when we're building technology, and that's what SLOs are really designed to do is to not just say it and have it be as part of the culture, which is a good thing to do and we all should, but to actually put in a durable process in our businesses that drags our focus back to the customer on a regular basis. If we start to veer in terms of availability and reliability, which is what we're usually measuring with SLOs, we get a signal. We get a ping that says, “Hey, your error budget is disappearing, and you need to go and take care of your customers,” and that's really what it comes down to. The form of it, the actual implementation, I think is going to be unique to every single company out there, just like KPIs. Everybody has to adapt them to the local organization. So every KPI implementation is unique, and every SLO implementation is unique because every business does its processes different.
To me, what's important and why I haven't done it yet in my current gig is because there's some stuff I want to take care of first to set us up for success but really, what I'm doing in the meantime is educating my leadership team, educating ICs and people around me about what's coming with SLOs and why we're going to do them so that when we start spinning this process up, we already have a little bit of buy-in and people already know what it's going to be. And then, I can do things like start working with the product team and start working with engineering leadership to get them ready so that they can own the right pieces of it.
Jonan: This is a really important point in SRE and in work generally. In any business, by the time you have the meeting to announce the thing and talk about the project that you want to do, your convincing should already be done. You should have already had everyone have their input, and they're all on board, and they own it a little bit. The buy-in is there so that when you have the conversation, “Hey, I think we should do this,” --
Amy: Are we just giving away all of my secrets of how I get stuff done or? [Laughs]
Jonan: [Chuckles] Yeah. We're going to go through them all. I think they're important to share. I want to talk about this customer-obsessed thing too. Just a side note, I know we both speak Japanese. You probably know this expression in Japanese. In English, we say the customer is always right. In Japanese, they say ‘Okyakusama wa kamisama desu,’ which means the customer is God.
Jonan: Yeah, which is a little bit more intense a perspective.
Amy: Can't you also translate that to a customer is a demon? [Laughs]
Jonan: So Kamisama is a thing that is like yeah, this honorable spirit kind of role.
Amy: Right. I just like doing that for people who try to make it like it’s always some nice God. It's more complex than that.
Jonan: It is. There are some mean Gods.
Amy: The customer is going to demand respect, and if you don't give it, there's probably going to be consequences.
Jonan: There's going to be a demon.
Jonan: When I sold cars, my boss told us one time, “Look, you treat your customers like family, but the bottom line is there are some people you don't want in your family.” [Laughs]
Jonan: Yeah, a really good manager had our back when people were just out of line with the way they dealt with us. A lot of people have a lot of anxiety coming into that sales process, and some people were just awful to you, just abusive, and having a manager who had your back was really nice. So we were talking about getting buy-in across the organization and giving away all of your secrets.
Jonan: The part that drove this conversation where we have these imperfect systems, and we dance with them anyway I wonder if you have general advice for people coming into this industry about getting used to that because as we talked about in the beginning, it was hard for me, and I know it's hard for a lot of people, in the beginning, to realize that it's not about more rules; it's about just being human and dancing with the system and that the rules aren't going to solve the problem, coming to be okay with that. I feel like there is an anxiety peak a couple of years into tech for a lot of people when they have to realize that. Do you have any advice for people coming into that?
Amy: It's hard. It's been a long time since I came into the industry. And you mentioned it; you can't learn it all. And so the thing that I've been advising young or early-career people I should say is pick a silo and start with it and grow out from it. But pick data science or pick SRE or DevOps or software engineering, product engineering, but pick one of those. Because if you go generally like programming or software engineering, it's such a huge field that we can't learn it all; it's impossible. And so you got to pick something that you can actually dig into and get good at some corner of it and find yourself a little secure space and work outwards into the system from there. That's the general advice. And then going back to the complex systems way of looking at the world, this is a little bit more revolutionary, but really looking at it and saying that the way that we're running companies today, most corporations out there are still run on the same kinds of patterns that go back to the 1920s, and that worked sort of then I say sort of because empires and fortunes were built. If we talk about the industrial revolution, millions of millions of cars were built. They sucked for a long time.
Amy: I don't know if you noticed. I used to work on cars when I was younger and American cars until the late 80s, mid-90s were problematic. They get up to 80,000 miles or 100,000 miles, and they just start falling apart, which was great for me because I was poor at the time, and I could just go buy them broken and fix them. But I think about that a lot. The evolution throughout my lifetime of how now you can go buy just about any brand of car and it's going to last 100,000 to 200,000 miles, that's table stakes now, so the bar has moved. How did that happen? In tech, we can talk about Kanban going back to Japanese again, which came out of the Toyota manufacturing system. And how did Toyota pull ahead in manufacturing in terms of quality and polish? And part of how they did that is they didn't do the silly stuff that we've been doing in the West for so long. We’re on Taylorism, where everything has to be scientific management. There's got to be a number on literally everything. What that does is people just work to the numbers.
Amy: And you don't get the outcomes that you want because they're too complex for the kinds of numbers that everybody's measuring against. If you're trying to build a car by numbers, there are such huge complex systems today that's why we have systems like Kanban and --
Jonan: The cord.
Amy: It has a fancy name, and I can never remember it. [Chuckles]
Jonan: I can't either, but the cord in the Kanban system, there's a cord that anyone on the Toyota production line can pull when they feel like something is off to halt production for the entire plant and alert people to come and investigate.
Amy: And so what that's really about is the people at the sharp end, so this is a resilience engineering term, or I think it comes out of cognitive science engineering. One of these days, I'm going to get you all that real good by some of the folks who really know this stuff.
Jonan: Yes. There are a lot of pedants who are waiting to correct us all, yes.
Amy: The sharp end, right?
Amy: The people building the car, the people digging, the ditches, the people writing the software know what's happening for real. When the people operating our production systems know what's happening in production for real, everybody, even a single layer removed from that, has an abstract, even more, incomplete view of what happens in the real world, how the work actually gets done. And as you move up in the organization, it gets more and more vague, and so we call it the blunt end. You get up to the CEO, and that's as blunt as it gets because if you're a CEO of a multi-thousand-person company, there is no way -- It comes down to information theory again, can't understand all of it.
For new people coming in, that’s one of these things I want people to think about is understanding that our managers and our leadership teams just don't have a clear picture; just accept that. And there's a lot of work to do to get these teams to also accept that, but it starts with somebody coming and saying, “This is actually what I do every day,” and being really clear about what the work really is and communicating outwards about that. And this is one of the things I'm really excited about seeing the growth in DevRel in the world. It's a whole cohort of people whose work it is to communicate more about what the work actually is. One of the grassroots things I can see changing how we work on these huge complex systems is starting to recognize that difference between the person doing the work has an understanding that just isn't available to people higher up and the people higher up understanding that they need to go to the people doing the work to really understand what's happening.
Jonan: And read the status updates and write the status updates, and this is really good advice.
Amy: It’s just back to DevOps. Just talk to people, please.
Jonan: Just talk. It's people all the way down. It turns out software is made of people.
Amy: It really is.
Jonan: Well, thank you so much for coming on the show. Where can people find you on the internet if they want to hear more of these thoughts?
Amy: On Twitter, I'm @MissAmyTobey, and that's my primary platform, so look for me there.
Jonan: Thank you again. I look forward to having you back in a year for the entirely improved state of software. I think this is going to be the year where we–
Amy: We’re going to turn that around, the whole industry.
Jonan: This and the Linux desktop. Can’t wait.
Amy: Absolutely. It's time.
Jonan: It's time. Have a great day.
Amy: You too.
Jonan: Take care.
Thank you so much for joining us. We really appreciate it. You can find the show notes for this episode along with all of the rest of The Relicans podcasts on therelicans.com. In fact, most anything The Relicans get up to online will be on that site. Right now, we're running a hackathon in partnership with dev.to called Hack the Planet, where we're giving away $20,000 in cash prizes along with many other fabulous gifts simply for participating. You'll also find news there shortly of FutureStack, our upcoming conference here at New Relic. We would love to have you join us. We'll see you next week. Take care.