A Conversation with Werner Vogels
JG = Jim Gray, WV = Werner Vogels
http://queue.acm.org/detail.cfm?id=1142065
JG: Can you crystallize what you’ve learned from this? If you had to list three to five lessons learned, what would you say?
WV: The first and foremost lesson is a meta-lesson: If applied, strict service orientation is an excellent technique to achieve isolation; you come to a level of ownership and control that was not seen before. A second lesson is probably that by prohibiting direct database access by clients, you can make scaling and reliability improvements to your service state without involving your clients. Other lessons are related to how you access services: If you want to be able to aggregate services easily, if you want to insert advanced infrastructure techniques such as decentralized request routing or distributed request tracking, you need a single unified service-access mechanism.
Another lesson we’ve learned is that it’s not only the technology side that was improved by using services. The development and operational process has greatly benefited from it as well. The services model has been a key enabler in creating teams that can innovate quickly with a strong customer focus. Each service has a team associated with it, and that team is completely responsible for the service—from scoping out the functionality, to architecting it, to building it, and operating it.
There is another lesson here: Giving developers operational responsibilities has greatly enhanced the quality of the services, both from a customer and a technology point of view. The traditional model is that you take your software to the wall that separates development and operations, and throw it over and then forget about it. Not at Amazon. You build it, you run it. This brings developers into contact with the day-to-day operation of their software. It also brings them into day-to-day contact with the customer. This customer feedback loop is essential for improving the quality of the service.
JG: You spend a fair amount of time recruiting. What’s your recruiting strategy? How do you decide if somebody should work at Amazon?
WV: The Amazon development environment requires engineers and architects to be very independent creative thinkers. We are building things that nobody else has done before, so you need to be able to think outside the box. You need to have a strong sense of ownership, because in the small teams in which you will work at Amazon, your colleagues will count on you to pull your weight—especially when it comes to operating the service that you have built. Can you take responsibility for making this the best it can be?
A very important point is whether candidates think the right way about customers and technology. Technology is useless if not used for the greater good of serving the customer. We are a strongly customer-oriented company, and we often use the “working from the customer backwards” approach. This means that in your thought process, you start with the customer and work your way backwards until you have found the simple and minimal technology that you need to satisfy the new customer requirement. It is important that engineers who come to work at Amazon understand that we do not build technology for technology’s sake, but to support the customer.
JG: You spent time at universities. What do you think about what they’re doing now?
WV: Different groups at Amazon interact with academia. Often a service needs to develop new revolutionary technology from scratch, and they will look at who in the research world worked on these topics before and who can help out.
As an example, at the infrastructure level we are building several systems that are a synthesis of some of the very exciting decentralized computing work that has rocked the operating systems and distributed systems world in the past few years. But we are finding that much of the academic technology is just not complete enough to be applied in real-life systems, as incomplete assumptions were often made. This may have been OK in an academic setting as it allowed for technology evaluation in isolation, but it just doesn’t work in real-world engineering. The limitations of the physical world mean you have to work under certain constraints, regardless of how you would like it to be.
This requires us to do a lot of advanced development to fill in the gaps that these research technologies left behind. As a reward for this extra effort, we do end up with technologies that are extremely robust and are simple to deploy and manage.
JG: Can people in academia help Amazon? What would you say about the current university situation?
WV: I realize that it’s hard in academia to do research at the scale of operation that Amazon requires. So we don’t look to academia to solve those challenges for us. We’re building data sets here at Amazon, however, to provide to academics so that we can get interactions going on some of the issues where they can contribute.
We have a number of internships and sabbatical positions where Ph.D. candidates, as well as professors, can spend some time in a very high-tech production environment. I really urge students to take at least one internship in a nonresearch environment, so that they can start to understand what it means to be effective inventors and how to develop technologies that can be used to build production systems. Doing your research at a research lab is certainly fascinating, but I find that the students who have come to Amazon for an internship find it extremely gratifying to be in the loop of building something real.
I have recently seen a few papers of students detailing experiences with building and operating distributed systems in a planet-lab environment. When analyzing the experiences in these papers, the main point appears to be that engineering distributed systems is an art that Ph.D. students in general do not yet possess. And I don’t mean reasoning about hardcore complex technologies—students are very good at that—but building production-style distributed services requires a whole set of different skills that you will never encounter in a lab. These are skills your professor can’t teach you because he or she never worked outside the lab either. If you really want to learn about building complex robust distributed services, an internship at Amazon will definitely give you that.
The same goes for the few professors who have spent some of their sabbaticals at Amazon. It has been eye-opening for them, and we welcome more of them.