Building Data Science Teams
http://radar.oreilly.com/2011/09/building-data-science-teams.html
Everyone wants to build a data-driven organization. It’s a popular phrase and there are plenty of books, journals, and technical blogs on the topic. But what does it really mean to be “data driven”? My definition is:
A data-driven organization acquires, processes, and leverages data in a timely fashion to create efficiencies, iterate on and develop new products, and navigate the competitive landscape.
I’ve found that the strongest data-driven organizations all live by the motto “if you can’t measure it, you can’t fix it” (a motto I learned from one of the best operations people I’ve worked with). This mindset gives you a fantastic ability to deliver value to your company by:
- Instrumenting and collecting as much data as you can.Whether you’re doing business intelligence or building products, if you don’t collect the data, you can’t use it.
- Measuring in a proactive and timely way. Are your products, and strategies succeeding? If you don’t measure the results, how do you know?
- Getting many people to look at data. Any problems that may be present will become obvious more quickly — “with enough eyes all bugs are shal- low.”
- Fostering increased curiosity about why the data has changed or is not changing. In a data-driven organization, everyone is thinking about the data.
The Roles of a Data Scientist
- Decision sciences and business intelligence
- One critical aspect of decision-making support is defining, monitor-ing, and reporting on key metrics.
- Once metrics and reporting are established, the dissemination of data is es- sential. As tools get more sophisticated, they typically add the ability to annotate and manipulate (e.g., pivot with other data elements) to provide additional insights.
- Data isn’t just the property of an analytics group or senior man- agement. Everyone should have access to as much data as legally possible.
- building predictive models that can be tested against existing data or data that needs to be acquired.
- One word of caution: people new to data science frequently look for a “silver bullet,” some magic number around which they can build their entire system. If you find it, fantastic, but few are so lucky. The best organizations look for levers that they can lean on to maximize utility, and then move on to find additional levers that increase the value of their business.
- Product and marketing analytics
- Fraud, abuse, risk and security
- Data services and operations
- Data engineering and infrastructure
Organizational and reporting alignment
As vague as that answer is, here are the three lessons I’ve learned:
- If the team is small, its members should sit close to each other. There are many nuances to working with data, and high-speed interaction between team members resolves painful, trivial issues.
- Train people to fish — it only increases your organization’s ability to be data driven. As previously discussed, organizations like Facebook and Zynga have democratized data effectively. As a result, these companies have more people conducting more analysis and looking at key metrics. This kind of access was nearly unheard of as little as five years ago. There is a down side: the increased demands on the infrastructure and need for training. The infrastructure challenge is largely a technical problem, and one of the easiest ways to manage training is to set up “office hours” and schedule data classes.
- All of the functional areas must stay in regular contact and communica- tion. As the field of data science grows, technology and process innova- tions will also continue to grow. To keep up to date it is essential for all of these teams to share their experiences. Even if they are not part of the same reporting structure, there is a common bond of data that ties every- one together.
What Makes a Data Scientist?
The term that seemed to fit best was data scientist: those who use both data and science to create something new.
But how do you find data scientists? Whenever someone asks that question, I refer them back to a more fundamental question: what makes a good data scientist? Here is what I look for:
- Technical expertise: the best data scientists typically have deep expertise in some scientific discipline.
- Curiosity: a desire to go beneath the surface and discover and distill a problem down into a very clear set of hypotheses that can be tested.
- Story telling: the ability to use data to tell a story and to be able to com- municate it effectively.
- Cleverness: the ability to look at a problem in different, creative ways.
These are some examples of training that hone the skills a data scientist needs to be successful:
- Finding rich data sources.
- Working with large volumes of data despite hardware, software, and bandwidth constraints.
- Cleaning the data and making sure that data is consistent.
- Melding multiple datasets together.
- Visualizing that data.
- Building rich tooling that enables others to work with data effectively.
And experiences like my own suggest that the best way to become a data scientist isn’t to be trained as a data scientist, but to do serious, data-intensive work in some other discipline.
Hiring data scientists was such a challenge at every place I’ve worked that I’ve adopted two models for building and training new hires. First, hire people with diverse backgrounds who have histories of playing with data to create some- thing novel. Second, take incredibly bright and creative people right out of college and put them through a very robust internship program.
=Hiring and talent=
(内容非常好,而且和构建数据科学团队关系其实不大,比较有普遍意义)
Many people focus on hiring great data scientists, but they leave out the need for continued intellectual and career growth. These key aspects of growth are what I call talent growth.
Would we be willing to do a startup with you? (你是否合适加入) This is the first question we ask ourselves as a team when we meet to evaluate a candidate. It sums up a number of key criteria:
- Time: If we’re willing to do a startup with you, we’re agreeing that we’d be willing to be locked in a small room with you for long periods of time. The ability to enjoy another person’s company is critical to being able to invest in each other’s growth.(双方必须给予足够时间来进行工作和沟通)
- Trust: Can we trust you? Will we have to look over your shoulder to make sure you’re doing an A+ job? That may go without saying, but the reverse is also important: will you trust me? If you don’t trust me, we’re both in trouble.(双方必须相互信任)
- Communication: Can we communicate with each other quickly and effi- ciently? If we’re going to spend a tremendous amount of time together and if we need to trust each other, we’ll need to communicate. Over time, we should be able to anticipate each other’s needs in a way that allows us to be highly efficient.(高效沟通)
Can you “knock the socks off” of the company in 90 days? (俗话说的试用期?)
- Once the first criteria has been met, it’s critical to establish mechanisms to ensure that the candidate will succeed. We do this by setting expectations for the quality of the candidate’s work, and by setting expectations for the velocity of his or her progress.
- First, the “knock the socks off” part: by setting the goal high, we’re asking whether you have the mettle to be part of an elite team. More importantly, it is a way of establishing a handshake for ensuring success. That’s where the 90 days comes in. A new hire won’t come up with something mind blowing if the team doesn’t bring the new hire up to speed quickly. The team needs to orient new hires around existing systems and processes. Similarly, the new hire needs to make the effort to progress, quickly. Does this person ask questions when they get stuck? There are no dumb questions, and toughing it out because you’re too proud or insecure to ask is counterproductive. Can the new hire bring a new system up in a day, or does it take a week or more? It’s important to understand that doing something mind-blowing in 90 days is a team goal, as much as an individual goal. It is essential to pair the new hire with a suc- cessful member of the team. Success is shared.
- This criterion sets new hires up for long-term success. Once they’ve passed the first milestone, they’ve done something that others in the company can rec- ognize, and they have the confidence that will lead to future achievements. I’ve seen everyone from interns all the way to seasoned executives meet this crite- rion. And many of my top people have had multiple successes in their first 90 days.
In four to six years, will you be doing something amazing? (长期规划)
- What does it mean to do something amazing? You might be running the team or the company. You might be doing something in a completely different dis- cipline. You may have started a new company that’s changing the industry. It’s difficult to talk concretely because we’re talking about potential and long- term futures. But we all want success to breed success, and I believe we can recognize the people who will help us to become mutually successful.
- With each new generation of professionals, the number of organiza- tions and even careers has increased. So rather than fight it, embrace the fact that people will leave, so long as they leave to do something amazing. What I’m interested in is the potential: if you have that potential, we all win and we all grow together, whether your biggest successes come with my team or somewhere else.
- Finally, this criteria is mutual. A new hire won’t do something amazing, now or in the future, if the organization he or she works for doesn’t hold up its end of the bargain. The organization must provide a platform and opportunities for the individual to be successful. Throwing a new hire into the deep end and expecting success doesn’t cut it. Similarly, the individual must make the com- pany successful to elevate the platform that he or she will launch from.
Building the LinkedIn Data Science Team
What I found really surprised me. The companies all had fantastic sets of employees who could be considered “data scientists.” However, they were uniformly discouraged. They did first-rate work that they considered critical, but that had very little impact on the or- ganization. They’d finish some analysis or come up with some ideas, and the product managers would say “that’s nice, but it’s not on our roadmap.” As a result, the data scientists developing these ideas were frustrated, and their or- ganizations had trouble capitalizing on what they were capable of doing.
It’s important that our data team wasn’t comprised solely of mathematicians and other “data people.” It’s a fully integrated product group that includes people working in design, web development, engineering, product marketing, and operations. The silos that have traditionally separated data people from engineering, from design, and from marketing, don’t work when you’re building data products.
Interaction between the data science teams and the rest of corporate culture is another key factor.
But it’s a mistake to treat data science teams like any old product group. (It is probably a mistake to treat any old product group like any old product group, but that’s another issue.) To build teams that create great data products, you have to find people with the skills and the curiosity to ask the big questions. You have build cross-disciplinary groups with people who are comfortable creating together, who trust each other, and who are willing to help each other be amazing. It’s not easy, but if it were easy, it wouldn’t be as much fun.