What habits make a programmer great?


Meta-habit: learn to adopt different habits for different situations. With that in mind, some techniques I've found useful for various situations: 对待不同领域和环境,需要的meta-habit是不同的,甚至有时候是相悖的。了解当前的环境,使用正确的meta-habit.

"Researchey" green-field development for data-science-like problems: 刚开始研究某个领域

  1. If it can be done manually first, do it manually. You'll gain an intuition for how you might approach it.
  2. Collect examples. Start with a spreadsheet of data that highlights the data you have available.
  3. Make it work for one case before you make it work for all cases.
  4. Build debugging output into your algorithm itself. You should be able to dump the intermediate results of each step and inspect them manually with a text editor or web browser.
  5. Don't bother with unit tests - they're useless until you can define what correct behavior is, and when you're doing this sort of programming, by definition you can't.

Maintenance programming for a large, unfamiliar codebase: 如何开始熟悉大型代码仓库

  1. Take a look at filesizes. The biggest files usually contain the meat of the program, or at least a dispatcher that points to the meat of the program. main.cc is usually tiny and useless for finding your way around.
  2. Single-step through the program with a debugger, starting at the main dispatch loop. You'll learn a lot about control flow.
  3. Look for data structures, particularly ones that are passed into many functions as parameters. Most programs have a small set of key data structures; find them and orienting yourself to the rest becomes much easier.
  4. Write unit tests. They're the best way to confirm that your understanding of the code is actually how the code works.
  5. Remove code and see what breaks. (Don't check it in though!)

Performance work: 如何评测和改进性能

  1. Don't, unless you've built it and it's too slow for users. Have performance targets for how much you need to improve, and stop when you hit them.
  2. Before all else (even profiling!), build a set of benchmarks representing typical real-world use. Don't let your performance regress unless you're very certain you're stuck at a local maxima and there's a better global solution just around the corner. (And if that's the case, tag your branch in the VCS so you can back out your changes if you're wrong.)
  3. Many performance bottlenecks are at the intersection between systems. Collect timing stats in any RPC framework, and have some way of propagating & visualizing the time spent for a request to make its way through each server, as well as which parts of the request happen in parallel and where the critical path is.
  4. Profile.
  5. Oftentimes you can get big initial wins by avoiding unnecessary work. Cache your biggest computations, and lazily evaluate things that are usually not needed.
  6. Don't ignore constant factors. Sometimes an algorithm with asymptotically worse performance will perform better in practice because it has much better cache locality. You can identify opportunities for this in the functions that are called a lot.
  7. When you've got a flat profile, there are often still very significant gains that can be obtained through changing your data structures. Pay attention to memory use; often shrinking memory requirements speeds up the system significantly through less cache pressure. Pay attention to locality, and put commonly-used data together. If your language allows it (shame on you, Java), eliminate pointer-chasing in favor of value containment.

General code hygiene: 代码卫生情况

  1. Don't build speculatively. Make sure there's a customer for every feature you put in.
  2. Control your dependencies carefully. That library you pulled in for one utility function may have helped you save an hour implementing the utility function, but it adds many more places where things can break - deployment, versioning, security, logging, unexpected process deaths.
  3. When developing for yourself or a small team, let problems accumulate and fix them all at once (or throw out the codebase and start anew). When developing for a large team, never let problems accumulate; the codebase should always be in a state where a new developer could look at it and say "I know what this does and how to change it." This is a consequence of the reader:writer ratio - startup code is written a lot more than it is read and so readability matters little, but mature code is read much more than it is written. (Switching to the latter culture when you need to develop like the former to get users & funding & stay alive is left as an exercise for the reader.)