Rule 1. You can’t tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don’t try to second guess and put in a speed hack until you’ve proven that’s where the bottleneck is.
Rule 2. Measure. Don’t tune for speed until you’ve measured, and even then don’t unless one part of the code overwhelms the rest.
Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don’t get fancy. (Even if n does get big, use Rule 2 first.)
Rule 4. Fancy algorithms are buggier than simple ones, and they’re much harder to implement. Use simple algorithms as well as simple data structures.
Rule 5. Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.
Number 5 reminds me of this Linus Torvalds quote:
…git actually has a simple design, with stable and reasonably well-documented data structures. In fact, I’m a huge proponent of designing your code around the data, rather than the other way around, and I think it’s one of the reasons git has been fairly successful […] I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships.
It’s nice to see an argument that doesn’t boil down to ‘more of X is better’.
This reminds me of something I heard on the radio many years ago, paraphrasing:
A fundamentalist is someone who reacts to contradictory information by saying “we need to X harder/more”
Very interesting. The key is that eBPF programs can be attached to tracepoints.
I’ve been doing some experimenting with Google Cloud Dataflow recently. This has gotten me interested in systems in this area. Here are two good related papers.
I believe the first is the basis for Cloud Dataflow and Apache Beam.
What I love about both of these is the work to find new abstractions that completely simplify the user facing model.