If you’ve been following me talk about Method-R, you might wonder how I learned it. In my case, I learned it the “right way”, the “official way”: My employer brought in a consultant to teach a three-day class in performance tuning. Method-R was the method he taught.
It also got me very, very interested in code profiling, which is a way to measure performance at a much more detailed level — not just is it fast enough, but, instead, how long does each piece take?
I don’t remember most of the three days, but I do remember the promise the consultant made on day one, that the class would revolutionize the way we did performance tuning on our teams. He knew the bar was set high, and he wanted to reach for it.
Like I said, the next three days were a blur. Our problem was mostly not pure database queries (“tuning SQL”), but instead the programming language than ran on top of the SQL – The “Programming Language for SQL”, or PL/SQL.
Here’s an example of the kind of performance problems we had in our code:
Function Calculate_Bill(Date BillDate, int customerID) returns integer
int total_price = 0
FOR Each line_ittem in (BillDate, CustomerID)
int billingcode = get_billing_code(CustomerID)
if (billingcode is not 50) total_price = line_item.price+total_price
Notice that if we have 100 line_items, we’ll get billing_code 100 times. If billing code is the slowest operation, we could pull get_billing_code up to the top of the loop, and only execute it first. For that matter, if billingcode is frequently 50, we could only execute the FOR loop if billing code is not 50 – because if billingcode is 50, we can just return zero.
Pulling logic, especially anything that needs to go to disk, out of loops, is classic kind of performance improvement; I learned it two weeks onto my first job in 1997.
What our consultant told us, that was interesting, was that wildly pulling code out of loops can be an incredible waste of time.
On A Real Program
Those seven lines of code might be the bottleneck for the entire application — it could be that get_billing_code is incredibly slow. Or they might not be; perhaps the typical customer only has one or two items a month. The opportunity for improvement might be somewhere else.
The way to find out is to profile the application, to find out how long the software spends on each line of code. Without that, we’re not improving; we’re guessing.
The trainer gave us tools to run our application (which was batch), then found out how many seconds were spent on each line of code. We might find, for example, that 95 minutes (out of 110) are spent in function load_bill, and, when we run load_bill by itself, that 85 minutes are spent in calculate_bill, and within that, 80 minutes are the call to get_billing_code. In that case, yes, we need to fix that function.
Or it might be that of those 110 minutes, two are spent in calculate_bill, and ‘fixing’ that function will have an impact that is not even noticeable.
Without profiling, we don’t know.
Most programming languages have profilers; here’s a typical output from the manual pages for dbprof, the free profiler for Perl 5:
Total Elapsed Time = 1.67 Seconds
User Time = 0.61 Seconds
%Time Seconds #Calls sec/call Name
52.4 0.320 2 0.1600 main::foo
45.9 0.280 200 0.0014 main::bar
0.00 0.000 1 0.0000 DynaLoader::import
0.00 0.000 1 0.0000 main::baz
Imagine running a program, and seeing that kind of output. What would you optimize?
I’m guessing a small improvement in the speed of bar would improve the overall application a great amount, wouldn’t it?
That is the power of profiling.
The Method-R part is making sure that the thing you are profiling is the right thing — the full application, end-to-end. This is easier to do with a batch program, where end to end is hitting the enter key, and than a GUI or Web Application.
We’ll have more on that to come.