All opinions expressed in this webpage are my own. <- My employer made me say that. (Seriously!)
Programming is tough and boring. There is a huge gap between what is taught in school and what happens in the real world. If you are fresh out of school or have just picked up programming, and think you know programming, oh no, you don't.
Real world programs are very big. How big? How about as many files as the number of lines in your biggest toy program? Typically, you only write a small part of it. Parts or much of the code may be "leveraged", meaning it has gone through many hands, is ugly, is poorly documented (or worse, out of sync), and you don't fully understand it. Yet, you are asked to maintain or add features to it, typically given very little time. Also, you may have to integrate two (or more!) sub-systems together. These, along with other not-so-well-advised decisions, it is not surprisingly there are bugs.
I am a firmware R&D engineer in the Business Printing Division (BPD) of Hewlett Packard. I bet most of you have heard of HP, but not BPD. BPD is one division in the Imaging and Printing Group (IPG). IPG is one of the several groups that make up HP. We focus on business-class inkjet printers.
What makes a business-class inkjet printer? Basically, speed, cost-per-page, total-cost-of-ownership, sharing between users, duplexing, multiple trays and so on. On the downside, business-class inkjet printers are usually expensive, big, noisy, ugly. There is also none of the fancy 6-ink, LCD-preview and memory card slot for direct printing.
My primary role is doing in-printer Pen Alignment. It is rather specialized and will bore the general reader to death. However, if you do know something about it, I will gladly discuss this topic with you, within limits. (Have to be careful as intellectual properties abound in this area.)
However, I venture out of this role from time to time, sometimes due to curiosity, sometimes due to the cool-factor, sometimes because I feel it is important, yet no one else bothers, or sometimes being assigned to troubleshoot certain things. I discovered many things in this process. And this is what I want to share.
As mentioned, I am a firmware programmmer. What this means is that we write and compile our program on a development system, then download it to the printer to run. The printer has only a serial output, shared by everyone; your output is constantly messed up by output from other parts of the program. There is no keyboard and debugger. Compilation is slow. Downloading is slow. How do you program effectively? Well, welcome to embedded systems programming. :-)
Embedded systems or not, programming is still programming. Programming bugs transcend systems, programs and languages. Program hangs, memory leaks, mixing two compilers, field failures, high CPU usage, portability, build failures on some development systems only, optimization bugs — everyone's favourite — and so on. I'm sure all of us have our share of the problems.
However, many people take the easy way out and do not track down the bugs. Some do not have the skills and are not interested to learn them. Some use inefficient techniques. I have a different approach. If I care enough, I will attempt to find the root cause.
Non-programmers often wonder how hard can programming be and why things go wrong so easily. Here is why. Writing code is hard. Making it work right is hard. Making it work right when things go wrong is even harder. And troubleshooting is the hardest of all.
Why? Although it is just one program to the end-user, it is really made up of many sub-programs, say 5 to 30. The interaction between them makes it difficult to reproduce the symptoms due to the exact timing and sequence of events. The most common interaction bug is race-condition.
Interaction aside, each sub-program can be thought of a state machine. Each state has a different behaviour and the state changes in response to events. The most common bugs are the so-called corner cases. They are seldom executed and even rarer tested.
Put the state machines and the sub-programs together and you get an infinite combination. This is why programming is difficult.
Think programming is boring? Troubleshooting is even more so, and tedious too. You need to try many cases, many of them similar, and repeat them over and over again. It helps to be patient, persistent, open-minded and cheerful.
Troubleshooting is also known as debugging and testing in different phases of the development.
A good tester always try to reproduce the sequence that leads to the failure. To do this, he must be a good listener — he must get all the symptoms and the steps that lead to the failure. And then he tries to reproduce the bug. This is a trial-n-error process. Domain knowledge, heuristics and luck all play a role. In my opinion, it is impossible to give a definite date to this step, despite what managers want to think.
Once it is possible to reproduce the bug, even if under some circumstances only, it is time to narrow it down and shorten the steps, the simpler, the better. This is because we need to reproduce the bug again and again in the process of solving it.
It is said a good tester would make a good programmer, but not vice-versa. I agree. One, nothing is more difficult than troubleshooting; two, programming is 30% coding and 70% troubleshooting; so if you are good at troubleshooting, you will be good at programming. Helping other people troubleshoot will make you a better programmer. :-)
I want to achieve the reputation as a bug-killer. This has nothing to do with company objectives or project goals. This is my personal obsession. :-)
The key thing is to pick my battles carefully. Given limited resources and time, you want to solve problems that have the most impact. After all, time spent on this additional task means I am not working on my assigned job, and that is bad. However, I don't want to troubleshoot problems that are too simple. There is no point since it doesn't enhance my reputation. However, it is useful to note how other people solve their problems. Sometimes their solution is incomplete, misses the cause, or is done wrongly. These happen more often than you think, for one reason or another.
No. I do not give a detailed walkthrough of the program, only a general breakdown. The development process is covered in greater detail, but all you will learn is how inefficient we are. If nothing else, I hope this will push the relevant folks to improve the development process.
(void *) &NHY;