Troubleshooting is more of an art form than an exact science. However, to be an efficient and effective troubleshooter, you must approach the problem in an organized and methodical manner. Remember, you are looking for the cause, not the symptom. As a troubleshooter, you must be able to quickly and confidently eliminate as many alternatives as possible so that you can focus on the things that might be the cause of the problem. In order to do this, you must be organized.
Understanding the following five phases of troubleshooting will help you focus on the cause of the problem and lead you to a permanent fix.
Phase I: Define the Problem
The first phase is the most critical and, often, the most ignored. Without a complete understanding of the entire problem, you can spend a great deal of time working on the symptoms instead of the cause. The only tools required for this phase are a pad of paper, a pen (or pencil), and good listening skills.
Listening to the client or coworker (the computer user) is your best source of information. Don't assume that just because you are the expert, the operator doesn't know what caused the problem. Remember, you might know how the computer works and be able to find the technical cause of the failure, but the users were there before and after the problem started and are likely to recall the events that led up to the failure.
Ask a few specific questions to help identify the problem and list the events that led up to the failure. You might want to create a form that contains the standard questions that follow (and other questions specific to the situation) for taking notes.
- When did you first notice the problem or error?
- Has the computer been moved recently?
- Have you made any changes to software or hardware?
- Has anything happened to the computer? Was it dropped or was something dropped on it? Was coffee or soda spilled on the keyboard?
- When exactly does the problem or error occur? During the startup process? After lunch? Only on Monday mornings? After using e-mail?
- Can you reproduce the problem or error?
- If so, how do you reproduce the problem?
- What does the problem or error look like?
- Describe any changes in the computer coinciding with the problem (such as noise, screen changes, lights, and so forth).
Phase II: Zero In on the Cause
The next step involves the process of isolating the problem. There is no particular correct approach to follow, and there is no substitute for experience. The best you can do is to eliminate any obvious problems and work from the simplest problems to the more complex. The purpose is to narrow your search down to one or two general categories. The following table provides 14 possible categories you can use to narrow your search.
|Electrical Power||Electric utility
Intermittent errors on POST.
Device not working/not found.
Properly seated cards (chip/boards)
Front panel wires (lights and buttons)
|Device not working.
Device not found.
Intermittent errors on a device.
CMOS (chip and settings)
Consistent errors on POST.
CMOS text errors.
RAM, hard disk drive, floppy disk drive, video errors.
|Memory||DRAM-proper type and setup
DRAM CMOS settings
SRAM-proper type and setup
SRAM CMOS settings
GPF with consistent addresses.
|Mass storage||Hard disk drives, floppy disk drives, CD-ROM drives, Zip drives, tape drives
Filenames and attributes
"Missing operating system"
"File not found"
"No boot device"
"Abort, Retry, Fail"
Serial port settings
Parallel port settings
Card jumper settings
|System locks up.
Device not responding.
Bizarre behavior from a device.
FCBs (File Control Blocks)
Paths and prompts
External MS-DOS commands
"Missing operating system"
"Bad or missing command interpreter"
"Insert disk with COMMAND.COM"
"Insufficient File Handles"
Knowledge of capabilities
Knowledge of bugs, incompatibilities, work-arounds
|Application doesn't work properly.
Lock-up only in specific application.
|Device drivers||All devices in CONFIG.SYS, SYSTEM.INI, or Registry
|Device lockups on access.
Computer runs in safe mode only.
|Memory management||HIMEM.SYS settings
MSDOS.SYS options (Win95)
Windows resource usage
|"Not enough memory" error.
Missing XMS, EMS memory.
GPFs at KRNL386.EXE.
GPFs at USER.EXE or GDI.EXE.
|Configuration/setup||Files used for initialization
Basic layout of initialized files
|Programs refuse to do something they should.
Missing options in program.
Missing program or device.
Knowledge of virus symptoms
|Computer runs slow.
|Operator Interface||Lack of training/understanding
Fear of the computer
|"I didn't touch it!"
"It always does that!"
|User forgets password.
Cable or NIC card problems.
|Be sure to observe the failure yourself. If possible, have someone demonstrate the failure to you. If it is an operator-induced problem, it is important to observe how it is created, as well as the results.|
Intermittent problems are the most difficult ones to isolate. They never seem to occur when you are present. The only way to resolve them is to be able to re-create the set of circumstances that causes the failure. Sometimes, moving step-by-step to eliminate the possible causes is all you can do. This takes time and patience. The user will have to keep a detailed record of what is being done before and when the failure occurs. In such cases, tell the user to not do anything with the computer when the problem recurs, except to call you. That way, the "evidence" will not be disturbed.
For a totally random, intermittent problem, always suspect the power supply.
Phase III: Conduct the Repair
After you have zeroed in on a few categories, the process of elimination begins.Make a Plan
Create a planned approach to isolating the problem based on your knowledge at this point. Your plan should start with the most obvious or easiest solution to eliminate and move forward. Put the plan in writing!
The first step of any plan should be to document and back up.
If possible, make no assumptions. If you must make any assumptions, write them down. You might need to refer back to them later.Follow the Plan from Beginning to End
Once a plan is created, it is important to follow it through. Jumping around and randomly trying things can often lead to more serious problems.
Document every action you take and its results.
If the first plan is not successful (they won't always be), create a new plan based on what you discovered with the previous plan. Be sure to refer to any assumptions you might have made.Repair or Replace
After locating the problem, either repair or replace the defect. If the problem is software-oriented, be sure to record the "before" and "after" changes.
Phase IV: Confirm the Results
No repair is complete without confirmation that the job is done. Confirmation involves two steps:
- Make sure that the problem no longer exists. Ask the user to test the solution and confirm client satisfaction.
- Make sure that the fix did not create other problems. You have not done a professional job if the repair has been completed at the expense of something else.
Phase V: Document the Results
Finally, document the problem and the repair. There is no substitute for experience in troubleshooting. Every new problem presents you with an opportunity to expand that experience. Keeping a copy of the repair procedure in your technical library will come in handy in a year or two when the problem (or one like it) occurs again. This is one way to build, maintain, and share experience.
The following points summarize the main elements of this lesson:
Learning doesn't stop with certification. To stay at the top of your profession, you must keep learning.
Staying connected with your peers is an important part of learning.
Maintain a proper set of the tools of the trade.
Know where and how to get technical support.
Good troubleshooting requires a plan. To be successful, you must stick to your plan.