Visual Basic

The Trouble with Teraflops

Imagine a database with 50,000 account records, perhaps for a public utility. Imagine a user requirement has been allowed to slip through the change control net -perhaps it's even genuinely needed-that allows users to search for customers by the first letter of their surnames. Chances are we will get about 2000 hits when we enter "D."

The prototype established that they want to see the hits back in a picklist that shows Name and Full Address.Table 12-2 describes the data fields returned for a single hit.

Field Size (bytes)
ID 4
Surname 50
ForeName 50
AddressLine1 50
AddressLine2 50
AddressLine3 50
Town/City 50
County 30
Postcode InCode 4
Postcode OutCode 4
TOTAL 342

Table 12-1 The fields in the customer surname query

So every time we do one of these searches, we can expect 684 KB across the network for each hit. We might consider sending all 2000 hits back in one go. This way, all 2000 have to arrive before the user interface code can load them in the picklist (probably a nice resource hungry grid control). We could send the rows one at a time, and try and load the first in the user interface as soon as it arrives.

The issue is one of scalability. System constraints are preventing us from doing what we wish to do. The approach we adopt is bound to be a compromise between cosseting bandwidth and providing responsiveness to the user. Since we would actually prefer never to send more than 100 rows back at a time, we can design away the problem.

If the user really must scroll down hundreds of rows, we can send back the first 100, and then indicate there are more. Then-if we really must-we will get the others (although I would really prefer to make them narrow their query) in the background while our user is still looking at the first 100. The alternative of bringing the next lot when the user asks for it (usually by scrolling down a grid or list box in the user interface) is very awkward as a user interface, since the chances are high that the user can scroll faster than we can get the data. Even so, there are likely to be strangely jerky and unexpected delays for the user.

We could consider setting up a special index on the server-a covering index for all the fields we need in the picklist, but the index only shows every 50th row. Thus we bring down the first 100, and if they want more we fetch the index. They choose from the index, and we bring down the 50 that fall between two index entries. Tree views are quite good for this. The new hierarchical cursors in ADO, and the Hierarchical FlexGrid control, are particularly interesting here. If users wants more data than we are prepared to send, we could send down the index and let them narrow things down from that. Other user interface solutions are tabs, with each tab representing an index section, like a phone or address book.

Another question that needs consideration is whether to bring the details of the records back in the picklist, or only the data that is needed for display to do the picking. Suppose that the full size of the record is 700 bytes. This would mean moving 1.4 MB for each picklist. Sounds like a bad idea-imagine the effect on the network.

Once we have the list, how do we work with it? The simplest method is to create objects on an as-needed basis. That is, an object is created and populated with data only after the user has chosen an item from the picklist. It almost never makes sense to create an object and load its data for each entry in a picklist. Our user interface might behave as if that is what we have done, but in reality it will be all a front. There are a number of options for implementation here.