Optimizing Database Applications

Level:
Level3

Like optimization techniques for Visual Basic applications in general, a lot of trees have given their lives to publish information on optimizing database applications. Also like Visual Basic in general, much of the information presented misses the mark on what will have the greatest impact on performance of the program. You cannot ignore basic design fundamentals such as indexing, but there are other factors that will have a greater impact that are often ignored. Here's what I'll be covering:
  • Indexing
    Properly indexing tables can dramatically speed up data access. You need to understand how indexes can be used and what trade-offs are involved in creating an index.
  • The Database Engine
    The database engine can do some things more efficiently and effectively than your application code, so let it do its job while you do yours.
  • Limiting Data Access
    Reducing the amount of data returned by a query can be your most effective tool for improving performance.
  • Unbound Controls
    VB's data control is a convenient tool for building applications, but carries a price tag that can hamper performance and limit the application design.
  • Summary
    A review of the key points.

Indexing


Introduction

Indexing a table speeds access to the data by giving the database engine a set of pointers to rows that it can use to retrieve records without scanning the entire table. There are two fundamental types of indexes that can be created:
  • Unique Index
    In a unique index, each item in the index must be unique within the table. Unique indexes are most commonly used with primary keys, but may also be used with other fields.
  • Non-unique Index
    In a non-unique index, there may be repeating values in the index. Non-unique indexes may be used for foreign key fields or for fields which are commonly used for search and sort operations.
Indexes can also be categorized as clustered and non-clustered. In general, an index is a tree structure. With a clustered index, the actual data pages (rows) are the leaf nodes of the tree. In a non-clustered index, the leaf nodes are pointers to records. This means that the records in a clustered index are physically ordered in the table according to the clustered index. Note that you cannot create a clustered index on a Jet table. They are supported by many other database engines, however, including MS SQL Server.

Determining the Fields to Index

The choice of which fields to index and which not to index is at times simple and obvious and at other times can be largely guess work. Let's look first at some fields which should normally always be indexed.
  • Primary Key
    A primary key by definition is unique within a table, and should always have a unique index assigned. This both allows the database engine to enforce the uniqueness of the data in the field and allows fast access to records when using the primary key field or fiels as the search criteria.
  • Foreign Keys
    Fields used for foreign keys should also be indexed, although in most cases the index will not be unique. This allows for significant performance benefits when joining tables in queries.
  • Query Criteria and Sort Fields
    Here the decision to create an index is somewhat less clear. A good general guideline is that if you are using fields as part of the WHERE or ORDER BY clause for a query that is the record source for a commonly used form or report, you should create an index. Indexing these fields can offer a performance advantage in your application.
For the balance of the fields in the database, the decision of whether or not to create an index is more of an art than a science. In general, you will gain a benefit in retrieval speed if you index fields which are often used for ad-hoc queries and filters.

Useful Indexes

Different database engines may use different methods of determining a plan for executing a query, however, most will follow these general guidelines for when an index is useful. In most cases, these guidelines will apply only to multiple field indexes. Let's take look at an example to illustrate where an index may or may not be helpful:

You have a table Customers with (among others) the fields CustFirst and CustLast, representing the customer's first and last name. Since the data is often sorted in last name, first name order, you create an index idxCustName on the fields CustLast, CustFirst in that order.

  • If you create a query with a criteria or a sort based on the last name, the index can be used.
  • If you create a query and sort by last name, first name, the index can be used.
  • If you create a query with a sort based on first name, last name, the index cannot be used.
Keep in mind also that in this example two separate indexes - one on the CustLast field and one on the CustFirst field - could be used in all three scenarios.

When to Avoid Indexes

While having an index available will nearly always improve retrieval time for records, it will also increase the time required for inserts and updates. If you're application has significant insert and update activity, you may want to limit the number of fields that are indexed and the most heavily used tables. Remember that when an insert or an update is performed, the database engine not only has to write the record to the physical file, it must also update the indexes.

The Database Engine


Visual Basic is a remarkable programming tool. Used in combination with the Jet Data Access Objects, it's relatively easy to create complex database applications. However, there are some things that are better left to the database engine than your application. Primarily this involves issues of data integrity. With the Jet engine, the databases you create can be shared with Microsoft Access in their native form. If you've coded all the rules that apply to the data into your application, its possible for a user to either unintentionally or maliciously circumvent your rules and manipulate the data outside the confines of the application. While it's possible to prevent an end-run around your coded rules by securing the data, you can still introduce violations through errors or omissions in your own code. To avoid data integrity problems, you should enforce as many of the constraints as possible at the database design level. Let's look at the types of rules for data integrity which can be enforced.
  • Entity Integrity
    This type of integrity prevents the insertion of duplicate records into a table and should be enforced through the use of a unique, primary key and a properly normalized database design.
  • Domain Integrity
    This ensures that the values entered into a field fall into an acceptable range or list of valid entries. Domain integrity can be enforced by using foreign key constraints or column level rules.
  • Referential Integrity
    This ensures that there are no foreign key values that point to non-existent records. Referential integrity should be enforce using foreign key constraints (known as relationships in Access).
  • Business Rules
    These are any additional rules which apply to your data and may cross column, row, and even table or database boundaries. Whenever possible, you should enforce business rules at the database design level as well through column or row level rules or foreign key constraints.
It's possible to enforce all of these rules in your application code, but it certainly isn't recommended. Keep in mind that as a database developer, your first responsibility is to the data in the database. While the needs of the end-user are certainly important and the goals of the management must be achieved, the application will be essentially useless if the data it manages cannot be counted upon to be correct.

Another factor to keep in mind is that in most cases it's more efficient from both a performance and a maintenence point of view to let the database engine do it's job enforcing rules against the data. For example, for you to enforce a referential integrity constraint, you would need to create a recordset based on a foreign key value and examine it for matching records - a task the database engine can do with considerably greater efficiency. In the case of something like a column level rule, there may be little performance difference between the database engine and your Visual Basic code, but if you let the database engine enforce the rule, you eliminate the possibility that an error in your code could introduce errors in the data. Additionally, you can choose to take the approach in your code of simply trapping errors from the engine rather than explicitly testing for rules. While you may decide that as a matter of good coding practice that you would prefer to test the data where possible before attempting any data manipulation, with the engine enforcing the rules, you leave open the choice.

Limiting Data Access


Limiting the amount of data returned by a query can be your most powerful tool for improving the performance of database applications. While indexing and a solid database design can help, no index will help much if you regularly run queries that return thousands of records on multiple table joins or try to populate a grid, list box, or combo box with more than a few dozen records. Here are a few of the key areas where restricting the data returned by a query can return big gains in performance:
  • Limit the number of rows returned.
    In most cases, end users will normally be working with only a single row at a time or a few rows displayed in a grid, list, or combo box. There's simply no reason to return all the rows in a table when a user will at most probably only need to work with a handful. Filling list boxes and combo boxes in particular can be tremendously slow if you are returning a lot of records.
  • Don't return more columns than you need.
    Programmers often use SELECT * FROM for simplicity or expediency. Avoid this habit and provide an explicit list of columns limited to those that you will actually display. Memo and binary data in particular are very slow to retrieve and should be eliminated from the results if possible. Most database engines, including Jet, do not actually store memo and binary data in the table, but simply keep a pointer to a linked list of pages where the actual data resides. Forcing the database engine to work its way though the pages of data when the information isn't required is a waste of resources in terms of both clock cycles and memory consumption.
  • Limit the number of joins.
    If you're basing a form on a query with a 15 table join, it's probably time for you to either rethink the form design or denormalize the database. The current generation of database engines are powerful, but not that powerful. Putting anything beyond four or five tables in a query will begin to seriously degrade performance.
  • Avoid sorts.
    If possible, avoid sorting the data returned by queries. Sorts can be slow, especially if the data is not indexed. If you do need to sort the data in a query, try to sort on an indexed field.
  • Limit the rows used for pick lists.
    While its practical and convenient to use a list or combo box to allow the user to pick from a list of a few values, choosing from a list of dozens or hundreds of values is both slow and impractical for the user. If you need to have the user choose a value from a large list, provide some other means which doesn't require populating a large list of rows. A reasonable guideline that I've used is that the largest pick list I'll supply to an application is the postal codes for the U.S. states. If you include territories and Canadian provinces, that's about 65 records. Even this list is longer than I would normally consider reasonable, but its so often needed that it's an acceptable violation.
It doesn't take a database design expert of Visual Basic programming guru to tell that you'll gain performance if you're not moving as much data around. Disk operations in particular are notoriously expensive. Take advantage of the performance benefits available to you by reducing the amount of data you're working with.

Unbound Controls


The data control that ships with Visual Basic is a powerful tool that can make building a database application a much simpler exercise, but it comes with a price tag that will impose performance penalties and restrict your design. If you're building a simple application, the data control may be sufficient for your needs, but if you're building a complex database application, you will almost certainly need to work directly with the data using the Jet Data Access Objects. Once you've learned to use the DAO model, taking the additional step of working exclusively with unbound controls isn't that difficult.

Here are a few of the reasons for using unbound controls:

  • You have explicit control over updates.
    With the data control, the control handles updates for you. While this is convenient, you give up control over how and when the records are updated.
  • You're not limited by query expressions.
    If you need to present data that can't be generated by a normal query expression, you'll need to use unbound controls.
  • You can reduce locking problems.
    By having explicit control over updates, you can avoid many locking problems or at least automate retrying updates when lock conflicts occur.
  • You can wrap form-based operations in transactions.
    The data control uses implicit transactions for inserting and updating records. Using unbound controls, you can allow the user to edit data which hasn't been committed then write the data to the tables when it's appropriate to do so.
Building an application with unbound controls isn't an entirely trivial task, however. Here's some of the extra steps you'll need to take:
  • Populating Controls
    You'll need to write code to retrieve data from the tables and populate the controls on forms. This typically would consist of a few lines of DAO code to retrieve a record and series of statements assigning the values from the fields to the controls.
  • Updating and Inserting Records
    Once the user has completed editing data, you'll need to retrieve the values from the controls and update or insert the records in the table. Retrieving the data from the controls is essentially the opposite of the process of populating the controls. To update the tables, you can use either a recordset and the AddNew and Edit methods with the Update method or use SQL statements to write the data using the Execute method of the Database object. Using a recordset object would allow you to assign control values directly to the fields, but the SQL Execute method is generally faster.
  • Navigation
    You'll need to provide your own tools to let the user navigate between records. If you're maintaining an open recordset for the form, this can be as simple as issuing a Move method against the recordset and calling your procedure to populate the controls. If not, you'll need to open a recordset, retrieve the appropriate record, and then populate.
Clearly there's some extra work involved in writing code to handle unbound controls. However, as a programmer your job is to write code, so you really can't complain if you actually have to do some work. While we've focused mainly on application design factors, keep in mind that working with unbound controls can actually be faster than working with the data control. Not only do you avoid the overhead of including the custom control in the application, but you're also more likely to restrict the amount of data you're working with. Additionally, since you will most often be working with only a single row at a time, there's no need to run queries that return more than a single row. If you need to have a total record count displayed for the form (such as displaying Record x of x in a status bar), you can quickly get a total count by running a query based on the same criteria with only a SELECT Count(*) in the field list.

Summary


Let's briefly review some of the key points:
  • Use indexes to speed up data retrieval.
  • Take advantage of the database engine for enforcing rules.
  • Limit the number of rows returned in queries.
  • Avoid populating list boxes, combo boxes, and grids with large recordsets.
  • Return only the fields you need in queries.
  • Limit the number of joins in queries.
  • Use unbound controls.

Originally written by Joe Garrick

If you enjoyed this post, subscribe for updates (it's free)

This helpful

Thank you. I'll try index my table :)