Like optimization techniques for Visual Basic applications in general, a lot of trees have given their lives
to publish information on optimizing database applications. Also like Visual Basic in general, much of the
information presented misses the mark on what will have the greatest impact on performance of the program.
You cannot ignore basic design fundamentals such as indexing, but there are other factors that will have
a greater impact that are often ignored. Here's what I'll be covering:
-
Indexing
Properly indexing tables can dramatically speed up data access. You need to understand how indexes can
be used and what trade-offs are involved in creating an index.
-
The Database Engine
The database engine can do some things more efficiently and effectively than your application code, so
let it do its job while you do yours.
-
Limiting Data Access
Reducing the amount of data returned by a query can be your most effective tool for improving performance.
-
Unbound Controls
VB's data control is a convenient tool for building applications, but carries a price tag that can
hamper performance and limit the application design.
-
Summary
A review of the key points.
Introduction
Indexing a table speeds access to the data by giving the database engine a set of pointers to rows that it
can use to retrieve records without scanning the entire table. There are two fundamental types of indexes
that can be created:
-
Unique Index
In a unique index, each item in the index must be unique within the table. Unique indexes are most commonly
used with primary keys, but may also be used with other fields.
Non-unique Index
In a non-unique index, there may be repeating values in the index. Non-unique indexes may be used for foreign
key fields or for fields which are commonly used for search and sort operations.
Indexes can also be categorized as clustered and non-clustered. In general, an index is a tree structure. With
a clustered index, the actual data pages (rows) are the leaf nodes of the tree. In a non-clustered index, the
leaf nodes are pointers to records. This means that the records in a clustered index are physically ordered in
the table according to the clustered index. Note that you cannot create a clustered index on a Jet table. They
are supported by many other database engines, however, including MS SQL Server.
Determining the Fields to Index
The choice of which fields to index and which not to index is at times simple and obvious and at other times
can be largely guess work. Let's look first at some fields which should normally always be indexed.
-
Primary Key
A primary key by definition is unique within a table, and should always have a unique index assigned.
This both allows the database engine to enforce the uniqueness of the data in the field and allows
fast access to records when using the primary key field or fiels as the search criteria.
-
Foreign Keys
Fields used for foreign keys should also be indexed, although in most cases the index will not be unique.
This allows for significant performance benefits when joining tables in queries.
-
Query Criteria and Sort Fields
Here the decision to create an index is somewhat less clear. A good general guideline is that if you are
using fields as part of the WHERE or ORDER BY clause for a query that is the record source for a commonly
used form or report, you should create an index. Indexing these fields can offer a performance advantage
in your application.
For the balance of the fields in the database, the decision of whether or not to create an index is more of
an art than a science. In general, you will gain a benefit in retrieval speed if you index fields which are
often used for ad-hoc queries and filters.
Useful Indexes
Different database engines may use different methods of determining a plan for executing a query, however, most
will follow these general guidelines for when an index is useful. In most cases, these guidelines will apply
only to multiple field indexes. Let's take look at an example to illustrate where an index may or may not be
helpful:
You have a table Customers with (among others) the fields CustFirst and CustLast,
representing the customer's first and last name. Since the data is often sorted in last name, first name order,
you create an index idxCustName on the fields CustLast, CustFirst in that order.
-
If you create a query with a criteria or a sort based on the last name, the index can be used.
-
If you create a query and sort by last name, first name, the index can be used.
-
If you create a query with a sort based on first name, last name, the index cannot be used.
Keep in mind also that in this example two separate indexes - one on the
CustLast field and one on the
CustFirst field - could be used in all three scenarios.
When to Avoid Indexes
While having an index available will nearly always improve retrieval time for records, it will also increase
the time required for inserts and updates. If you're application has significant insert and update activity,
you may want to limit the number of fields that are indexed and the most heavily used tables. Remember that
when an insert or an update is performed, the database engine not only has to write the record to the
physical file, it must also update the indexes.
Visual Basic is a remarkable programming tool. Used in combination with the Jet Data Access Objects, it's
relatively easy to create complex database applications. However, there are some things that are better left
to the database engine than your application. Primarily this involves issues of data integrity. With the Jet
engine, the databases you create can be shared with Microsoft Access in their native form. If you've coded
all the rules that apply to the data into your application, its possible for a user to either unintentionally
or maliciously circumvent your rules and manipulate the data outside the confines of the application. While
it's possible to prevent an end-run around your coded rules by securing the data, you can still introduce
violations through errors or omissions in your own code. To avoid data integrity problems, you should
enforce as many of the constraints as possible at the database design level. Let's look at the types
of rules for data integrity which can be enforced.
-
Entity Integrity
This type of integrity prevents the insertion of duplicate records into a table and should be enforced
through the use of a unique, primary key and a properly normalized database design.
-
Domain Integrity
This ensures that the values entered into a field fall into an acceptable range or list of valid entries.
Domain integrity can be enforced by using foreign key constraints or column level rules.
-
Referential Integrity
This ensures that there are no foreign key values that point to non-existent records. Referential integrity
should be enforce using foreign key constraints (known as relationships in Access).
-
Business Rules
These are any additional rules which apply to your data and may cross column, row, and even table or
database boundaries. Whenever possible, you should enforce business rules at the database design level
as well through column or row level rules or foreign key constraints.
It's possible to enforce all of these rules in your application code, but it certainly isn't recommended.
Keep in mind that as a database developer, your first responsibility is to the data in the database.
While the needs of the end-user are certainly important and the goals of the management must be
achieved, the application will be essentially useless if the data it manages cannot be counted upon to
be correct.
Another factor to keep in mind is that in most cases it's more efficient from both a performance and a
maintenence point of view to let the database engine do it's job enforcing rules against the data. For
example, for you to enforce a referential integrity constraint, you would need to create a recordset
based on a foreign key value and examine it for matching records - a task the database engine can do
with considerably greater efficiency. In the case of something like a column level rule, there may be
little performance difference between the database engine and your Visual Basic code, but if you let
the database engine enforce the rule, you eliminate the possibility that an error in your code could
introduce errors in the data. Additionally, you can choose to take the approach in your code of simply
trapping errors from the engine rather than explicitly testing for rules. While you may decide that as
a matter of good coding practice that you would prefer to test the data where possible before attempting
any data manipulation, with the engine enforcing the rules, you leave open the choice.
Limiting the amount of data returned by a query can be your most powerful tool for improving the performance of
database applications. While indexing and a solid database design can help, no index will help much if you
regularly run queries that return thousands of records on multiple table joins or try to populate a grid, list box,
or combo box with more than a few dozen records. Here are a few of the key areas where restricting the data
returned by a query can return big gains in performance:
-
Limit the number of rows returned.
In most cases, end users will normally be working with only a single row at a time or a few rows displayed in
a grid, list, or combo box. There's simply no reason to return all the rows in a table when a user will at most
probably only need to work with a handful. Filling list boxes and combo boxes in particular can be tremendously
slow if you are returning a lot of records.
-
Don't return more columns than you need.
Programmers often use SELECT * FROM for simplicity or expediency. Avoid this habit and provide an explicit
list of columns limited to those that you will actually display. Memo and binary data in particular are
very slow to retrieve and should be eliminated from the results if possible. Most database engines, including
Jet, do not actually store memo and binary data in the table, but simply keep a pointer to a linked list of
pages where the actual data resides. Forcing the database engine to work its way though the pages of data
when the information isn't required is a waste of resources in terms of both clock cycles and memory consumption.
-
Limit the number of joins.
If you're basing a form on a query with a 15 table join, it's probably time for you to either rethink the
form design or denormalize the database. The current generation of database engines are powerful, but not
that powerful. Putting anything beyond four or five tables in a query will begin to seriously degrade
performance.
-
Avoid sorts.
If possible, avoid sorting the data returned by queries. Sorts can be slow, especially if the data is not
indexed. If you do need to sort the data in a query, try to sort on an indexed field.
-
Limit the rows used for pick lists.
While its practical and convenient to use a list or combo box to allow the user to pick from a list of
a few values, choosing from a list of dozens or hundreds of values is both slow and impractical for the
user. If you need to have the user choose a value from a large list, provide some other means which doesn't
require populating a large list of rows. A reasonable guideline that I've used is that the largest pick
list I'll supply to an application is the postal codes for the U.S. states. If you include territories and
Canadian provinces, that's about 65 records. Even this list is longer than I would normally consider
reasonable, but its so often needed that it's an acceptable violation.
It doesn't take a database design expert of Visual Basic programming guru to tell that you'll gain performance
if you're not moving as much data around. Disk operations in particular are notoriously expensive. Take advantage
of the performance benefits available to you by reducing the amount of data you're working with.
The data control that ships with Visual Basic is a powerful tool that can make building a database application
a much simpler exercise, but it comes with a price tag that will impose performance penalties and restrict
your design. If you're building a simple application, the data control may be sufficient for your needs, but if
you're building a complex database application, you will almost certainly need to work directly with the data
using the Jet Data Access Objects. Once you've learned to use the DAO model, taking the additional step of
working exclusively with unbound controls isn't that difficult.
Here are a few of the reasons for using unbound controls:
-
You have explicit control over updates.
With the data control, the control handles updates for you. While this is convenient, you give up control
over how and when the records are updated.
-
You're not limited by query expressions.
If you need to present data that can't be generated by a normal query expression, you'll need to use unbound
controls.
-
You can reduce locking problems.
By having explicit control over updates, you can avoid many locking problems or at least automate retrying
updates when lock conflicts occur.
-
You can wrap form-based operations in transactions.
The data control uses implicit transactions for inserting and updating records. Using unbound controls,
you can allow the user to edit data which hasn't been committed then write the data to the tables when
it's appropriate to do so.
Building an application with unbound controls isn't an entirely trivial task, however. Here's some of the
extra steps you'll need to take:
-
Populating Controls
You'll need to write code to retrieve data from the tables and populate the controls on forms. This typically
would consist of a few lines of DAO code to retrieve a record and series of statements assigning the values
from the fields to the controls.
-
Updating and Inserting Records
Once the user has completed editing data, you'll need to retrieve the values from the controls and
update or insert the records in the table. Retrieving the data from the controls is essentially the
opposite of the process of populating the controls. To update the tables, you can use either a
recordset and the AddNew and Edit methods with the Update method or use SQL statements to write the
data using the Execute method of the Database object. Using a recordset object would allow you to
assign control values directly to the fields, but the SQL Execute method is generally faster.
-
Navigation
You'll need to provide your own tools to let the user navigate between records. If you're maintaining
an open recordset for the form, this can be as simple as issuing a Move method against the recordset
and calling your procedure to populate the controls. If not, you'll need to open a recordset, retrieve
the appropriate record, and then populate.
Clearly there's some extra work involved in writing code to handle unbound controls. However, as a
programmer your job is to write code, so you really can't complain if you actually have to do some work.
While we've focused mainly on application design factors, keep in mind that working with unbound controls
can actually be faster than working with the data control. Not only do you avoid the overhead of
including the custom control in the application, but you're also more likely to restrict the amount
of data you're working with. Additionally, since you will most often be working with only a single
row at a time, there's no need to run queries that return more than a single row. If you need to have
a total record count displayed for the form (such as displaying Record x of x in a status bar), you
can quickly get a total count by running a query based on the same criteria with only a SELECT Count(*)
in the field list.
Let's briefly review some of the key points:
- Use indexes to speed up data retrieval.
- Take advantage of the database engine for enforcing rules.
- Limit the number of rows returned in queries.
- Avoid populating list boxes, combo boxes, and grids with large recordsets.
- Return only the fields you need in queries.
- Limit the number of joins in queries.
- Use unbound controls.
Originally written by Joe Garrick
Comments
This helpful
Thank you. I'll try index my table :)
Post new comment