When working with databases many acronyms and terms are thrown around. Often this can make people feel very overwhelmed. This article will help you greatly. It explains what a database is and how you use it effectively.
Before moving on, lets describe the data that we wish to record for each component of the application:
The solution to this problem is to use a relational model for the data. Don't let the terminology get you down - the concept isn't that hard to understand. It simply means that in this example each order entered is related to a customer record, and each line item is related to an order record. A relational database management system (RDBMS) is then a piece of software that manages groups of records which are related to one another. Let's take our flat file and break it up into three tables: Customers, Orders, and OrderDetails. The fields are just as they are shown above, with a few additions. To the Orders table, we will add a Customer Number field, and to the OrderDetails table we will add an Order Number field. Here's the list again with the required additional fields and modified field names.
OK, back to the tables. What we've done besides the name change is to add fields to the Orders and OrderDetails tables. Each have key fields used to provide a link to the associated Customers and Orders records, respectively. These additional fields are called foreign keys.
There are two types of key fields we are dealing with: primary keys and foreign keys. A primary key is a field that uniquely identifies a record in a table. No two records can have the same value for a primary key. Each value in a primary key will identify one and only one record. A foreign key represents the value of primary key for a related table. Foreign keys are the cornerstone of relational databases. In the Orders table, the OrdCustID field would hold the value of the CustID field for the customer who placed the order. By doing this, we can attach the information for the customer record to the order by storing only the one value. We'll discuss how to put the data back together again next.
One of the inherent problems of any type of data management system is that ultimately the human users of the system will only be able to view data in two dimensions, which in the end become rows and columns in a table either on the screen or on paper. While people can conceptualize objects in three dimensions, its very difficult to represent detail data in anything other than a flat table. After all the effort we went through to break down the original flat file into three tables, we are now going to undo that effort and make a flat file again.
We're going to accomplish this amazing feat of backwards progress by using queries. A query is simply a view of data which represents the data from one or more tables. Lets say we want to see the orders placed by our customers. We can link the Customers and Orders tables using the CustID field from Customers and the OrdCustID field from Orders - remember, the value of the OrdCustID field represents a related record in the Customers table and is equal to the CustID value from that record. By joining together the two tables based on this relationship, we can add fields from both tables and see all orders along with any pertinent customer data.
Let's take the earlier example and build a query to look at customer orders. Here's the SQL for it:
SELECT CustName, CustCity, CustState, OrdDate FROM Customers INNER JOIN Orders ON Customer.CustID = Orders.OrdCustID;That wasn't too tough. Lets look in a little more detail. This query starts with the SELECT keyword. Most of the queries you'll be building will be SELECT queries. SELECT simply means that we wish to "select" records, or retrieve records from the tables. Following the SELECT keyword is the list of fields. Next comes the FROM keyword. This is used to indicate where the data is coming from. In this case, its coming from the Customers table and the Orders table. The key to this query is the INNER JOIN. There are two basic types of joins which can be done between tables: inner joins and outer joins. An inner join will return records for which only the matching fields in both tables are equal. An outer join will return all the records from one table, and only the matching records from the other table. Outer joins are further divided into left joins and right joins. The left or right specifies which side of the join returns all records. The balance of the example query specifies which fields are used to join the table. In this case we are matching the CustID field from Customers to the OrdCustID field (the foreign key) in Orders.
One thing that should be noted is that this is Jet SQL. Each RDBMS has its own particular dialect of SQL, just as Visual Basic is derived from some original BASIC language somewhere, Jet SQL is a variation of SQL particular to Microsoft's Jet database engine. For a complete description of the features of Jet SQL, search the VB help files for the topics "Jet SQL" and "reserved word, Jet".
Database applications have several choices available for enforcing referential integrity, but if possible, you should let the database engine do its job and handle this for you. The latest advanced database engines allow you to use declarative referential integrity. You specify a relationship between tables at design time, indicating if updates and deletes will cascade through related tables. If cascading updates are enabled, changes to the primary key in a table are propagated through related tables. If cascading deletes are enabled, deletions from a table are propagated through related tables.
Looking again at our order entry system, if cascading updates are enabled, a change to the CustID for a Customers table record would change all of the related OrdCustID values in the Orders table. If cascading deletes are enabled, deleting a record from Customers would delete any related records in the Orders table. In contrast, if cascading updates or deletes are not enabled, you would be prevented from changing the primary key or deleting a record from Customers if any related records exist in the Orders table.
Keep in mind also that if you have enforced referential integrity in the relationship between Orders and OrderDetails, this relationship can also have an effect on your ability to manage records in Customers. Just as you can't delete a customer with orders, neither can you delete an order with detail items. The result is passed along as far as necessary. If you cascade deletes from Customers to Orders, but not from Orders to OrderDetails, you will be prevented from deleting a record in Customers if there are any Orders records which have related OrderDetails records.
Before you go ahead and enable cascading deletes on all your relationships keep in mind that this can be a dangerous practice in some situations. Lets say you have a table called States, which lists the U.S.P.S two letter state abbreviation for each of the states in the country, along with the full name of the state. You use this table as a lookup table and to enforce the validity of the state entered in the Customers table. If you define a relationship between the States table and the Customers table with cascading deletes enabled, then delete a record from States, you will delete all Customers table records where the customer is located in that state. In most cases, I've chosen to let my applications handle cascading deletes. This gives me a bit of a buffer against errors in the application and helps to prevent the loss of data. The database engine will prevent deletions if related records exist, forcing me to account for those records explicitly.
If you look at the Customers table, you can see that it isn't really necessary to include the CustCity and CustState fields since a US ZIP Code uniquely defines a city and state in the US. I'm not aware of how postal codes in other countries work, but I'm going to guess that a postal code will in most cases define a city uniquely enough to get the mail delivered. However, when taken to extremes, you'll pay a performance penalty for excessive normalization. If you were to fully normalize the Customers table, you would need to remove the CustCity and CustState fields and create a table, perhaps called ZIPCodes, which included these fields, then include only the CustZIP field and join the Customers table to the ZIPCodes table in order to reconstruct the full address. The problem with this is that you add the overhead of an additional join in every query where you need to have the full address available.
There aren't any hard and fast rules for when to stop normalizing a database. You need to make your own choices based on the practicality of the data structures and the performance trade-offs involved. If possible, you should at least try to design the application so that you can restructure the data to accomodate normalizing or denormalizing the tables.
For a more in-depth discussion of building a relational database, you can also visit my Database Design Fundamentals page.
Originally written by Joe Garrick