Using Industrial Strength Database Features
Data integrity, in an overwhelmingly small nutshell, means that an operation on one piece of data cannot invalidate, orphan, or otherwise corrupt other pieces of data in a database. Data integrity can be enforced in many ways, but I'll just cover the basics here:
Enforcing Data Integrity In the Code
This is what currently goes on more regularly than most people would probably like to admit. This is probably due in large part to the fact that in many open source projects, there is no 'proper' database administrator (DBA) involved in any phase of the project. The developer is the DBA!
Here's an example of enforcing data integrity 'in the code' - don't fret - you won't have to be a coder to get this:
You have a database with 3 database tables, named 'people', 'places' and 'things', respectively. Every person in the 'people' table lives in a 'place' and has one or more 'things'. Even if you've done only simple database applications, you can probably envision that there's an 'id' field in the 'things' table, which corresponds to an 'id' in the 'people' table, so you can match up people with the things they own. But what happens when a person goes away and you need to delete the record in 'people' for this person? Quite naturally, you'd need to also delete their 'things'! In a large many software packages, this means the application does a database call, saying 'DELETE FROM people WHERE id = 1'. The application then does an additional database call, saying 'DELETE FROM things WHERE id = 1'. This is not to even mention what to do about the 'places' table!
If you've seen similar situations in your applications, and you've handled similar problems in the same way, you have, indeed, enforced data integrity in your own code. You have also (quite possibly unwittingly) done the database's job, and made your own job more complex as a result. Enforcing data integrity solely through the use of front end application code will heretofore be referred to as 'the wrong way'.
Enforcing Data Integrity in the Database
Enforcing data integrity is a job that has fallen squarely on the shoulders of the back end database for many years. The simplest way to handle this task in the most database-agnostic way possible is through the use of foreign keys. In older releases of MySQL, foreign keys are only supported at the syntax level. What this means is that defining a foreign key in the SQL code won't cause MySQL to make loud belching noises, but the relationships you meant to form with them are not enforced.
In some newer 3.x versions of MySQL using InnoDB tables, some functionality related to foreign keys has been supported. In the 4.x versions and upward, foreign keys are starting to become more usable. PostgreSQL has had very good, mature support for foreign keys for some time. No matter which database you use, you should be using foreign keys. I'll refer back to our 3 example tables to explain how they basically work.
In the earlier example, you had to code all of the checks involved in making sure that deleting a record in one table didn't invalidate data in other tables. Now, with a foreign key, things are a bit smoother. You define the 'id' field in your 'things' table as a foreign key referencing the corresponding 'id' field in the 'people' table. You add a restriction to the foreign key definition along the lines of 'ON DELETE CASCADE' or 'ON DELETE RESTRICT', and with those three words, you've potentially saved yourself many lines of front-end code.
In the first case, a DELETE operation on the 'people' table will 'cascade' down to the child table, such that all of the records in 'things' that had the same key value will also be deleted. In the second case, your database will throw an error if you try to delete from the 'people' table someone who still has 'things' in the database.
There's not nearly enough room here to go into a detailed discussion on how foreign keys make your life easier. However, if you've sifted through huge lists of database features not knowing which ones were really important, foreign keys, to most administrators, rank right up there with... I dunno... tables?!
Views are one of the DBA's best friends. Sure, you can string together some subselects and fancy SQL hacks to sort of emulate the functionality of a view, but it's not nearly as clean and elegant, and it's not nearly as easy to use or maintain. MySQL 5.0 or 5.1 will reportedly have views. PostgreSQL already has usable, though not perfect, support for views.
You can think of views as a 'virtual table' that stores a particularly useful view of your data. So, for example, using our imaginary tables again, if you had an application that listed all of the things in a particular place, you would generally do a join on the two tables involved, and select a subset of the data from the result of the join. You would repeat this activity every time you needed data that depended on fields from more than one table... ad nauseum.
Instead, what you can do is basically store this join in the database as a view, and use a much simpler 'SELECT' statement - against the view itself. A view, to an application, looks, smells and acts just like a table. It doesn't know the difference. It's quite a lot like assigning a value (in the form of an SQL statement) to a variable (in the form of the view name), and then calling the variable to get some subset of the data it holds.
If you're thinking this sounds like an abstraction layer over your 'base' data tables, you've pretty much got it. As an application developer, this allows you the freedom to make changes to the base tables between releases for the purpose of adding functionality (or removing it), without greatly affecting the front-end code, without the need for a complex install routine that has to migrate tons of data to an entirely new database design, and without requiring your users to bear the pain of some crazy, manual data migration. As long as the view of the data that the application needs can be maintained, everything should 'just work'(tm). You could even rewrite the view definitions if necessary, to allow for name changes in the base tables or columns. As long as the end result looks like the same table the application has always called upon, your code can remain unchanged.
For applications (or parts of applications) that provide 'read only' services to the end user (such as report generation), views also act as a layer of security for your base tables. While your base tables pretty much have to be read-write, your views can certainly be made 'read only'. In addition, in the event that the application is somehow hijacked, the malicious user would have a tough time finding a way to write to a read-only view, and guessing the base table names would be quite difficult.
In conclusion I do hope that this information provides you with some inspiration to go off and find ways to exploit (slightly) more advanced features of the back end to help your applications be more robust, more secure, faster, and easier to maintain and develop. While you could get by treating a database as an arbitrary collection of tables, with some research, you'll find many features that are easy to use, and have a very large impact on your application.
Indeed, it is entirely possible, with the use of views, temporary tables, stored procedures, triggers, views and other advanced database features, to offload entire sections of an applications functionality to the database. Furthermore, though this may sound extremely impractical, some corporate policies greatly restrict the operations that can be directly performed on data using front-end application code, preferring to have those applications call stored procedures and other back-end functionality developed by the database administration and development team. The reasons for this revolve only partially around performance and flexibility. Other reasons revolve around accountability and data security, or the fact that they want their C-coders to code C, not SQL - so it's easier to call a function than write the SQL the function represents.
This is just a light intro. There is much more to discover. I will leave you simply by saying that those who develop database software build in advanced and, at times, complex features for a reason. Being that databases are pretty useless without applications to access them, it stands to reason that some of these features might be useful to application developers. Do your research, and use 'the right tool for the job'.