On the Web today, content is king. After you’ve mastered HTML and learned a few neat tricks in JavaScript and Dynamic HTML, you can probably build a pretty impressive-looking Web site design. But then comes the time to fill that fancy page layout with some real information. Any site that successfully attracts repeat visitors has to have fresh and constantly updated content. In the world of traditional site building, that means HTML files–and lots of ‘em.
The problem is that, more often than not, the people providing the content for a site are not the same people handling its design. Oftentimes, the content provider doesn’t even know HTML. How, then, is the content to get from the provider onto the Web site? Not every company can afford to staff a full-time Webmaster, and most Webmasters have better things to do than copying Word files into HTML templates anyway.
Maintenance of a content-driven site can be a real pain, too. Many sites (perhaps yours?) feel locked into a dry, outdated design because rewriting those hundreds of HTML files to reflect a new design would take forever. Server-side includes (SSI’s) can help alleviate the burden a little, but you still end up with hundreds of files that need to be maintained should you wish to make a fundamental change to your site.
The solution to these headaches is database-driven site design. By achieving complete separation between your site’s design and the content you are looking to present, you can work with each without disturbing the other. Instead of writing an HTML file for every page of your site, you only need to write a page for each kind of information you want to be able to present. Instead of endlessly pasting new content into your tired page layouts, create a simple content management system that allows the writers to post new content themselves without a lick of HTML!
In this 10-part weekly series of articles, I’ll provide a hands-on look at what’s involved in building a database-driven Web site. We’ll be using two new tools for this: the PHP scripting language and the MySQL relational database. If your Web host provides PHP/MySQL support, you’re in great shape. If not, we’ll be looking at the set-up procedures under Unix and Windows, so don’t sweat it.
These articles are aimed at intermediate or advanced Web designers looking to make the leap into server-side programming. You’ll be expected to be comfortable with HTML, as I’ll be making use of it without explanation. A teensy bit of JavaScript may serve us well at some point, but I’ll be sure to keep it simple for the uninitiated.
By the end of this series, you can expect to have a grasp of what’s involved in setting up and building a database-driven Web site. If you follow along with the examples, you’ll also learn the basics of PHP (a server-side scripting language that allows you to do a lot more than access a database easily) and Structured Query Language (SQL — the standard language for interacting with relational databases). Most importantly, you’ll come away with everything you need to get started on your very own database-driven site in no time!
- Part 1: Installation
- Part 2: Getting Started with MySQL
- Part 3: Getting Started with PHP
- Part 4: Using PHP to access a MySQL database
Parts 5-10 are only available in print from SitePoint.com.
- Part 5: Relational Database Design
- Part 6: A Content Management System
- Part 7: Content Formatting and Submission
- Part 8: MySQL Administration
- Part 9: Advanced SQL
- Part 10: Advanced PHP
Guaranteeing Data Integrity with MySQL 5.0
The IT crown jewels of any organization lies in their database infrastructure. Obviously, well-designed front end applications play a leading role when it comes to ensuring that critical information is captured or reviewed, but the buck stops at the database when it comes to data retention and protection.
Because such things are the database’s responsibility, it is important that the underlying engine act in ways that preserve the integrity and validity of the data. This means the RDBMS should stand guard and deny entry of any and all invalid data through the welcome gates of the actual database.
For users of MySQL, this issue of server-enforced data integrity has been somewhat of a blessing and a curse. Traditionally, MySQL has taken the position that the front-end application should bear the responsibility of validating incoming data. There are countless users of MySQL who are completely fine with this approach, and many who actually prefer this form of data validation.
However, the meteoric rise in MySQL’s popularity has brought many database “switch out” projects where MySQL is replacing proprietary or other open source database back ends where data integrity enforcement is expected at the database engine layer and not at the application layer. Most hard-core database professionals expect a database to validate incoming data and reject any data that is of the wrong datatype or data that violates certain data integrity rules.
The good news is that MySQL 5.0 now offers server-enforced data integrity. While the database server can still be used in its former mode of operation, database professionals wanting the RDBMS to be the ultimate authority with respect to accepting or rejecting data into/from the database can now easily make this happen in 5.0.
How Things Used to Be
Prior to version 5.0, MySQL handled incoming invalid data in a manner that most non-MySQL users wouldn’t expect. The best way to demonstrate this is through the following example. Let’s create a table with several different datatypes and then insert some bad data into the table to see how MySQL handles it.
First, let’s create our table and insert some clean data:
mysql> create table mytest (c1 int, c2 varchar(10), c3 date) engine=myisam; Query OK, 0 rows affected (0.00 sec) mysql> insert into mytest values (1,'testing',now()); Query OK, 1 row affected (0.00 sec) mysql> select * from mytest; +------+---------+------------+ | c1 | c2 | c3 | +------+---------+------------+ | 1 | testing | 2005-11-29 | +------+---------+------------+ 1 row in set (0.00 sec)
So far so good. But now let’s use MySQL’s multi-insert capability (a great feature by the way…!) to run some invalid data through each column in our table and see what happens:
mysql> insert into mytest values
-> ('bad number','testing',now()),
-> (1,1,now()),
-> (1,'testing','bad date');
Query OK, 3 rows affected, 2 warnings (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 2
mysql> show warnings;
+---------+------+------------------------------------------------------+
| Level | Code | Message |
+---------+------+------------------------------------------------------+
| Warning | 1264 | Out of range value adjusted for column 'c1' at row 1 |
| Warning | 1265 | Data truncated for column 'c3' at row 3 |
+---------+------+------------------------------------------------------+
2 rows in set (0.00 sec)
mysql> select * from mytest;
+------+---------+------------+
| c1 | c2 | c3 |
+------+---------+------------+
| 1 | testing | 2005-11-29 |
| 0 | testing | 2005-11-29 |
| 1 | 1 | 2005-11-29 |
| 1 | testing | 0000-00-00 |
+------+---------+------------+
4 rows in set (0.00 sec)
The first set of values contains a string value for the first column, which is a numeric datatype. The second set of values contains a numeric value for the second column, which is a character datatype (variable). And the third set of values contains a string value for the third column, which is a date datatype.
What does MySQL do? It ’silently’ converts the incoming bad data into the default values for each of the table’s column datatypes. So in the above example, the string data targeting column one in our table gets converted to a zero (the default value for numeric columns), the numeric data targeting the varchar column goes in as is, and the bad date gets converted to be a zero-filled date value.
This behavior has been criticized by some and termed a MySQL ‘gotcha’. Indeed, for those coming from traditional database management systems, this style of data handling is out of character. However, there are some who actually prefer this type of behavior. For example, some MySQL users don’t want large data loads interrupted with errors and would rather have the load finish and then perform data validation. This type of approach is somewhat mirrored in Oracle10g release 2, where, during a load, bad data can be redirected into a mirror table that contains all rows that have violated data integrity rules.
But, as has already been stated, the vast majority of the database community prefers to have traditional server-enforced data integrity. And that’s exactly what they can have with MySQL 5.0.
Enforcing Data Integrity through MySQL 5.0
In version 5.0, MySQL users have the option to retain the old behavior of data integrity validation or switch to the new mode of server-enforced data integrity. At the heart of this change is the sql_mode configuration parameter. A DBA can modify this parameter in a variety of ways to produce exactly the type of data integrity enforcement they want. Let’s see how easy this change can occur. We’ll use the most restrictive setting for the sql_mode parameter and repeat our prior tests to see how differently MySQL behaves:
mysql> show variables like 'sql_mode';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| sql_mode | |
+---------------+-------+
mysql> set sql_mode=strict_all_tables;
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like 'sql_mode';
+---------------+-------------------+
| Variable_name | Value |
+---------------+-------------------+
| sql_mode | STRICT_ALL_TABLES |
+---------------+-------------------+
mysql> insert into mytest values ('testing','testing',now());
ERROR 1264 (22003): Out of range value adjusted for column 'c1' at row 1
mysql> insert into mytest values (1,'testing','bad date');
ERROR 1292 (22007): Incorrect date value: 'testing' for column 'c3' at row 1
First, notice that the server-enforced data integrity capability is dynamic - it can be changed without stopping/starting the MySQL server. Next, take note that the above example changed server-enforced data integrity at the client level only:
mysql> show global variables like 'sql_mode'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | sql_mode | | +---------------+-------+ 1 row in set (0.00 sec) mysql> show variables like 'sql_mode'; +---------------+-------------------+ | Variable_name | Value | +---------------+-------------------+ | sql_mode | STRICT_ALL_TABLES | +---------------+-------------------+ 1 row in set (0.00 sec)
A DBA can globally set such data integrity enforcement by changing the sql_mode configuration parameter in the my.cnf file or they can start the MySQL daemon/service and specifically specify it via command line startup if they wish. A client (given the privileges) can also override the global setting if they desire.
However, the main thing to observe in the above test is that MySQL is now rejecting invalid data and is not converting it into a column’s default value. Note that this data integrity enforcement includes all aspects of MySQL including things like ENUM constraints:
mysql> create table emp(name varchar(30), gender enum('M','F')) engine=myisam;
Query OK, 0 rows affected (0.00 sec)
mysql> set sql_mode='';
Query OK, 0 rows affected (0.00 sec)
mysql> show variables like 'sql_mode';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| sql_mode | |
+---------------+-------+
1 row in set (0.00 sec)
mysql> insert into emp values ('fred','A');
Query OK, 1 row affected, 1 warning (0.00 sec)
mysql> show warnings;
+---------+------+---------------------------------------------+
| Level | Code | Message |
+---------+------+---------------------------------------------+
| Warning | 1265 | Data truncated for column 'gender' at row 1 |
+---------+------+---------------------------------------------+
1 row in set (0.00 sec)
mysql> select * from emp;
+------+--------+
| name | gender |
+------+--------+
| fred | |
+------+--------+
1 row in set (0.00 sec)
mysql> set sql_mode='strict_all_tables';
Query OK, 0 rows affected (0.00 sec)
mysql> insert into emp values ('john','A');
ERROR 1265 (01000): Data truncated for column 'gender' at row 1
mysql> select * from emp;
+------+--------+
| name | gender |
+------+--------+
| fred | |
+------+--------+
1 row in set (0.00 sec)
The setting used for sql_mode above, strict_all_tables, enables server-enforced data integrity across all of MySQL’s pluggable storage engines. However, you have much more flexibility in setting this parameter. There are wide arrays of options to choose from and you can choose to set more than one value for the parameter if you wish. For example, you can choose to only have server-enforced data integrity for all your transaction tables, but not allow zero dates in any table regardless of underlying storage engine. For a complete listing of all the available parameters, please see http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html in the MySQL Reference Manual.
The last thing to note is that the global sql_mode configuration parameter is intentionally left blank for version 5.0 of MySQL. Countless MySQL applications are written with MySQL’s former data-enforcement behavior in mind, so no global change has been introduced for MySQL’s defaults at this time. This may change in a future release of the server.
Conclusion
Because databases should contain only accurate and valid data, DBAs must work hard to ensure that no misleading information ever enters the gates of their operational data stores and data warehouses. Beginning in MySQL 5.0, DBAs can now enforce data integrity through the MySQL server and not have to depend on data validation via application front ends.
Making the change from no integrity enforcement to global integrity enforcement is easy to do, so MySQL DBAs upgrading from previous MySQL versions should seriously consider its use. And anyone new to MySQL who is deploying version 5.0 should definitely make the new server-enforced data integrity the default.
