Relational Database Management Systems
(RDBMSs), such as those
covered in this book, are the primary engines of information systems
worldwide, particularly web applications and distributed
client/server computing systems. They enable a multitude of users to
quickly and simultaneously access, create, edit, and manipulate data
without impeding or impacting other users. They also allow developers
to write useful applications to access their resources as well as
provide administrators with the capabilities they need to maintain,
secure, and optimize organizational data resources.
An RDBMS is defined as a system whose users view data as a collection
of tables related to each other through common data values. Data is
stored in tables, which are composed of
rows and columns. Tables of
independent data can be linked (or related) to
one another if they each have unique, identifying columns of data
(called keys) that represent a data value held
in common. E. F.
Codd first described relational database theory in his landmark paper
"A Relational Model of Data for Large Shared Data
Banks," published in the Communications of
the ACM (Association for Computing Machinery) in June,
1970. Under Codd's new relational data model, data
was structured (into tables of rows and columns); manageable using
operations like selections, projections, and joins; and consistent as
the result of integrity rules like keys and referential integrity.
Codd also articulated rules that governed how a relational database
should be designed; the process for applying these rules is now known
SQL99 had two main
parts, Foundation:1999 and Bindings:1999. The
SQL2003 Foundation section includes
all of the Foundation and Bindings standards from SQL99, but a new
section called Schemata was created.
The Core requirements of SQL2003 did not change from Core SQL99. So
the database platforms that conformed to Core SQL99 automatically
conform to SQL2003. Although the Core of SQL2003 has no additions
(except for a few new reserved words), a number of individual
statements and behaviors have been updated or modified. Because these
updates are reflected in the individual syntax descriptions of each
statement in Chapter 3, we
won't spend time on them here.
A few elements of the Core in
SQL99 have been deleted from SQL2003,
The BIT and BIT VARYING
The UNION JOIN clause
The UPDATE...SET ROW statement
In addition, a number of features, most of which were rather obscure,
have been added, deleted, or renamed. At this point in time, many of
the new features of SQL2003 standard are interesting mostly from an
academic standpoint because none of the database platforms support
the feature yet. However, there are a few new features that hold more
than passing interest:
The constantly evolving nature of the SQL standard has given rise to a number
of SQL dialects
among the various vendors and platforms.
These dialects most commonly evolved because the user community of a
given database vendor required capabilities in the database before
the ANSI committee created a standard. Occasionally, though, a new
feature is introduced by the academic or research communities due to
competitive pressures from competing technologies. For example, many
database vendors are augmenting their current programmatic offerings
with Java (as is the case with DB2, Oracle, and Sybase) or VBScript
(as Microsoft is doing). In the future, programmers and developers
will use these programming languages in concert with SQL to build SQL
Many of these dialects include conditional processing capabilities
(such as that controlled through IF . . . THEN
statements), control-of-flow functions (such as
WHILE loops), variables, and error handling.
Because ANSI had not yet developed a standard for these important
features at the time users began to demand them, RDBMS developers and
vendors were free to create their own commands and syntax. In fact,
some of the earliest vendors from the 1980s have variances in the
most elementary commands, such as SELECT,
because their implementations predate the standards. ANSI is now
refining standards that address these inconsistencies.
SQL in a Nutshell, Second Edition, describes the
SQL standard and the platform-specific implementations of five
IBM's popular DBMS runs on hardware platforms
ranging from PCs all the way up to huge mainframes. It runs on a
multitude of operating systems, including the burgeoning Linux OS.
DB2 is very popular in corporate environments, especially those with
a healthy investment in IBM software, equipment, or services. This
book covers DB2 Universal Database Version 8.1 for Linux, Unix, and
MySQL is a popular open source DBMS that is known for its blistering
performance. It runs on numerous operating systems, including most
Linux variants. To improve performance, it has a slimmer feature set
than do many other DBMSs. This book covers MySQL 4.0.
Oracle is a leading RDBMS in the commercial sector. It runs on a
multitude of operating systems and hardware platforms. Its scalable,
reliable architecture has made it the platform of choice for many
users. SQL in a Nutshell, Second Edition, covers
Oracle Database 10g.
To begin to use SQL, you must understand how statements are written.
SQL syntax falls into four main categories. Each category is
introduced in the following list and then explained in further detail
in the sections that follow:
Describe a user- or system-supplied name for database objects, such
as a database, a table, a constraint on a table, the columns in a
table, a view, etc.
Describe a user- or system-supplied string or value that is not
otherwise an identifier or a keyword. Literals may be strings like
"hello", numbers like 1234, dates
like "Jan 01, 2002", or Boolean
values like TRUE.
Are symbols specifying an action performed on one or more
expressions, most often in
INSERT, SELECT, or
UPDATE statements. Operators are also used
frequently in the creation of
Have special meaning to the database SQL parser, such as
DELETE, or CREATE.
Reserved words are words that cannot be used as
identifiers within the database platform, usually commands or SQL
statements. Keywords are words that may become
reserved some time in the future. You can circumvent the restriction
on using reserved words and keywords as identifiers through
quoted identifiers, described in a moment. This
is not recommended! Elsewhere in the book (besides here), we use the
term keyword to describe both concepts.
A table can contain one or many columns. Each column must be defined
with a datatype that provides a general classification of the data
that the column will store. In real-world applications, datatypes
provide some control and efficiency as to how tables are defined and
how the data is stored within the table. Using specific datatypes
enables better, more understandable queries and helps control the
integrity of data.
The tricky thing about SQL2003 datatypes is that they do not always
map directly to an identical implementation in a given platform.
Although the platforms specify
"datatypes" that correspond to the
SQL2003 datatypes, these are not always true SQL2003 datatypes. For
example, MySQL's implementation of a
BIT datatype is actually identical to a
CHAR(1) datatype value. Nonetheless, each of the
platform datatypes is close enough to the standard to be both easily
understandable and job-ready.
The official SQL2003 datatypes (as opposed to platform-specific
datatypes) fall into the general categories described in Table 2-8. (Note that the SQL2003 standard contains a
few rarely used datatypes (ARRAY,
MULTISET, REF, and
ROW) that are shown only in Table 2-8 and not elsewhere in the book.)
Constraints are defined when you create or alter a table. The general
is shown here:
The syntax elements are as follows:
Begins a constraint definition and, optionally, provides a name for
the constraint. When you omit
constraint_name, the system will create
one for you automatically. On some platforms, like DB2, you may omit
the CONSTRAINT keyword as well. System-generated
names are often incomprehensible. It is good practice to specify
human-readable, sensible names for constraints.
Declares the constraint as one of the allowable types:
CHECK, PRIMARY KEY,
UNIQUE, or FOREIGN KEY.
More information about each type of constraint appears later in this
Associates one or more columns with the constraint. Specify the
columns in a comma-delimited list. The column list must be enclosed
in parentheses. The column list should be omitted for column-level
constraints. Columns are not used in every constraint. For example,
CHECK constraints do not generally use column
When researching a command in this chapter:
Read Section 3.2.
Check the platform support table.
Read the section on SQL2003 syntax and description even if you are
looking for a specific platform implementation.
Finally, read the specific platform implementation information.
Any common features between the platform implementations of a command
are discussed and compared against the SQL2003 topic. Thus, the
subsection on a platform's implementation of a
particular command may not describe every aspect of that command,
since some of its details may be covered in the SQL2003 section.
Please note that if there is a keyword that appears in a command
syntax but not in its keyword description, this is because we chose
not to repeat descriptions that appear under the ANSI entry.
If you skip straight to a platform implementation and do not see a
discussion of a keyword or clause that you are interested in, that
means it is described (in common across all supporting platforms)
under the SQL2003 entry of the command.
Table 3-1 provides a listing of the SQL
statements, the platforms that support them, and the degree to which
they support them. The following list offers useful tips for reading
Table 3-1, as well as what each abbreviation
stands for. The sections that follow describe the commands from the
table in detail:
The first column contains the alphabetized SQL commands.
The SQL statement class for each command is indicated in the second
The subsequent columns list the vendor's level of
The platform supports the SQL2003 standard for the particular command.
The platform supports the SQL2003 standard for the particular
command, using vendor-specific code or syntax.
The platform supports some but not all of the functions specified by
the SQL2003 standard for the particular command.
The platform does not support the particular command according to the
Remember that even if a specific SQL2003 command is listed as
"Not supported," the platform
usually has alternative coding or syntax to enact the same command or
function. Therefore, be sure to read the discussion and examples for
each command later in this chapter. Likewise, a few of the commands
in Table 3-1 are not found in the SQL2003
standard, and these have been indicated with the term
"Non-ANSI" under the heading
"SQL2003 class" in the table.
The ALL operator performs a Boolean test of a
subquery for the existence of a value in all rows. The
ANY operator, and its synonym
SOME, performs a Boolean test of a subquery for
the existence of a value in any of the rows tested.
Tests a scalar expression, such as a column, against every value in
ALL, and against every value until a match is
found for ANY and SOME. All
rows must match the expression to return a Boolean TRUE value for the
ALL operator, while one or more rows must match
the expression, for ANY and
SOME, to return a Boolean TRUE value.
expression to the
comparison must be a standard comparison
operator like =, <>, !=, >, >=, <, or <=.
The ALL operator returns a Boolean TRUE value
when one of two things happen: either the subquery returns an empty
set (i.e., no records) or every record in the set meets the
comparison. ALL returns FALSE when any records
in the set do not match the value comparison.
ANY and SOME operators
return a Boolean TRUE when at least one record in the subquery
matches the comparison operation, and FALSE when no record matches
the comparison operation (or when a subquery returns an empty result
set). If even one return value of the subquery is NULL, then the
operation evaluates as NULL, not as TRUE.
4.1. Types of Functions
functions return a single value based
upon a set of other values. If used among other expressions in the
item list of a SELECT statement, the
SELECT must have a GROUP BY
or HAVING clause. No GROUP
BY or HAVING clause is required if
the aggregate function is the only value retrieved by the
SELECT statement. The supported aggregate
functions and their syntax are listed in Table 4-1.
Table4-1.ANSI SQL aggregate functions
Computes the average value of a column given by
Computes a correlation coefficient.
Counts the rows defined by the
Counts all rows in the specified table or view.
Computes population covariance.
Computes sample covariance.
Computes the relative rank of a hypothetical row within a group of
rows, where the rank is equal to the number of rows less than or
equal to the hypothetical row divided by the number of rows in the
Generates a dense rank (no ranks are skipped) for a hypothetical row
value_list) in a group of rows generated
by GROUP BY.
allows for a
aggregate function calls, the addition of which makes those functions
into window functions. Both Oracle and DB2 support this window
function syntax. This section describes how to use the
Oracle and DB2.
Oracle tends to refer to window functions as analytic
Window, or analytic, functions are
similar to standard aggregate functions in that they operate on
multiple rows, or groups of rows, within the result set returned from
a query. However, the groups of rows that a window function operates
on are defined not by a GROUP BY clause, but by partitioning and
windowing clauses. Furthermore, the order within these groups is
defined by an ordering clause, but that order only affects function
evaluation, and has no effect on the order in which rows are returned
by the query.
Window functions are the last items in a query to be evaluated except
for the ORDER BY clause. Because of this late
evaluation, window functions cannot be used
within the WHERE, GROUP BY,
or HAVING clauses.
ANSI SQL scalar functions return a
single value each time they are invoked. The SQL standard provides
many scalar functions that can be used to manipulate date and time
types, strings, and numbers, as well as retrieve system information
such as the current user or login name.
Scalar functions fall into the categories
listed in Table 4-2.
Table4-2.Categories of scalar functions
Performs operations on values or settings built into the database.
Oracle uses the term "built-in" to
describe all the specialty functions that are provided by Oracle, and
thus "built into" their DBMS. This
is a distinct and separate usage from the
"built-in" functions described
CASE and CAST
While these two functions operate on scalar input values, they are in
a category all their own. CASE supplies
IF-THEN logic to SQL statements and
CAST can convert values from one datatype to
Date and Time
Performs operations on temporal datatypes and
returns values in a temporal datatype format.
There is no SQL2003 function that operates on a temporal datatype and
returns a temporal result. The closest function is
EXTRACT (covered in Section 4.4.3 later in this
chapter), which operates on temporal values and returns numeric
values. Functions returning temporal values but operating on no
arguments are covered in the later Section 4.4.1.
This section provides an alphabetical listing of
that are specific to DB2 and not from the ANSI standard, with
examples and corresponding results.
Returns the arc cosine of a number ranging from -1 to 1. The result
ranges from 0 to and is expressed in radians. For example:
Returns the ASCII code of the first character of text. For example:
Returns the arc sine of
from -1 to 1. The resulting value ranges from -/2 to /2 and is
expressed in radians. For example:
Returns the arctangent of any
resulting value ranges from -/2 to /2 and is expressed in radians.
Returns the arctangent of
nbr. The values for
are not restricted, but the results range from - to and are expressed
is similar to
with the exception that the signs of
y are used to determine the quadrant of
the result. For example:
database applications, large and small, involves many steps. Careful
thought must be given to application architecture, and especially to
the following issues:
How to map application data, which is typically object-oriented, to a
How to handle errors gracefully
How to maximize performance and scalability
A typical database application will require many different SQL
statements. The management of so many statements is simplified by the
fact that all SQL statements will follow roughly the same pattern of
execution within an application. Figure 5-1 is a
state diagram showing how SQL
statements are prepared, executed,
and then processed by a database application when interacting with a
relational database system. The state diagram has been broken down
into eleven steps, four of which are optional (and are indented in
Figure5-1.Statement execution state diagram
Following are detailed descriptions of each step shown in Figure 5-1:
: Establishing connectivity is the
first step in every successful database application. It is in this
step that the client, or database application, makes a physical
connection to the database that will be used to transmit the SQL
statements to the database and the results back to the client. The
actual physical connection could be over a LAN, WAN, or even a simple
logical connection back to the client in cases where the database
application and server are running on the same machine. For more
information on how to establish connectivity, please see the section
Section 5.2 later in this chapter.
with ADO.NET requires instantiating a connection object with a
properly formatted connection string and then invoking the
Open method on the connection object. The
connection object can be an
SqlConnection, or an
OdbcConnection is designed for any ODBC
datasource, and the
OleDbConnection type will work
with any OLE DB Provider. For the highest performance data access,
use connection objects specifically tuned to the specific database
platform, such as
SqlConnection for Microsoft SQL
Server. Following is the syntax for creating a
Connection object in ADO.NET:
The format of the connection string is the same for all of the
connection types. The format is a string of key/value pairs delimited
by semicolons. For example:
While the format is the same for every connection type, the keys and
values are quite different. Tables Table 5-1
through Table 5-3 list the attributes for the
three connection types listed above. Many database platforms support
additional attributes that can also be set through the connection
string. For a list of these attributes, please consult the
appropriate database vendor documentation.
5.3. Closing a Database Connection