Lecture 14
Functional Dependencies
Functional dependencies (FD) are are type of constraint that is based on
keys. A superkey is defined as in the relational
schema R , where:
a subset K of R is a subkey of R if, in
any legal relation r(R), for all pairs, t_{1 }and
t_{2} in tuple r such that t_{1 }is
not equal to t_{2} then t_{1}[K]_{ }is
not equal to t_{2}[K].
Or, no two rows (tuples) have the same value in the attribute(s) K,
which is the key. Now, if there are two attributes (or sets of attributes)
A and B that are legal in the relation schema R, we
can have a functional dependency where
for all pairs of tuples such that t_{1}[A]_{ }is
equal to t_{2}[A] and t_{1}[B]_{ }is
equal to t_{2}[B]. This allows us to state that K
is a superkey of R if K implies R. For example,
in a relation that has names and social security numbers, whenever your
Social Security number is the student ID, the name in that tuple can only
contain your name. That is because your name is not unique, but your
Social Security is. If I go to the Social Security Administration
and search their database for the name "Gary Burt", the results is a large
number of people. If I search of the social security number "123456789",
the result is one and only one person.
Another example is in the loan information that we looked at before:
Loaninfoschema = (branchname, loanname, customername, amount)
it can be shown that the loannumber implies both the amount
and the branchname. It does not imply the customername
because there may be more than one person listed on the load, such as a
husband and wife, or parent and child (when the parent cosigns the loan).
Functional dependencies:

specify a set of constraints on a legal relation.

test relations to see if they are legal.
Some relations are said to be trivial when they are satisfied by all relations:.

A implies A

A implies B and B implies A.
Closure of a Set of Functional Dependencies
It is not enough to look at a single FD. All FDs must be considered
in a relation! Given the schema R = ( A, B, C, G,
H, I) and the FDs of:
A implies B
A implies C
CG implies H
CG implies I
B implies H
We can show that A implies H because A implies B
which implies H.
The notional for a FD is F. The notation of F^{+}
is the set of all FDs logically implied by F. There is a set
of rules, called Armstrong's axioms, that we can use to to compute closure.

Reflexivity rule: If A is a set of attributes, and B
is a set of attributes that are completely contained in A, the A
implies B.

Augmentation rule: If A implies B, and C
is a set of attributes, then if A implies B, then AC
implies BC.

Transitivity rule: If A implies B and B
implies C, then A implies C.
These can be simplified if we also use:

Union rule: If A implies B and A implies
C, the A implies AC.

Decomposition rule: If A implies BC then A
implies B and A implies C.

Pseudotransitivity rule: If A implies B and CB
implies D, then AC implies D.
Using mathematical principles, we can not test a set of attributes to see
if they are a legal superkey.
Pitfalls in RelationalDatabase Design
Obviously, we can have good and bad designs. Among the undesirable
design items are:

Repetition of information

Inability to represent certain information
The relation lending with the schema is an example of a bad
design:
LendingSchema=(branchname, branchcity, assets, cutomername,
loannumber, amount)
branchname 
branchcity 
assets 
customername 
loannumber 
amount 
Downtown 
Brooklyn 
9000000

Jones 
L17 
1000

Redwood 
Palo Alto 
2100000

Smith 
L23 
2000

Perryridge 
Horseneck 
1700000

Hayes 
L15 
1500

Downtown 
Brooklyn 
9000000

Jackson 
L14 
1500

Mianus 
Horseneck 
400000

Jones 
L93 
500

Round Hill 
Horseneck 
8000000

Turner 
L11 
900

Pownal 
Bennington 
300000

Williams 
L29 
1200

North Town 
Rye 
3700000

Hayes 
L16 
1300

Downtown 
Brooklyn 
9000000

Johnson 
L23 
2000

Perryridge 
Horseneck 
1700000

Glenn 
L25 
2500

Brighton 
Brooklyn 
7100000

Brooks 
L10 
2200

Looking at the Downtown and Perryridge, when a new loan is added, the branchcity
and assets must be repeated. That makes updating the table more difficult,
because the update must guarantee that all tuples are updated. Additional
problems come from having two people take out one loan (L23). More
complexity is involved when Jones took out a loan at a second branch (maybe
one near home and the other near work.) Notice that there is
no way to represent information on a branch unless there is a loan.
Decomposition
The obvious solution is that we should decompose this relation. As
an alternative design, we can use the Decomposition rule: If A
implies BC then A implies B and A implies C.
This gives us the schemas:

branchcustomerschema = (branchname, branchcity, assets, customername)

customerloanschema = (customername, loannumber, amount)
branchname 
branchcity 
assets 
customername 
Downtown 
Brooklyn 
9000000

Jones 
Redwood 
Palo Alto 
2100000

Smith 
Perryridge 
Horseneck 
1700000

Hayes 
Downtown 
Brooklyn 
9000000

Jackson 
Mianus 
Horseneck 
400000

Jones 
Round Hill 
Horseneck 
8000000

Turner 
Pownal 
Bennington 
300000

Williams 
North Town 
Rye 
3700000

Hayes 
Downtown 
Brooklyn 
9000000

Johnson 
Perryridge 
Horseneck 
1700000

Glenn 
Brighton 
Brooklyn 
7100000

Brooks 
customername 
loannumber 
amount 
Jones 
L17 
1000

Smith 
L23 
2000

Hayes 
L15 
1500

Jackson 
L14 
1500

Jones 
L93 
500

Turner 
L11 
900

Williams 
L29 
1200

Hayes 
L16 
1300

Johnson 
L23 
2000

Glenn 
L25 
2500

Brooklyn 
L10 
2200




Then when we need to get back to the original table, we can do a natural
join on the two relations branchcustomer and customerloan.
Evaluating this design, how does it compare to the first version?

Looking at the Downtown and Perryridge, when a new loan is added, the branchcity
and assets must be repeated. Problem still exists.

Problems come from having two people take out one loan (L23). Problem
still exists.

More complexity is involved when Jones took out a loan at a second branch.
Problem still exists.

Notice that there is no way to represent information on a branch
unless there is a loan. Problem still exists.
Worse, there is a new problem! When we do the natural join, we get
back four additional tuples that did not exists in the original table:

(Downtown, Brooklyn, 9000000, Jones, L93, 500)

(Perryridge, Horseneck, 1700000, Hayes, L16, 1300)

(Mianus, Horseneck, 400000, Jones, L17, 1000)

(North Town, Rye, 3700000, Hayes, L15, 1500)
We are no long able to represent in the database information about which
customers are borrows from which branch. This is called a lossy
decomposition or lossyjoin decomposition. A decomposition
that is not a lossydecomposition is a losslessjoin decomposition.
Losslessjoins are a requirement for good design and this causes
constraints on the set of possible relations. We say that a relation
is legal if it satisfies all rules, or constraints, imposed.
The proper way to decomposition this example so that we can have a losslessjoin
is to use three relations.

branchschema = (branchname, assets, branchcity)

loanschema = (branchname, loannumber, amount)

borrowerschema = (customername, loannumber)
Normalization Using Functional Dependencies
Using FDs, it is possible to define several normal forms to help
develop good database designs. The two that we will example are BoyceCodd
Normal Form (BCNF) and Third Normal Form (3NF). The requirements
for good decomposition are

LosslessJoin Decomposition

Dependency Preservation

Lack of Repetition of Information
We've discussed the lossless decomposition. Dependency preservation
specifies that the design insure that when an update is made to the database,
that it does not create an illegal relation. In regard to the repetition
of information, it is necessary to include the key of another table, so
that the joins can be properly formed. That is the only information
that should be in both tables!
BoyceCodd Normal Form
A relation schema R is said to be in BCNF with respect to a set
F of FDs, if for all FDs in F^{+} of the form A implies
B, where A is a subset of R and B is a subset of R and at
least one of the following rules is true:

A implies B is a trivial FD (B is a subset of A)

A is a superkey for schema R
Without doing the mathematically proofs, it can be shown that the BCNF
results in dependency preservation.
Third Normal Form
A relation schema R is in 3NF with respect to a set F of
FDs if, for all FDs in the F^{+} of the form A implies B,
where A is a subset of R and B is a subset of R and at least
one of the following rules is true:

A implies B is a trivial FD

A is a superkey for schema R

Each attribute in the result of the expression BA is contained in a candidate
key for R
CSEE
 461
 461
S'99  lectures
 news
 help