Thursday, 3 November 2011

Databases: Multivalued Dependencies

I don't know about anyone else, but multivalued dependencies took me quite a while to get my head around. I think I've got there now, but I thought working through an example might help. This is all new for me so it is possible that I haven't understood correctly, but here is my understanding.

Example

Credit: BerndH,
Wikipedia Commons
Credit: Beezhive
Wikipedia Commons
I have a relation Garden which has attributes location (which part of the garden), plant (which plant is growing there), colour (colour of the plant), That is, Garden(position, plant, colour). I'll do several examples of different data to illustrate different possible functional and multivalued relations.

For the first part, let's look at the colour of the flowers. I always plant mixed colours of plants when they exist so whenever I plant narcissus, I get some white and some yellow. That means, that whenever I have narcissus in a location, I should have an entry in the relation that says they are yellow and another that says they are white. This means I have the multivalued relation

plant ->> location

since whenever I have narcissus at a location, I have both white and yellow ones there, and whenever I have bluebells, I always have blue and white ones there.

locationplantcolour
frontnarcissusyellow
frontnarcissuswhite
backnarcissusyellow
backnarcissuswhite
leftnarcissusyellow
leftnarcissuswhite
rightbluebellsblue
rightbluebellswhite

Credit: MichaelMaggs,
Wikipedia Commons
Suppose I plant some bluebells in the front. As I know that I have the multivalued relation plant ->> location, to satisfy that relation, I have to plant both white and blue bluebells. If I only planted blue then the relation would no longer hold as I have to have every combination of the unmentioned attribute colour. If the colour combination (bluebell, white) occurs in one location, it has to occur everywhere bluebell occurs. I don't have yellow bluebells since they don't occur anywhere in the table, so the yellow doesn't suddenly get attached to the bluebell just because it is the colour of some flower. Basically, I just cut and paste the two lines for bluebells with just the location changed.

locationplantcolour
frontnarcissusyellow
frontnarcissuswhite
frontbluebellsblue
frontbluebellswhite
backnarcissusyellow
backnarcissuswhite
leftnarcissusyellow
leftnarcissuswhite
rightbluebellsblue
rightbluebellswhite

Now I'm going to add in another property of the flowers. Are they scented or not? Let's suppose we just check the flowers in the front and find out that yellow narcissus are not scented, but some of the white are scented and some aren't. With the information so far, can we say whether our multivalued dependency still holds or not? We don't have scented yellow narcissus but we do have scented white ones so all combinations of white/yellow and scented/unscented aren't present. This is not a problem since the multivalued dependency only tells us that if we have a certain combination of the attributes which aren't mentioned for a particular plant, then every time that plant occurs with a location, that combination of the unmentioned attributes must occur.

For our example, we know in the front we have yellow, unscented and also unscented and scented white narcissus and since we have narcissus in the back, they must also be unscented and yellow; white and unscented; white and scented. We also have narcissus to the left so they must also be present in the three combinations. Note how the table has got bigger and how easy it would be to miss a combination somewhere.

locationplantcolourscent
frontnarcissusyellowscented
frontnarcissuswhitescented
frontnarcissuswhiteunscented
frontbluebellsblueunscented
frontbluebellswhiteunscented
backnarcissusyellowscented
backnarcissuswhitescented
backnarcissuswhiteunscented
leftnarcissusyellowscented
leftnarcissuswhitescented
leftnarcissuswhiteunscented
rightbluebellsbluescented
rightbluebellswhitescented

How could we better represent this data without this horrible explosion when you just wanted to add scented/unscented? We could use the multivalued dependency plant ->> location to make the relation Garden(location,plant, colour, scent) in 4th Normal Form. We take out the attributes mentioned in the dependency and they form their own relation Where(plant, location) and then remove the right hand side from the Garden to form Flowers(plant, colour, scent).

Where
locationplant
frontnarcissus
frontbluebells
backnarcissus
leftnarcissus
rightbluebells
Flowers
plantcolourscent
narcissusyellowscented
narcissuswhitescented
narcissuswhiteunscented
bluebellsblueunscented
bluebellswhiteunscented

If we take the natural join of Flowers with Where, we get back exactly the tuples in Garden.