|
Page 7 of 12
Pivot Table "Spreadsheet Format" to Normalized Data
Consider the reverse: Suppose you had a pivot table, or the data in a spreadsheet-like format, and you wanted a normalized table of exams. For this example, the table nonormal is needed. This table is defined and created as follows:
SQLite version 3.0.8
Enter ".help" for instructions
sqlite> attach database 'examdatabase' as e1;
sqlite> attach database 'examdatabase2' as e2;
sqlite> create table e1.nonormal as
select ln,fn,
sum(score*(1-abs(sign(exam-1)))) as exam1,
sum(score*(1-abs(sign(exam-2)))) as exam2,
sum(score*(1-abs(sign(exam-3)))) as exam3,
sum(score*(1-abs(sign(exam-4)))) as exam4
from (select 'e1' as db,* from e1.exam union select 'e2' as db,* from e2.exam)
group by ln,fn;
sqlite> .header on
sqlite> .mode column
sqlite> select * from e1.nonormal;
ln fn exam1 exam2 exam3 exam4
---------- ---------- ---------- ---------- ---------- ----------
Anderson Bob 75 82 92 95
Carter Sue 89 100 99 95
Stoppard Tom 88 90 92 95
The nonormal table was created in the examdatabase, since "e1." was given before the name. Again, the objective here is to go backwards and create a normalized table from the pivot table, a table that will list all exam scores in one field and all the exam numbers in another, without having a separate field for each exam. In addition, the goal is to do all this in one select statement without looping through the data. First, it is necessary to create a number table, "enum", and it must have the field "e" from 1..N where N is the number of exams (which is four in this case).
sqlite> CREATE TABLE enum (e int);
sqlite> INSERT INTO "enum" VALUES(1);
sqlite> INSERT INTO "enum" VALUES(2);
sqlite> INSERT INTO "enum" VALUES(3);
sqlite> INSERT INTO "enum" VALUES(4);
The coalesce function is used in an interesting way for this example.
sqlite> .mode list
sqlite> select n.ln,n.fn,
1*(1-abs(sign(e.e-1)))+
2*(1-abs(sign(e.e-2)))+
3*(1-abs(sign(e.e-3)))+
4*(1-abs(sign(e.e-4))),
coalesce(0/(e.e-1),n.exam1)+
coalesce(0/(e.e-2),n.exam2)+
coalesce(0/(e.e-3),n.exam3)+
coalesce(0/(e.e-4),n.exam4)
from enum as e,e1.nonormal as n;
Anderson|Bob|1|75
Carter|Sue|1|89
Stoppard|Tom|1|88
Anderson|Bob|2|82
Carter|Sue|2|100
Stoppard|Tom|2|90
Anderson|Bob|3|92
Carter|Sue|3|99
Stoppard|Tom|3|92
Anderson|Bob|4|95
Carter|Sue|4|95
Max Min Problems
Assume you have the following table of names, ages, and salaries. Find the age, name, and salary of the youngest person making the overall highest salary, or first find the highest salary, then, from this group, select the youngest person.
create table salary (name varchar(3),age int, salary double);
insert into salary values ('dan',23,67);
insert into salary values ('bob',45,94);
insert into salary values ('tom',24,94);
insert into salary values ('sue',23,45);
insert into salary values ('joe',45,51);
insert into salary values ('sam',22,51);
Once you have the data entered, you will have the following;
sqlite> .headers on
sqlite> .mode column
sqlite> select * from salary;
name age salary
---------- ---------- ----------
dan 23 67
bob 45 94
tom 24 94
sue 23 45
joe 45 51
sam 22 51
sqlite>
The following select will give you the youngest person making the top salary in the company:
sqlite> select 1000-max(salary*1000-age)%1000 from salary;
1000-max(salary*1000-age)%1000
------------------------------
24
This is the correct answer. The highest salary is 94 for Bob and Tom. Tom is the youngest at 24.
Why the number 1000? Well, no one lives to be 1000, so we know that age will never be >= 1000. Therefore, max(salary*1000-age) will clearly choose the highest salary independent of age, as long as salary is >= 1. In cases of a tie in salary, the youngest person will subtract the least amount from the salary, so this value will return as the highest. It's easy to remove the salary part from this number. Since salary is multiplied by 1000, it will disappear with mod 1000, since it's a perfect factor of 1000.
To understand how this works, it is helpful to break the statement into separate, smaller parts, as follows:
sqlite> select salary*1000-age,salary*1000,-age from salary;
salary*1000-age salary*1000 -age
--------------- ----------- ----------
66977 67000 -23
93955 94000 -45
93976 94000 -24
44977 45000 -23
50955 51000 -45
50978 51000 -22
sqlite>
But what about the negative value for age? With the non-Knuth method of the mod function, "%", when x < 0, then x % y will return x, if abs(x) < abs(y).
x % y is defined as follows:
x % y == x - INT(x/y)*y
and undefined for y == 0. C and Fortran use this method.
In contrast, the Knuth method, found in Python and accepted in mathematics, defines this function as follows:
x mod y == x - floor(x/y),
and equal to x if y == 0
The difference between the two shows up with negative values for x.
Or, put another way, as long as -x != y, then -x % y = -x. For example, assume x=4 and y=5, then -4 % 5 will return a -4. Here are a few other examples. Again this is not the Knuth method for the mod function.
-1 % 5 = -1
-2 % 5 = -2
-3 % 5 = -3
So what we are really doing is the following:
1000 + -1*(1000-age) = age
|