...
Any column that is named with an _sk
suffix is a Synthetic Key that points to a corresponding dimension table, named with the part of the column before the _sk
; the dimension table will have a _dim
suffix in the name. This way, it's easy to write the JOIN
's, the column name is a clue to the dimension table, which will have a column of the same name. Almost every dimension table will consist of a *_sk
PRIMARY KEY
and a name column.
...
A typical dimension table
...
The pfx_dw.jobstatus_dim table is one of the few exceptions to the normal dimension table structure; it provides a mapping between the integer and human-readable status values.
Code Block |
---|
mysql> SELECT * FROM jobstatus_dim;
+--------------+------------+-------------+----------------+-------------+
| jobstatus_sk | status_int | status_char | effective_date | expiry_date |
+--------------+------------+-------------+----------------+-------------+
| 1 | 16 | complete | 1999-12-31 | 9999-12-31 |
| 2 | 32 | failed | 1999-12-31 | 9999-12-31 |
| 3 | 48 | killed | 1999-12-31 | 9999-12-31 |
| 4 | 272 | blocked | 1999-12-31 | 9999-12-31 |
| 5 | 288 | waiting | 1999-12-31 | 9999-12-31 |
| 6 | 304 | suspended | 1999-12-31 | 9999-12-31 |
| 7 | 320 | pending | 1999-12-31 | 9999-12-31 |
| 8 | 323 | waiting | 1999-12-31 | 9999-12-31 |
| 9 | 325 | badlogin | 1999-12-31 | 9999-12-31 |
| 10 | 336 | running | 1999-12-31 | 9999-12-31 |
| 11 | 261 | dying | 1999-12-31 | 9999-12-31 |
+--------------+------------+-------------+----------------+-------------+ |
...
, the "user_dim" table
For example, the user_sk column can be used to do a SQL `INNER JOIN` to the `user_dim` INNER JOIN
to the user_dim table.
Code Block |
---|
mysql> SHOW CREATE TABLE pfx_dw.user_dim\G *************************** 1. row *************************** Table: user_dim Create Table: CREATE TABLE `user_dim` ( `user_sk` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(255) NOT NULL, PRIMARY KEY (`user_sk`), UNIQUE KEY `name` (`name`) ) ENGINE=MyISAM AUTO_INCREMENT=10 DEFAULT CHARSET=latin1 1 row in set (0.00 sec) |
Code Block |
mysql> SELECT name FROM user_dim;
+---------------+
| name |
+---------------+
| Administrator |
| bar |
| foo |
| foobar |
| fubar |
| jburk |
| perforce |
| root |
| shinya |
+---------------+ |
Get a count of all jobs for a particular user:
Code Block |
---|
mysql> SELECT COUNT(*) FROM job_fact AS fact INNER JOIN user_dim AS user ON fact.user_sk = user.user_sk WHERE user.name = 'jburk' ; +----------+ | COUNT(*) | +----------+ | 5868 | +----------+ |
...
The pfx_dw.time_dim table is provided so that you don't have to perform date/time operations on every row in a fact table (since they can run into the 100's of millions of rows), instead you do a SQL INNER JOIN
to it and use the values in the time_dim table in your WHERE
clause. The time_sk column in every fact table has an identical value in the time_dim
table which has a single row with a primary key time_sk. The time_sk
value is actually the unix epoch time in seconds:
Code Block |
---|
mysql> SELECTSELECT * FROMFROM time_dimdim ORDER BYBY time_sksk DESC LIMIT 1; ; +------------+---------------------+------+------------+-----+------------+-------+------+ | time_sk | date_time | hour | date | dow | month_name | month | year | +------------+---------------------+------+------------+-----+------------+-------+------+ | 1392339600 | 2014-02-13 17:00:00 | 17 | 2014-02-13 | 5 | February | 2 | 2014 | +------------+---------------------+------+------------+-----+------------+-------+------+ 1 row in set (0.00 sec) mysql> SELECTSELECT FROM_UNIXTIME(time_sk) , date_timetime FROMFROM time_dimdim ORDER BYBY time_sksk DESC LIMIT 1; +------------------------+---------------------+ | FROM_UNIXTIME(time_sk) | date_time | +------------------------+---------------------+ | 2014-02-13 17:00:00 | 2014-02-13 17:00:00 | +------------------------+---------------------+ |
The "job status" dimension table
The pfx_dw.jobstatus_dim table is one of the few exceptions to the normal dimension table structure; it provides a mapping between the integer and human-readable status values.
Code Block |
---|
mysql> SELECT * FROM jobstatus_dim;
+--------------+------------+-------------+----------------+-------------+
| jobstatus_sk | status_int | status_char | effective_date | expiry_date |
+--------------+------------+-------------+----------------+-------------+
| 1 | 16 | complete | 1999-12-31 | 9999-12-31 |
| 2 | 32 | failed | 1999-12-31 | 9999-12-31 |
| 3 | 48 | killed | 1999-12-31 | 9999-12-31 |
| 4 | 272 | blocked | 1999-12-31 | 9999-12-31 |
| 5 | 288 | waiting | 1999-12-31 | 9999-12-31 |
| 6 | 304 | suspended | 1999-12-31 | 9999-12-31 |
| 7 | 320 | pending | 1999-12-31 | 9999-12-31 |
| 8 | 323 | waiting | 1999-12-31 | 9999-12-31 |
| 9 | 325 | badlogin | 1999-12-31 | 9999-12-31 |
| 10 | 336 | running | 1999-12-31 | 9999-12-31 |
| 11 | 261 | dying | 1999-12-31 | 9999-12-31 |
+--------------+------------+-------------+----------------+-------------+ |
Get a count of all jobs for a particular user for January, 2014:
Code Block |
---|
mysql> SELECT COUNT(*) FROM job_fact AS fact INNER JOIN user_dim AS user ON fact.user_sk = user.user_sk INNER JOIN time_dim AS time ON fact.time_sk=time.time_sk WHERE user.name = 'jburk' AND time.month = 1 AND time.year = 2014 ; +----------+ | COUNT(*) | +----------+ | 83 | +----------+ |
Get a count of all jobs for each user for all of 2013:
Code Block |
---|
mysql> SELECT user.name , time.month_name , COUNT(*) as "job count" FROM job_fact AS fact INNER JOIN user_dim AS user ON fact.user_sk = user.user_sk INNER JOIN time_dim AS time ON fact.time_sk=time.time_sk WHERE time.year = 2013 GROUP BY user.name , time.month ORDER BY user.name , time.month ; +---------------+------------+-----------+ | name | month_name | job count | +---------------+------------+-----------+ | Administrator | January | 4 | | bar | July | 1 | | foo | July | 2 | | foobar | July | 1 | | foobar | September | 2 | | fubar | July | 1 | | jburk | March | 1 | | jburk | May | 123 | | jburk | June | 24 | | jburk | July | 220 | | jburk | August | 140 | | jburk | September | 68 | | jburk | October | 560 | | jburk | November | 59 | | jburk | December | 4467 | | perforce | January | 128 | | perforce | February | 4 | | root | January | 31 | | root | February | 37 | | root | March | 7 | | root | April | 17 | | root | May | 7 | | root | June | 1 | | root | July | 11 | | root | September | 2 | | root | December | 2 | | shinya | January | 23 | | shinya | February | 69 | | shinya | March | 7 | | shinya | April | 20 | | shinya | May | 2 | +---------------+------------+-----------+ |
Get a count of all jobs for each user for all of 2013, broken down by month and the job's final status:
Code Block |
---|
mysql> SELECT user.name , time.year , time.month_name , status.status_char , COUNT(*) as "job count" FROM job_fact AS fact INNER JOIN user_dim AS user ON fact.user_sk=user.user_sk INNER JOIN time_dim AS time ON fact.time_sk=time.time_sk INNER JOIN jobstatus_dim as status ON fact.jobstatus_sk=status.jobstatus_sk WHERE time.year = 2013 GROUP BY user.name , time.month , status.status_int ORDER BY user.name , time.month ,status.status_char ; +---------------+------+------------+-------------+-----------+ | name | year | month_name | status_char | job count | +---------------+------+------------+-------------+-----------+ | Administrator | 2013 | January | failed | 4 | | bar | 2013 | July | complete | 1 | | foo | 2013 | July | complete | 2 | | foobar | 2013 | July | complete | 1 | | foobar | 2013 | September | complete | 2 | | fubar | 2013 | July | complete | 1 | | jburk | 2013 | March | complete | 1 | | jburk | 2013 | May | complete | 100 | | jburk | 2013 | May | failed | 11 | | jburk | 2013 | May | killed | 12 | | jburk | 2013 | June | complete | 4 | | jburk | 2013 | June | failed | 20 | | jburk | 2013 | July | complete | 134 | | jburk | 2013 | July | failed | 33 | | jburk | 2013 | July | killed | 53 | | jburk | 2013 | August | complete | 67 | | jburk | 2013 | August | failed | 25 | | jburk | 2013 | August | killed | 48 | | jburk | 2013 | September | complete | 38 | | jburk | 2013 | September | failed | 17 | | jburk | 2013 | September | killed | 13 | | jburk | 2013 | October | complete | 450 | | jburk | 2013 | October | failed | 61 | | jburk | 2013 | October | killed | 49 | | jburk | 2013 | November | complete | 38 | | jburk | 2013 | November | failed | 12 | | jburk | 2013 | November | killed | 9 | | jburk | 2013 | December | complete | 3812 | | jburk | 2013 | December | failed | 627 | | jburk | 2013 | December | killed | 28 | | perforce | 2013 | January | failed | 46 | | perforce | 2013 | January | killed | 82 | | perforce | 2013 | February | complete | 3 | | perforce | 2013 | February | killed | 1 | | root | 2013 | January | complete | 24 | | root | 2013 | January | failed | 6 | | root | 2013 | January | killed | 1 | | root | 2013 | February | complete | 34 | | root | 2013 | February | killed | 3 | | root | 2013 | March | complete | 7 | | root | 2013 | April | complete | 9 | | root | 2013 | April | failed | 4 | | root | 2013 | April | killed | 4 | | root | 2013 | May | complete | 6 | | root | 2013 | May | killed | 1 | | root | 2013 | June | complete | 1 | | root | 2013 | July | complete | 7 | | root | 2013 | July | failed | 1 | | root | 2013 | July | killed | 3 | | root | 2013 | September | complete | 1 | | root | 2013 | September | failed | 1 | | root | 2013 | December | complete | 2 | | shinya | 2013 | January | complete | 4 | | shinya | 2013 | January | failed | 7 | | shinya | 2013 | January | killed | 12 | | shinya | 2013 | February | complete | 60 | | shinya | 2013 | February | failed | 2 | | shinya | 2013 | February | killed | 7 | | shinya | 2013 | March | complete | 2 | | shinya | 2013 | March | failed | 2 | | shinya | 2013 | March | killed | 3 | | shinya | 2013 | April | complete | 16 | | shinya | 2013 | April | killed | 4 | | shinya | 2013 | May | killed | 2 | +---------------+------+------------+-------------+-----------+ |