You are viewing an old version of this page. View the current version.

    Compare with Current View Page History

    « Previous Version 9 Next »


    The pfx_dw schema is a fairly textbook "star schema", with things like jobs and worker usage in fact tables, and the various things that you might use in a SQL WHERE clause in dimension tables.

    The job dimension table, a typical "fact" table.

    For example, to get reports about various jobs over time, you'll be querying the pfx_dw.job_fact table:

    The "job_time*" columns are stored in UNIX epoch time

    Icon

     

    To convert them to human-readable format use the MySQL FROM_UNIXTIME() function.

     

    mysql> SELECT job_id, job_timesubmit, FROM_UNIXTIME(job_timesubmit) FROM pfx_dw.job_fact WHERE job_id=98269;
    +--------+----------------+-------------------------------+
    | job_id | job_timesubmit | FROM_UNIXTIME(job_timesubmit) |
    +--------+----------------+-------------------------------+
    |  98269 |     1414709214 2014-10-30 15:46:54           |
    +--------+----------------+-------------------------------+

     

    "*_sk" columns can be used to do INNER JOINs to a similarly named dimension table

    Any column that is named with an _sk suffix is a Synthetic Key that points to a corresponding dimension table, named with the part of the column before the _sk; the dimension table will have a _dim suffix in the name. This way, it's easy to write the JOIN's, the column name is a clue to the dimension table, which will have a column of the same name. Almost every dimension table will consist of a *_sk PRIMARY KEY and a name column.

     

    A typical dimension table, the "user_dim" table

    For example, the user_sk column can be used to do a SQL INNER JOIN to the user_dim table.

     

    Get a count of all jobs for a particular user:

    The time dimension table

    The pfx_dw.time_dim table is provided so that you don't have to perform date/time operations on every row in a fact table (since they can run into the 100's of millions of rows), instead you do a SQL INNER JOIN to it and use the values in the time_dim table in your WHERE clause. The time_sk column in every fact table has an identical value in the time_dim table which has a single row with a primary key time_sk.  The time_sk value is actually the unix epoch time in seconds:

    The "job status" dimension table

    The pfx_dw.jobstatus_dim table is one of the few exceptions to the normal dimension table structure; it provides a mapping between the integer and human-readable status values.

     

    Get a count of all jobs for a particular user for January, 2014:
    Get a count of all jobs for each user for all of 2013:

    Get a count of all jobs for each user for all of 2013, broken down by month and the job's final status:
    Get the sum total of cpu_seconds used for each user for the last 7 days, broken down by user, date, and the job's final status:

     

    SELECT
        user.name
        time.date
        , status.status_char
        SUM(fact.cpu_seconds) as "cpu_time"
    FROM
        job_fact AS fact
    INNER JOIN
        user_dim AS user
    ON
        fact.user_sk=user.user_sk
    INNER JOIN
        time_dim AS time
    ON
        fact.time_sk=time.time_sk
    INNER JOIN
        jobstatus_dim AS status
    ON
        fact.jobstatus_sk=status.jobstatus_sk
    WHERE
        DATEDIFF(CURDATE(), time.date_time) < 7
    GROUP BY
        user.name
        time.date
        , status.status_int
    ORDER BY
        time.date
        , cpu_time DESC
        , status.status_char
    ;
     
    +--------+------------+-------------+----------+
    name   date       | status_char | cpu_time |
    +--------+------------+-------------+----------+
              << snipped >>
    | jburk  | 2014-07-14 | complete    |   351036 |
    | jburk  | 2014-07-14 | killed      |    60029 |
    | jburk  | 2014-07-14 | failed      |      139 |
    | coxj   | 2014-07-14 | killed      |       98 |
    | garza  | 2014-07-14 | killed      |        0 |
    | jburk  | 2014-07-15 | complete    |    28910 |
    | fubar  | 2014-07-15 | complete    |    18610 |
    | foobar | 2014-07-15 | complete    |    18561 |
    | jburk  | 2014-07-15 | killed      |    16967 |
    | jburk  | 2014-07-15 | failed      |       27 |
    | jburk  | 2014-07-16 | complete    |    46797 |
    | jburk  | 2014-07-16 | killed      |    17136 |
    | jburk  | 2014-07-16 | failed      |        2 |
              << snipped >>
    +--------+------------+-------------+----------+

     

     

     

    • No labels