@RELEASE: 6.5-2


    ==== CL 12016 ====

    @FIX: worker and supervisor install do not register for all users on Windows

    ==== CL 12006 ====
    @FIX: ERROR 1146 (42S02) at line 87 in file: './create_job_fact.sql': Table 'pfx_stats.memusage' doesn't exist - swap order of table assignment and creation, some versions of MySQL are error'ing

    ==== CL 11989 ====
    @FIX: worker_drive_map and worker_path_map not correctly saved via "Configure local host", format to match API updatelocalconfig expectations

    ==== CL 11987 ====
    @FIX: localized the _user_duties and _prgp_duties IntHash variables to the queuereject() routine for thread-safety, from being data members of the supervisor class.

    ZD: 10342

    ==== CL 11986 ====

    @FIX: added code to appropriately handle timing issues where a command,
    such as preemption, can be issued multiple times by different threads on
    the same running subjob, leaving those jobs to be in odd states. One common
    symptom was seeing the "aberrant report" message in the supelog, and those
    jobs getting stuck in the "running" state despite all the frames being 100%

    ==== CL 11985 ====

    @FIX: converseWorkerWithRetries() and converseSubSupervisorWithRetries()
    routines were fixed so that they properly return success when there are no
    communication errors. These routines were retrying when the server
    responded with a rpy.tag() of QB_MESSAGE_ERROR, which doesn't mean there
    was a communication error, but rather means that the server encountered
    some general internal error, causing unwanted retries.

    ZD: 10527

    ==== CL 11982 ====
    @FIX: contradictory job log entries saying a failed frame is being reported as complete when a few lines ago it was actually (correctly) reported as failed.

    ==== CL 11980 ====
    @FIX: QB_CONVERT_PATH() not getting evaluated when worker_path_map is undefined or empty

    ==== CL 11963 ====
    @FIX: catch jobs with package data the cause _qb.packageStrToDict to raise an exception

    ==== CL 11961 ====
    @CHANGE: add additional sanity checks to cleanup script, limit number of log directory deletions to a fraction of total jobs in qube, can be overridden by option flag.

    ==== CL 11931 ====
    @CHANGE: create the backfill_fact (supervisor dispatch efficiency) dataWarehouse "12-hour" table every 5 minutes rather than every 15 to keep the chart data more current - full-range table is small enough to support this

    ==== CL 11915 ====
    @FIX: fixed cross-dependency created in CL11893.

    JIRA: QUBE-176

    ==== CL 11908 ====
    @CHANGE: changed/added code to set up the following default my.cnf parameters

    all OSs:
    query_cache_size = 0 # disable the query cache, hit rate is almost 0% due to qube being very write-intensive
    thread_cache_size = 16 # acts like supervisor_idle_threads
    table_open_cache = 2500 # mysql will cache the file handles necessary to hold this number of tables f/h's
    open_files_limit = 50000 # table_open_cache will drive the number of open files, MyISAM needs a max of 2 per table, but MySQL can also open other files past the table_open_cache*2 value - refer to:http://dev.mysql.com/doc/refman/5.1/en/table-cache.html

    JIRA: QUBE-175

    ==== CL 11899 ====
    @FIX: made the path map translations case-insensitive on OSX and Windows platforms.

    @NEW: added 3rd optional parameter to QbString::replace(), which specifies the case-sensitivity, which defaults to TRUE.

    JIRA: QUBE-177

    ==== CL 11895 ====
    @NEW: exposed the C API routine "qbisadmin()" as "qb.isadmin()" in Python API and "qb::isadmin()" in Perl API.

    JIRA: QUBE-174

    ==== CL 11893 ====
    @CHANGE: "qbadmin {s|w} -configuration" now displays both the integer AND string values of all "*_flags" (such as "supervisor_flags") parameters for readability

    JIRA: QUBE-176

    ==== CL 11856 ====
    @FIX: added code to fix jobs getting stuck in the "dying" state, that can occur due to race conditions.

    Dispatched instances of jobs that were requested to be "killed" before they
    properly finished starting up on the workers were ending up getting stuck
    in the "dying" state.

    ZD: 10369

    ==== CL 11850 ====
    @FIX: C4D AppFinder jobs crash when paths or filenames wrapped in QB_CONVERT_PATH() start with a number

    ==== CL 11829 ====
    @FIX: Issue with grid jobs where some instances would start running multiple times on the dispatched host, causing the job to eventually fail.

    ZD: 10325

    ==== CL 11820 ====
    @FIX: disable permission check of worker_logpath, as it was creating false-alarms and putting the worker to be in panic mode unnecessarily.

    ZD: 5445 5236
    BUGZID: 63683

    See also CL9234

    ==== CL 11815 ====
    @FIX: on Linux in the /etc/init.d/worker script, we're now allowing a longer timeout (15 seconds) for the worker to shutdown cleanly before forcefully killing (i.e. "kill -9") the processes.

    The default short timeout of 3 seconds was not sufficient on many systems for all child worker threads to exit and the main thread to release the running subjobs and report to the supervisor that it's "down".

    JIRA: QUBE-90

    ==== CL 11807 ====
    @FIX: added dependency ("requires") on the "expat" package for qube-core RPM packages.

    JIRA: QUBE-68
    ZD: 8499

    ==== CL 11801 ====
    @FIX: fixed qbjoborder() routine so that it respects the queuing algorithm's job-host pair rejection routine, queuereject().

    This bug, for example, was causing the routine to return jobs that shouldn't qualify to run on
    the given host because of the "worker_restrictions" settings of the worker.

    ZD: 10231
    JIRA: QUBE-158

    ==== CL 11795 ====
    @FIX: issue where the python API qb.convertpath() will cause a bus error (crash) in the caller, if called with no args.

    @FIX: issue where the 2-argument invocation of qb.convertpath() was not
    working, and may cause a bus error. Turned out to be a bug in the
    internal conversion routine _qb_py_dict_pathmap().

    ==== CL 11793 ====
    @FIX: bug with modifying user and group permissions.

    Operations such as adding or deleting a group or users would generate an
    error message in the supelog, like the following:

    [Sep 20, 2013 11:39:35] HOSTNAME[25107]: group permissions modified for user foobar by user USERNAME
    [Sep 20, 2013 11:39:35] HOSTNAME[25107]: ERROR: database query error: via TCP/IP - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ') = 'test' AND LOWER(user) = 'foobar'' at line 1 (1064)
    [Sep 20, 2013 11:39:35] HOSTNAME[25107]: SELECT access FROM grp WHERE valid = 1 AND LOWER(name)) = 'test' AND LOWER(user) = 'foobar'

    JIRA: QUBE-157

    ==== CL 11790 ====
    @FIX: fixed inaccurately reported host.processor_speed (CPU frequency in MHz) property on OSX workers.

    JIRA: QUBE-153

    ==== CL 11788 ====
    @CHANGE: added "GRANT" statement to "GRANT SELECT ON *.*" to the qube_readonly user on "localhost".

    JIRA: QUBE-105

    ==== CL 11771 ====
    @FIX: problem where it was impossible to undefine worker_properties and worker_resources once they were defined in qbwrk.conf or qb.conf, even if the lines were removed from the config files.

    JIRA: QUBE-85
    ZD: 10227

    ==== CL 11767 ====
    @FIX: Setting "worker_cpus=0" or removing a "worker_cpus=N" line from qbwrk.conf had no effect, and the previous setting would get stuck.

    JIRA: QUBE-80, QUBE-112

    ==== CL 11748 ====
    @FIX: helloWorld example jobtype can't create job archive file job.qja below QBDIR/examples, area is read-only, write to a temp directory

    ==== CL 11740 ====
    @CHANGE: remove dependencies between Windows MSI installers, any qube component can be installed or uninstalled independent of the others

    ==== CL 11733 ====
    @FIX:qbtail.py now prints out the help screen if run without arguments.

    • No labels