Qube 6.9-1 Release Notes

################################################################= ##############

@RELEASE: 6.9-1

########################################################################= ######

@SUMMARY: This is a maintenance release of 6.9, and includes a number of= fixes
and improvements to 6.9-0. Recommended upgrade for all 6.9 cus= tomers.

########################################################################= ######

=3D=3D=3D=3D CL 17696 =3D=3D=3D=3D
@UPDATE: add explanation f= or "deferTableCreation" to the python qb.submit() API routine.

JIRA: QUBE-2400

=3D=3D=3D=3D CL 17692 =3D=3D=3D=3D
@FIX: another memory leak plugge= d in the startHost()-related routine, startQualifiedJobsOnHost(). This was = causing successful itereations of startHost() (i.e., an instance was dispat= ched to a worker) to cause memory bloats. Among other places, it was affect= ing the background helper thread (when it does the "requeuing host&quo= t; routine.

JIRA: QUBE-2382

=3D=3D=3D=3D CL 17649 =3D=3D=3D=3D
@FIX: memory leak in preemption = code, especially when preemption policy is set to passive or is disabled by= the algorithm.

QUBE: JIRA-2382

=3D=3D=3D=3D CL 17634 =3D=3D=3D=3D
@FIX: memory leak in one of the = host-triggered dispatch routines
startQualifiedJobsOnHost(), which is = called from startHost().

Among other things, this was bloating the memory usage inside the helper=
routine running in a background thread/process (cleanermain()).

JIRA: QUBE-2382
ZD: 16952

=3D=3D=3D=3D CL 17610 =3D=3D=3D=3D
@FIX: memory corruption that wou= ld cause python or perl to crash when the function was called inside jobs.<= /p>

JIRA: QUBE-2389

=3D=3D=3D=3D CL 17595 =3D=3D=3D=3D
@FIX: fixed memory leak in QbPac= k::store() and storeXML() methods, which were causing, among other things, = supervisor threads to bloat when processing large job submissions

JIRA: QUBE-2382

=3D=3D=3D=3D CL 17594 =3D=3D=3D=3D
@FIX: plugged a potential memory= leak in QbDaemon communication code, affecting all server (supervisor, wor= ker) programs

JIRA: QUBE-2382

=3D=3D=3D=3D CL 17593 =3D=3D=3D=3D
@FIX: plugged memory leak in dis= patch code

JIRA: QUBE-2382

=3D=3D=3D=3D CL 17592 =3D=3D=3D=3D
@FIX: plugged potential memory l= eak in user permission-check routine, specifically in the group-access chec= k code

JIRA: QUBE-2382

=3D=3D=3D=3D CL 17566 =3D=3D=3D=3D
@NEW: qbwrk.conf loading optimiz= ation (and thus "qbadmin w -reconfig" speed up) by explictly list= ing template names and non-existing hostnames in the new [global_config] se= ction

* added [global_config] section to the qbwrk.conf file, and allow new co= nfig parameters "templates" to list all qbwrk.conf template secti= on names, and "non_existent" to list all non-existent hostnames

* supe skips ip-address resolution for all section names included in &qu= ot;templates" and "non_existent", and all reserved names, i.= e.: "global_config", "default", "linux", &quo= t;osx", and "winnt", thus speeding up the loading of qbwrk.c= onf file, which in turn speeds up supervisor boot time and "qbadmin w = -reconfig" operation.

JIRA: QUBE-2346

=3D=3D=3D=3D CL 17540 =3D=3D=3D=3D
@CHANGE: removed unnecessary sub= mit-time check/rejection of omithosts and omitgroups.

ZD: 16907, 16908
JIRA: QUBE-2366

=3D=3D=3D=3D CL 17449 =3D=3D=3D=3D
@FIX: directory deletion during = log cleanup can fail if the supervisor is updating the job history file at = the same time

=3D=3D=3D=3D CL 17435 =3D=3D=3D=3D
@FIX: supervisor process handlin= g a qbping request should always reread the license file before replying

There was a code path that instructs the supe thread to force-read thelicense file, but the read was not happening under certain conditions; = the
code was returning the old cached data if available, or the defaul= t count
of 2 if the cache isn't available.

* add a few more informational lines to print to the supelog at license<= br /> re-reading.

JIRA: QUBE-2317

=3D=3D=3D=3D CL 17422 =3D=3D=3D=3D
@FIX: make formatting and object= instantiation compatible with Python 2.6

=3D=3D=3D=3D CL 17416 =3D=3D=3D=3D
@FIX: remove unnecessary error m= essage in the schema upgrade routine

JIRA: QUBE-2283

=3D=3D=3D=3D CL 17414 =3D=3D=3D=3D
@CHANGE: Add more text to descri= be the subtle yet significant difference between "retry" and &quo= t;requeue" Python API routines

JIRA: QUBE-2049

=3D=3D=3D=3D CL 17403 =3D=3D=3D=3D
@FIX: jobs with status "reg= istering" appears when submissions are rejected due to incorrect requi= rements specifications

ZD: 16408
JIRA: QUBE-2034

=3D=3D=3D=3D CL 17402 =3D=3D=3D=3D
@FIX: intermittent bug where som= e supe threads won't properly read the supervisor license key from qb.lic

* add warning message to print to supelog when the license file reader returns zero-length data

ZD: 16828
JIRA: QUBE-2317

=3D=3D=3D=3D CL 17390 =3D=3D=3D=3D
@FIX: post-flight should only be= run when qbreportwork() is invoked with an agenda-item with terminal-state=

JIRA: QUBE-2032
ZD: 16412

=3D=3D=3D=3D CL 17376 =3D=3D=3D=3D
@FIX: Triggers incorrectly execu= ting multiple times

When a composite (i.e, using && or ||) trigger is specified for = a job's callback, such as "done-job-job1 && done-job-job2"= ;,
the callback would erroneously get run multiple times.

ZD: 16282
JIRA: QUBE-1881

=3D=3D=3D=3D CL 17369 =3D=3D=3D=3D
@FIX: issue introduced in 6.9 wh= ere requestwork() jobtype backend routine will crash when frame padding is = 40 or greater.

Python jobtype backend, in particular, was found to crash during a call = to
the API routine qb.requestwork(), with a "*** stack smashing d= etected ***:"
error message and a backtrace.

ZD: 16759
JIRA: QUBE-2318

=3D=3D=3D=3D CL 17290 =3D=3D=3D=3D
@TWEAK: license-reading routine = prints the total license count to the supelog

JIRA: QUBE-2003

=3D=3D=3D=3D CL 17289 =3D=3D=3D=3D
@TWEAK: "ping" handler= to print out more info to supelog

Every "qbping" will print out something like the following sup= elog now:

[Nov 18, 2016 16:25:55] shinyambp[11662]: INFO: responded to ping reques= t from [127.0.0.1]: 6.9-0 bld-custom osx - - host - 0/11 unlimited licenses= (metered=3D0/0) - mode=3D0 (0)

JIRA: QUBE-2002

=3D=3D=3D=3D CL 17231 =3D=3D=3D=3D
@FIX: disabled verbose option fo= r logging libcurl actions

=3D=3D=3D=3D CL 17208 =3D=3D=3D=3D
@CHANGE: Popluate the subjob (in= stance) objects with more data (like status), and not just the IDs, when su= bjob info is requested via "qbhostinfo" (qb.hostinfo(subjobs=3DTr= ue) for python API)

Previously, only jobid, subid, and host info (name, address, macaddress)=
were filled. Now, things like "status", "timestart&quo= t;, "allocations",
etc. are properly filled in.

JIRA: QUBE-2073
ZD: 16541

=3D=3D=3D=3D CL 17206 =3D=3D=3D=3D
@FIX: When "migrate_on_fram= e_retry" job flag is set, prevent backend from doing further processin= g (especially another requestwork()) after a work failed

This was causing race-conditions that will get agenda items to be stuck = in
"retrying" state, while there are no instances processing= them.

Now the reportwork() API routine is modified so that if it's invoked to<= br />report that a work "failed", and the "migrate_on_frame_= retry" is set on the
job, it will stop processing (does a long sl= eep), and let the worker/proxy
do the process clean up.

JIRA: QUBE-2202
ZD: 16553

=3D=3D=3D=3D CL 17186 =3D=3D=3D=3D
@FIX: "VirtualBox Host-Only= Ethernet Adapter" now when daemons (supe, worker) try to pick a prima= ry mac address

JIRA: QUBE-2149
ZD: 16561

=3D=3D=3D=3D CL 17182 =3D=3D=3D=3D
@CHANGE: all classes that inheri= t from QbObject print as a regular dictionary, no longer have a __repr__ wh= ich prints the job data as a single flat string
@NEW: add qb.validatej= ob() function to python API, help find malformed jobs that crash the user i= nterfaces

=3D=3D=3D=3D CL 17141 =3D=3D=3D=3D
@FIX: Any job submitted from wit= hin a running job picks up the pgrp of the submitting job

By design, if the submission environment has QBGRPID and QBJOBID set, th= e
API's submission routine will set the job's pgrp and pid, respective= ly to
the values specified in the environment variables.

One couldn't override this "inheritance" behavior even by expl= icitly
specifying "pgrp" or "pid" in the job being= submitted, for instance with
the "-pgrp" command-line optio= n of qbsub.

Fixed, so that setting "pgrp" to 0 on submission means that th= e job should
generate its own pgrp instead of inheriting it from the e= nvironment.

JIRA: QUBE-2141
ZD: 16545

=3D=3D=3D=3D CL 17101 =3D=3D=3D=3D
@NEW: add "-dying" and= "-registering" options to qbjobs.
@CHANGE: also add dying a= nd registering jobs to the "-active" filter.

JIRA: QUBE-2091
ZD: 16469

=3D=3D=3D=3D CL 17083 =3D=3D=3D=3D
@FIX: Python API: qbping(asDict= =3DTrue) crashes when used against older (pre-6.9) supe

Among other things, this was causing WV to crash and AV to note an
= exception (but not crash) when starting up with an older supervisro.

JIRA: QUBE-2084