1
0
-1

In the early days of the refactored database, the following seemed to be about the "least bad" way to grab combined exposure table and image table info on the processed catalogs for a nite:

SELECT e.expnum, s.filename, e.nite, e.object, e.band, i.ccdnum, i.airmass,
        e.mjd_obs, e.exptime, i.skybrite, i.skysigma, 
        i.elliptic as image_ellipt, 0.27*i.fwhm as image_fwhm_arcsec,
        i.saturate as image_sat_level, c.filetype,
        i.crpix1, i.crpix2, i.naxis1, i.naxis2
FROM PROD.EXPOSURE e, 
     PROD.IMAGE i, 
     PROD.WDF, 
     PROD.CATALOG c, 
     (select c.filename 
      from PROD.catalog c, PROD.ops_proctag optag, PROD.exposure e 
      where e.expnum=c.expnum and
            c.filename like concat(optag.unitname,'%') and 
            c.filetype='cat_finalcut' and
            optag.tag='Y2N_FIRSTCUT' and 
            c.nite='20141027' and 
            c.band='g') s 
WHERE e.expnum=i.expnum AND
      i.filename=wdf.parent_name AND 
      wdf.child_name=c.filename AND 
      c.filename=s.filename;

I understand that there have been a lot of improvements to the refactored database in preparation for Y2A1.  Can you suggest a simpler/cleaner query to get this information?

 

Thanks!

 

Best regards,

Douglas

 

    CommentAdd your comment...

    2 answers

    1.  
      2
      1
      0

      Note, that I have altered the query to point to the "Y2A1_FINALCUT" processing tag.   There are currently backfill operations running that are propagating pfw_attempt_id's for older runs/attempts (so last year's Y2N_FIRSTCUT).  I believe PROD.catalog has already been updated, once PROD.image finishes this query should work for all tags in PROD.proctag.

       

      SELECT e.expnum, s.filename, e.nite, e.object, e.band, i.ccdnum, i.airmass,

             e.mjd_obs, e.exptime, i.skybrite, i.skysigma, i.elliptic as image_ellipt,

             0.27*i.fwhm as image_fwhm_arcsec, i.saturate as image_sat_level, s.filetype,

             i.crpix1, i.crpix2, i.naxis1, i.naxis2

      FROM PROD.EXPOSURE e, PROD.IMAGE i, PROD.PROCTAG t, PROD.CATALOG s

      WHERE t.tag='Y2A1_FINALCUT'

          AND t.pfw_attempt_id=i.pfw_attempt_id

          AND t.pfw_attempt_id=s.pfw_attempt_id

          AND s.filetype='cat_finalcut'

          AND i.filetype='red_immask'

          AND i.expnum=e.expnum

          AND i.ccdnum=s.ccdnum

          AND i.band='g'

          AND e.nite=20141027;

       

      1. Douglas Tucker

        Thanks, Robert!

        Could you say a few words about how PFW_ATTEMPT_ID is created and how it is related to REQNUM/ATTNUM? 

      CommentAdd your comment...
    2.  
      2
      1
      0

      There are other people that probably have a deeper grasp of the decisions that led to this, but I was present for many of the discussions/debates.  This is my version of how these all came to be (and explanation of their inter-relationship).  As the refactored framework and associated DB tables were being developed it was found that in order to have the flexibility to uniquely identify one processing attempt from another (and their associated products) that three parameters were needed.  Those were:

      1. REQNUM (request number): identifies a processing request 
      2. UNITNAME (unit name): the fundamental unit(s) being processed within that request (so an exposure, a tile, a calibration...)
      3. ATTNUM (attempt number): a running number to identify each specific attempt to process each unit within a request (so ATTNUM=1 fails... but we know why and it is deemed reasonable to simply re-run... and you get ATTNUM=2).

      So every attempt to process, for example, an exposure by a specific pipeline would start with a REQNUM to process a UNITNAME (or a set of UNITNAME(s)) and a running ATTNUM associated with each attempt to make that processing happen (because sometimes thing break and you have to start over...).  

      PFW means Processing FrameWork.  All tables in db-desoper with this prefix (PFW_REQUEST, PFW_ATTEMPT, PFW_JOB, PFW_WRAPPER, PFW_EXEC) are related to the submission and execution of those processing attempts and basically the triplet (REQNUM, UNITNAME, ATTNUM) are used throughout to track/record that progress.  So the problem arises that when you are trying to relate between processing attempts and their products in the DB you will end up making joins that look like (TABLE1.REQNUM=TABLE2.REQNUM and TABLE1.UNITNAME=TABLE2.UNITNAME and TABLE1.ATTNUM=TABLE2.ATTNUM). 

      Such queries are somewhat cumbersome to type... but moreover if you start playing "games" where you are not using these in the same way the indexes are defined you can develop queries that are truly horrendous in their performance.

      Enter PFW_ATTEMPT_ID.  At the level of the PFW_ATTEMPT table, an attempt already has a triplet that is unique to a specific processing attempt.  Therefore to "simplify" subsequent DB queries and to facilitate our use of tags to point to those attempts, a column  PFW_ATTEMPT.ID was added to this table (and then propagated elsewhere).   In keeping with best practices that means that this ID in other tables would be referred to as PFW_ATTEMPT_ID (to reflect which ID it matches).  We have propagated this concept both to the tables tracking the processing attempts and to the tables where the products of those attempts (IMAGE, CATALOG, etc...) are tracked so that the tags which are tracked in (PROCTAG) can be related to the processing attempt and the products by joining on those IDs.

       

      Understand that PFW_ATTEMPT_IDs were essentially added to the system just prior to the start of Y2A1_FINALCUT processing.  So older processing (Y2N_FIRSTCUT, Y2N_SNDIFF, Y3N_FIRSTCUT and Y3N_SNDIFF) did not have these IDs propagated as the processing occurred.  We are in the midst of going through these tables and backfilling the PFW_ATTEMPT_ID columns (and DESFILE_ID columns) so that there is a homogenous set.   Currently you cannot exploit these inter-relationships everywhere (backfill of PROD.CATALOG was completed a couple days ago... backfill of PROD.IMAGE was running yesterday when I left (January 27, 2016).   So one might need to be patient waiting for all these backfill operations to propagate for the earlier processing campaigns.

      1. Douglas Tucker

        Thanks for the detailed and clear explanation, Robert!  This is very useful!

      CommentAdd your comment...