Data loading steps

  1. Associated literature:
    refcode 1995ApJS...98..129K

  2. Read associated literature to learn about the nature of the dataset:
  3. Received electronic versions of tables from Kim:
    Table 1A (tb1a.txt) & 1B(tb1b.txt): not the final version published - names and positions are different to those published
    Table 2A & 2B (tbl2.txt): not the final version published
    Table 3A & 3B (tbl3.tex): added "*" to first entry (NGC 23) after communication with Kim.

  4. Using data listed in step 2, build metadata table - for this dataset, Kim has actually done most of the hard work,
    listed in "in.tbl". There are a few corrections to this table:
    (i) obs_region should be "nucleus" not "Whole_Galaxy"
    (ii) given_name - phases "blue"/"red"/"blue+red" removed
    (iii) lambda_s & lambda_e - only had 4s.f. - re-read from fitsheader to 6 s.f. (see below - readFITShead.pl)
    (iv) cra2000 & cdec2000 - had too many d.p. in arcsec - re-calculated to 2d.p. with "/Users/bchan/bin/d2t.pl"
    (v) n_pix - incorrectly set to 800 for both telescopes - re-read from fitsheader (see below - readFITShead.pl)

  5. Look at fits header to identify any required parameter available - in this case there are two types form the two telescopes,
    use "/Users/bchan/bin/fitshead", e.g.:
    	:~> fitshead filename.fits
    It turns out that the FITS header keys used by the two telescopes are the same:
    	CRVAL1    = lambda_s
    	CDELT1    = dlambda
    	AIRMASS   = airmass
    	EXPOSURE  = exptime
    	NAXIS     = naxis
    	NAXIS1    = n_pix
  6. Using the script "readFITShead.pl":
    look at documentation:
    	:~> perl readFITShead.pl -h
    run it first time to generate required data file - INPUT_fitsheader_identifier.dat:
    	:~> perl readFITShead.pl
    edit data file - INPUT_fitsheader_identifier.dat (resluts of step 4) and re-run script:
    	:~> perl readFITShead.pl *.fits
    output file = OUTPUT_identified_fitsheaders.dat

  7. Using EXCEL, created "metadata_table.xls" and merged "in.tbl" & "OUTPUT_identified_fitsheaders.dat", corrected problems listed in step 3. The columns that still need to be filled are:
    (i) seeing: Palomar 200in = 1.5; MKO 2.2m = 1
    (ii) instrument: Palomar 200in = Double Spectrograph; MKO 2.2m = Faint Object Spectrograph
    (iii) ex_reg_leng: Need original cz measurement: (see step 2)
    (vi) flag_abs_cal: Need electronic version of table 2 or 3

  8. Further issues with Kims original metadata table:
    (i) given_name: some are different to those published - use names in (tbl3.tex): created tbl3.name - tab-separated with names - entry 140 changed from Mrk 331 to MCG+03-60-036 (NOTE: in the publication, table 1A - last entry - had UGC 12812 while tables 2A & 3A used MCG+03-60-036).
    (ii) cra1950 & cdec1950 - are actually NED positions, not those published.

  9. Matching metadata table names to those from table 3:
    	:~> perl ~/bin/FINDdiff.pl -keepall -m -col=1,2 -sep="\t" metadata_table.names tbl3.name > LOG
    Names with 'problems' will be reported in the LOG file, they involve: (1) minor name-format differences; (2) strange and unexpected differences; and (3) What the??? (typographical error?)
    		tbl3.name            metadata_table.names
    	(1)	IC5135               NGC7130
    		MCG+0029023          MCG+00-29-023
    		MCG+0142088          MCG+01-42-088
    		MCG+0204025          MCG+02-04-025
    		MCG+0310045          MCG+03-10-045
    		MCG+0818012          MCG+08-18-012
    		MCG-0201051N         MCG-02-01-051N
    		MCG-0201051S         MCG-02-01-051S
    		MCG-0233098E         MCG-02-33-098E
    		MCG-0233098W         MCG-02-33-098W
    		MCG-0304014          MCG-03-04-014
    		NGC1143/44           NGC1143/4
    		NGC6090NE            N6090NE
    		NGC6090SW            N6090SW
    		NGC7771Main          NGC7771
    	(2)	IR10565+2448Main     IR10565+2448W
    		IR10565+2448SE       IR10565+2448NE
    		NGC3508N             NGC3508E
    		NGC3508S             NGC3508W
    		NGC5257              NGC5257/8E
    		NGC5258              NGC5257/8W
    		UGC9913E             UGC9913N
    		UGC9913W             UGC9913S
    		Zw448.020Main        Zw448.020SE
    	(3)	IR04335-2524 (14?)   ESO484-G036
    In conclusion - I need an alternate source to varify these discripancies.

  10. * Kay has supplied OCR of table 1A and 1B - Kay_table_1A_1B.txt
    * After fixing a few minor problems, generated EXCEL file: Kay_table_1A_1B.xls
    * Since the order of objects within Tables 1, 2, and 3 are the same, I merged Kim's tbl3.name "in-order" to over come most of the confusions listed in step 9, and fixed a couple of misplaced error and a bunch of OCR error by hand..
    	OCR error examples:
    	NOC 1204      -> NGC 1204
    	ESO 550-1G025 -> ESO 550-IG025
    	NGC 3508 F    -> E
    	IR 152503609  -> IR 15250+3609
    	ESO 343-10013 -> ESO 343-IG013
    	ESO 602-0025  -> ESO 602-G025
    	ESO 534-0009  -> ESO 534-G009
    	IR 093032736  -> IR 09303+2736
    	IR 093392835  -> IR 09339+2835
    	IR 35483+4227 -> IR 15483+4227
    	IR 35519+3537 -> IR 15519+3537
    	ESO 593-10008 -> ESO 593-IG008
    	IR 22279-1312 -> IR 22279-1112
    * It's obvious that this ORC process still require a fair bit of human intervention and most of these error are difficult to spot!
    * After fixing these problems, a tab-separated file is created: Kay_table_1A_1B.tab

  11. Try step 9 again - matching metadata table names to those from Kay_table_1A_1B.tab:
    	:~> perl ~/bin/FINDdiff.pl -keepall -m -col=1,1 -sep="\t" metadata_table.names Kay_table_1A_1B.tab > LOG2
    Again the problematic names will be reported in the LOG2 file, this time only a few simple ones:
    	Kay_table_1A_1B.tab            metadata_table.names
    	NGC 5953/4                     NGC 5953
    	NGC 6090 NE                    N6090 NE
    	NGC 6090 SW                    N6090 SW
    The output file is named: metadata_table.names_Kay_table_1A_1B.tab.common, once the above 'problem lines' are added back in (by hand), this list can be readily merged back into the EXCEL file: metadata_table.xls

  12. Marion has provided the 'cat-prep' file, to re-cover the link between NED-objects and Kim's-objects: Kim_catprep_file.txt
    * First, a few lines in this file were removed:
    --> These two lines are not link to Kim's paper as the refcode is not persent:
    	0_E_|G     |145447.8   +244858.   |                                |G_UGC 09618 NED01                                                                                                                                                                       
    	E -                                                                                                                                                                                                                                                         
    --> These ones are used for rearranging NED objects in database:
    	0___+ZAP   +000000.0000+000000.000+O_NONEXISTENT OBJECT 774        |I_IRAS F09338+3133              |                                |                                -     -           -        -                    |1111...............| 0.0   0.0    0  
    	0___|ZAP   |000000.0000+000000.000-I_IRAS  09338+3133               O_NONEXISTENT OBJECT 774        -I_IRAS F09338+3133                                                                                                                                     
    	0_R_|ZAP   |000000.0000+000000.000+I_IRAS  09338+3133               O_CG 0024                       +I_IRAS F09338+3133                                                                                                                                     
    	R + 6930 IRAS  09338+3133              
    	R + 6929 IRAS F09338+3133              
    * With nedit, I sorted the catprep file (a few example lines shown below) and extracted two name columns (shown in bold below) and check for the correct refcode (column under-lined below).
    	0_Z_|G     |000833.4   -122310.   |                                |  MRK 0938                        |G_NGC 0034                          |                                |     |           |   5860 |                    |1995ApJS...98..129K|                 
    	0_Z_|G     |001618.0   -103914.   |                                |  MCG -02-01-051                |G_MCG -02-01-051 S              |                                |     |           |   8170 |                    |1995ApJS...98..129K|                 
    	0_Z_|G     |001618.0   -103914.   |                                |  MCG -02-01-052                |G_MCG -02-01-051 N              |                                |     |           |   8080 |                    |1995ApJS...98..129K|                 
    * In the two name columns, the names with "G_" prefix (mostly on the right column) are those object names used in the publication, while the column on the left are mostly the identified NED-names. There are however a few cases where the published names were added into NED directly (hence only the left-column is filled - with "G_" prefix names), a few examples are shown below.
    	0+S_|G     |191132.3   -212422.   |                                |G_ESO 593-IG 008 N              |                                |                                |     |           |[0.0434]|                    |1995ApJS...98..129K|  18.0 6.0    83 
    	0+S_|G     |191132.3   -212422.   |                                |G_ESO 593-IG 008 S              |                                |                                |     |           |[0.0437]|                    |1995ApJS...98..129K|  18.0 6.0    83 
    	0+S_|G     |205505.3   +165603.   |                                |G_CGCG 448-020 NW             |                                |                                |     |           |[0.0359]|                    |1995ApJS...98..129K|  35.0 7.0    70 
    	0+S_|G     |205505.3   +165603.   |                                |G_CGCG 448-020 SE              |                                |                                |     |           |[0.0355]|                    |1995ApJS...98..129K|  35.0 7.0    70 
    * The "G_" prefix and the " ID" postfix (see examples below) in the published names were removed, and the two name columns are put into an EXCEL file Kim_catprep_file.xls
    	0_S_+G     |092054.8   +394327.   |                                |  IRAS F09209+3943              +G_IRAS  09209+3943 ID           |                                -     -           +[0.0923]-                    |1995ApJS...98..129K|                 
    	0_S_+G     |092151.9   +342838.   |                                |  IRAS F09218+3428              +G_IRAS  09218+3428 ID           |                                -     -           +[0.0677]-                    |1995ApJS...98..129K|                 
    	0_S_+G     |092431.6   +351728.   |                                |  IRAS F09245+3517              +G_IRAS  09245+3517 ID           |                                -     -           +[0.1391]-                    |1995ApJS...98..129K|                 
    * This list cannot be matched to Kay_table_1A_1B.tab yet as the published names in the "catprep" file are in NED format (losely)... I now use the code nedname to attemp an unification...

    (a) in Kim_catprep_file.xls: the column published-names was pass to nedname to generate the column published-names-formated which was used to create the column complete_published-names-formated. Then the file Kim_catprep_file.tab was created...
    (b) similarly in Kay_table_1A_1B.xls: "conditioned" column-final-name --> nedname --> column-NED-formated --> column-complete_final-name_formated --> updated file-Kay_table_1A_1B.tab. The only extra step was the conditioning of "final-name" which included only replacing all "IR "-prefix to "IRAS ".

  13. Match up the two lists Kim_catprep_file.tab and Kay_table_1A_1B.tab:
    	:~> perl ~/bin/FINDdiff.pl -keepall -m -col=2,2 -sep="\t" Kay_table_1A_1B.tab Kim_catprep_file.tab > LOG3
    A few simple mismatch names are reported in the LOG3 file:
    	Kay_table_1A_1B.tab            Kim_catprep_file.tab
    	MCG+03-60-036                  UGC12812 (we decided to use the name MCG+03-60-036 for this object)
    	UGC6436NW                      UGC06436NW   
    	UGC6436SE                      UGC06436SE   
    	UGC4881SW                      UGC04881SW   
    	UGC4881NE                      UGC04881NE   
    	UGC8335NW                      UGC08335SE   
    	UGC8335SE                      UGC08335NW   
    	Zw448.020NW                    CGCG448-020NW
    	Zw448.020SE                    CGCG448-020SE
    	NGC0985                        NGC0958 (error in Kim's paper - see Marion's email, comment added in table)
    The output file is named: Kay_table_1A_1B.tab_Kim_catprep_file.tab.common, the above 'problem lines' are added back in (by hand).

  14. Finally, match up the two lists metadata_table.names and Kay_table_1A_1B.tab_Kim_catprep_file.tab.common:
    	:~> perl ~/bin/FINDdiff.pl -keepall -m -col=1,1 -sep="\t" metadata_table.names Kay_table_1A_1B.tab_Kim_catprep_file.tab.common > LOG4
    Checking LOG4 shows no problems at all in this match as I have already edited out the problems found in step (11) before matching. The result list (renamed metadata_Kay_table__Kim_catprep.common) can be directly merged back into metadata_table.xls
    	:~> perl ~/bin/NEDcoordtrans.pl metadata_table.pub_pos > LOG_NEDcoordtrans
  15. From Kim, we have were reminded that ex_reg_leng can be obtained from the fitsheader parameters: APLOW \& APHIGH (the range of pixel values used for the extracted spectra). So, we used readFITShead.pl to extract these values, and merged into the EXCEL file: metadata_table.xls
    To convert the ex_reg_leng length from pixels to arcsec, we needed know the pixel scales. The following were found from the web, and with help from Kim and Joe, we decided the pixel scales used in this paper. NOTE: the final adopted pixel-scales (MKO 2.2m = 0.65 arcsec/pixel; Palomar Red = 0.58 arcsec/pixel; Palomar Blue = 0.78 arcsec/pixel) for the Palomar is different to thos listed in the tables below as the instrument has changed many times over the years.
    	Length: 128 arcsec
    	Red:            0.468 arcsec/pixel; CCD 21; 1024 x 1024; 24 pixels
    	Blue:           0.390 arcsec/pixel; CCD 20; 2688 x 512 (+100 pixels x-axis overscan, 2788 total) ; 15 pixels
    	Alternate Blue: 0.624 arcsec/pixel; CCD 14; 1024 x 1024; 24 pixels
                	 Table 7.8: Faint Object Spectrograph Characteristics             
    	 Collimators           400mm          blue, visual and red (marked)           
    	 Cameras       Z135    135mm       Zeiss f/2.0 5975158 (broad: 400-1000 nm)   
    	 Slits         160mum x 8mm           1.5arcsec x 74arcsec                       
    	 Beam Size            40mm                                                    
    	 Field Size at Slit   26mm                   4 arcmin                         
    	 Image Scale                                                                  
    	   at detector       135mm      27.2 arcsec mm (1);   0.65 arcsec/24 micron   
  16. Finally, after all the metadata are collected, the content of the EXCEL file: metadata_table.xls is copied into a text file (named metadata_table.fin), used as the input to the loader script generate_spectraDB_loader.pl:
    	perl generate_spectraDB_loader.pl metadata_table.fin
  17. This script also generate the input file for the script which makes the previews:
    	perl generate_spectraDB_preview.pl