Below are all of our import statements. You probably won't need to mess with these, but if you want to use a new package put it here!
%matplotlib inline
import h5py
import numpy as np
import matplotlib.pyplot as plt
import helperfunctions as hf
from collections import defaultdict
from itertools import product
Be sure to change to the run number you want to study
%run ../Scripts/convert_hdf5_raw.py -r Run_1323
%run ../Physics/pulseAnalysis_shapetime.py -r Run_1323
dataDir contains our processed, uninterlaced pulses
blocksDir contains our interlaced pulse blocks
run = 'Run_1323'
dataDir = hf.checkDir(hf.getRootDir()+"/Data/Processed/Pulses/"+run+"/")
blocksDir = hf.checkDir(hf.getRootDir()+"/Physics/Blocks_Final/"+run+"/")
Here we print out what amplitudes are available in this dataset, as well as which amplitude we are currently looking at
amps = hf.getAmps(dataDir)
print('The available amplitudes are:', ', '.join(amps))
amp = amps[0] # if your data has more than one amplitude, changing 0 will allow you to look at those too
print('The amplitude you are using is: ' + amp)
One benefit of saving our data in a .hdf5 format is that we can easily save various attributes about the testboard while we are taking data. This allows us to know what we were doing at the testboard just by looking at the data.
Let's load up our data and see what attributes are saved in it
inFile = h5py.File(blocksDir+"Blocks_Amp"+amp+"_1x.hdf5", "r")
print([attr for attr in inFile.attrs])
We can look at one of these attributes by accessing the inFile attributes like a dictionary. Let's see what the ADC sampling frequency in MHz is
print(inFile.attrs['adc_freq'])
Some attributes are saved everytime a measurement is taken, for example dac_ibi_g20
print(inFile.attrs['dac_ibi_g20'])
That's a lot to look at, so let's instead just look at the unique values of dac_ibi_g20
print(np.unique(inFile.attrs['dac_ibi_g20']))
The main thing we want to do now that we have datasets with multiple configurations is to produce plots comparing them. To do this, we first need to see what HDF5 attributes were varied as we took data. While we're at it, let's also print out all the values these attributes can have
# First loop through all the attributes saved in inFile and save the ones with more than 1 unique value
cut_attrs = [attr for attr in inFile.attrs if len(np.unique(inFile.attrs[attr])) > 1 and attr != 'attribute_combos']
# Next, loop through these attributes and save the unique values they can take
cut_combos = [np.unique(inFile.attrs[attr]).tolist() for attr in cut_attrs]
print('The attributes we can vary are:', cut_attrs)
print('Their values are:', cut_combos)
Now that we know what we're working with, we can start making comparisons plots. The bit of code that produces these is rather tricky, but if we walk through step-by-step it should start to make sense:
First we use python's built-in product function to get all possible combinations of our cut attributes. This could be accomplished by nesting several for loops, but if we were varying 10 different parameters we'd need 10 nested loops
print(list(product(*cut_combos)))
Next up is the first line within our for loop, where we define cut_index. Let's work through this with an example.
Let's take the third product of our cut attributes
prod = list(product(*cut_combos))[2]
print(prod)
print(cut_attrs) # print these so we know what each value corresponds to
Now we're going to compare the value of each attribute to the value listed in prod, which will tell us which measurements had each of these attribute values. This will result in arrays of True and False, where True means the measurement had the attribute value we are looking at now
print( np.array( [inFile.attrs[cut_attrs[i]] == prod[i] for i in range(len(prod))]) )
Each row in this array corresponds to an attribute we varied, and each column corresponds to a different measurement during data taking.
Now that we know where each individual attribute is what we want, we want to combine them so that we can know where every attribute is what we want.
cut_index = np.logical_and.reduce(np.array( [inFile.attrs[cut_attrs[i]] == prod[i] for i in range(len(prod))]))
print( cut_index )
So we have this big array of True's and False's, now what? Well, numpy allows boolean indexing, allowing you to index an array using a boolean array, returning only those values where the indexing array was True. Let's see a quick example:
test_array = np.array([0,1,2,3,4])
test_index = [True, True, False, False, True]
print( test_array[test_index] )
We also want to check that there are actually any measurements that match the particular combination of variables we've chosen, which we can do using any
print( any(cut_index) )
If we were to instead look at the first combination of attributes python produced, we would find that we didn't actually take any measurements with that particular setup.
The rest of this bit of code is packaging the data up into a dict and then plotting them all on one set of axes
pulse_data = {}
for prod in product(*cut_combos):
cut_index = np.logical_and.reduce(np.array( [inFile.attrs[cut_attrs[i]] == prod[i] for i in range(len(prod))] ))
if not any(cut_index): continue
cut_attrs_str_title = "; ".join([cut_attrs[i]+' '+str(prod[i]) for i in range(len(prod))])
pulse_data.update({cut_attrs_str_title:inFile["/coluta1/channel2/samples"][()][cut_index][0:1200]})
for name, data in pulse_data.items():
plt.plot(np.mean(data, axis = 0), label=name)
plt.legend()
plt.title('COLUTA 1, Channel 2, Pulse Comparison, Amplitude ' + amp)
plt.show()
Instead of plotting all 6 combinations at once, let's look at a comparison with the Gain 20 amplifier on and off and the other attributes at their default.
First let's print out the dict keys so we know how to access the data
print( list(pulse_data.keys()) )
plt.plot(np.mean(pulse_data['dac_ibi_g20 63; on_g20 False; sw_ibo_g20 False'], axis = 0),
label = 'dac_ibi_g20 63; on_g20 False; sw_ibo_g20 False')
plt.plot(np.mean(pulse_data['dac_ibi_g20 63; on_g20 True; sw_ibo_g20 False'], axis = 0),
label = 'dac_ibi_g20 63; on_g20 True; sw_ibo_g20 False')
plt.title('COLUTA 1, Channel 2, on_g20 Comparison, Amplitude ' + amp)
plt.legend()
plt.show()
So turning on_g20 on makes the pulse amplitude higher, as would be expected when we turn on an amplifier. But does it look like the gain is actually 20?
Let's plot the noise for both of these pulses too:
plt.plot(np.std(pulse_data['dac_ibi_g20 63; on_g20 False; sw_ibo_g20 False'], axis = 0))
plt.title('COLUTA 1, Channel 2 Noise, on_g20 False, Amplitude ' + amp)
plt.show()
plt.plot(np.std(pulse_data['dac_ibi_g20 63; on_g20 True; sw_ibo_g20 False'], axis = 0))
plt.title('COLUTA 1, Channel 2 Noise, on_g20 True, Amplitude ' + amp)
plt.show()
These two plots are very different, and they're both different from the noise plot we saw in the previous notebook.
Here's a few ideas of how to investigate further:
At this point you've seen how to load in a data file, loop through measurements, make cuts based on various attributes and make comparison plots. Now you can look at pulse features you might have noticed during these two notebooks, make plots comparing pulses in different ways, or look at pulses from newer data taking runs