Role of phytoplankton in ecosystem.
Structure of phytoplankton - need to observe at a small scale
Existing technology: microscopes, sampling, 2D or not small scale enough.
How does holograhpic technology help?
What does the 4-Deep camera do
How much data does it collect?
1) Halifax Harbor
2) Narragansett Bay and Jamestown zooplankton
3) Sargasso Sea
What are locations (coordinates), chla at the time of sampling, general community composition?
In this stage, the goal is to preprocess raw holograms before reconstruction in order to filter out holograms which do not contain particles.
Marine phytoplankton are responsible for a signicant amount of the global net primary pro-duction (NPP) amounting to 50x1015 grams of carbon per year, rivaling the NPP on land (Chavez, 2011). This primary production feeds the bulk of the marine food web and produces significant amounts of oxygen. On the other hand, sudden blooms of phytoplankton can cause regions of hypoxia and fish kills while certain bloom forming species are toxic to humans and other organisms. Because of the impact of phytoplankton on the fisheries industry as well as potentially being hazardous to human health, it is in our interest to study the behaviors and life cycles of phytoplankton.
The structure of phytoplankton results from their interactions with a fluid environment and their need to acquire light and nutrients while avoiding predation and sinking. Because phytoplankton have limited motility, the fluid flow around them influences their distribution in the water column. For example, the interplay between cell motility and vertical shear causes phenomena called thin layers, where phytoplankton biomass is concentrated in a layer of ocean few centimeters to a meter thick (Durham et al., 2009). To study these interactions, observations should be made of phytoplankton in situ in the context of ocean circulation and turbulence.
Conventional methods for recording measurements of plankton include light microscopy, which requires fixing samples to a slide, and in situ measures of chlorophyll a fluorescence. Flow-cytometry has been introduced as an alternative to microscopic analysis in the lab to automate the counting of plankton and biomass in prepared samples. This method relies on fluorescence measurements of pigments to help identify particles and preparation steps such as staining are required (Hofstraat, et al 1994). Samples can also be imaged with confocal microscopy which has the advantage of being able to image a 3D volume by representing it as a stack of 2D frames. However, confocal microscopy is also limited to fluorescent samples (Jericho et al 2001).
Similar to a confocal microscope, a holographic microscope also captures information about the imaging volume and represents it as a stack of 2D images. Unlike other forms of microscopy, it is well suited for making in-situ observations because it does not require sample preparation and has a frame rate and resoluti
The initial datasets provided by 4Deep were imaged in Halifax Harbor.
Over a period of four weeks, 19 recordings were made in Narragansett Bay.
The tradeoff is processing effort - previous authors manually curated data. The purpose of this project is to automate.
The classification of satellite land-surface images is a similar problem encountered in image processing and recognition. The Support Vector Machine, or SVM has proved to be competitive in the binary classification of high-dimensional data sets (Huang, et al, 2002).
A small subset of 88 Narragansett Bay holograms was manually sorted into two classes, empty (43) and zooplankton (45). The MATLAB Computer Vision System Package was used to extract a bag of features from this subset and a linear SVM classifier was trained using this data. 5-fold cross validation indicated that the linear SVM had an accuracy of 78.4% percent. This process took approximately an hour.
This trained SVM classifier was used to classify holograms recorded from the Jamestown sample to determine its performance against a different data set.
To see how well a classifier trained using a set (10%) of holograms can predict the rest of the data set, the Jamestown holograms were partitioned into training and validation sets. A linear SVM classifier was created and trained using 90 out of 900 holograms. However, to use the classifier on the remaining 810 holograms, the data must undergo feature extraction using bagoffeatures. Running bagoffeatures on 810 holograms did not terminate after 9 hours. Considering that actual data sets will contain thousands of holograms, this is not a feasible method.
Another approach compares the variance of the pixels in a hologram against the mean variance of representative empty holograms.
The 264 Narragansett holograms are manually sorted into ‘zooplankton’ or ‘empty’ categories prior to assessing this method. From the empty subset, 8 of the most consistent holograms were chosen and averaged to represent the background noise of the dataaset
This averaged background is subtracted from all 264 holograms in the directory. Then the normalized variance of the resulting difference matrices is used as a measure of how much it differs from the background signal.
A threshold value is calculated after subtracting background noise from the 8 empty holograms by averaging the variance of the resulting pixels and adding a standard deviation. The holograms with a variance greater than the threshold are categorized as zooplankton and the ones below the threshold as empty.
The result of this method on the Narragansett dataset is shown in Figure 14. 80% of the zoo holograms and 53% of the empty holograms were correctly categorized.
< Comment on some high peaks in the empty section >
Due to the binary categorization of ‘zoo’ versus ‘empty’, many holograms in the empty category contain small particles or plankton which might be interesting to reconstruct.
Choosing which holograms are suitable to represent the dataset’s background noise is a step involving manual selection. Although this step is necessary the number of empty holograms selected should be optimized to give the best results from the fewest manually selected holograms.
In the above analysis, the 8 best empty holograms were chosen to perform the method. However this required manually sorting through the entire dataset to find the most ‘empty’.
A method that better represents the actual use case would be to randomly choose a subset from the data to be qualified manually as either zoo or empty. The effect of the size of the randomly chosen subset on the method’s performance was evaluated. 50, 10, and 5 percent of the dataset were determined to be empty or zoo, and the resulting empties used to generate the threshold for the dataset.
Table 9 shows the results of using different percentages of the Narragansett dataset to create the threshold value. The random 10% subset was chosen three times to show the effect of the images in the representative set. Both 10 and 5 percent subsets outperform the 50 percent subset of empties in finding zoos, but the 50 percent subset also filters out empty holograms at a better rate due to the higher threshold value.
Because of the size of the Narragansett dataset (236), smaller subsets were not assessed.
Comparing variance against the variance of representative empty holograms shows how much a hologram differs from the average background noise. Another metric which may be used to flag a hologram for reconstruction is the change from one hologram to the next. A significant change in the pixels over consecutive holograms would indicate that a particle had entered or left the imaging volume.
The holograms of the Narragansett dataset were paired up and a variance metric for the d/dt between each pair was calculated.
Similarly to the previous method, a threshold is calculated from the difference between the 8 best pairs of representative ‘empty’ holograms showing little change from one to the next. The results are plotted in Figure 16.
The calculated T value is too low to be used as a threshold for separating holograms for reconstruction. Because the majority of holograms are not of interest for reconstruction, keeping an arbitrary percentage of the highest d/dt variance pairs can coarsly filter for the holograms which may contain larger particles without any manual work.
When a pair of holograms has a difference variance above the median value, it is possible that one is empty while the next contains a particle, or that both contain interesting particles (see Figure 17). At this stage we have not determined a way to identify which out of each hologram pair should be kept for reconstruction.
To narrow down the holograms ultimately selected for reconstruction, a two step process issued. The 50% of hologram pairs with the highest variance in their difference are kept. Then they are filtered to remove holograms with a variance lower than a threshold calculated from the variance of the selected representative empty holograms.
To help qualify this method, the datasets from Narragansett Bay and the Sargasso Sea were partitioned by hand into empty, plankton, and zoo categories. The zoo category includes holograms which contain more discrete particles which are easily identified when quickly glancing over the holograms. The plankton category contains holograms which have smaller particles and any ambiguous holograms. The empty category contains all holograms which would not be chosen for reconstruction.
Tables 11 and 12 show the results of this method on a directory of 236 holograms from Narragansett Bay (raw1), and on a directory of 4862 holograms taken in the Sargasso Sea.
Final text for conclusion goes here
in as many
cards as you like.
Figure 14 in Overleaf
zoopos = 24, zooneg = 6, emppos = 111, empneg = 123
-Put labels on it
-Point out example holograms above and below threshold
Plot of variance of the difference between consecutive holograms.
An example pair of consecutive holograms where both contain interesting particles.
TABLE 11 - results of Narraganset
TABLE 12 - results of EN575