Hi, I’m Anne Carpenter from the Broad Institute of Harvard and MIT. And I co-lead the Cellprofiler project. Sometimes you can answer your biological question of interest by measuring properties of the entire image, such as the total amount of fluorescence or the texture or smoothness of staining across the entire field of view. But that’s fairly rare. Usually you’re going to need to measure individual biological entities with image within the image and those entities can be things like nuclei, cells, synapses, vesicles. They could be regions of organisms or tissue samples or colonies growing on a plate. Segmentation is the process of identifying individual cells or structures within the image. And for most bioimage analysis, workflows, segmentation is a key step. Segmentation gets its name from segmenting or dividing the pixels in the image into regions of interest otherwise called ROIs such as individual cells as shown here. In some software, this step may be called segmentation, but it could also be called identification or object detection, And the regions of interest can be shown using outlines or bounding boxes or as arbitrary, a color arbitrarily colored sect collections of pixels known as a label matrix because each separate entity is given a different numeric label. Now you don’t see the numeric labels directly, but you see them indirectly as reflected in the colors that are shown. So let’s start with a super simple segmentation. Algorithm called thresholding. You can check our Kurt Thorns iBiology video. If you want to understand it in more detail. But thresholding is often just the first step in a segmentation workflow. Now, sometimes your image might be straightforward enough. That thresholding is all you need to answer your problem. So this can be the case if the cells are very nicely separated if you have a high signal-to-noise ratio. But often that’s not the case and thresholding is just the first step So to under to understand thresholding. We need to take a look at a histogram. These pixel values shown in the histogram are what the digital camera reads when it it’s looking at your your specimen of interest, And we see that the pixel values range all the way from the low dark values. — this is gonna be the background in your image, — all the way up to the high values which start to be the bright parts of your image. That’s where the cells are located. And if you think about thresholding, the decision to be made is, how do we pick a pixel value below which all of the pixels belong to the background of the image and above which all the pixels belong to the cells of interest? And so if we pick a threshold, for example as shown here, it’s much too lenient. We’re starting to get some background together with the cell. But if we pick a higher value, that’s maybe a little too high. Now it’s too stringent and we’re not capturing the entire cell. So we want to pick a threshold value. That’s just right as shown here. And so again this algorithm, it’s a fancy word for just a mathematical routine that we’re running across the image and it really couldn’t be simpler in this case. We pick a literal value and if the pixel in the image is less than the value, it’s background; if it’s higher, it’s foreground, And we get a result like this that shows very nicely, distinguishing the background and the foreground. But as you can see in this case, that’s not enough to solve our problem. We did a decent job of separating foreground and background, but now we need to separate touching objects Again. I’m gonna show you just one simple example algorithm where we’re gonna again apply a mathematical routine to all the pixels in the image. So to understand, this algorithm called seed-based watershedding it’s helpful to think of the image as a landscape with mountains where the cells are located. So you can see that kind of a depiction here. It’s the same data as in the original image, but we’re just looking at it in a different way, And so if you think about what distinguishes cells that are sort of stuck together, you think of it as mountain peaks. And what we want to find using this? We first want to find seeds using this algorithm. And that what we want to look for here are the peaks in the image in this landscape image. And so what that means is for every pixel in the image. You march all the way across the image and for every pixel you ask am. I the highest pixel in my local neighborhood. And if so, you’ve found a mountaintop Now you have to smooth the image first. That’s a pre-processing step that you would run before this algorithm. If you smooth it, then each cell will have just one peak pixel. And that will give you as you can see here. A red dot on top of each mountain. And from there you can do the watershedding step. And that’s where you can picture water spurting out of the top of each of these mountains, flowing down and settling in the basins in the in the dim regions that are in between each of those touching cells. So that gives us a very nice result. We end up with all. The cells nicely separated from each other as shown here So. I hope from these two simple examples. You have a sense that image processing algorithms involve just processing the pixels using kind of simple mathematical rules. Of course, there can be much more complex algorithms. And for example, I show here Some untangling of worms, which can involve a lot of complexity in terms of the expected shape of the given organism. The same would be true for algorithms that are very good at distinguishing neurons that have branching patterns and so on. But sometimes even those complex customized image processing algorithms where you apply mathematical rules to the pixels are not adequate to solve a problem. And that can sometimes be the case for non-fluorescent samples where the intensity of the image just doesn’t reflect the amount of material present. That makes some of the standard thresholding and watershed techniques. Not work very well. Some examples are bright field and differential interference, contrast, phase contrast as well as some, especially challenging tissue or histopathology samples as shown here. So, in these cases, we need to turn to machine learning. So instead of choosing a particular image processing algorithm, a simple mathematical routine we’re actually gonna train the computer interactively to detect the right kinds of pixels. So this is called supervised machine learning because the expert supervises the computer to learn how to detect the right kinds of pixels to be labeled as the right kinds of class. So there’s luckily very nice open-source tools for this step. One example is shown here called Ilastik It’s open-source and very user-friendly. And the idea is you just scribble on the structures or you just make dots. If you want the algorithm to process even faster and the algorithm looks at the pixels that you have marked for each class — such as the membrane regions, the nuclei, the mitochondria — and then it predicts for the remainder of the image. And it also, once your classifier has been interactively trained and you’ve corrected errors and it gets better and better and more and more accurate. Then you can run this trained model. This trained classifier on a huge set of images completely automatically. So what the result is is called a probability map And it looks a bit more like a fluorescence image. Then you can use standard image processing algorithms to process it, So you can think of supervised machine learning as a sort of very sophisticated pre-processing step that allows you to carry out segmentation in a more convenient way. So some downsides of using supervised machine learning are that it takes some time to do the training and you might need to retrain on each individual batch of images. If you have some very some technical variations. But these algorithms are very powerful and they don’t require that you have much background knowledge of how the image processing is working. All you’re doing is clicking pixels and scribbling on the image, and it just sort of magically learns what you need it to learn. Now there’s one type of machine learning called deep learning, also known as neural networks, and it’s a type of what you might have seen headlines called artificial intelligence. And there’s really a revolution in image processing happening right now, where deep learning is blowing away prior records for accuracy in detecting objects and images. You may have seen this in other fields, such as your deep learning, advancing other fields, such as autocorrecting your typing or detecting faces in images and so on And these deep learning models are so powerful because they work by layers of processing and that’s why they’re called deep, So the layers give you a lot of flexibility to learn the different cell structures And that makes them powerful. But it also means that you need a lot of examples and a lot of Ann, which means a lot of annotated images to show it what the right answer should be. The typical workflow then is to train the network by providing tens or maybe hundreds of images where you’ve fully annotated to mark the cell structures of interest. Of course, that’s a significant limitation –. It takes a fair bit of time. — and a major area of research right now is to reduce that number of annotations needed. Now there’s another challenge. And that is that it takes a. It takes a bit of computational expertise to operate some of these algorithms these days. So there’s not very user-friendly tool’s quite yet. I expect this to change within within the next year or so Already. You have a trained. It is already the case that if you have a trained deep learning model, you can run it in user-friendly tools such as CellProfiler and Imagej Now. The bad news is those already trained models very rarely can be taken from one project and then have them actually work on another project. Again, I expect this to change in the near future. In fact, my lab organized the 2018 Data Science Bowl, where we annotated 20,000 nuclei from a huge variety of fluorescence and histology images. We covered all kinds of cell types, all kinds of microscopes and experiments and stains. The fact is, a biologist can look at these images and EA very easily outline the nuclei in them And we want to train a deep learning model that has that same level of knowledge not just trained to a specific experiment, but that has more broad capability. So in the future? I fully expect that you will be able to sit down at your microscope. Snap a picture and it will automatically — without you tweaking any software. — just know where the nuclei are and maybe even give you a DNA content. Histogram Okay, So whether you’ve identified your objects using classical image processing algorithms or supervised machine learning or even deep learning once the objects or the regions of interests are identified. You can manipulate them as you need to customize your own project. So, for example, one really common step is if you have identified nuclei in your image, you can expand those borders out by a process of dilation to find the edges of the cell. And once you’ve done that you can do a simple subtraction procedure to take the cell area minus the nucleus area and end up with the cytoplasm. This allows you to measure the fluorescent protein. That’s that might be located within the cytoplasm. As another example, you might have identified speckles within nuclei, and you can associate all of these individual speckles to figure out which cell a nucleus in which cell they belong to. And there’s many other things you can do to manipulate objects in this kind of way to make sure you’re measuring what you’d like to measure. You can also track objects if you have time-lapse images. And I hope you’ll check out Kevin Eliceiri’s video on tracking.