Transcription of: C4W3L10 Region Proposals




if you look at the object detection literature there's a set of ideas called region proposals that's been very influential in computer vision as well I wanted to make this video optional because I tend to use the region proposal set of algorithms a bit less often but nonetheless it has been an influential body of work and an idea that you might come across in your own work let's take a look so if you recall the sliding windows idea and you would take a trained classifier and run it across all of these different windows and run a detector to see if there's a car a pedestrian or maybe a motorcycle now you couldn't run the algorithm convolutional lead but one downside that the algorithm is it just classifies a lot of regions where there's clearly no object right so this rectangle down here is pretty much blank it's clearly nothing interesting there to classify and maybe it was also running it on this rectangle which you know looks like there's nothing that interesting there so what rosco straight Jeff Dunn who travelled arrow and Jitendra Malik proposed in a paper sight to the bottom of the slide is an algorithm called our CNN which stands for regions with convolutional networks or regions with CNN's and what that does is it tries to pick just a few regions that make sense to run your confident classifier on so rather than running your sliding windows on every single window you instead select just a few windows and run you're confident classifier on just a few windows the way that they perform the region proposals is to run an algorithm called a segmentation algorithm that results in this output on the right in order to figure out what could be objects so for example the segmentation algorithm finds a blob over here and so you might pick that bounding box and say let's run the classifier on that block looks like is this little green thing that finds a blob there and soon I'll also run a crossfire on that rectangle to see if there's anything interesting there and in this case this blue blob if it runs the classifier on that hopefully you find a pedestrian and if you run it on this light cyan maybe you'll find a car maybe not Marshall so the details of this this is called a segmentation algorithm and what you do is you find maybe two thousand blobs and place bounding boxes around about two thousand blobs and run your classifier on just those two thousand blobs which and this can be a much smaller number of positions on which to run you're confident classifier then if you have to run it at every single position throughout the image and this is a special case if you are running you're confident not just on square shape regions we're running them on tall skinny regions to try to find pedestrians we're running them on you know why fat regions try to find cars and running them at multiple scales as well so that's the are CNN or the region with CNN a region of CNN features idea now it turns out the our CNN Avram is still quite slow so there's been a line of work to explore how to speed up this algorithm so the basic our CNN algorithm would propose regions using some algorithm and then classify the Patrol's regions one at a time and for each of the regions it will output the label so they're a car is there a pedestrian is there a motorcycle there and also output a bounding box so you can get an accurate bounding box if indeed there is a object in that region so just to be clear the our CNN algorithm doesn't just trust the bounding box it was given it also outputs a bounding box B xB YB hbw in order to get a more accurate bounding box than whatever happened to surround the block that the image segmentation algorithm gave it so it can get pretty accurate bounding boxes now one downside of the our CNN algorithm was that is actually quite slow so over the years they've been a few improvements to the our CNN algorithm Roscoe Shrek proposed the fast our CNN algorithm and it's basically the our CNN algorithm but with a convolutional implementation of sliding windows so the original implantation would actually classify the regions one at a time so fast our CNN news a convolutional implementation of sliding windows and this is roughly similar to the idea you saw in the fourth video of this week and that speeds up our CNN quite a bit it turns out that one of the problems with false Austrian an algorithm is that the clustering step to propose the regions is still quite slow and so a different group shouting rent timing her Roscoe shriek and teen son proposed the faster our CN n algorithm which uses a convolutional neural network instead of one of the more traditional segmentation algorithms to propose the blobs of the rows to regions and that wound up running quite a bit faster than the fast our CN n algorithm although I think the faster our CN n algorithm most implementations are usually still quite a bit slower than the yellow algorithm so the idea of median proposals has been quite influential in computer vision and I want you to know about these ideas because you see others still use these ideas for myself and this is my personal opinion not the opinion of the computer vision research community as a whole I think that we can proposal is an interesting idea but that not having two steps first proposed regions and then classify it being able to do you know everything more or at the same time similar to the Yola order you only look ones algorithm that seems to me like a more promising direction for the long term but that's my personal opinion and not necessary the opinion of the whole computer vision research community so feel free to take that with a grain of salt but I think that the RC n an idea you might come up you might come across others using it so it's well worth knowing as well so you can understand others algorithms better so that we've now finished up our material for this week on object detection I hope you enjoy working on this week's program exercise and either forward to seeing you next week