YOLOv7: Trainable Bag-of-Freebies(arxiv.org) |
YOLOv7: Trainable Bag-of-Freebies(arxiv.org) |
1. Intro - a note on the overall problem domain - object detection in this case and bit zoomed in to the DL space. 2. Related work - work so far in the domain .. without critizin it. 3. Problem statement - what is the knowledge gap in the related work this paper is talking about. 4. Solution - how did we address the gap. 5. Validation - how do we claim our solution addressed the gap it was intended to address.
This paper's abstract covers only the last part and sporadically a bit of 2. What I want to know is this abstract is "what is the new learning in the yolov7 arch?"
Perhaps the bigger picture here is that it points to metrics chasing as a proxy for a "research agenda" in the ML community.
If you go to the associated code, you'll see that it needs a 'backbone', 'neck' etc. What is a backbone? Questions that arise directly from the code will lead you towards good blog articles, etc. https://huggingface.co/spaces/nateraw/yolov6/blob/main/yolov...
OTOH, you could go and have a look at (for instance) the Stanford vision courses for a more 'theoretical' approach. But the code itself is often solid guide to what's going on (the frameworks used for Deep Learning map well onto what's being discussed in blogs/lectures/papers).
Here's a good resource: https://eli.thegreenplace.net/2018/depthwise-separable-convo....
Anyone knows any more, maybe?
Something called YOLOv7.
Yikes. It's not clear to me if that's the upper limit on accuracy or a limit imposed by requiring that it run at 30 FPS, but still...yikes.
From the paper:
> For example, multi-object track- ing [94, 93], autonomous driving [40, 18], robotics [35, 58], medical image analysis [34, 46], etc.
LOL, these are all great use cases for a model with < 60% accuracy!
While the author likely didn't have that intention, that's what came across.
Even for YOLO meaning "You Only Look Once" YOLO and v7 do not go together well.
The point I was making is that YOLO and v7 don't go well together, and that is true for either meaning of YOLO.
It's not as if this is named "the final algorithm v7"
Because distinguishing an object as belonging to one class out of a thousand with 50% accuracy doesn't mean it's a coin flip. You'd need a thousand-sided coin. Random chance in that case is 0.1%, which maeks 50% way, way better.
This is definitely not a coin flip, actually somehow close to what a human would produce, IMHO.
I assure you it’s highly useful in the real, real world.