YOLOv7: Trainable Bag-of-Freebies

YOLOv7: Trainable Bag-of-Freebies(arxiv.org)

92 points by groar 3 years ago | 31 comments

sriku 3 years ago |

A rather tangential comment - this paper is an example of how NOT to write an abstract. An abstract is expected to tell me what new piece of knowledge I can learn by reading more. The content of this abstract is only 20% of what a real abstract should be .. the first half of the first sentence is almost all that's needed (could include which archa it beats). The rest of the abstract needs to cover this (perhaps one sentence each) -

1. Intro - a note on the overall problem domain - object detection in this case and bit zoomed in to the DL space. 2. Related work - work so far in the domain .. without critizin it. 3. Problem statement - what is the knowledge gap in the related work this paper is talking about. 4. Solution - how did we address the gap. 5. Validation - how do we claim our solution addressed the gap it was intended to address.

This paper's abstract covers only the last part and sporadically a bit of 2. What I want to know is this abstract is "what is the new learning in the yolov7 arch?"

Perhaps the bigger picture here is that it points to metrics chasing as a proxy for a "research agenda" in the ML community.

yeldarb 3 years ago | |

We summarized the high level improvements here: https://blog.roboflow.com/yolov7-breakdown/

kylevedder 3 years ago |

Probably the most interesting trick from the paper is using the head as a soft supervisor for earlier layers of the network, with the intuition being that if the earlier layers learn to imitate the higher capacity later layers, it frees up the capacity of the later layers to better learn the residual and provides more dense supervisory signal.

lostmsu 3 years ago | |

Yes, but to my surprise the "compound scaling" provides 3x more improvement in their ablation study. Also, I don't understand Table 8 in their ablation study for aux heads, specifically: why does it have different base benchmark values from Tables 6 and 7?

squarefoot 3 years ago |

As someone who got only his feet wet with OpenCV like 20 years ago, so basic shape recognition and no AI involved, what read/software, etc. would you suggest to catch up and play with current technology without being inundated by theory that I'm sure I couldn't grasp?

montanalow 3 years ago | |

Go to huggingface.com and start with some of the tutorials. The operational/engineering skill sets alone are all you need to treat modern ML models like any other black box API/SDK.

intpx 3 years ago | | |

They call it ‘Tasks’

https://huggingface.co/tasks

Tempest1981 3 years ago | | |

https://huggingface.co (no 'm')

synergy20 3 years ago | | |

went there and there are lots of stuff indeed, but I failed to find anything related to "operational/engineering skill sets"?

mdda 3 years ago | |

To just play with something : https://huggingface.co/spaces/nateraw/yolov6 (There's an images tab, and some samples below).

If you go to the associated code, you'll see that it needs a 'backbone', 'neck' etc. What is a backbone? Questions that arise directly from the code will lead you towards good blog articles, etc. https://huggingface.co/spaces/nateraw/yolov6/blob/main/yolov...

OTOH, you could go and have a look at (for instance) the Stanford vision courses for a more 'theoretical' approach. But the code itself is often solid guide to what's going on (the frameworks used for Deep Learning map well onto what's being discussed in blogs/lectures/papers).

bj-rn 3 years ago | |

MS put up some courses on github: https://microsoft.github.io/ML-For-Beginners

https://microsoft.github.io/AI-For-Beginners/

bigdict 3 years ago | |

Start with theory you're sure you could grasp. Understand how convolutions work and that covers a good chunk of theory.

Here's a good resource: https://eli.thegreenplace.net/2018/depthwise-separable-convo....

isoprophlex 3 years ago |

Github repo mentions "teaser: Yolov7-mask" showing segmentation as well. Highly relevant to my interests. Sadly I can't easily discern any other info on this topic.

Anyone knows any more, maybe?

hwers 3 years ago | |

What are you using it for if can share? I’ve thought about training some of these and releasing the weights but I’ve never found a reason they’d really be useful personally so it never really happened

isoprophlex 3 years ago | | |

I'm working on a computer vision pipeline that relies heavily on segmentation to detect objects in video feeds. We capture about 6 hours of video each day. So being somewhat close to real time with our processing rate is important ...

anewpersonality 3 years ago |

We should stop calling it YOLO after the creator quit machine learning.

isoprophlex 3 years ago | |

Especially hilarious considering some other people ALSO jumped on the "we made an object detector so let's call it YOLOvX" wagon and released...

Something called YOLOv7.

https://github.com/jinfagang/yolov7

DonHopkins 3 years ago | | |

Looking forward to the cat detector in YOLOv9.

binibus 3 years ago | |

Why? For me at this point YOLO means a family of detectors that in a single pass propose a bounding box per pixel and filters them with some clustering algorithm. When I see YOLOfoo I know what kind of architecture to expect. A more descriptive name like YOLO-tricks instead of YOLOvX would be nice though.

SrslyJosh 3 years ago |

> the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher

Yikes. It's not clear to me if that's the upper limit on accuracy or a limit imposed by requiring that it run at 30 FPS, but still...yikes.

JustFinishedBSG 3 years ago | |

It's clearly the latter and I don't see why it would be "yikes". Real time detectors are useless if "real time" means 1fps.

SrslyJosh 3 years ago | | |

What good is speed if the accuracy isn't significantly better than a coin flip?

From the paper:

> For example, multi-object track- ing [94, 93], autonomous driving [40, 18], robotics [35, 58], medical image analysis [34, 46], etc.

LOL, these are all great use cases for a model with < 60% accuracy!

IncRnd 3 years ago |

In YOLOv7, YOLO and v7 don't go well together. No, not at all. YOLO normally means "You Only Live Once", and v7 means it's lived at least six times before this.

While the author likely didn't have that intention, that's what came across.

Even for YOLO meaning "You Only Look Once" YOLO and v7 do not go together well.

gchq-7703 3 years ago | |

YOLO in this case stands for "You Only Look One".

DonHopkins 3 years ago | | |

YAML originally stood for "Yet Another Markup Language" until somebody pointed out that it wasn't actually a markup language, so they retro-named it "YAML Ain't Markup Language".

IncRnd 3 years ago | | |

Yes.

The point I was making is that YOLO and v7 don't go well together, and that is true for either meaning of YOLO.