ffmpeg uses a sectioned parameter approach. Input parameters and input, optionally multiple, output parameters and output target.
In my example I specified copy as codec so the streams are copied as-is, without reencoding. I used map to map video streams 1 and 2 of the first input, and default/all audio streams.
The ffmpeg reference docs are thorough. The wiki has some more guide, example, and explanatory docs.
That assessment entirely depends on what you consider “complex” and “easy”.
What do you mean by it’s bad at doing easy things but good at doing complex things? I don’t see how something complex would work better than something easy.
Current AI is not smarter than humans. It needs supervised training, and then acts according to that. That’s inherently incompatible to novelty and correct exploration.
You lay out a highly sophisticated attack when it’s simple to adjust the downloaded software to call home. Why would anyone invest that much into something like that (you left out where “some other databases” would be and how reliable they would be) when there are much simpler and more reliable approaches?