It looks like you're trying to oneshot programs by writing prompts into a chat interface. Most people talking about "vibe coding" are using tools like Cursor, which has an edit model, does a lot of prompt assembly under the hood, and preserves the normal programmer-iteration loop.
I note that your problem statement is very much in the sweet spot of "hard things that fool nonprogrammers into thinking they're easy." If I were to attempt that task, as a human programmer sans-AI, I would not expect it to go smoothly and I would budget multiple days for whatever went wrong.
It also looks like this is a task where most of the difficulty is located in architectural choices, but your prompts may have tricked the models into skipping over the architectural consideration and rushing into an implementation. For a task like this, where the hard part is architectural (ie, selecting language/library/etc), I usually describe the problem setup and ask for alternatives with pros/cons.
I'm not sure whether presence of errors in the prompt causes current-gen models to imitate the parts of their training distribution which contain more errors, or if fine-tuning removes that effect, but referring to HTML as a programming language is incorrect. The programming language in question would be Javascript.
As for specific issues you ran into, as indicated by your prompts:
* It sounds like the first attempt involved dragging and dropping onto a batch file. In Windows, dragging a file onto an executable runs the executable with the dragged file as first argument; I believe this is what it was trying to do. However, this only works if you're dragging a file (ie, from a file manager or your browser's downloads list), not if you're dragging an image out of a web page. I believe you nudged it into this mistake by talking about Windows and Python.
* There are significant differences in what browser APIs you're allowed to use on a regular web page, in an HTML file at a file:// URI, and in an iframe. Doing this task from either of the latter two is probably theoretically possible, but it's a major restriction. This is responsible for the localStorage-related errors you saw.
* You ask for logging "directly on the HTML page", implying you might not have been watching the developer tools console? If you weren't watching the console, you'd miss a lot of vital information. One of your prompts mentions not being able to see which network error had occurred; that information would have been there, JS is incapable of suppressing it, and also incapable of accessing it to copy the error message somewhere else, for security reasons.