I think it doesn't even have to do with the amount of calls or amount of images download, since now that I think about it, I had the problem even before starting to code, parsing only a single product page. I thought the fix was the UserAgent. I'll have to do more testing, maybe try a few flags and see if anything changes. Last night it didn't work from 12am to 3am that I turned off the pc (I didn't really try much, since I was doing other stuff, I was hoping it would fix itself). I also avoid calling it too often since I also consider being blocked as a high possibility (another reason to use update=-1).Yincognito wrote: ↑May 24th, 2024, 11:30 am Did you consider the fact that the measures downloading the images finish at a different time than the product ones? So maybe you should add the enabling to the former? Optionally, with a [!Delay ...] before enabling as well, just to be sure?
Well, I use my tricks in mostly single item (and different site) displaying changeable via scrolling, as you already know, and that doesn't exactly suit your current scenario, where you poll the same site and you display multiple items and their properties at the same time. In my scenarios, I use a single "product" (that might have multiple "properties", of course), so a single "set" of measures / meters grabbing the said properties is enough. This makes a sequential system trivial to implement, e.g. no enabling / disabling needed.
Anyway, besides the considerations regarding the finish actions and the potential delays, or other tricks on full sequential access (which can be added to the bangs variable easily, although now its existence is not necessary since starting stuff for the 1st product automatically continues with the others through the finish actions), the ideal solution would involve a single request for multiple products via an API (checking the Network tab in the browser's Developer Tools after reloading the page might reveal such a system / link - already checked, no transparent API calls in this case, plus, even if it was, images would still be downloaded individually). Personally, I don't like redundancy and polling a site more than once, but yeah, in some cases it's unavoidable.
P.S. I might adjust your code to fit my ideas later on, but I don't guarantee it.
EDIT: Careful with the calls to the site, they have a captcha and all that (asked me twice in the browser, no shame whatsoever, lol). By the way, I get the encoding issue from the OP even with the UserAgent, so it looks like some header / flag / codepage configuration is needed for a "by the book" retrieval. So, the failed retrievals might have something to do with either the captcha or the encoding, as a possibility. Will stop trying for now in order to not make it worse or get blocked / banned and such, but I didn't abuse it anyway, just about 5 attempts till now.
I read somewhere that Amazon doesn't like being "scraped'" (I guess that's the same as "parsed"), and something about Python and the Amazon API. I'm not sure how to implement that. API's are a new thing to me and I have no idea how it works, is that free? No idea.
They also mention amazon blocking access, and also mention something about the headers. How the headers thing work on WebParser? In the examples I saw they do it like this (on python):
Code: Select all
custom_headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 13_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15',
'Accept-Language': 'da, en-gb, en',
'Accept-Encoding': 'gzip, deflate, br',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Referer': 'https://www.google.com/'
}