[Feature] Extra Feature for Webparser

tass_co · Post by **tass_co** » July 6th, 2022, 10:13 pm

Hi everyone

As you know, we can't get data from some sites with Webparser.
For example RARBG. Also, the RSS i is not working properly.
For sites like this, I save the page as HTML using a browser plugin and then I can get the data I want with Webparser.
This is a somewhat arduous path

My suggestion, Webparser should be able to save the page as HTML or TXT

Only source code (without audio/image etc)

Thank you

Jeff · Post by **Jeff** » July 7th, 2022, 5:02 pm

tass_co wrote: ↑July 6th, 2022, 10:13 pm My suggestion, Webparser should be able to save the page as HTML or TXT :thumbup:
Only source code (without audio/image etc)

Rainmeter already does that.
That's why Download and DownloadFile exists.
If that dosen't work, you can always do ["""curl "example.com" > file.txt"""] to basically do the same thing.
And if you're more advanced than that, Invoke-WebRequest in Powershell. I think Powershell is the way to go since it allows you to get data after a site gets modified by JavaScript or PHP (so something like the YouTube home page). Not quite clear from your post what the problem is specifically, I'm assuming it's this.

Could you provide the measures so we can do more diagnosis on your problem? This can just be a Rainmeter Help topic instead of Bug & Suggestion.

tass_co · Post by **tass_co** » July 7th, 2022, 5:58 pm

Jeff wrote: ↑July 7th, 2022, 5:02 pm Rainmeter already does that.
That's why Download and DownloadFile exists.
If that dosen't work, you can always do ["""curl "example.com" > file.txt"""] to basically do the same thing.
And if you're more advanced than that, Invoke-WebRequest in Powershell. I think Powershell is the way to go since it allows you to get data after a site gets modified by JavaScript or PHP (so something like the YouTube home page). Not quite clear from your post what the problem is specifically, I'm assuming it's this.

Could you provide the measures so we can do more diagnosis on your problem? This can just be a Rainmeter Help topic instead of Bug & Suggestion.

I'm sorry. I did as you said with Debug. Thanks

Post by **Yincognito** » July 27th, 2022, 8:38 am

Also, the RSS is working properly, but it depends on the link you use, on whether it's a plain RSS or ATOM, and of course on whether the code you write to extract info from it is correct and flexible enough...

tass_co · Post by **tass_co** » July 27th, 2022, 5:16 pm

Yincognito wrote: ↑July 27th, 2022, 8:38 am Also, the RSS is working properly, but it depends on the link you use, on whether it's a plain RSS or ATOM, and of course on whether the code you write to extract info from it is correct and flexible enough...

Unfortunately, the problem is not RegExp related.
During the day I get the following code (also from the browser).
no data...

Code: Select all

<rss version="2.0">
<channel>
<title>RARBG</title>
<description>RARBG rss feed direct download</description>
<link>https://rarbg.to</link>
<lastBuildDate>Wed, 27 Jul 2022 19:04:41 CEST</lastBuildDate>
<copyright>(c) 2022 RARBG</copyright>
<item>

If I try 10 times, it does 5-6 times.
For this reason, I tried the way to save the page.
But that wasn't very positive either.
Now I save the page from the browser and process it.

Post by **Yincognito** » July 27th, 2022, 7:29 pm

tass_co wrote: ↑July 27th, 2022, 5:16 pm Unfortunately, the problem is not RegExp related.
During the day I get the following code (also from the browser).
no data...
Code: Select all
<rss version="2.0">
<channel>
<title>RARBG</title>
<description>RARBG rss feed direct download</description>
<link>https://rarbg.to</link>
<lastBuildDate>Wed, 27 Jul 2022 19:04:41 CEST</lastBuildDate>
<copyright>(c) 2022 RARBG</copyright>
<item>
If I try 10 times, it does 5-6 times.
For this reason, I tried the way to save the page.
But that wasn't very positive either.
Now I save the page from the browser and process it.

So, let me get this straight: is the feed legitimately not populated during the day, OR it's populated but you get it unpopulated via WebParser (unless you save the page from the browser as you mentioned)? Also, does saving the page from the browser get you the populated version of the feed (or in other words, does that solve the problem)?

tass_co · Post by **tass_co** » July 27th, 2022, 9:15 pm

Yincognito wrote: ↑July 27th, 2022, 7:29 pm So, let me get this straight: is the feed legitimately not populated during the day, OR it's populated but you get it unpopulated via WebParser (unless you save the page from the browser as you mentioned)? Also, does saving the page from the browser get you the populated version of the feed (or in other words, does that solve the problem)?

The webparser works fine. The problem is that RSS is not work stable during the day.
Sometimes works, sometimes not. If it doesn't work, it doesn't work in the browser either.
Therefore, I open the page in a web browser normally (not in RSS format) and save the page.
Then I get the data from the saved file using webparser.

Post by **Yincognito** » July 27th, 2022, 10:43 pm

tass_co wrote: ↑July 27th, 2022, 9:15 pm The webparser works fine. The problem is that RSS is not work stable during the day.
Sometimes works, sometimes not. If it doesn't work, it doesn't work in the browser either.
Therefore, I open the page in a web browser normally (not in RSS format) and save the page.
Then I get the data from the saved file using webparser.

But when it works in the browser, does it also work with WebParser - the RSS I mean? Or is it that you actually save the main page instead of the RSS page maybe?

tass_co · Post by **tass_co** » July 27th, 2022, 11:39 pm

Yincognito wrote: ↑July 27th, 2022, 10:43 pm But when it works in the browser, does it also work with WebParser - the RSS I mean? Or is it that you actually save the main page instead of the RSS page maybe?

Im sorry, I guess I didn't fully explain.

I open and save normally in the browser (html format).https://proxyrarbg.org/torrents.php
This way, I can only get correct results
I have to use this way as it gets stuck in Webparser security check.
28-07-2022 02_16_39-Mozilla Firefox (Gizli Gezinti).png

RSS does not always work correctly. I mostly get the blank rss page code I posted above.
RSS page https://proxyrarbg.org/rssdd.php

Also, there is no point in using it because there is no topic link. Includes magnet link only.

28-07-2022 02_35_00-Mozilla Firefox.png

The curl command (or derivatives) is also not used because it gets stuck in security check either.

We've talked to balala about UserAgent before. https://docs.rainmeter.net/manual/measures/webparser/#UserAgent
Maybe security check can be passed with UserAgent option, but we don't know how it is used.

Post by **Yincognito** » July 28th, 2022, 9:24 am

tass_co wrote: ↑July 27th, 2022, 11:39 pm Im sorry, I guess I didn't fully explain.
I open and save normally in the browser (html format).https://proxyrarbg.org/torrents.php

RSS does not always work correctly. I mostly get the blank rss page code I posted above. RSS page https://proxyrarbg.org/rssdd.php
[...]
Also, there is no point in using it because there is no topic link. Includes magnet link only.

Ah, ok, I understand now. You access and save a different page than the RSS one in the browser, because the former is more complete than the latter (but only when accessed from the browser). Yeah, it happens on some sites, all kinds of scripts to provide the access only after a certain routine is followed. That's why the downloaded page contains what you need while the opposite does not.

I'm not sure whether magnet links only is a problem, but if you're talking about being unable to get the RSS "by topic" aka by category, you just didn't look enough, because it's possible. For example, categories 2, 23, 24, 25 and 26 are apparently (see the link address) for "Top Music", and similarly, other categories are for other stuff. You only have to check how the links look in the main page for a category, and replace category or category[] with categories when adding to your https://rarbg.to/rssdd.php? URL path (example: https://...rarbg.org/torrents.php?category=2;23;24;25;26 becomes https://rarbg.to/rssdd.php?categories=2;23;24;25;26 in your RSS link).

Unfortunately, more filtering, sorting or getting a certain "page" of the results is something that is either not possible or not known yet, apart from basic categorizing. But who knows, if you look or experiment query variations enough maybe you'll find out - heck, I even found out a PowerShell script that got such info, and other users wanting to get the same thing you do for use via a Rainmeter skin.

[Feature] Extra Feature for Webparser

[Feature] Extra Feature for Webparser

Re: [Feature] Extra Feature for Webparser

Re: [Feature] Extra Feature for Webparser

Re: [Feature] Extra Feature for Webparser

Re: [Feature] Extra Feature for Webparser

Re: [Feature] Extra Feature for Webparser

Re: [Feature] Extra Feature for Webparser

Re: [Feature] Extra Feature for Webparser

Re: [Feature] Extra Feature for Webparser

Re: [Feature] Extra Feature for Webparser