Wikipedia:Link rot/URL change requests/Archives/2021/November

From Wikipedia, the free encyclopedia

Daftar Situs / Judi Slots

I can't quite get a handle on this. It would appear that an Indonesian gambling site has usurped a number of domains and both Refill and Citation Bot are pulling in the new page title including in one case an archived version from wayback. The variety of domains usurped can be found by searching for pages containing either Daftar Situs or Judi slots. There is of course a number of articles using the original domain names for citations which do not show either of the tell tale phrases but will go to the Indonesian gambling site on clicking the link eg bcsportshalloffame.com, robinsonlibrary.com Lyndaship (talk) 10:51, 20 October 2021 (UTC)

Fascinating discovery, User:Lyndaship. Found up to xx domains globally and 106 domains on enwiki:

Extended content

I can run the bot one-time on these domains, but it raises questions how to do it ongoing. Industrialized/automated usurpation never encountered before. -- GreenC 18:52, 20 October 2021 (UTC)

It's all beyond me so whatever you can do, I will be interested to see how many instances there are. It would appear that it is even bigger, another word which often appears in the usurpations is Terpercaya (always in conjunction with Situs and/or Judi I think) which shows other domains in addition to your list above have been usurped eg newctzen.com Lyndaship (talk) 19:29, 20 October 2021 (UTC)
It's a difficult problem for sure. Even worse, we are only seeing where a |title= was missing, which is actually somewhat rare. It suggests there are many more that have a title and thus invisible from detection using this method. Thank you for continuing to find more cases. -- GreenC 20:18, 20 October 2021 (UTC)

@Lyndaship and GreenC: I am tracking these (hence the template) and will ask for reports on these (the 'COIBot' reports in the links).

I know it is not going to be fun, but the suggested actions are: 1) blacklist these sites to avoid new additions, 2) convert ALL these links to archived links, 3) remove all current links without prejudice (people WILL follow the original link if it is still there and arrive at a wrong target, at worst the new target could include phishing or malware techniques or get them in the future), 4) make sure you whitelist all the archive-links if they include the blacklisted link (one mass request will do). (it is probably safe to 2, 3 and 4 before 1, but if there are ongoing additions of the links then it will be an never ending task). And I expect that that needs to be done by members of a WikiProject who is closest to these subjects (so they know best what to do in case archives don't exist, or whatever - the links HAVE to go). --Dirk Beetstra T C 05:27, 21 October 2021 (UTC)

Results

  • 75 domains (list above)
  • Edit 1,034 pages
  • Add 1,354 archive URLs - Example, Example
  • Change 1,085 |url-status=dead or live -> |url-status=usurped - Example
  • Convert 44 {{webarchive}} to straight archive - Example
  • Convert 182 usurped titles - Example
  • Delete 256 citations in 63 articles no archive avail - Example, Example
  • Domains marked permadead in IABot database (TBD)
  • Various general fixes

I believe it is done, unless Lyndaship finds more. The domains could now be blacklisted, but since the domains still exist in most citations it will cause unintended problems tripping edit filters. I would not recommend it unless there is evidence of active spamming. Given the relatively low number of links for 75 domains it looks like passive spamming old links. @Lyndaship, Beetstra, and Billinghurst:. -- GreenC 17:57, 22 October 2021 (UTC)

I see you've done a huge amount of work but I fear its not all done. Firstly searching for pages containing Daftar Situs turns up results like Llanddowror and List of ValuJet destinations. Secondly searching for pages containing Terpercaya produces a large number of results linked to www.leighrayment.com (previously mentioned in another thread) plus unrelated ones such as List of largest running events. Thirdly searching for pages containing slot online produces a small number such as Visa requirements for holders of passports issued by the Sovereign Military Order of Malta. Also searching for pages containing Judi online finds pages like Luke Hayes-Alexander. Lyndaship (talk) 18:29, 22 October 2021 (UTC)

Results (batch 2)

  • 31 new domains added to list
  • Edit 2,808 pages
  • Add 1,684 archive URLs
  • Change 2,633 |url-status=dead or live -> |url-status=usurped
  • Convert 314 {{webarchive}} to straight archive
  • Convert 28 usurped titles
  • Delete 327 citations in 270 articles no archive avail
  • Domains marked permadead in IABot database (TBD)
  • Various general fixes
  • Re-processed 2,526 Wikipedia:Link_rot/URL_change_requests#leighrayment.com URLs which resulted in a few deletions (lack of archive URL) and removal of 98 spam strings from |title= and |website=.

New domains keep showing up, perhaps 1-5 per week. Search: insource:/(Daftar Situs|Judi Slot|Terpercaya|(slot|Judi) online)/i [1]. It's easier to process in batches, same work to process 1 as 10. Some domains have re-expired. At least one has been reclaimed by the original (or new) non-spam owner. @Lyndaship:. -- GreenC 05:38, 26 October 2021 (UTC)

Been watching you and your bot sort these, would have been impossible without it. I've done a final check (for now) searching on lesser used keywords which have appeared in the resolved ones. The following domains appear to have problems, dailyguideafrica.com, nirbhayatheplay.com, div52.org, bayankhongor.com, 1972summitseries.com, belarusianstudies.org, wikireadia.org, jpnews.org, iccnow.org, ampsforchrist.com, biomedicalstatistics.info, tvtoymemories.com and irannovinfilm.com. I'll look again in a month or so to see if any new ones have popped up. Thanks again for a super job Lyndaship (talk) 14:38, 26 October 2021 (UTC)
Ok great. These 13 domains are in 264 pages, I'll wait for the next set before running again. Agree don't see how this would be practical without a bot, I had to manually edit about 200 citation removals (mainly for hindoonnet.com) which took the better part of a day. -- GreenC 17:08, 26 October 2021 (UTC)

Lyndaship: I've created a new project page Wikipedia:Link rot/cases/Judi with the shortcut WP:JUDI. It will be the main page for tracking. Feel free to continue posting updates and new discoveries on this page if you want I can move them over. -- GreenC 16:54, 31 October 2021 (UTC)

@Beetstra and Billinghurst: in case you want to track WP:JUDI (Indonesian for "gambling"). -- GreenC 17:16, 31 October 2021 (UTC)

Venezuelan Winter League Stats (baseball) Site

Hey all,

I deal mainly in baseball player pages. I've noticed that the link that most Venezuelan stats sites links to now example: John Morris (http://www.purapelota.com/lvbp/mostrar.php?id=morrjoh001) is at a website domain that is for sale. I tracked down the new site at: John Morris (https://www.pelotabinaria.com.ve/beisbol/mostrar.php?ID=morrjoh001). There are thousands of pages in the baseball category that link to the old site.

I checked with the way back machine and it appears to be the same data points, and appears to be as good a source as exists for Venezuelan Winter League Stats

Cheers! DaffydAtzinger (talk) 01:18, 31 October 2021 (UTC)

@DaffydAtzinger: Checking [2] there are about 1000 links of various forms.

2 out of 4 work, with the first form being over 95% of all links. I can move them and archive where unable to move. -- GreenC 19:25, 31 October 2021 (UTC)

@GreenC: thanks mate. much obliged. have a great one. DaffydAtzinger (talk) 19:35, 31 October 2021 (UTC)

Results

  • Edited 1,058 pages
  • Moved 1,059 links (Example)
  • Added 51 archive URLs (Example)

Each of the move URLs was verified working including for soft-404s (redirect to home page etc). This was complicated by an aggressive bot blocking system, even on header checks. I was able to get past it. It didn't matter anyway as everything is working as expected. Never seen that before, with domain migrations there is always stuff that doesn't get moved. This was a clean sweep. Props to the sys admins at pelotabinaria.com.ve -- GreenC 04:25, 1 November 2021 (UTC)

afriquepanorama.com usurped

afriquepanorama.com: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:frSpamcheckMER-C X-wikigs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.com

This site has been usurped. It doesn't have that many active links in mainspace (5 enwiki, 24 globally), but marking it as usurped would be great. Perryprog (talk) 15:27, 31 October 2021 (UTC)

@Perryprog: Thank you for reporting this domain. It is part of the large Wikipedia:Link_rot/URL_change_requests#Daftar_Situs_/_Judi_Slots problem. It will added to the list and processed once enough domains are found for the next batch. The page title "Data Keluaran Togel HK Dan LiveDraw Hongkongpools" provides a new set of key terms to search on to find others. -- GreenC 15:55, 31 October 2021 (UTC)
Holy moly. Good luck, and thank you for your hard work on this. Perryprog (talk) 15:58, 31 October 2021 (UTC)
Its our old friend Daftar Situs/Judi online. There are a number of domains turning up on these keywords which have been usurped - list to follow Lyndaship (talk) 15:57, 31 October 2021 (UTC)
Ok found the following , the first looks quite big - abscbnpr.com, 1924.org, bbauindia.org, networkedblogs.com, sevensisterspost.com, lotteryage.com, cienfuegoscity.org and coxautodata.com Lyndaship (talk) 16:38, 31 October 2021 (UTC)
Added to WP:JUDI. -- GreenC 16:55, 31 October 2021 (UTC)

IT and communication (jkorpela)

Jukka Korpela's website "IT and communication" was moved from www.cs.tut.fi/~jkorpela/ to jkorpela.fi in 2017, but old links stopped redirecting to the new addresses earlier this year. The broken links were fixed in the Finnish Wikipedia a couple months ago, but I noticed that there are still broken links in enwiki. Thus links beginning with "http(s)://www.cs.tut.fi/~jkorpela/" should be changed to "https://jkorpela.fi/". --TommiWalle (talk) 17:23, 7 November 2021 (UTC)

It is done, on 32 pages. Thanks for the report. -- GreenC 18:21, 7 November 2021 (UTC)

gendisasters.com

At some point, www3.gendisasters.com was moved to www.gendisasters.com, with 323 pages that need to be fixed. Ionmars10 (talk) 02:10, 8 November 2021 (UTC)

This is done. Edited 324 pages (one in File space). Almost everything remaining www3 should be an archive URL. This is because when looking up the www URL at Wayback it returns the www3 one, and I didn't want to remove existing archive URLs. There were at least 3 www3 URLs that did not work at www. If you see anything else let me know, thanks for reporting! -- GreenC 18:40, 8 November 2021 (UTC)

pifeedback.com

While fixing a large disambiguation link set, I have uncovered this domain to be upsurped by a gaming site. Diff: Special:Diff/1057945556. – robertsky (talk) 15:47, 30 November 2021 (UTC)

– robertsky, thanks added to the queue at Wikipedia:Link rot/cases/Judi to be usurped. -- GreenC 16:51, 30 November 2021 (UTC)