Wikipedia:Link rot/URL change requests/Archives/2023/May

From Wikipedia, the free encyclopedia

It seems many sources of this use WebCite and Wayback Machine links (WebCite archives no longer work, and FC has been excluded from Wayback due to robots.txt). Only their archive.today links work. Hope these can replace the existing non-working links. Kailash29792 (talk) 05:48, 27 April 2023 (UTC)

Yeah. OK WaybackMedic can detect bad archive links and search for better ones. I'll run it through soon and see how it does. Thanks! -- GreenC 14:14, 27 April 2023 (UTC)
This will take another day or so. So far it's getting about a 10% hit rate ie. 1 in 10 cites to TFC have a Wayback URL. -- GreenC 21:05, 28 April 2023 (UTC)
Done. Example: Special:Diff/1152698626/1152714898 -- GreenC 12:15, 2 May 2023 (UTC)

https://en.wikipedia.org/w/index.php?title=Special:LinkSearch&limit=500&offset=0&target=http%3A%2F%2Fznaci.net (214 links to update) –Vipz (talk) 04:03, 3 May 2023 (UTC)

Will do. Each link will be verified it migrated ie. more than search-replace. -- GreenC 04:09, 3 May 2023 (UTC)
User:Vipz, done example Special:Diff/1153279062/1153744731. -- GreenC 03:00, 8 May 2023 (UTC)

Discovery Kids and Familia link rot

I dont know if this is the right place to put this here but these Discovery kids links (discoverykidsbrasil.uol.com.br, tudiscoverykids.com and discoverykidsplay.com) are downloading strange files, and Discovery familia link (tv.discoveryfamilia.com) just redirects to warner bros discovery page, i already did the Enwiki and maybe add these websites to iabot blocklist, thanks Notrealname1234 (talk) 02:47, 6 May 2023 (UTC)

User:Notrealname1234, thank you for adding archives on enwiki. They are all now permadead/blacklisted in IABot. -- GreenC 03:08, 8 May 2023 (UTC)

epaperbeta.timesofindia.com

It seems the site no longer works, but has archives like this. Kailash29792 (talk) 06:45, 6 May 2023 (UTC)

Done. There are a lot of {{dead link}} for some reason, no archives available maybe 40 or 50% -- GreenC 14:44, 8 May 2023 (UTC)

Usurped url

I have found out that realtvnews.com.ar got usurped, i already changed the link from boomerang latin america page, please change the links, thanks Notrealname1234 (talk) 23:06, 11 May 2023 (UTC)

Done. -- GreenC 01:04, 12 May 2023 (UTC)

Broken link

jetixeurope.com is another link that is broken, this link is spread across jetix pages, please change them, thanks Notrealname1234 (talk) 16:26, 12 May 2023 (UTC)

Done. -- GreenC 19:39, 12 May 2023 (UTC)

When fixing dead links in A Sister's All You Need, I noticed some pages from http://www.b2c.hachettebookgroup.com/ are now at https://yenpress.com/

Example:

PianoPoet (talk) 06:55, 15 March 2023 (UTC)

~50 pages, I can probably write a semi-automated script for this with regex. It’s likely best to do the raw regex replacement for the url parameter and remove the url status, archive url, etc, parameters, and run iabot to re add those. If someone can tackle this before I get a chance, feel free. EpicPupper (talk) 20:27, 11 April 2023 (UTC)
It's already done, sorry I didn't post a reply earlier. -- GreenC 22:48, 11 April 2023 (UTC)
Actually I see a problem. I thought there was a redirect and since none existed didn't do the conversion (only treated as a dead domain), but now see is possible to do the conversion with info from the old URL. The method you gave might work but it's kind of awkward with a custom script since there are CS1|2 templates, square and bare URLs to deal with, plus {{dead link}} templates to manage, plus checking if the new URL is working or not prior to replacement. That's part of what WaybackMedic manages among other things. I'll re-run this with the conversion to yenpress -- GreenC 23:02, 11 April 2023 (UTC)
@GreenC any updates on this? Thanks! Frostly (talk) 23:37, 12 May 2023 (UTC)

Twitter may kill accounts over 30 days old - archive bot needed for twitter.com usage

Twitter just changed its rules to say that accounts that aren't logged into for 30 days are liable to be deleted. I've seen claims that some accounts have already been deleted.

We have a tremendous number of Twitter links in references - 57,972 articles have twitter.com links as I write this. Could someone please point an archive bot at them?

There's also {{cite tweet}}, but I understand that's already being attended to by an archive bot.

It's possible that Twitter will backtrack on this rule. But we should archive this stuff anyway - Twitter is already a much more fragile and unreliable platform than it was in October 2022. - David Gerard (talk) 19:14, 12 May 2023 (UTC)

When you say "we should archive" do you mean find all the links on Wikipedia that point to Twitter, and add them to the Wayback Machine? Or find all those links on Wikipedia, and find Wayback Machine links for them, and add those to Wikipedia? The former is done automatically by other process. The later would be treated like any dead link - it would die and the bots would detect it and add an archive URL. Generally, the WaybackMachine has excellent coverage of well known sites, and if there are dead links the bots will add archive URLs naturally through normal processing. Since this is an ongoing thing (30 day account deletions) we want it to work automatically, so we don't have to do special runs every 30 days, because new Twitter links are being added to Wikipedia every day, so any bespoke archiving efforts now would miss those future link additions. -- GreenC 20:22, 12 May 2023 (UTC)
does a bot have to be made for this? @GreenC Notrealname1234 (talk) 20:31, 12 May 2023 (UTC)
No - everything working as normal. GreenC 20:33, 12 May 2023 (UTC)
i forgot about iabot, maybe find twitter links then add them to wayback machine would be a good idea since most tweets are not archived, and some of them are blocked (check https://wiki.archiveteam.org/index.php/List_of_websites_excluded_from_the_Wayback_Machine/Partial_exclusions/Twitter_accounts) (use archive.today for them) @GreenC Notrealname1234 (talk) 20:37, 12 May 2023 (UTC)
most tweets are not archived - links added to Wikipedia get archived, including Twitter. -- GreenC 21:01, 12 May 2023 (UTC)
oh ok, if some tweets are blocked for wayback machine, please use archive.today for them (check the link i sent above) @GreenC Notrealname1234 (talk) 21:40, 12 May 2023 (UTC)
Tweets aren't archived in IA unless and until someone (or some bot) happens to push the button. I admit I have no idea how well covered in IA our links to Twitter are! Also, a lot of IA links to tweets fail to retrieve the tweets correctly in my experience - may be worth doing an archive.today copy too. (I admit I'm suggesting work to others I've little idea how to set up.) - David Gerard (talk) 22:36, 12 May 2023 (UTC)
There are processes, created and manged by Internet Archive, that monitor every edit to Wikipedia (all projects and languages), look for URLs, check if that URL is in Wayback, and if not add it to the Wayback. It doesn't skip Twitter. -- GreenC 03:07, 13 May 2023 (UTC)
Courtesy ping to @TheresNoTime who has an open BRFA relating to the matter. Frostly (talk) 23:39, 12 May 2023 (UTC)
@Frostly that user is sleeping Notrealname1234 (talk) 01:26, 13 May 2023 (UTC)
Then "find all those links on Wikipedia, and find Wayback Machine links for them, and add those to Wikipedia?" is a good idea then. @GreenC Notrealname1234 (talk) 01:10, 14 May 2023 (UTC)
Created a Phabricator ticket requesting IABot support for Cite Tweet. Frostly (talk) 20:30, 14 May 2023 (UTC)
any updates? Notrealname1234 (talk) 21:56, 17 May 2023 (UTC)
I am running a similar bot at NLwiki, for all kinds of websites articles link to. The primary goal is to archive linked websites before they become dead. Despite InternetArchiveBot running, it still saves websites to Wayback Machine: see example edits mentioning "# link(s) gearchiveerd". It adds the archive links to the articles too, but I can switch this off here if this is unwanted. Wikiwerner (talk) 13:00, 29 May 2023 (UTC)