Winamp Developer Wiki:Community Portal
Welcome to the community portal. This is the place to find out what is happening on the Winamp Wiki! Learn what tasks need to be done and share news about recent events or current activities.
Dealing with wiki spam
As I'm sure you've noticed, this wiki seems to get a fair amount of spam. How can we deal with this? A quick google search turned up the following pages:
- Basic anti-spam features of mediawiki installations (a neat overview)
- More overview
- Several tips for stopping spam pages with divs
Do you think we could use any of these tips to cut down our spam count? For example, many of the spam pages seem to use the same div tag to prevent viewing of the 'edit' tab. Could we put this tag in our local settinsg spam regex?
I'm not sure if any sysops have time to work on this right now, but I thought I would start a page for discussion. Thanks for reading. --Culix 06:57, 19 June 2009 (UTC)
- Okay, I tried a basic test - I think it is the div text 'position:absolute;' that is hiding the edit tab (or at least helping). If you look at this page before the edit, I am unable to see the edit tab in FireFox 3.0.11. I am, however, able to view the edit tab on this page after the edit. And the only difference is removing the text 'position:absolute;' from the div tag.
- Could we use this text to perhaps prevent spammers from saving the page? Or is such an action futile if they just quickly change their spam template? It might raise the bar a little bit. --Culix 07:09, 19 June 2009 (UTC)
- So it looks like most of our pages are spam rather than real content :( I think we need a way to deal with these quickly rather than trying to combat them all by hand. How do you feel about creating a bot account and giving it access to delete pages? We could use something like this deletion-helper bot to go through the pages. There are some instructions for using the bot on non-wikimedia projects. I'm willing to try and use the bot if admins are okay with me creating an account for it. --Culix 13:01, 26 June 2009 (UTC)
- Okay, I manually deleted enough pages to make most of the first 50 Popular pages point to actual content. With 40,000 pages in the wiki though, it looks like 99% of them are spam, and dealing with all of those by hand would be tedious. After some off-wiki discussion with Gistbane, we have a battle plan: some filters will be added to the wiki's blacklist regex, and Gistbane set up an anti-spam bot account to help delete pages.
- Based on my first test, it looks like it takes about 2.5 hours to delete 1000 pages using the deletion script, so that's roughly 100 hours of running the script if we want to clean the whole wiki. This may take a while, but I'll try to run the script for a few hours every day and see how it goes. --Culix 13:08, 30 June 2009 (UTC)
Okay, it took 55 hours of running the bot, but 20,206 spam pages have been deleted :) I think this is most of them, not counting user pages (counting user pages it's only about 50%).
Methodology: I started by going through Special:AllPages and made a big list of every page on the wiki. I searched through these by hand to try and spot real pages to make sure no real content would be deleted, even if it was an orphan. Everything else was put into 'the spam list'. I then started from the Main Page and spidered through all real links to actual content (about 100 pages). I cross-referenced this list with the spam list to make sure no known good pages were marked as spam.
I think my method worked alright, but if you know of a legitimate page that was deleted please let me know! Seeing as the wiki was 0.25% actual content, it's quite possible a page or two was missed ;)
If no one has objections I am also going to run the bot against all redirects that are user pages. I see exactly one user page that is a legitimate redirect, and 20,000+ that point to (now nonexistant) spam pages. --Culix 07:23, 8 July 2009 (UTC)