As part of the fundraising drive last Dec (not last Dec, but last-last Dec... you know what I mean), I promised to import some of the old wotmania messageboards that Tor and I scraped from the site in its last few months.
It's taken me far, faaar longer than anticipated (approx 45 hours over the last two weeks alone), mostly due to the appalling state of the HTML I was trying to extract the data from. I also ended up needing to re-write most of the messageboard functionality to allow the archive posts to show properly.
You can view the archived boards in the new "Archive" section of the menu over on the left.
- [CMB2](/archive/cmb2/)
- [CMB3](/archive/cmb3/)
It's not possible to show the archive boards in the same way as the current ones, mostly due to the fact that there are no "recent" posts which can be displayed. Instead, the core homepage allows you to do some basic filtering of top-level threads, whereas the search page provide much more in-depth functionality to find specific posts.
There are a couple of things to watch out for:
- The base filter form on the top of the board page won't allow a date range over more than 31 days. This is to stop the server from dying a fiery death as it tries to process a LOT of data. Even a full month's worth of posts take a good few seconds to load. Keeping date ranges small is a good way to keep the website running quickly.
- If you have a premium account, you'll be able to use the "favourite post" functionality on the archived boards in the same way as you can on the normal boards.
- I've added the ability for comments/new posts to be added to these old threads. The comment form is at the bottom of the page when viewing a post. It works in the same way as comments on quickpolls/journals/bugs/etc; just a single unthreaded list of comments for the whole thread. You can use the "order" option on the filter form to order threads by the number of "new" replies they have.
- The way in which we saved the posts from wotmania means that on posts with over 100 responses, the sub-threading has been lost. The data I have allows me to identify the top-level responses within a particular thread, but not the sub-threading thereafter. These posts are instead shown with an ![exclamation](/site_media/images/messageboard/mbtips.gif) icon next to them, to show that it should be in a sub-thread.
- The quality of Mike's HTML was terrible, and, even worse, was different depending on the contents of the post. This means that I needed to extract data in a number of different ways, depending on just how badly messed up the code was. The end result of this is that the formatting of many posts just isn't quite right. In particular, edit links show even though they don't work, post links are shown twice, and there tends to be a lot of extra space at the end of the post. Signatures are also poorly formatted, and tend to be a lot bigger than they should be.
Still, it's good enough for now. If anyone's interested in a more indepth explanation of how we scraped the data in the first place, and how I went about extracting it and importing it into RAFO, let me know. It's all terribly exciting.
As a final note, you may notice that there is a big gap between the final post on CMB2 (14th Nov 2003) and the first post on CMB3 (9th Apr 2004). This is because, as per [Mike's first post](/archive/cmb3/1/), the original CMB3 got corrupted and was wiped out.
*MySmiley*
CrazedWeasel
"Do not waste time bothering whether you "love" your neighbor; act as if you did...When you are behaving as if you loved someone you will presently come to love him."-- C. S. Lewis