Mirroring Wikipedia

This page will talk about mirroring an install of Wikipedia on your own hardware. This How-To is built with MediaWiki version 1.17.0 in mind.

Installing Required packets
On Fedora you will need to install the following: yum install httpd mysql-server mysql php php-pdo perl-DBD-MySQL php-xml

Configuring mySQL
Add or modify the /etc/my.cf file to add the following entry max_allowed_packet=128M

Building the MediaWiki Install
To start with, we need an installed version of MediaWiki, you can find the current version at http://www.mediawiki.org/wiki/Download cd /var/www/ wget http://download.wikimedia.org/mediawiki/1.17/mediawiki-1.17.0.tar.gz tar -xzf mediawiki-1.17.0.tar.gz -C wikipedia chown -R apache:apache /var/www/wikipedia

Via SVN
svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/CategoryTree/ svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/CharInsert svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/Cite/ svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/ExpandTemplates svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/SyntaxHighlight_GeSHi/ svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/Poem/ svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/OpenSearchXml/ svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/WikiEditor/ svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/wikihiero/ svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/Vector/

A list of needed extensions can be found on MediaWiki site here: http://www.mediawiki.org/wiki/Category:Extensions_used_on_Wikimedia
 * Cite - Adds two parser hooks to MediaWiki,   and  ; these operate together to add citations to pages.

Downloading the data
Download & Import Script: DOWNLOADDIR=/var/www/wikipedia-downloads MEDIAWIKIDIR=/var/www/wikipedia SQLNAME='wikipedia_db' DATE=$1 DIR=$DOWNLOADDIR/$DATE mkdir -p $DIR cd $DIR echo "Date: $DATE" echo "##############################################" wget -c http://dumps.wikimedia.org/enwiki/$DATE/enwiki-$DATE-md5sums.txt echo "Finished downloading MD5 file" echo "" for a in `egrep "enwiki-........-pages-articles.\.xml\.bz2|enwiki-........-pages-articles..\.xml\.bz2" enwiki-$DATE-md5sums.txt |awk '{print $2}'` do  echo "" echo "##############################################" echo "Working on: $a" echo "" wget -c http://dumps.wikimedia.org/enwiki/$DATE/$a php $MEDIAWIKIDIR/maintenance/initStats.php --update php $MEDIAWIKIDIR/maintenance/importDump.php $DIR/$a /sbin/service mysqld restart done echo "UPDATE site_stats SET ss_total_views = 0 WHERE ss_row_id = 1; UPDATE page SET page_counter = 0;" |mysql $SQLNAME php $MEDIAWIKIDIR/maintenance/initStats.php --update php $MEDIAWIKIDIR/maintenance/rebuildrecentchanges.php echo "##############################################" echo "Done!!!"
 * 1) !/bin/bash

Importing the data into MeidaWiki
= Scripts = Script to download the needed files: for a in 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 10 9 8 7 6 5 4 3 2 1 do  cd /var/www/wikipedia-downloads/20110620/ wget -c http://download.wikimedia.org/enwiki/20110620/enwiki-20110620-pages-articles$a.xml.bz2 bzip2 -d /var/www/wikipedia-downloads/20110620/enwiki-20110620-pages-articles$a.xml.bz2 done Script for importing the data (this can take days): for a in 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 10 9 8 7 6 5 4 3 2 1 do  echo "---  Working on enwiki-20110620-pages-articles$a.xml  ---" php /var/www/wikipedia.mattrude.com/maintenance/importDump.php /var/www/wikipedia-downloads/20110620/enwiki-20110620-pages-articles$a.xml php /var/www/wikipedia.mattrude.com/maintenance/initStats.php --update done php maintenance/rebuildrecentchanges.php