Mirroring Wikipedia
Appearance
This page will talk about mirroring an install of Wikipedia on your own hardware. This How-To is built with MediaWiki version 1.17.0 in mind.
Installing Required packets[edit | edit source]
On Fedora you will need to install the following:
yum install httpd mysql-server mysql php php-pdo perl-DBD-MySQL php-xml
Configuring mySQL[edit | edit source]
Add or modify the /etc/my.cf file to add the following entry
max_allowed_packet=128M
Building the MediaWiki Install[edit | edit source]
To start with, we need an installed version of MediaWiki, you can find the current version at http://www.mediawiki.org/wiki/Download
cd /var/www/ wget http://download.wikimedia.org/mediawiki/1.17/mediawiki-1.17.0.tar.gz tar -xzf mediawiki-1.17.0.tar.gz -C wikipedia chown -R apache:apache /var/www/wikipedia
Needed Plugins for Wikipedia[edit | edit source]
Via SVN[edit | edit source]
svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/CategoryTree/ svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/CharInsert svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/Cite/ svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/ExpandTemplates svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/SyntaxHighlight_GeSHi/ svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/Poem/ svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/OpenSearchXml/ svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/WikiEditor/ svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/wikihiero/ svn checkout http://svn.wikimedia.org/svnroot/mediawiki/branches/REL1_17/extensions/Vector/
A list of needed extensions can be found on MediaWiki site here: http://www.mediawiki.org/wiki/Category:Extensions_used_on_Wikimedia
- Cite - Adds two parser hooks to MediaWiki, <ref> and <references />; these operate together to add citations to pages.
Downloading the data[edit | edit source]
Download & Import Script:
#!/bin/bash DOWNLOADDIR=/var/www/wikipedia-downloads MEDIAWIKIDIR=/var/www/wikipedia SQLNAME='wikipedia_db' ######################################## DATE=$1 DIR=$DOWNLOADDIR/$DATE mkdir -p $DIR cd $DIR echo "Date: $DATE" echo "##############################################" wget -c http://dumps.wikimedia.org/enwiki/$DATE/enwiki-$DATE-md5sums.txt echo "Finished downloading MD5 file" echo "" for a in `egrep "enwiki-........-pages-articles.\.xml\.bz2|enwiki-........-pages-articles..\.xml\.bz2" enwiki-$DATE-md5sums.txt |awk '{print $2}'` do echo "" echo "##############################################" echo "Working on: $a" echo "" wget -c http://dumps.wikimedia.org/enwiki/$DATE/$a php $MEDIAWIKIDIR/maintenance/initStats.php --update php $MEDIAWIKIDIR/maintenance/importDump.php $DIR/$a /sbin/service mysqld restart done echo "UPDATE site_stats SET ss_total_views = 0 WHERE ss_row_id = 1; UPDATE page SET page_counter = 0;" |mysql $SQLNAME php $MEDIAWIKIDIR/maintenance/initStats.php --update php $MEDIAWIKIDIR/maintenance/rebuildrecentchanges.php echo "##############################################" echo "Done!!!"
Importing the data into MeidaWiki[edit | edit source]
Scripts[edit | edit source]
Script to download the needed files:
for a in 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 10 9 8 7 6 5 4 3 2 1 do cd /var/www/wikipedia-downloads/20110620/ wget -c http://download.wikimedia.org/enwiki/20110620/enwiki-20110620-pages-articles$a.xml.bz2 bzip2 -d /var/www/wikipedia-downloads/20110620/enwiki-20110620-pages-articles$a.xml.bz2 done
Script for importing the data (this can take days):
for a in 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 10 9 8 7 6 5 4 3 2 1 do echo "----------- Working on enwiki-20110620-pages-articles$a.xml -----------" php /var/www/wikipedia.mattrude.com/maintenance/importDump.php /var/www/wikipedia-downloads/20110620/enwiki-20110620-pages-articles$a.xml php /var/www/wikipedia.mattrude.com/maintenance/initStats.php --update done php maintenance/rebuildrecentchanges.php