Following my previous post on a version of Wikipedia for Windows Mobile improved from the original Pocket Wikipedia 1.0 version by free-soft.ro, I decided to find a MobiPocket (PRC) version, to read on my Blackberry phone, Unfotunately I could not find a usable version - many versions I found, including PRC, are incomplete, with images stripped off, and not suitable for mobile viewing. There is also an expensive Wikipedia software for most mobile platforms. TomeRaider also offers a few free versions of wikipedia (with images, and compact without images around 50MB) of Wikipedia in its propietary format. As none of these suit my needs, I decided to go ahead with creating my own PRC version of Wikipedia.
The article database
I decided to use the same article database (Wikipedia.wi) as Pocket Wikipedia 1.0, which turns out to be the 2007 School Wikipedia selection. Although the source code was never released, the binary was not obfuscated and after a bit of decompiling using .NET Reflector, I was able to extract the articles and images from the 180MB SevenZip-compressed database.
Building the PRC ebook
My first thought was to rely on the MobiPocket Creator user interface. However, its UI is terrible - there is no way to add multiple HTML/image files at a time, you have to add them one by one. Even if drag and drop is supported, the application stops responding when a lot of files are added. I then decided to create the OPF file myself, then feed it into mobigen or kindlegen in order to create the final PRC file.
The source code to extract the articles and create the OPF file was written in .NET and can be downloaded here. Once the OPF is created, as there are more than 5000 articles and 24,000 images, kindlegen/mobigen takes more than 15 minutes on a 3Ghz processor to create the final PRC file.
Some of the articles contain Unicode characters (for example, various currency symbols) but were extracted and saved in ASCII format. I have tried various methods in System.Text.Encoding to convert to Unicode before saving without success. The only resolution I found is to use UTFCast Express (freeware) to convert the HTML files to Unicode before feeding them into kindlegen/mobigen.
The product: Wikipedia on a 214MB PRC file
The final compressed PRC file can be downloaded here. It's a multi-part RAR file, so you will need to download both parts to the same directory and use WinRar to extract the PRC file.
PocketWikipedia.part1.rar
PocketWikipedia.part2.rar
It contains all articles and images as in the original version, with a subject list, and an index where the titles of all articles can be looked up. As the title list is generated automatically by guessing the few words of the article, there are cases where the title are not retrieved properly, which can be resolved by editing the index manually in the OPF file before calling kindlegen to generate the PRC file.
Due to the large file size, some desktop versions of MobiPocker ebook reader may fail to open the file due to a Win32 exception. Mobile versions, in particular Blackberry and Windows Mobile, seem to open the PRC file properly.
UPDATE (8 Oct 2010): A Vietnamese reader has used my instructions to create an improved version of the Wikipedia ebook in PRC format. The new version, which fixes some font problems and has improved search support, can be downloaded here and here. It's a multi-part RAR file, so you will need to download both parts to the same directory and use WinRar to extract the original PRC file.
Thanks! This is great.
ReplyDeletewow! this is grest! betta tan any encyclipodia :)
ReplyDeletetahnk you very mouch i have been searching for it from long time also is there any way to read offline wikipedia on S60V2 mobile viz nokia 6600
ReplyDeletehey i have downloaded the files but it seems to be currepted
ReplyDeleteTry the version at the bottom of the article. Also you need to download both parts and use WinRar to extract it
ReplyDeleteThanks very much. I am downloading both and can't wait to see the results....
ReplyDeletewell its working great on even nokia 6600 with just 8mb of ram but now with nokia n82 (256mb)ram my expectaions increased as most of the terms related to medicine are not present perhaps due to the given wikipedia was 4 years old can you tell me where or how i can creat my own .mobi wikipedia from latest files by the way i am using linux since 3 years so if any linux way to do so is not a prob
ReplyDeleteadil, you can build your own version of wikipedia using the instructions in this article. The wikipedia database can be freely downloaded from Wikimedia website. However, I don't think you can find any versions specially designed for medical terms and the full version is far too big to put on a mobile phone. A proper approach would be to use a medical dictionary in PRC ebook format - there should be one if you look hard enough. Good luck.
ReplyDeletethanks you very much.
ReplyDeleteI have found it for a long time.
And I am also looking for an Oxford English grammar dictionary in prc version to use with my mobilephone.
An example of the dictionary is in this link
http://www.mobipocket.com/en/eBooks/...sp?BookID=8793
Through this article, I think that you can help me.
I am looking forward to seeing your reply.
Thank you for this useful article.
the full link is http://www.mobipocket.com/en/eBooks/eBookDetails.asp?BookID=8793
ReplyDeleteplease contact me via Y!M hungchuthanh@yahoo.com
thank you
well right now im using this wiki on my n73 and is working very fine also few of my friends are happy with speed as mobipocket work seamless in most of phones great but iam unable to make my new wikipedia as the one given by you is old yet working i hope that idf you get some time you could help us by creating a newer version of the same thank you and marry chrismas
ReplyDeleteadil, glad you find the wiki useful :) Will try to make a newer version when I have the time. :)
ReplyDelete