(👑=top-quality, ✅=good, ❌=bad, ⛔=deprecated)
About this project
Visuddhimagga is one of the most important commenterial works in Theravada, yet, in 2023, we still don't have a good electronic edition. Buddhist Publication Society (BPS) created and published one in 2011, but released it in print and PDF hardcopy. Sadhu for their work, but it is still not good enough for electronic devices, let alone future-proof.
Vimuttimagga is an earlier meditation manual, with strong influence on Visuddhimagga. Recently, a new translation by Bhikkhu Nyanatusita was published, but is only available in paper form. Therefore, this work is based on the translation by Soma & Kheminda Theras.
Providing high-quality machine-readable Visuddhimagga, plus several machine-generated electronic editions (such as HTML, ePub, PDF). The primary focus is English reader, not Pali scholar.
- (0. ✅ politics)
- Is this legal? I contacted BPS who has the © (no reply so far). I believe BPS is motivated by the benefit of all practitioners and there won't be any objections. It would be nice to have green light (or even source data) from BPS. Eventually, sites like accesstoinsight.org might offer “our” electronic version, in addition to their PDF.
- 1. ✅ semantic representation
- The BPS edition was used as a basis for ≈TEI representation of the contents (≈ denotes light customization); TEI is academic XML-based standard for this kind of work. The result are quite good semantic data (that is, hyperlinks, paragraphs, sections, cross-references &c are tagged as such). Outputs generated from these data show their soundness.
- 2. ✅ verification
- This seems to be done: making sure that the current data are complete and did not suffer from a structural problem during the automated conversion.
- 3. ✅ automation
- The repository had CI (continuous integration) which produces various formats. Tuning those formats is not yet the objective, but they look good and are already usable.
- 4. 🔧 corrections
- Hand-editing the data to fix non-structural errors mentioned above. LaTeX- and Sphinx-based formats have issue hyperlinks on page numbers and § labels — thus it is easy to help. Those include:
- ✅ obvious errors in the BPS edition (few);
- ✅ trailing lines broken off paragraphs (since pdfminer's pdf2txt uses lots of heuristics);
- cross-references which were not parsed correctly;
- ✅ incorrectly parsed entries in the index and glossary;
- ✅ some (sub)section header are missing, as they were not parsed correctly;
- tables: format-specific;
- ✅ tables in the introduction (not particularly pretty)
- appendix tables (not a priority).
- 5. 🔧 enhancements
- Routines to enhance the data, such as: detecting more hyperlinks (such as hyperlinks to precise locations in online suttas for Tipitaka references).
- 6. ✨ publication
- Final and fine-tuned outputs.
The basis data is OCRed scan of the print, with is semantically tagged and saved in ODT (LibreOffice) format. This format is used as starting point (ODT → ≈TEI → other outputs, using the same machinery as Visuddhimagga. The following tasks are outstanding:
- 1. Improve sectioning within chapters
- The sectioning is flat and relatively difficult to navigate. Sectioning in the Nyanatusita's edition (not yet looked at) could be an inspiration.
- 2. Fix bibliographical references
- Convert quotation locations in the canon to modern format (so that they can hyperlink to suttacentral.net)
- 3. Correct Pali quotations in footnotes
- Use current electronic Pali texts to fuzzy-match footnotes and fix any typos there (possibly many due to OCR)
How can you help?
If you spot anything wrong, file an issue at github or open a pull request with corrected data.
If you want to work on any of the other points, go ahead and let me know!