Scripts to export the SPIP MySQL database of the current website to plain text Markdown files with YAML front-matter metadata.
Go to file
2023-06-15 17:00:59 +02:00
scripts reorganization 2023-04-21 08:54:41 +00:00
spip2md execution instructions, remove typing extensions 2023-06-15 17:00:59 +02:00
.dockerignore init testing dockerfile 2023-06-14 17:19:35 +02:00
.gitignore internal links almost ok + repair û chars + lighter output for unknown chars 2023-05-24 15:32:03 +02:00
COPYING fix licence & add note on each file 2023-06-15 15:28:37 +02:00
dockerfile init testing dockerfile 2023-06-14 17:19:35 +02:00
LICENSE fix licence & add note on each file 2023-06-15 15:28:37 +02:00
poetry.lock execution instructions, remove typing extensions 2023-06-15 17:00:59 +02:00
pyproject.toml execution instructions, remove typing extensions 2023-06-15 17:00:59 +02:00
README.md execution instructions, remove typing extensions 2023-06-15 17:00:59 +02:00

SPIP Database to Markdown

spip2md is a litle Python app that can export a SPIP database into a plain text, Markdown + YAML repository, usable with static site generators.

Features

spip2md is currently able to:

  • Export every section (spip_rubriques), with every article (spip_articles) they contain
    • Replace authors (spip_auteurs) IDs with their name (in YAML block)
    • Generate different files for each language found in <multi> blocks
    • Copy over all the attached files (spip_documents), with proper links
    • Convert SPIP Markup language
    • Convert SPIP ID-based internal links (like <art123>) into path-based, normal links

Dependencies

spip2md needs Python version 3.9 or supperior.

spip2md uses three Python libraries(as defined in pyproject.toml):

  • Peewee, with a database connection for your database:
    • pymysql (MySQL/MariaDB)
  • PyYaml
  • python-slugify (unidecode)

Usage

First make sure you have the dependencies installed (system-wide or in a Python virtual-environment).

Next, make sure you have access to the SPIP database you want to export on a MySQL/MariaDB server. By default, spip2md expects a database named spip hosted on localhost, with a user named spip of which password is password, but you can totally configure this as well as other settings in the YAML config file.

If you want to copy over attached files like images, you also need access to the data directory of your SPIP website, usually named IMG, and either rename it data in your current working directory, or set data_dir setting to its path.

Currently, the config file can be given as the only CLI parameter, or if no parameter is given, the program searches a spip2md.yml file in the current working directory. Heres the configuration options:

db: Name of the database (default is spip)
db_host: Host of the database (default is localhost)
db_user: The database user (default is spip)
db_pass: The database password (default is password)
data_dir: The directory in which SPIP images & files are stored
export_languages: Array of languages to export (default is ["en",])
output_dir: The directory in which files will be written (default is output/)
prepend_h1: Should spip2md prepend the title of articles as Markdown h1 (default true)
prepend_id: Whether to prepend ID of the object to directory slug
prepend_lang: Whether to prepend lang of the object to directory slug
export_drafts: Should we export drafts (default true)
remove_html: Should we clean remaining HTML blocks (default true)
unknown_char_replacement: Broken encoding that cannot be repaired is replaced with that
clear_log: Clear logfile between runs instead of appending to (default false)
clear_output: Clear output dir between runs instead of merging into (default false)
logfile: Name of the logs file (default is spip2md.log)

TODO

These tables seem to contain not-as-useful information, but this needs to be investicated:

  • spip_evenements
  • spip_meta
  • spip_mots
  • spip_syndic_articles
  • spip_mots_liens
  • spip_zones_liens
  • spip_groupes_mots
  • spip_meslettres
  • spip_messages
  • spip_syndic
  • spip_zones

These tables seem technical, SPIP specific:

  • spip_depots
  • spip_depots_plugins
  • spip_jobs
  • spip_ortho_cache
  • spip_paquets
  • spip_plugins
  • spip_referers
  • spip_referers_articles
  • spip_types_documents
  • spip_versions
  • spip_versions_fragments
  • spip_visites
  • spip_visites_articles

These tables are empty:

  • spip_breves
  • spip_evenements_participants
  • spip_forum
  • spip_jobs_liens
  • spip_ortho_dico
  • spip_petitions
  • spip_resultats
  • spip_signatures
  • spip_test
  • spip_urls