update readme

This commit is contained in:
Guilhem Fauré 2023-06-06 10:44:08 +02:00
parent 32844f451b
commit 3c2e6597fc
2 changed files with 89 additions and 74 deletions

145
README.md
View File

@ -1,78 +1,93 @@
---
lang: en
---
# SPIP Database to Markdown
Python scripts to export the SPIP MySQL database of the current website to plain text Markdown files with YAML front-matter metadata.
## Notes on exporting the SPIP MySQL data to Markdown files
There are 40 tables, of which:
`spip2md` is a litle Python app that can export a SPIP database into a plain text,
Markdown + YAML repository, usable with static site generators.
- 8 contain the major part of the data
- 4 are relations between other tables
- 5 contain as few data as global parameters
- 13 seems to be technical information specific to SPIP
- 10 are completely empty
## Features
### Tables & Database schema
Elements to take into account:
`spip2md` is currently able to:
- Export every section (`spip_rubriques`), with every article (`spip_articles`) they
contain
- Replace authors (`spip_auteurs`) IDs with their name (in YAML block)
- Generate different files for each language found in `<multi>` blocks
- Copy over all the attached files (`spip_documents`), with proper links
- Convert SPIP [Markup language](https://www.spip.net/fr_article1578.html)
- Convert SPIP ID-based internal links (like `<art123>`) into path-based, normal links
## Usage
To use the app, simply run the command `spip2md`. However, you probably want to
configure certain settings before running it, like the database credentials.
Here are the available _configuration options_, to put in a `spip2md.yml` file:
```yaml
db: Name of the database (default is spip)
db_host: Host of the database (default is localhost)
db_user: The database user (default is spip)
db_pass: The database password (default is password)
data_dir: The directory in which SPIP images & files are stored
export_languages: Array of languages to export (default is ["en",])
output_dir: The directory in which files will be written (default is output/)
prepend_h1: Should spip2md prepend the title of articles as Markdown h1 (default true)
prepend_id: Whether to prepend ID of the object to directory slug
prepend_lang: Whether to prepend lang of the object to directory slug
export_drafts: Should we export drafts (default true)
remove_html: Should we clean remaining HTML blocks (default true)
unknown_char_replacement: Broken encoding that cannot be repaired is replaced with that
clear_log: Clear logfile between runs instead of appending to (default false)
clear_output: Clear output dir between runs instead of merging into (default false)
logfile: Name of the logs file (default is spip2md.log)
```
## External links
- SPIP [Markup language](https://www.spip.net/fr_article1578.html)
- SPIP [Database structure](https://www.spip.net/fr_article713.html)
- SPIP [HTML templates](https://www.spip.net/fr_article879.html)
#### Main tables, with a lot of data
These tables contains a lot of data. Each row will probably correspond to one Markdown file.
## TODO
- spip_articles
- spip_auteurs
- spip_documents
- spip_evenements
- spip_meta
- spip_mots
- spip_rubriques
- spip_syndic_articles
These tables could represent additional data to export:
#### Relational tables, making links between main tables
These tables join information between main tables. They will probably correspond to entries in YAML front-matters.
- `spip_evenements`
- `spip_meta`
- `spip_mots`
- `spip_syndic_articles`
- `spip_mots_liens`
- `spip_zones_liens`
- spip_auteurs_liens
- spip_documents_liens
- spip_mots_liens
- spip_zones_liens
- `spip_groupes_mots`
- `spip_meslettres`
- `spip_messages`
- `spip_syndic`
- `spip_zones`
#### Tables with little data
These tables contains a few rows. They will probably correspond to global configuration files in static website.
- `spip_depots`
- `spip_depots_plugins`
- `spip_jobs`
- `spip_ortho_cache`
- `spip_paquets`
- `spip_plugins`
- `spip_referers`
- `spip_referers_articles`
- `spip_types_documents`
- `spip_versions`
- `spip_versions_fragments`
- `spip_visites`
- `spip_visites_articles`
- spip_groupes_mots
- spip_meslettres
- spip_messages
- spip_syndic
- spip_zones
These tables are empty:
#### Technical tables
These tables contain technical information that is probably specific to SPIP or the system on which it is installed.
- spip_depots
- spip_depots_plugins
- spip_jobs
- spip_ortho_cache
- spip_paquets
- spip_plugins
- spip_referers
- spip_referers_articles
- spip_types_documents
- spip_versions
- spip_versions_fragments
- spip_visites
- spip_visites_articles
#### Empty tables
These tables are empty, so they dont need to be treated.
- spip_breves
- spip_evenements_participants
- spip_forum
- spip_jobs_liens
- spip_ortho_dico
- spip_petitions
- spip_resultats
- spip_signatures
- spip_test
- spip_urls
- `spip_breves`
- `spip_evenements_participants`
- `spip_forum`
- `spip_jobs_liens`
- `spip_ortho_dico`
- `spip_petitions`
- `spip_resultats`
- `spip_signatures`
- `spip_test`
- `spip_urls`

View File

@ -19,21 +19,21 @@ class Configuration:
db_host: str = "localhost" # Where is the DB
db_user: str = "spip" # A DB user with read access to SPIP database
db_pass: str = "password" # Password of db_user
output_dir: str = "output/" # The directory to which DB will be exported
export_languages = ("en",) # Languages that will be exported
data_dir: str = "data/" # The directory in which SPIP images & documents are stored
output_dir: str = "output/" # The directory to which DB will be exported
prepend_h1: bool = True # Add the title of the article as a Markdown h1
prepend_id: bool = True # Add the ID of object before slug
prepend_lang: bool = False # Add the lang of object before slug
unknown_char_replacement: str = "??" # Replaces unknown characters
export_languages = ("fr", "en") # Languages that will be exported
export_filetype: str = "md" # Extension of exported text files
export_drafts: bool = True # Should we export drafts as draft:true articles
clear_output: bool = False # Remove eventual output dir before running
clear_log: bool = False # Clear log before every run instead of appending to
logfile: str = "spip2md.log" # File where logs will be written, relative to wd
logname: str = "spip2md" # Labelling of logs
loglevel: str = "WARNING" # Minimum criticity of logs written in logfile
remove_html: bool = True # Should spip2md remove every HTML tags
unknown_char_replacement: str = "??" # Replaces unknown characters
clear_log: bool = False # Clear log before every run instead of appending to
clear_output: bool = False # Remove eventual output dir before running
logfile: str = "spip2md.log" # File where logs will be written, relative to wd
loglevel: str = "WARNING" # Minimum criticity of logs written in logfile
logname: str = "spip2md" # Labelling of logs
export_filetype: str = "md" # Extension of exported text files
max_articles_export: int = 1000 # TODO reimplement
max_sections_export: int = 500 # TODO reimplement