Adding Search to My Pelican Blog with Datasette

Last summer I migrated my blog from Wordpress to Pelican. I did this for a couple of reasons (see my post here), but one thing that I was a bit worried about when I migrated was that Pelican's offering for site search didn't look promising.

There was an outdated plugin called tipue-search but when I was looking at it I could tell it was on it's last legs.

I thought about it, and since my blag isn't super high trafficked AND you can use google to search a specific site, I could wait a bit and see what options came up.

After waiting a few months, I decided it would be interesting to see if I could write a SQLite utility to get the data from my blog, add it to a SQLite database and then use datasette to serve it up.

I wrote the beginning scaffolding for it last August in a utility called pelican-to-sqlite, but I ran into several technical issues I just couldn't overcome. I thought about giving up, but sometimes you just need to take a step away from a thing, right?

After the first of the year I decided to revisit my idea, but first looked to see if there was anything new for Pelican search. I found a tool plugin called search that was released last November and is actively being developed, but as I read through the documentation there was just A LOT of stuff:

stork
requirements for the structure of your page html
static asset hosting
deployment requires updating your nginx settings

These all looked a bit scary to me, and since I've done some work using datasette I thought I'd revisit my initial idea.

My First Attempt

As I mentioned above, I wrote the beginning scaffolding late last summer. In my first attempt I tried to use a few tools to read the md files and parse their yaml structure and it just didn't work out. I also realized that Pelican can have reStructured Text and that any attempt to parse just the md file would never work for those file types.

My Second Attempt

The Plugin

During the holiday I thought a bit about approaching the problem from a different perspective. My initial idea was to try and write a datasette style package to read the data from pelican. I decided instead to see if I could write a pelican plugin to get the data and then add it to a SQLite database. It turns out, I can, and it's not that hard.

Pelican uses signals to make plugin in creation a pretty easy thing. I read a post and the documentation and was able to start my effort to refactor pelican-to-sqlite.

From The missing Pelican plugins guide I saw lots of different options, but realized that the signal article_generator_write_article is what I needed to get the article content that I needed.

I then also used sqlite_utils to insert the data into a database table.

def save_items(record: dict, table: str, db: sqlite_utils.Database) -> None:  # pragma: no cover
    db[table].insert(record, pk="slug", alter=True, replace=True)

Below is the method I wrote to take the content and turn it into a dictionary which can be used in the save_items method above.

def create_record(content) -> dict:
    record = {}
    author = content.author.name
    category = content.category.name
    post_content = html2text.html2text(content.content)
    published_date = content.date.strftime("%Y-%m-%d")
    slug = content.slug
    summary = html2text.html2text(content.summary)
    title = content.title
    url = "https://www.ryancheley.com/" + content.url
    status = content.status
    if status == "published":
        record = {
            "author": author,
            "category": category,
            "content": post_content,
            "published_date": published_date,
            "slug": slug,
            "summary": summary,
            "title": title,
            "url": url,
        }
    return record

Putting these together I get a method used by the Pelican Plugin system that will generate the data I need for the site AND insert it into a SQLite database

def run(_, content):
    record = create_record(content)
    save_items(record, "content", db)

def register():
    signals.article_generator_write_article.connect(run)

The html template update

I use a custom implementation of Smashing Magazine. This allows me to do some edits, though I mostly keep it pretty stock. However, this allowed me to make a small edit to the base.html template to include a search form.

In order to add the search form I added the following code to base.html below the nav tag:

    <section class="relative h-8">
    <section class="absolute inset-y-0 right-10 w-128">
    <form
    class = "pl-4"
    <
    action="https://search-ryancheley.vercel.app/pelican/article_search?text=name"
    method="get">
            <label for="site-search">Search the site:</label>
            <input type="search" id="site-search" name="text"
                    aria-label="Search through site content">
            <button class="rounded-full w-16 hover:bg-blue-300">Search</button>
    </form>
    </section>

Putting it all together with datasette and Vercel

Here's where the magic starts. Publishing data to Vercel with datasette is extremely easy with the datasette plugin datasette-publish-vercel.

You do need to have the Vercel cli installed, but once you do, the steps for publishing your SQLite database is really well explained in the datasette-publish-vercel documentation.

One final step to do was to add a MAKE command so I could just type a quick command which would create my content, generate the SQLite database AND publish the SQLite database to Vercel. I added the below to my Makefile:

vercel:
    { \
    echo "Generate content and database"; \
    make html; \
    echo "Content generation complete"; \
    echo "Publish data to vercel"; \
    datasette publish vercel pelican.db --project=search-ryancheley --metadata metadata.json; \
    echo "Publishing complete"; \
    }

The line

datasette publish vercel pelican.db --project=search-ryancheley --metadata metadata.json; \

has an extra flag passed to it (--metadata) which allows me to use metadata.json to create a saved query which I call article_search. The contents of that saved query are:

select summary as 'Summary', url as 'URL', published_date as 'Published Data' from content where content like '%' || :text || '%' order by published_date

This is what allows the action in the form above to have a URL to link to in datasette and return data!

With just a few tweaks I'm able to include a search tool, powered by datasette for my pelican blog. Needless to say, I'm pretty pumped.

Next Steps

There are still a few things to do:

separate search form html file (for my site)
formatting datasette to match site (for my vercel powered instance of datasette)
update the README for pelican-to-sqlite package to better explain how to fully implement
Get pelican-to-sqlite added to the pelican-plugins page