Adding Search to My Pelican Blog with Datasette

Last summer I migrated my blog from Wordpress to Pelican. I did this for a couple of reasons (see my post here), but one thing that I was a bit worried about when I migrated was that Pelican's offering for site search didn't look promising.

There was an outdated plugin called tipue-search but when I was looking at it I could tell it was on it's last legs.

I thought about it, and since my blag isn't super high trafficked AND you can use google to search a specific site, I could wait a bit and see what options came up.

After waiting a few months, I decided it would be interesting to see if I could write a SQLite utility to get the data from my blog, add it to a SQLite database and then use datasette to serve it up.

I wrote the beginning scaffolding for it last August in a utility called pelican-to-sqlite, but I ran into several technical issues I just couldn't overcome. I thought about giving up, but sometimes you just need to take a step away from a thing, right?

After the first of the year I decided to revisit my idea, but first looked to see if there was anything new for Pelican search. I found a tool plugin called search that was released last November and is actively being developed, but as I read through the documentation there was just A LOT of stuff:

  • stork
  • requirements for the structure of your page html
  • static asset hosting
  • deployment requires updating your nginx settings

These all looked a bit scary to me, and since I've done some work using datasette I thought I'd revisit my initial idea.

My First Attempt

As I mentioned above, I wrote the beginning scaffolding late last summer. In my first attempt I tried to use a few tools to read the md files and parse their yaml structure and it just didn't work out. I also realized that Pelican can have reStructured Text and that any attempt to parse just the md file would never work for those file types.

My Second Attempt

The Plugin

During the holiday I thought a bit about approaching the problem from a different perspective. My initial idea was to try and write a datasette style package to read the data from pelican. I decided instead to see if I could write a pelican plugin to get the data and then add it to a SQLite database. It turns out, I can, and it's not that hard.

Pelican uses signals to make plugin in creation a pretty easy thing. I read a post and the documentation and was able to start my effort to refactor pelican-to-sqlite.

From The missing Pelican plugins guide I saw lots of different options, but realized that the signal article_generator_write_article is what I needed to get the article content that I needed.

I then also used sqlite_utils to insert the data into a database table.

def save_items(record: dict, table: str, db: sqlite_utils.Database) -> None:  # pragma: no cover
    db[table].insert(record, pk="slug", alter=True, replace=True)

Below is the method I wrote to take the content and turn it into a dictionary which can be used in the save_items method above.

def create_record(content) -> dict:
    record = {}
    author = content.author.name
    category = content.category.name
    post_content = html2text.html2text(content.content)
    published_date = content.date.strftime("%Y-%m-%d")
    slug = content.slug
    summary = html2text.html2text(content.summary)
    title = content.title
    url = "https://www.ryancheley.com/" + content.url
    status = content.status
    if status == "published":
        record = {
            "author": author,
            "category": category,
            "content": post_content,
            "published_date": published_date,
            "slug": slug,
            "summary": summary,
            "title": title,
            "url": url,
        }
    return record

Putting these together I get a method used by the Pelican Plugin system that will generate the data I need for the site AND insert it into a SQLite database

def run(_, content):
    record = create_record(content)
    save_items(record, "content", db)

def register():
    signals.article_generator_write_article.connect(run)

The html template update

I use a custom implementation of Smashing Magazine. This allows me to do some edits, though I mostly keep it pretty stock. However, this allowed me to make a small edit to the base.html template to include a search form.

In order to add the search form I added the following code to base.html below the nav tag:

    <section class="relative h-8">
    <section class="absolute inset-y-0 right-10 w-128">
    <form
    class = "pl-4"
    <
    action="https://search-ryancheley.vercel.app/pelican/article_search?text=name"
    method="get">
            <label for="site-search">Search the site:</label>
            <input type="search" id="site-search" name="text"
                    aria-label="Search through site content">
            <button class="rounded-full w-16 hover:bg-blue-300">Search</button>
    </form>
    </section>

Putting it all together with datasette and Vercel

Here's where the magic starts. Publishing data to Vercel with datasette is extremely easy with the datasette plugin datasette-publish-vercel.

You do need to have the Vercel cli installed, but once you do, the steps for publishing your SQLite database is really well explained in the datasette-publish-vercel documentation.

One final step to do was to add a MAKE command so I could just type a quick command which would create my content, generate the SQLite database AND publish the SQLite database to Vercel. I added the below to my Makefile:

vercel:
    { \
    echo "Generate content and database"; \
    make html; \
    echo "Content generation complete"; \
    echo "Publish data to vercel"; \
    datasette publish vercel pelican.db --project=search-ryancheley --metadata metadata.json; \
    echo "Publishing complete"; \
    }

The line

datasette publish vercel pelican.db --project=search-ryancheley --metadata metadata.json; \

has an extra flag passed to it (--metadata) which allows me to use metadata.json to create a saved query which I call article_search. The contents of that saved query are:

select summary as 'Summary', url as 'URL', published_date as 'Published Data' from content where content like '%' || :text || '%' order by published_date

This is what allows the action in the form above to have a URL to link to in datasette and return data!

With just a few tweaks I'm able to include a search tool, powered by datasette for my pelican blog. Needless to say, I'm pretty pumped.

Next Steps

There are still a few things to do:

  1. separate search form html file (for my site)
  2. formatting datasette to match site (for my vercel powered instance of datasette)
  3. update the README for pelican-to-sqlite package to better explain how to fully implement
  4. Get pelican-to-sqlite added to the pelican-plugins page

Prototyping with Datasette

At my job I work with some really talented Web Developers that are saddled with a pretty creaky legacy system.

We're getting ready to start on a new(ish) project where we'll be taking an old project built on this creaky legacy system (VB.net) and re-implementing it on a C# backend and an Angular front end. We'll be working on a lot of new features and integrations so it's worth rebuilding it versus shoehorning the new requirements into the legacy system.

The details of the project aren't really important. What is important is that as I was reviewing the requirements with the Web Developer Supervisor he said something to the effect of, "We can create a proof of concept and just hard code the data in a json file to fake th backend."

The issue is ... we already have the data that we'll need in a MS SQL database (it's what is running the legacy version) it's just a matter of getting it into the right json "shape".

Creating a 'fake' json object that kind of/maybe mimics the real data is something we've done before, and it ALWAYS seems to bite us in the butt. We don't account for proper pagination, or the real lengths of data in the fields or NULL values or whatever shenanigans happen to befall real world data!

This got me thinking about Simon Willison's project Datasette and using it to prototype the API end points we would need.

I had been trying to figure out how to use the db-to-sqlite to extract data from a MS SQL database into a SQLite database and was successful (see my PR to db-to-sqlite here)

With this idea in hand, I reviewed it with the Supervisor and then scheduled a call with the web developers to review datasette.

During this meeting, I wanted to review:

  1. The motivation behind why we would want to use it
  2. How we could leverage it to do Rapid Prototyping
  3. Give a quick demo data from the stored procedure that did the current data return for the legacy project.

In all it took less than 10 minutes to go from nothing to a local instance of datasette running with a prototype JSON API for the web developers to see.

I'm hoping to see the Web team use this concept more going forward as I can see huge benefits for Rapid Prototyping of ideas, especially if you already have the data housed in a database. But even if you don't, datasette has tons of tools to get the data from a variety of sources into a SQLite database to use and then you can do the rapid prototyping!

My First Python Package

A few months ago I was inspired by Simon Willison and his project Datasette and it’s related ecosystem to write a Python Package for it.

I use toggl to track my time at work and I thought this would be a great opportunity use that data with Datasette and see if I couldn’t answer some interesting questions, or at the very least, do some neat data discovery.

The purpose of this package is to:

Create a SQLite database containing data from your toggl account

I followed the tutorial for committing a package to PyPi and did the first few pushes manually. Then, using a GitHub action from one of Simon’s Datasette projects, I was able to automate it when I make a release on GitHub!

Since the initial commit on March 7 (my birthday BTW) I’ve had 10 releases, with the most recent one coming yesterday which removed an issue with one of the tables reporting back an API key which, if published on the internet could be a bad thing ... so hooray for security enhancements!

Anyway, it was a fun project, and got me more interested in authoring Python packages. I’m hoping to do a few more related to Datasette (although I’m not sure what to write honestly!).

Be sure to check out the package on PyPi.org and the source code on GitHub.