Adding Search to My Pelican Blog with Datasette
Last summer I migrated my blog from Wordpress to Pelican. I did this for a couple of reasons (see my post here), but one thing that I was a bit worried about when I migrated was that Pelican's offering for site search didn't look promising.
There was an outdated plugin called tipue-search but when I was looking at it I could tell it was on it's last legs.
I thought about it, and since my blag isn't super high trafficked AND you can use google to search a specific site, I could wait a bit and see what options came up.
After waiting a few months, I decided it would be interesting to see if I could write a SQLite utility to get the data from my blog, add it to a SQLite database and then use datasette to serve it up.
I wrote the beginning scaffolding for it last August in a utility called pelican-to-sqlite, but I ran into several technical issues I just couldn't overcome. I thought about giving up, but sometimes you just need to take a step away from a thing, right?
After the first of the year I decided to revisit my idea, but first looked to see if there was anything new for Pelican search. I found a tool plugin called search that was released last November and is actively being developed, but as I read through the documentation there was just A LOT of stuff:
- stork
- requirements for the structure of your page html
- static asset hosting
- deployment requires updating your
nginx
settings
These all looked a bit scary to me, and since I've done some work using datasette I thought I'd revisit my initial idea.
My First Attempt
As I mentioned above, I wrote the beginning scaffolding late last summer. In my first attempt I tried to use a few tools to read the md
files and parse their yaml
structure and it just didn't work out. I also realized that Pelican
can have reStructured Text and that any attempt to parse just the md
file would never work for those file types.
My Second Attempt
The Plugin
During the holiday I thought a bit about approaching the problem from a different perspective. My initial idea was to try and write a datasette
style package to read the data from pelican
. I decided instead to see if I could write a pelican
plugin to get the data and then add it to a SQLite database. It turns out, I can, and it's not that hard.
Pelican uses signals
to make plugin in creation a pretty easy thing. I read a post and the documentation and was able to start my effort to refactor pelican-to-sqlite
.
From The missing Pelican plugins guide I saw lots of different options, but realized that the signal article_generator_write_article
is what I needed to get the article content that I needed.
I then also used sqlite_utils
to insert the data into a database table.
def save_items(record: dict, table: str, db: sqlite_utils.Database) -> None: # pragma: no cover
db[table].insert(record, pk="slug", alter=True, replace=True)
Below is the method I wrote to take the content and turn it into a dictionary which can be used in the save_items
method above.
def create_record(content) -> dict:
record = {}
author = content.author.name
category = content.category.name
post_content = html2text.html2text(content.content)
published_date = content.date.strftime("%Y-%m-%d")
slug = content.slug
summary = html2text.html2text(content.summary)
title = content.title
url = "https://www.ryancheley.com/" + content.url
status = content.status
if status == "published":
record = {
"author": author,
"category": category,
"content": post_content,
"published_date": published_date,
"slug": slug,
"summary": summary,
"title": title,
"url": url,
}
return record
Putting these together I get a method used by the Pelican Plugin system that will generate the data I need for the site AND insert it into a SQLite database
def run(_, content):
record = create_record(content)
save_items(record, "content", db)
def register():
signals.article_generator_write_article.connect(run)
The html template update
I use a custom implementation of Smashing Magazine. This allows me to do some edits, though I mostly keep it pretty stock. However, this allowed me to make a small edit to the base.html
template to include a search form.
In order to add the search form I added the following code to base.html
below the nav
tag:
<section class="relative h-8">
<section class="absolute inset-y-0 right-10 w-128">
<form
class = "pl-4"
<
action="https://search-ryancheley.vercel.app/pelican/article_search?text=name"
method="get">
<label for="site-search">Search the site:</label>
<input type="search" id="site-search" name="text"
aria-label="Search through site content">
<button class="rounded-full w-16 hover:bg-blue-300">Search</button>
</form>
</section>
Putting it all together with datasette and Vercel
Here's where the magic starts. Publishing data to Vercel with datasette
is extremely easy with the datasette
plugin datasette-publish-vercel
.
You do need to have the Vercel cli installed, but once you do, the steps for publishing your SQLite database is really well explained in the datasette-publish-vercel
documentation.
One final step to do was to add a MAKE
command so I could just type a quick command which would create my content, generate the SQLite database AND publish the SQLite database to Vercel. I added the below to my Makefile
:
vercel:
{ \
echo "Generate content and database"; \
make html; \
echo "Content generation complete"; \
echo "Publish data to vercel"; \
datasette publish vercel pelican.db --project=search-ryancheley --metadata metadata.json; \
echo "Publishing complete"; \
}
The line
datasette publish vercel pelican.db --project=search-ryancheley --metadata metadata.json; \
has an extra flag passed to it (--metadata
) which allows me to use metadata.json
to create a saved query which I call article_search
. The contents of that saved query are:
select summary as 'Summary', url as 'URL', published_date as 'Published Data' from content where content like '%' || :text || '%' order by published_date
This is what allows the action
in the form
above to have a URL to link to in datasette
and return data!
With just a few tweaks I'm able to include a search tool, powered by datasette for my pelican blog. Needless to say, I'm pretty pumped.
Next Steps
There are still a few things to do:
- separate search form html file (for my site)
- formatting
datasette
to match site (for my vercel powered instance ofdatasette
) - update the README for
pelican-to-sqlite
package to better explain how to fully implement - Get
pelican-to-sqlite
added to the pelican-plugins page
Prototyping with Datasette
At my job I work with some really talented Web Developers that are saddled with a pretty creaky legacy system.
We're getting ready to start on a new(ish) project where we'll be taking an old project built on this creaky legacy system (VB.net
) and re-implementing it on a C#
backend and an Angular
front end. We'll be working on a lot of new features and integrations so it's worth rebuilding it versus shoehorning the new requirements into the legacy system.
The details of the project aren't really important. What is important is that as I was reviewing the requirements with the Web Developer Supervisor he said something to the effect of, "We can create a proof of concept and just hard code the data in a json file to fake th backend."
The issue is ... we already have the data that we'll need in a MS SQL database (it's what is running the legacy version) it's just a matter of getting it into the right json "shape".
Creating a 'fake' json object that kind of/maybe mimics the real data is something we've done before, and it ALWAYS seems to bite us in the butt. We don't account for proper pagination, or the real lengths of data in the fields or NULL values or whatever shenanigans happen to befall real world data!
This got me thinking about Simon Willison's project Datasette and using it to prototype the API end points we would need.
I had been trying to figure out how to use the db-to-sqlite
to extract data from a MS SQL database into a SQLite database and was successful (see my PR to db-to-sqlite
here)
With this idea in hand, I reviewed it with the Supervisor and then scheduled a call with the web developers to review datasette
.
During this meeting, I wanted to review:
- The motivation behind why we would want to use it
- How we could leverage it to do Rapid Prototyping
- Give a quick demo data from the stored procedure that did the current data return for the legacy project.
In all it took less than 10 minutes to go from nothing to a local instance of datasette
running with a prototype JSON API for the web developers to see.
I'm hoping to see the Web team use this concept more going forward as I can see huge benefits for Rapid Prototyping of ideas, especially if you already have the data housed in a database. But even if you don't, datasette
has tons of tools to get the data from a variety of sources into a SQLite database to use and then you can do the rapid prototyping!
My First Python Package
A few months ago I was inspired by Simon Willison and his project Datasette and it’s related ecosystem to write a Python Package for it.
I use toggl to track my time at work and I thought this would be a great opportunity use that data with Datasette and see if I couldn’t answer some interesting questions, or at the very least, do some neat data discovery.
The purpose of this package is to:
Create a SQLite database containing data from your toggl account
I followed the tutorial for committing a package to PyPi and did the first few pushes manually. Then, using a GitHub action from one of Simon’s Datasette projects, I was able to automate it when I make a release on GitHub!
Since the initial commit on March 7 (my birthday BTW) I’ve had 10 releases, with the most recent one coming yesterday which removed an issue with one of the tables reporting back an API key which, if published on the internet could be a bad thing ... so hooray for security enhancements!
Anyway, it was a fun project, and got me more interested in authoring Python packages. I’m hoping to do a few more related to Datasette (although I’m not sure what to write honestly!).
Be sure to check out the package on PyPi.org and the source code on GitHub.