Putting it All Together

In this final post I'll be writing up how everything fits together. As a recap, here are the steps I go through to create and publish a new post

Create Post

  1. Create .md for my new post
  2. write my words
  3. edit post
  4. Change status from draft to published

Publish Post

  1. Run make html to generate the SQLite database that powers my site's search tool1
  2. Run make vercel to deploy the SQLite database to vercel
  3. Run git add <filename> to add post to be committed to GitHub
  4. Run git commit -m <message> to commit to GitHub
  5. Post to Twitter with a link to my new post

My previous posts have gone over how each step was automated, but now we'll 'throw it all together'.

I updated my Makefile with a new command:

tweet:
    ./tweet.sh

When I run make tweet it will calls tweet.sh. I wrote about the tweet.sh file in Auto Generating the Commit Message so I won't go deeply into here. What it does is automate steps 1 - 5 above for the Publish Post section above.

And that's it really. I've now been able to automate the file creation and publish process.

Admittedly these are the 'easy' parts. The hard part is the actual writing, but it does remove a ton pf potential friction from my workflow and this will hopefully lead to more writing this year.

  1. make vercel actually runs make html so this isn't really a step that I need to do.

Automating the file creation

In my last post Auto Generating the Commit Message I indicated that this post I would "throw it all together and to get a spot where I can run one make command that will do all of this for me".

I decided to take a brief detour though as I realized I didn't have a good way to create a new post, i.e. the starting point wasn't automated!

In this post I'm going to go over how I create the start to a new post using Makefile and the command make newpost

My initial idea was to create a new bash script (similar to the tweet.sh file), but as a first iteration I went in a different direction based on this post How to Slugify Strings in Bash.

The command that the is finally arrived at in the post above was

newpost:
    vim +':r templates/post.md' $(BASEDIR)/content/blog/$$(date +%Y-%m-%d)-$$(echo -n $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md

which was really close to what I needed. My static site is set up a bit differently and I'm not using vim (I'm using VS Code) to write my words.

The first change I needed to make was to remove the use of vim from the command and instead use touch to create the file

newpost:
    touch $(BASEDIR)/content/blog/$$(date +%Y-%m-%d)-$$(echo -n $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md

The second was to change the file path for where to create the file. As I've indicated previously, the structure of my content looks like this:

content
├── musings
├── pages
├── productivity
├── professional\ development
└── technology

giving me an updated version of the command that looks like this:

touch content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md

When I run the command make newpost title='Automating the file creation' category='productivity' I get a empty new files created.

Now I just need to populate it with the data.

There are seven bits of meta data that need to be added, but four of them are the same for each post

Author: ryan
Tags:
Series: Remove if Not Needed
Status: draft

That allows me to have the newpost command look like this:

newpost:
    touch content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md
    echo "Author: ryan" >> content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md
    echo "Tags: " >> content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md
    echo "Series: Remove if Not Needed"  >> content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md
    echo "Status: draft"  >> content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md

The remaining metadata to be added are:

  • Title:
  • Date
  • Slug

Of these, Date and Title are the most straightforward.

bash has a command called date that can be formatted in the way I want with %F. Using this I can get the date like this

echo "Date: $$(date +%F)" >> content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md

For Title I can take the input parameter title like this:

echo "Title: $${title}" > content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md

Slug is just Title but slugified. Trying to figure out how to do this is how I found the article above.

Using a slightly modified version of the code that generates the file, we get this:

printf "Slug: " >> content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md
echo "$${title}" | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' >> content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md

One thing to notice here is that printf. I needed/wanted to echo -n but make didn't seem to like that. This StackOverflow answer helped me to get a fix (using printf) though I'm sure there's a way I can get it to work with echo -n.

Essentially, since this was a first pass, and I'm pretty sure I'm going to end up re-writing this as a shell script I didn't want to spend too much time getting a perfect answer here.

OK, with all of that, here's the entire newpost recipe I'm using now:

newpost:
    touch content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md
    echo "Title: $${title}" > content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md
    echo "Date: $$(date +%F)" >> content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md
    echo "Author: ryan" >> content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md
    echo "Tags: " >> content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md
    printf "Slug: " >> content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md
    echo "$${title}" | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z >> content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md
    echo "Series: Remove if Not Needed"  >> content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md
    echo "Status: draft"  >> content/$$(echo $${category})/$$(echo $${title} | sed -e 's/[^[:alnum:]]/-/g' | tr -s '-' | tr A-Z a-z.md).md

This allows me to type make newpost and generate a new file for me to start my new post in!1

  1. When this post was originally published the slug command didn't account for making all of the text lower case. This was fixed in a subsequent commit

Auto Generating the Commit Message

In my first post of this series I outlined the steps needed in order for me to post. They are:

  1. Run make html to generate the SQLite database that powers my site's search tool1
  2. Run make vercel to deploy the SQLite database to vercel
  3. Run git add <filename> to add post to be committed to GitHub
  4. Run git commit -m <message> to commit to GitHub
  5. Post to Twitter with a link to my new post

In this post I'll be focusing on how I automated step 4, Run git commit -m <message> to commit to GitHub.

Automating the "git commit ..." part of my workflow

In order for my GitHub Action to auto post to Twitter, my commit message needs to be in the form of "New Post: ...". What I'm looking for is to be able to have the commit message be something like this:

New Post: Great New Post https://ryancheley.com/yyyy/mm/dd/great-new-post/

This is basically just three parts from the markdown file, the Title, the Date, and the Slug.

In order to get those details, I need to review the structure of the markdown file. For Pelican writing in markdown my file is structured like this:

Title:
Date:
Tags:
Slug:
Series:
Authors:
Status:

My words start here and go on for a bit.

In the last post I wrote about how to git add the files in the content directory. Here, I want to take the file that was added to git and get the first 7 rows, i.e. the details from Title to Status.

The file that was updated that needs to be added to git can be identified by running

find content -name '*.md' -print | sed 's/^/"/g' | sed 's/$/"/g' | xargs git add

Running git status now will display which file was added with the last command and you'll see something like this:

 git status
On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
        content/productivity/auto-generating-the-commit-message.md

What I need though is a more easily parsable output. Enter the porcelin flag which, per the docs

Give the output in an easy-to-parse format for scripts. This is similar to the short output, but will remain stable across Git versions and regardless of user configuration. See below for details.

which is exactly what I needed.

Running git status --porcelain you get this:

❯ git status --porcelain
?? content/productivity/more-writing-automation.md

Now, I just need to get the file path and exclude the status (the ?? above in this case2), which I can by piping in the results and using sed

❯ git status --porcelain | sed s/^...//
content/productivity/more-writing-automation.md

The sed portion says

  • search the output string starting at the beginning of the line (^)
  • find the first three characters (...). 3
  • replace them with nothing (//)

There are a couple of lines here that I need to get the content of for my commit message:

  • Title
  • Slug
  • Date
  • Status4

I can use head to get the first n lines of a file. In this case, I need the first 7 lines of the output from git status --porcelain | sed s/^...//. To do that, I pipe it to head!

git status --porcelain | sed s/^...// | xargs head -7

That command will return this:

Title: Auto Generating the Commit Message
Date: 2022-01-24
Tags: Automation
Slug: auto-generating-the-commit-message
Series: Auto Deploying my Words
Authors: ryan
Status: draft

In order to get the Title, I'll pipe this output to grep to find the line with Title

git status --porcelain | sed s/^...// | xargs head -7 | grep 'Title: '

which will return this

Title: Auto Generating the Commit Message

Now I just need to remove the leading Title: and I've got the title I'm going to need for my Commit message!

git status --porcelain | sed s/^...// | xargs head -7 | grep 'Title: ' | sed -e 's/Title: //g'

which return just

Auto Generating the Commit Message

I do this for each of the parts I need:

  • Title
  • Slug
  • Date
  • Status

Now, this is getting to have a lot of parts, so I'm going to throw it into a bash script file called tweet.sh. The contents of the file look like this:

TITLE=`git status --porcelain | sed s/^...// | xargs head -7 | grep 'Title: ' | sed -e 's/Title: //g'`
SLUG=`git status --porcelain | sed s/^...// | xargs head -7 | grep 'Slug: ' | sed -e 's/Slug: //g'`
POST_DATE=`git status --porcelain | sed s/^...// | xargs head -7 | grep 'Date: ' | sed -e 's/Date: //g' | head -c 10 | grep '-' | sed -e 's/-/\//g'`
POST_STATUS=` git status --porcelain | sed s/^...// | xargs head -7 | grep 'Status: ' | sed -e 's/Status: //g'`

You'll see above that the Date piece is a little more complicated, but it's just doing a find and replace on the - to update them to / for the URL.

Now that I've got all of the pieces I need, it's time to start putting them together

I define a new variable called URL and set it

URL="https://ryancheley.com/$POST_DATE/$SLUG/"

and the commit message

MESSAGE="New Post: $TITLE $URL"

Now, all I need to do is wrap this in an if statement so the command only runs when the STATUS is published

if [ $POST_STATUS = "published" ]
then
    MESSAGE="New Post: $TITLE $URL"

    git commit -m "$MESSAGE"

    git push github main
fi

Putting this all together (including the git add from my previous post) and the tweet.sh file looks like this:

# Add the post to git
find content -name '*.md' -print | sed 's/^/"/g' | sed 's/$/"/g' | xargs git add


# Get the parts needed for the commit message
TITLE=`git status --porcelain | sed s/^...// | xargs head -7 | grep 'Title: ' | sed -e 's/Title: //g'`
SLUG=`git status --porcelain | sed s/^...// | xargs head -7 | grep 'Slug: ' | sed -e 's/Slug: //g'`
POST_DATE=`git status --porcelain | sed s/^...// | xargs head -7 | grep 'Date: ' | sed -e 's/Date: //g' | head -c 10 | grep '-' | sed -e 's/-/\//g'`
POST_STATUS=` git status --porcelain | sed s/^...// | xargs head -7 | grep 'Status: ' | sed -e 's/Status: //g'`

URL="https://ryancheley.com/$POST_DATE/$SLUG/"

if [ $POST_STATUS = "published" ]
then
    MESSAGE="New Post: $TITLE $URL"

    git commit -m "$MESSAGE"

    git push github main
fi

When this script is run it will find an updated or added markdown file (i.e. article) and add it to git. It will then parse the file to get data about the article. If the article is set to published it will commit the file with a message and will push to github. Once at GitHub, the Tweeting action I wrote about will tweet my commit message!

In the next (and last) article, I'm going to throw it all together and to get a spot when I can run one make command that will do all of this for me.

Caveats

The script above works, but if you have multiple articles that you're working on at the same time, it will fail pretty spectacularly. The final version of the script has guards against that and looks like this

  1. make vercel actually runs make html so this isn't really a step that I need to do.
  2. Other values could just as easily be M or A
  3. Why the first three characters, because that's how porcelain outputs the status
  4. I will also need the Status to do some conditional logic otherwise I may have a post that is in draft status that I want to commit and the GitHub Action will run posting a tweet with an aritcle and URL that don't actually exist yet.

git add filename automation

In my last post I mentioned the steps needed in order for me to post. They are:

  1. Run make html to generate the SQLite database that powers my site's search tool1
  2. Run make vercel to deploy the SQLite database to vercel
  3. Run git add <filename> to add post to be committed to GitHub
  4. Run git commit -m <message> to commit to GitHub
  5. Post to Twitter with a link to my new post

In that post I focused on number 5, posting to Twitter with a link to the post using GitHub Actions.

In this post I'll be focusing on how I automated step 3, "Run git add <filename> to add post to be committed to GitHub".

Automating the git add ... part of my workflow

I have my pelican content set up so that the category of a post is determined by the directory a markdown file is placed in. The structure of my content folder looks like this:

content
├── musings
├── pages
├── productivity
├── professional\ development
└── technology

If you just just git status on a directory it will give you the status of all of the files in that directory that have been changed, added, removed. Something like this:

 git status
On branch main
Untracked files:
  (use "git add <file>..." to include in what will be committed)
        content/productivity/more-writing-automation.md
        Makefile
        metadata.json

That means that when you run git add . all of those files will be added to git. For my purposes all that I need is the one updated file in the content directory.

The command find does a great job of taking a directory and allowing you to search for what you want in that directory. You can run something like

find content -name '*.md' -print

And it will return essentially what you're looking for. Something like this:

content/pages/404.md
content/pages/curriculum-vitae.md
content/pages/about.md
content/pages/brag.md
content/productivity/adding-the-new-file.md
content/productivity/omnifocus-3.md
content/productivity/making-the-right-choice-or-how-i-learned-to-live-with-limiting-my-own-technical-debt-and-just-be-happy.md
content/productivity/auto-tweeting-new-post.md
content/productivity/my-outlook-review-process.md
content/productivity/rules-and-actions-in-outlook.md
content/productivity/auto-generating-the-commit-message.md
content/productivity/declaring-omnifocus-bankrupty.md

However, because one of my categories has a space in it's name (professional development) if you pipe the output of this to xargs git add it fails with the error

fatal: pathspec 'content/professional' did not match any files

In order to get around this, you need to surround the output of the results of find with double quotes ("). You can do this by using sed

find content -name '*.md' -print | sed 's/^/"/g' | sed 's/$/"/g'

What this says is, take the output of find and pipe it to sed and use a global find and replace to add a " to the start of the line (that's what the ^ does) and then pipe that to sed again and use a global find and replace to add a " to the end of the line (that's what the '$' does).

Now, when you run

find content -name '*.md' -print | sed 's/^/"/g' | sed 's/$/"/g'

The output looks like this:

"content/pages/404.md"
"content/pages/curriculum-vitae.md"
"content/pages/about.md"
"content/pages/brag.md"
"content/productivity/adding-the-new-file.md"
"content/productivity/omnifocus-3.md"
"content/productivity/making-the-right-choice-or-how-i-learned-to-live-with-limiting-my-own-technical-debt-and-just-be-happy.md"
"content/productivity/auto-tweeting-new-post.md"
"content/productivity/my-outlook-review-process.md"
"content/productivity/rules-and-actions-in-outlook.md"
"content/productivity/auto-generating-the-commit-message.md"
"content/productivity/declaring-omnifocus-bankrupty.md"

Now, you can pipe your output to xargs git add and there is no error!

The final command looks like this:

find content -name '*.md' -print | sed 's/^/"/g' | sed 's/$/"/g' | xargs git add

In the next post, I'll walk through how I generate the commit message to be used in the automatic tweet!

  1. make vercel actually runs make html so this isn't really a step that I need to do.

Auto Tweeting New Post

Each time I write something for this site there are several steps that I go through to make sure that the post makes it's way to where people can see it.

  1. Run make html to generate the SQLite database that powers my site's search tool1
  2. Run make vercel to deploy the SQLite database to vercel
  3. Run git add <filename> to add post to be committed to GitHub
  4. Run git commit -m <message> to commit to GitHub
  5. Post to Twitter with a link to my new post

If there's more than 2 things to do, I'm totally going to forget to do one of them.

The above steps are all automat-able, but the one I wanted to tackle first was the automated tweet. Last night I figured out how to tweet with a GitHub action.

There were a few things to do to get the auto tweet to work:

  1. Find a GitHub in the Market Place that did the auto tweet (or try to write one if I couldn't find one)
  2. Set up a twitter app with Read and Write privileges
  3. Set the necessary secrets for the report (API Key, API Key Secret, Access Token, Access Token Secret, Bearer)
  4. Test the GitHub Action

The action I chose was send-tweet-action. It's got easy to read documentation on what is needed. Honestly the hardest part was getting a twitter app set up with Read and Write privileges.

I'm still not sure how to do it, honestly. I was lucky enough that I already had an app sitting around with Read and Write from the WordPress blog I had previously, so I just regenerated the keys for that one and used them.

The last bit was just testing the action and seeing that it worked as expected. It was pretty cool running an action and then seeing a tweet in my timeline.

The TIL for this was that GitHub Actions can have conditionals. This is important because I don't want to generate a new tweet each time I commit to main. I only want that to happen when I have a new post.

To do that, you just need this in the GitHub Action:

    if: "contains(github.event.head_commit.message, '<String to Filter on>')"

In my case, the <String to Filter on> is New Post:.

The send-tweet-action has a status field which is the text tweeted. I can use the github.event.head_commit.message in the action like this:

    ${{ github.event.head_commit.message }}

Now when I have a commit message that starts 'New Post:' against main I'll have a tweet get sent out too!

This got me to thinking that I can/should automate all of these steps.

With that in mind, I'm going to work on getting the process down to just having to run a single command. Something like:

    make publish "New Post: Title of my Post https://www.ryancheley.com/yyyy/mm/dd/slug/"
  1. make vercel actually runs make html so this isn't really a step that I need to do.

Adding Search to My Pelican Blog with Datasette

Last summer I migrated my blog from Wordpress to Pelican. I did this for a couple of reasons (see my post here), but one thing that I was a bit worried about when I migrated was that Pelican's offering for site search didn't look promising.

There was an outdated plugin called tipue-search but when I was looking at it I could tell it was on it's last legs.

I thought about it, and since my blag isn't super high trafficked AND you can use google to search a specific site, I could wait a bit and see what options came up.

After waiting a few months, I decided it would be intersting to see if I could write a SQLite utility to get the data from my blog, add it to a SQLite database and then use datasette to serve it up.

I wrote the beginning scaffolding for it last August in a utility called pelican-to-sqlite, but I ran into several technical issues I just couldn't overcome. I thought about giving up, but sometimes you just need to take a step away from a thing, right?

After the first of the year I decided to revisit my idea, but first looked to see if there was anything new for Pelican search. I found a tool plugin called search that was released last November and is actively being developed, but as I read through the documentation there was just A LOT of stuff:

  • stork
  • requirements for the structure of your page html
  • static asset hosting
  • deployment requires updating your nginx settings

These all looked a bit scary to me, and since I've done some work using datasette I thought I'd revisit my initial idea.

My First Attempt

As I mentioned above, I wrote the beginning scaffolding late last summer. In my first attempt I tried to use a few tools to read the md files and parse their yaml structure and it just didn't work out. I also realized that Pelican can have reStructured Text and that any attempt to parse just the md file would never work for those file types.

My Second Attempt

The Plugin

During the holiday I thought a bit about approaching the problem from a different perspective. My initial idea was to try and write a datasette style package to read the data from pelican. I decided instead to see if I could write a pelican plugin to get the data and then add it to a SQLite database. It turns out, I can, and it's not that hard.

Pelican uses signals to make plugin in creation a pretty easy thing. I read a post and the documentation and was able to start my effort to refactor pelican-to-sqlite.

From The missing Pelican plugins guide I saw lots of different options, but realized that the signal article_generator_write_article is what I needed to get the article content that I needed.

I then also used sqlite_utils to insert the data into a database table.

def save_items(record: dict, table: str, db: sqlite_utils.Database) -> None:  # pragma: no cover
    db[table].insert(record, pk="slug", alter=True, replace=True)

Below is the method I wrote to take the content and turn it into a dictionary which can be used in the save_items method above.

def create_record(content) -> dict:
    record = {}
    author = content.author.name
    category = content.category.name
    post_content = html2text.html2text(content.content)
    published_date = content.date.strftime("%Y-%m-%d")
    slug = content.slug
    summary = html2text.html2text(content.summary)
    title = content.title
    url = "https://www.ryancheley.com/" + content.url
    status = content.status
    if status == "published":
        record = {
            "author": author,
            "category": category,
            "content": post_content,
            "published_date": published_date,
            "slug": slug,
            "summary": summary,
            "title": title,
            "url": url,
        }
    return record

Putting these together I get a method used by the Pelican Plugin system that will generate the data I need for the site AND insert it into a SQLite database

def run(_, content):
    record = create_record(content)
    save_items(record, "content", db)

def register():
    signals.article_generator_write_article.connect(run)

The html template update

I use a custom implementation of Smashing Magazine. This allows me to do some edits, though I mostly keep it pretty stock. However, this allowed me to make a small edit to the base.html template to include a search form.

In order to add the search form I added the following code to base.html below the nav tag:

    <section class="relative h-8">
    <section class="absolute inset-y-0 right-10 w-128">
    <form
    class = "pl-4"
    <
    action="https://search-ryancheley.vercel.app/pelican/article_search?text=name"
    method="get">
            <label for="site-search">Search the site:</label>
            <input type="search" id="site-search" name="text"
                    aria-label="Search through site content">
            <button class="rounded-full w-16 hover:bg-blue-300">Search</button>
    </form>
    </section>

Putting it all together with datasette and Vercel

Here's where the magic starts. Publishing data to Vercel with datasette is extremely easy with the datasette plugin datasette-publish-vercel.

You do need to have the Vercel cli installed, but once you do, the steps for publishing your SQLite database is really well explained in the datasette-publish-vercel documentation.

One final step to do was to add a MAKE command so I could just type a quick command which would create my content, generate the SQLite database AND publish the SQLite database to Vercel. I added the below to my Makefile:

vercel:
    { \
    echo "Generate content and database"; \
    make html; \
    echo "Content generation complete"; \
    echo "Publish data to vercel"; \
    datasette publish vercel pelican.db --project=search-ryancheley --metadata metadata.json; \
    echo "Publishing complete"; \
    }

The line

datasette publish vercel pelican.db --project=search-ryancheley --metadata metadata.json; \

has an extra flag passed to it (--metadata) which allows me to use metadata.json to create a saved query which I call article_search. The contents of that saved query are:

select summary as 'Summary', url as 'URL', published_date as 'Published Data' from content where content like '%' || :text || '%' order by published_date

This is what allows the action in the form above to have a URL to link to in datasette and return data!

With just a few tweaks I'm able to include a search tool, powered by datasette for my pelican blog. Needless to say, I'm pretty pumped.

Next Steps

There are still a few things to do:

  1. separate search form html file (for my site)
  2. formatting datasette to match site (for my vercel powered instance of datasette)
  3. update the README for pelican-to-sqlite package to better explain how to fully implement
  4. Get pelican-to-sqlite added to the pelican-plugins page

The Well Maintained Test

At the beginning of November Adam Johnson tweeted

I’ve come up with a test that we can use to decide whether a new package we’re considering depending on is well-maintained.

and linked to an article he wrote.

He came up (with the help of Twitter) twelve questions to ask of any library that you're looking at:

  1. Is it described as “production ready”?
  2. Is there sufficient documentation?
  3. Is there a changelog?
  4. Is someone responding to bug reports?
  5. Are there sufficient tests?
  6. Are the tests running with the latest <Language> version?
  7. Are the tests running with the latest <Integration> version?
  8. Is there a Continuous Integration (CI) configuration?
  9. Is the CI passing?
  10. Does it seem relatively well used?
  11. Has there been a commit in the last year?
  12. Has there been a release in the last year?

I thought it would be interesting to turn that checklist into a Click App using Simon Willison's Click App Cookiecutter.

I set out in earnest to do just that on November 8th.

What started out as just a simple Click app, quickly turned in a pretty robust CLI using Will McGugan's Rich library.

I started by using the GitHub API to try and answer the questions, but quickly found that it couldn't answer them all. Then I cam across the PyPI API which helped to answer almost all of them programatically.

There's still a bit of work to do to get it where I want it to, but it's pretty sweet that I can now run a simple command and review the output to see if the package is well maintained.

You can even try it on the package I wrote!

the-well-maintained-test https://github.com/ryancheley/the-well-maintained-test

Which will return (as of this writing) the output below:

1. Is it described as 'production ready'?
        The project is set to Development Status Beta
2. Is there sufficient documentation?
        Documentation can be found at
https://github.com/ryancheley/the-well-maintained-test/blob/main/README.md
3. Is there a changelog?
        Yes
4. Is someone responding to bug reports?
        The maintainer took 0 days to respond to the bug report
        It has been 2 days since a comment was made on the bug.
5. Are there sufficient tests? [y/n]: y
        Yes
6. Are the tests running with the latest Language version?
        The project supports the following programming languages
                - Python 3.7
                - Python 3.8
                - Python 3.9
                - Python 3.10

7. Are the tests running with the latest Integration version?
        This project has no associated frameworks
8. Is there a Continuous Integration (CI) configuration?
        There are 2 workflows
         - Publish Python Package
         - Test

9. Is the CI passing?
        Yes
10.  Does it seem relatively well used?
        The project has the following statistics:
        - Watchers: 0
        - Forks: 0
        - Open Issues: 1
        - Subscribers: 1
11.  Has there been a commit in the last year?
        Yes. The last commit was on 11-20-2021 which was 2 days ago
12. Has there been a release in the last year?
        Yes. The last commit was on 11-20-2021 which was 2 days ago

There is still one question that I haven't been able to answer programmatically with an API and that is:

Are there sufficient tests?

When that question comes up, you're prompted in the terminal to answer either y/n.

But, it does leave room for a fix by someone else!

Styling Clean Up with Bash

I have a side project I've been working on for a while now. One thing that happened overtime is that the styling of the site grew organically. I'm not a designer, and I didn't have a master set of templates or design principals guiding the development. I kind of hacked it together and made it look "nice enough"

That was until I really starting going from one page to another and realized that there styling of various pages wasn't just a little off ... but A LOT off.

As an aside, I'm using tailwind as my CSS Framework

I wanted to make some changes to the styling and realized I had two choices:

  1. Manually go through each html template (the project is a Django project) and catalog the styles used for each element

OR

  1. Try and write a bash command to do it for me

Well, before we jump into either choice, let's see how many templates there are to review!

As I said above, this is a Django project. I keep all of my templates in a single templates directory with each app having it's own sub directory.

I was able to use this one line to count the number of html files in the templates directory (and all of the sub directories as well)

ls -R templates | grep html | wc -l

There are 3 parts to this:

  1. ls -R templates will list out all of the files recursively list subdirectories encountered in the templates directory
  2. grep html will make sure to only return those files with html
  3. wc -l uses the word, line, character, and byte count to return the number of lines return from the previous command

In each case one command is piped to the next.

This resulted in 41 html files.

OK, I'm not going to want to manually review 41 files. Looks like we'll be going with option 2, "Try and write a bash command to do it for me"

In the end the bash script is actually relatively straight forward. We're just using grep two times. But it's the options on grep that change (as well as the regex used) that are what make the magic happen

The first thing I want to do is find all of the lines that have the string class= in them. Since there are html templates, that's a pretty sure fire way to find all of the places where the styles I am interested in are being applied

I use a package called djhtml to lint my templates, but just in case something got missed, I want to ignore case when doing my regex, i.e, class= should be found, but so should cLass= or Class=. In order to get that I need to have the i flag enabled.

Since the html files may be in the base directory templates or one of the subdirectories, I need to recursively search, so I include the r flag as well

This gets us

grep -ri "class=" templates/*

That command will output a whole lines like this:

templates/tasks/steps_lists.html:    <table class="table-fixed w-full border text-center">
templates/tasks/steps_lists.html:                <th class="w-1/2 flex justify-left-2 p-2">Task</th>
templates/tasks/steps_lists.html:                <th class="w-1/4 justify-center p-2">Edit</th>
templates/tasks/steps_lists.html:                <th class="w-1/4 justify-center p-2">Delete</th>
templates/tasks/steps_lists.html:                    <td class="flex justify-left-2 p-2">
templates/tasks/steps_lists.html:                    <td class="p-2 text-center">
templates/tasks/steps_lists.html:                        <a class="block hover:text-gray-600"
templates/tasks/steps_lists.html:                            <i class="fas fa-edit"></i>
templates/tasks/steps_lists.html:                    <td class="p-2 text-center">
templates/tasks/steps_lists.html:                        <a class="block hover:text-gray-600"
templates/tasks/steps_lists.html:                            <i class="fas fa-trash-alt"></i>
templates/tasks/step_form.html:        <section class="bg-gray-400 text-center py-2">
templates/tasks/step_form.html:            <button type="submit" class="bg-blue-500 hover:bg-blue-700 text-white font-bold py-2 px-4 rounded">{{view.action|default:"Add"}} </button>

Great! We have the data we need, now we just want to clean it up.

Again, we'll use grep onl this time we want to look for an honest to goodness regular expression. We're trying to identify everything in between the first open angle brackey (<) and the first closed angle bracket (>)

A bit of googling, searching stack overflow, and playing with the great site regex101.com gets you this

<[^\/].*?>

OK, we have the regular expression we need, but what options do we need to use in grep? In this case we actually have two options:

  1. Use egrep (which allos for extended regular expressions)
  2. Use grep -E to make grep behave like egrep

I chose to go with option 2, use grep -E. Next, we want to return ONLY the part of the line that matches the regex. For that, we can use the option o. Putting it all together we get

grep -Eo "<[^\/].*?>"

Now, we can pipe the results from our first command into our second command and we get this:

grep -ri "class=" templates/* | grep -Eo "<[^\/].*?>"

This will output to standard out, but next I really want to use a tool for aggregation and comparison. It was at this point that I decided the best next tool to use would be Excel. So I sent the output to a text file and then opened that text file in Excel to do the final review. To output the above to a text file called tailwind.txt we

grep -ri "class=" templates/* | grep -Eo "<[^\/].*?>" > tailwind.txt

With these results I was able to find several styling inconsistencies and then fix them up. In all it took me a few nights of working out the bash commands and then a few more nights to get the styling consistent. In the process I learned so much about grep and egrep. It was a good exercise to have gone through.

djhtml and justfile

I had read about a project called djhtml and wanted to use it on one of my projects. The documentation is really good for adding it to precommit-ci, but I wasn't sure what I needed to do to just run it on the command line.

It took a bit of googling, but I was finally able to get the right incantation of commands to be able to get it to run on my templates:

djhtml -i $(find templates -name '*.html' -print)

But of course because I have the memory of a goldfish and this is more than 3 commands to try to remember to string together, instead of telling myself I would remember it, I simply added it to a just file and now have this recipe:

# applies djhtml linting to templates
djhtml:
    djhtml -i $(find templates -name '*.html' -print)

This means that I can now run just djhtml and I can apply djhtml's linting to my templates.

Pretty darn cool if you ask me. But then I got to thinking, I can make this a bit more general for 'linting' type activities. I include all of these in my precommit-ci, but I figured, what the heck, might as well have a just recipe for all of them!

So I refactored the recipe to be this:

# applies linting to project (black, djhtml, flake8)
lint:
    djhtml -i $(find templates -name '*.html' -print)
    black .
    flake8 .

And now I can run all of these linting style libraries with a single command just lint

Prototyping with Datasette

At my job I work with some really talented Web Developers that are saddled with a pretty creaky legacy system.

We're getting ready to start on a new(ish) project where we'll be taking an old project built on this creaky legacy system (VB.net) and re-implementing it on a C# backend and an Angular front end. We'll be working on a lot of new features and integrations so it's worth rebuilding it versus shoehorning the new requirements into the legacy system.

The details of the project aren't really important. What is important is that as I was reviewing the requirements with the Web Developer Supervisor he said something to the effect of, "We can create a proof of concept and just hard code the data in a json file to fake th backend."

The issue is ... we already have the data that we'll need in a MS SQL database (it's what is running the legacy version) it's just a matter of getting it into the right json "shape".

Creating a 'fake' json object that kind of/maybe mimics the real data is something we've done before, and it ALWAYS seems to bite us in the butt. We don't account for proper pagination, or the real lengths of data in the fields or NULL values or whatever shenanigans happen to befall real world data!

This got me thinking about Simon Willison's project Datasette and using it to prototype the API end points we would need.

I had been trying to figure out how to use the db-to-sqlite to extract data from a MS SQL database into a SQLite database and was successful (see my PR to db-to-sqlite here)

With this idea in hand, I reviewed it with the Supervisor and then scheduled a call with the web developers to review datasette.

During this meeting, I wanted to review:

  1. The motivation behind why we would want to use it
  2. How we could leverage it to do Rapid Prototying
  3. Give a quick demo data from the stored procedure that did the current data return for the legacy project.

In all it took less than 10 minutes to go from nothing to a local instance of datasette running with a prototype JSON API for the web developers to see.

I'm hoping to see the Web team use this concept more going forward as I can see huge benefits for Rapid Prototyping of ideas, especially if you already have the data housed in a database. But even if you don't, datasette has tons of tools to get the data from a variety of sources into a SQLite database to use and then you can do the rapid prototyping!


Page 2 / 17