Inserting a URL in Markdown in VS Code

Since I switched my blog to pelican last summer I've been using VS Code as my writing app. And it's really good for writing, note just code but prose as well.

The one problem I've had is there's no keyboard shortcut for links when writing in markdown ... at least not a default / native keyboard shortcut.

In other (macOS) writing apps you just select the text and press ⌘+k and boop! There's a markdown link set up for you. But not so much in VS Code.

I finally got to the point where that was one thing that may have been keeping me from writing because of how much 'friction' it caused!

So, I decided to figure out how to fix that.

I did have to do a bit of googling and eventually found this StackOverflow answer

Essentially the answer is

  1. Open the Preferences Page: ⌘+Shift+P
  2. Select Preferences: Open Keyboard Shortcuts (JSON)
  3. Update the keybindings.json file to include a new key

The new key looks like this:

{
    "key": "cmd+k",
    "command": "editor.action.insertSnippet",
    "args": {
        "snippet": "[${TM_SELECTED_TEXT}]($0)"
    },
    "when": "editorHasSelection && editorLangId == markdown "
}

Honestly, it's little things like this that can make life so much easier and more fun. Now I just need to remember to do this on my work computer 😀

Logging Part 2

In my previous post I wrote about inline logging, that is, using logging in the code without a configuration file of some kind.

In this post I'm going to go over setting up a configuration file to support the various different needs you may have for logging.

Previously I mentioned this scenario:

Perhaps the DevOps team wants robust logging messages on anything ERROR and above, but the application team wants to have INFO and above in a rotating file name schema, while the QA team needs to have the DEBUG and up output to standard out.

Before we get into how we may implement something like what's above, let's review the parts of the Logger which are:

Formatters

In a logging configuration file you can have multiple formatters specified. The above example doesn't state WHAT each team need, so let's define it here:

  • DevOps: They need to know when the error occurred, what the level was, and what module the error came from
  • Application Team: They need to know when the error occurred, the level, what module and line
  • The QA Team: They need to know when the error occurred, the level, what module and line, and they need a stack trace

For the Devops Team we can define a formatter as such1:

'%(asctime)s - %(levelname)s - %(module)s'

The Application team would have a formatter like this:

'%(asctime)s - %(levelname)s - %(module)s - %(lineno)s'

while the QA team would have one like this:

'%(asctime)s - %(levelname)s - %(module)s - %(lineno)s'

Handlers

The Handler controls where the data from the log is going to be sent. There are several kinds of handlers, but based on our requirements above, we'll only be looking at three of them (see the documentation for more types of handlers)

From the example above we know that the DevOps team wants to save the output to a file, while the Application Team wants to have the log data saved in a way that allows the log files to not get too big. Finally, we know that the QA team wants the output to go directly to stdout

We can handle all of these requirements via the handlers. In this case, we'd use

Configuration File

Above we defined the formatter and handler. Now we start to put them together. The basic format of a logging configuration has 3 parts (as described above). The example I use below is YAML, but a dictionary or a conf file would also work.

Below we see five keys in our YAML file:

version: 1
formatters:
handlers:
loggers:
root:
  level:
  handlers:

The version key is to allow for future versions in case any are introduced. As of this writing, there is only 1 version ... and it's version: 1

Formatters

We defined the formatters above so let's add them here and give them names that map to the teams

version: 1
formatters:
  devops:
    format: '%(asctime)s - %(levelname)s - %(module)s'
  application:
    format: '%(asctime)s - %(levelname)s - %(module)s - %(lineno)s'
  qa:
    format: '%(asctime)s - %(levelname)s - %(module)s - %(lineno)s'

Right off the bat we can see that the formatters for application and qa are the same, so we can either keep them separate to help allow for easier updates in the future (and to be more explicit) OR we can merge them into a single formatter to adhere to DRY principals.

I'm choosing to go with option 1 and keep them separate.

Handlers

Next, we add our handlers. Again, we give them names to map to the team. There are several keys for the handlers that are specific to the type of handler that is used. For each handler we set a level (which will map to the level from the specs above).

Additionally, each handler has keys associated based on the type of handler selected. For example, logging.FileHandler needs to have the filename specified, while logging.StreamHandler needs to specify where to output to.

When using logging.handlers.RotatingFileHandler we have to specify a few more items in addition to a filename so the logger knows how and when to rotate the log writing.

version: 1
formatters:
  devops:
    format: '%(asctime)s - %(levelname)s - %(module)s'
  application:
    format: '%(asctime)s - %(levelname)s - %(module)s - %(lineno)s'
  qa:
    format: '%(asctime)s - %(levelname)s - %(module)s - %(lineno)s'
handlers:
  devops:
    class: logging.FileHandler
    level: ERROR
    filename: 'devops.log'
  application:
    class: logging.handlers.RotatingFileHandler
    level: INFO
    filename: 'application.log'
    mode: 'a'
    maxBytes: 10000
    backupCount: 3
  qa:
    class: logging.StreamHandler
    level: DEBUG
    stream: ext://sys.stdout

What the setup above does for the devops handler is to output the log data to a file called devops.log, while the application handler outputs to a rotating set of files called application.log. For the application.log it will hold a maximum of 10,000 bytes. Once the file is 'full' it will create a new file called application.log.1, copy the contents of application.log and then clear out the contents of application.log to start over. It will do this 3 times, giving the application team the following files:

  • application.log
  • application.log.1
  • application.log.2

Finally, the handler for QA will output directly to stdout.

Loggers

Now we can take all of the work we did above to create the formatters and handlers and use them in the loggers!

Below we see how the loggers are set up in configuration file. It seems a bit redundant because I've named my formatters, handlers, and loggers all matching terms, but 🤷‍♂️

The only new thing we see in the configuration below is the new propagate: no for each of the loggers. If there were parent loggers (we don't have any) then this would prevent the logging information from being sent 'up' the chain to parent loggers.

The documentation has a good diagram showing the workflow for how the propagate works.

Below we can see what the final, fully formed logging configuration looks like.

version: 1
formatters:
  devops:
    format: '%(asctime)s - %(levelname)s - %(module)s'
  application:
    format: '%(asctime)s - %(levelname)s - %(module)s - %(lineno)s'
  qa:
    format: '%(asctime)s - %(levelname)s - %(module)s - %(lineno)s'
handlers:
  devops:
    class: logging.FileHandler
    level: ERROR
    filename: 'devops.log'
  application:
    class: logging.handlers.RotatingFileHandler
    level: INFO
    filename: 'application.log'
    mode: 'a'
    maxBytes: 10000
    backupCount: 3
  qa:
    class: logging.StreamHandler
    level: DEBUG
    stream: ext://sys.stdout
loggers:
  devops:
    level: ERROR
    formatter: devops
    handlers: [devops]
    propagate: no
  application:
    level: INFO
    formatter: application
    handlers: [application]
    propagate: no
  qa:
    level: DEBUG
    formatter: qa
    handlers: [qa]
    propagate: no
root:
  level: ERROR
  handlers: [devops, application, qa]

In my next post I'll write about how to use the above configuration file to allow the various teams to get the log output they need.

  1. full documentation on what is available for the formatters can be found here: https://docs.python.org/3/library/logging.html#logrecord-attributes ↩

Logging Part 1

Logging

Last year I worked on an update to the package tryceratops with Gui Latrova to include a verbose flag for logging.

Honestly, Gui was a huge help and I wrote about my experience here but I didn't really understand why what I did worked.

Recently I decided that I wanted to better understand logging so I dove into some posts from Gui, and sat down and read the documentation on the logging from the standard library.

My goal with this was to (1) be able to use logging in my projects, and (2) write something that may be able to help others.

Full disclosure, Gui has a really good article explaining logging and I think everyone should read it. My notes below are a synthesis of his article, my understanding of the documentation from the standard library, and the Python HowTo written in a way to answer the Five W questions I was taught in grade school.

The Five W's

Who are the generated logs for?

Anyone trying to troubleshoot an issue, or monitor the history of actions that have been logged in an application.

What is written to the log?

The formatter determines what to display or store.

When is data written to the log?

The logging level determines when to log the issue.

Where is the log data sent to?

The handler determines where to send the log data whether that's a file, or stdout.

Why would I want to use logging?

To keep a history of actions taken during your code.

How is the data sent to the log?

The loggers determine how to bundle all of it together through calls to various methods.

Examples

Let's say I want a logger called my_app_errors that captures all ERROR level incidents and higher to a file and to tell me the date time, level, message, logger name, and give a trace back of the error, I could do the following:

import logging

message='oh no! an error occurred'
formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s - %(name)s')
logger = logging.getLogger('my_app_errors')
fh = logging.FileHandler('errors.log')
fh.setFormatter(formatter)
logger.addHandler(fh)
logger.error(message, stack_info=True)

The code above would generate something like this to a file called errors.log

2022-03-28 19:45:49,188 - ERROR - oh no! an error occurred - my_app_errors
Stack (most recent call last):
  File "/Users/ryan/Documents/github/logging/test.py", line 9, in <module>
    logger.error(message, stack_info=True)

If I want a logger that will do all of the above AND output debug information to the console I could:

import logging

message='oh no! an error occurred'

logger = logging.getLogger('my_app_errors')

ch = logging.StreamHandler()
fh = logging.FileHandler('errors.log')

formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s - %(name)s')

fh.setFormatter(formatter)
ch.setFormatter(formatter)

logger.addHandler(fh)
logger.addHandler(ch)

logger.error(message, stack_info=True)
logger.debug(message, stack_info=True)

Again, the code above would generate something like this to a file called errors.log

2022-03-28 19:45:09,406 - ERROR - oh no! an error occurred - my_app_errors
Stack (most recent call last):
  File "/Users/ryan/Documents/github/logging/test.py", line 18, in <module>
    logger.error(message, stack_info=True)

but it would also output to stderr in the terminal something like this:

2022-03-27 13:18:45,367 - ERROR - oh no! an error occurred - my_app_errors
Stack (most recent call last):
  File "<stdin>", line 1, in <module>

The above it a bit hard to scale though. What happens when we want to have multiple formatters, for different levels that get output to different places? We can incorporate all of that into something like what we see above, OR, we can stat to leverage the use of logging configuration files.

Why would we want to have multiple formatters? Perhaps the DevOps team wants robust logging messages on anything ERROR and above, but the application team wants to have INFO and above in a rotating file name schema, while the QA team needs to have the DEBUG and up output to standard out.

You CAN do all of this inline with the code above, but would you really want to? Probably not.

Enter configuration files to allow easier management of log files (and a potential way to make everyone happy) which I'll cover in the next post.

I made a Slackbot!

Building my first Slack Bot

I had added a project to my OmniFocus database in November of 2021 which was, "Build a Slackbot" after watching a Video by Mason Egger. I had hoped that I would be able to spend some time on it over the holidays, but I was never able to really find the time.

A few weeks ago, Bob Belderbos tweeted:

And I responded

I didn't really have anymore time now than I did over the holiday, but Bob asking and me answering pushed me to actually write the darned thing.

I think one of the problems I encountered was what backend / tech stack to use. I'm familiar with Django, but going from 0 to something in production has a few steps and although I know how to do them ... I just felt ~overwhelmed~ by the prospect.

I felt equally ~overwhelmed~ by the prospect of trying FastAPI to create the API or Flask, because I am not as familiar with their deployment story.

Another thing that was different now than before was that I had worked on a Django Cookie Cutter to use and that was 'good enough' to try it out. So I did.

I ran into a few problems while working with my Django Cookie Cutter but I fixed them and then dove head first into writing the Slack Bot

The model

The initial implementation of the model was very simple ... just 2 fields:

class Acronym(models.Model):
    acronym = models.CharField(max_length=8)
    definition = models.TextField()

    def save(self, *args, **kwargs):
        self.acronym = self.acronym.lower()
        super(Acronym, self).save(*args, **kwargs)

    class Meta:
        unique_together = ("acronym", "definition")
        ordering = ["acronym"]

    def __str__(self) -> str:
        return self.acronym

Next I created the API using Django Rest Framework using a single serializer

class AcronymSerializer(serializers.ModelSerializer):
    class Meta:
        model = Acronym
        fields = [
            "id",
            "acronym",
            "definition",
        ]

which is used by a single view

class AcronymViewSet(viewsets.ReadOnlyModelViewSet):
    serializer_class = AcronymSerializer
    queryset = Acronym.objects.all()

    def get_object(self):
        queryset = self.filter_queryset(self.get_queryset())
        print(self.kwargs["acronym"])
        acronym = self.kwargs["acronym"]
        obj = get_object_or_404(queryset, acronym__iexact=acronym)

        return obj

and exposed on 2 end points:

from django.urls import include, path

from .views import AcronymViewSet, AddAcronym, CountAcronyms, Events

app_name = "api"

user_list = AcronymViewSet.as_view({"get": "list"})
user_detail = AcronymViewSet.as_view({"get": "retrieve"})

urlpatterns = [
    path("", AcronymViewSet.as_view({"get": "list"}), name="acronym-list"),
    path("<acronym>/", AcronymViewSet.as_view({"get": "retrieve"}), name="acronym-detail"),
    path("api-auth/", include("rest_framework.urls", namespace="rest_framework")),
]

Getting the data

At my joby-job we use Jira and Confluence. In one of our Confluence spaces we have a Glossary page which includes nearly 200 acronyms. I had two choices:

  1. Copy and Paste the acronym and definition for each item
  2. Use Python to get the data

I used Python to get the data, via a Jupyter Notebook, but I didn't seem to save the code anywhere (🤦🏻), so I can't include it here. But trust me, it was 💯.

Setting up the Slack Bot

Although I had watched Mason's video, since I was building this with Django I used this article as a guide in the development of the code below.

The code from my views.py is below:

ssl_context = ssl.create_default_context()
ssl_context.check_hostname = False
ssl_context.verify_mode = ssl.CERT_NONE

SLACK_VERIFICATION_TOKEN = getattr(settings, "SLACK_VERIFICATION_TOKEN", None)
SLACK_BOT_USER_TOKEN = getattr(settings, "SLACK_BOT_USER_TOKEN", None)
CONFLUENCE_LINK = getattr(settings, "CONFLUENCE_LINK", None)
client = slack.WebClient(SLACK_BOT_USER_TOKEN, ssl=ssl_context)

class Events(APIView):
    def post(self, request, *args, **kwargs):

        slack_message = request.data

        if slack_message.get("token") != SLACK_VERIFICATION_TOKEN:
            return Response(status=status.HTTP_403_FORBIDDEN)

        # verification challenge
        if slack_message.get("type") == "url_verification":
            return Response(data=slack_message, status=status.HTTP_200_OK)
        # greet bot
        if "event" in slack_message:
            event_message = slack_message.get("event")

            # ignore bot's own message
            if event_message.get("subtype"):
                return Response(status=status.HTTP_200_OK)

            # process user's message
            user = event_message.get("user")
            text = event_message.get("text")
            channel = event_message.get("channel")
            url = f"https://slackbot.ryancheley.com/api/{text}/"
            response = requests.get(url).json()
            definition = response.get("definition")
            if definition:
                message = f"The acronym '{text.upper()}' means: {definition}"
            else:
                confluence = CONFLUENCE_LINK + f'/dosearchsite.action?cql=siteSearch+~+"{text}"'
                confluence_link = f"<{confluence}|Confluence>"
                message = f"I'm sorry <@{user}> I don't know what *{text.upper()}* is :shrug:. Try checking {confluence_link}."

            if user != "U031T0UHLH1":
                client.chat_postMessage(
                    blocks=[{"type": "section", "text": {"type": "mrkdwn", "text": message}}], channel=channel
                )
                return Response(status=status.HTTP_200_OK)
        return Response(status=status.HTTP_200_OK)

Essentially what the Slack Bot does is takes in the request.data['text'] and checks it against the DRF API end point to see if there is a matching Acronym.

If there is, then it returns the acronym and it's definition.

If it's not, you get a message that it's not sure what you're looking for, but that maybe Confluence1 can help, and gives a link to our Confluence Search page.

The last thing you'll notice is that if the User has a specific ID it won't respond with a message. That's because in my initial testing I just had the Slack Bot replying to the user saying 'Hi' with a 'Hi' back to the user.

I had a missing bit of logic though, so once you said hi to the Slack Bot, it would reply back 'Hi' and then keep replying 'Hi' because it was talking to itself. It was comical to see in real time 😂.

Using ngrok to test it locally

ngrok is a great tool for taking a local url, like localhost:8000/api/entpoint, and exposing it on the internet with a url like https://a123-45-678-901-234.ngrok.io/api/entpoint. This allows you to test your local code and see any issues that might arise when pushed to production.

As I mentioned above the Slack Bot continually said "Hi" to itself in my initial testing. Since I was running ngrok to serve up my local Server I was able to stop the infinite loop by stopping my local web server. This would have been a little more challenging if I had to push my code to an actual web server first and then tested.

Conclusion

This was such a fun project to work on, and I'm really glad that Bob tweeted asking what Slack Bot we would build.

That gave me the final push to actually build it.

  1. You'll notice that I'm using an environment variable to define the Confluence Link and may wonder why. It's mostly to keep the actual Confluence Link used at work non-public and not for any other reason 🤷🏻 ↩

Adding Search to My Pelican Blog with Datasette

Last summer I migrated my blog from Wordpress to Pelican. I did this for a couple of reasons (see my post here), but one thing that I was a bit worried about when I migrated was that Pelican's offering for site search didn't look promising.

There was an outdated plugin called tipue-search but when I was looking at it I could tell it was on it's last legs.

I thought about it, and since my blag isn't super high trafficked AND you can use google to search a specific site, I could wait a bit and see what options came up.

After waiting a few months, I decided it would be intersting to see if I could write a SQLite utility to get the data from my blog, add it to a SQLite database and then use datasette to serve it up.

I wrote the beginning scaffolding for it last August in a utility called pelican-to-sqlite, but I ran into several technical issues I just couldn't overcome. I thought about giving up, but sometimes you just need to take a step away from a thing, right?

After the first of the year I decided to revisit my idea, but first looked to see if there was anything new for Pelican search. I found a tool plugin called search that was released last November and is actively being developed, but as I read through the documentation there was just A LOT of stuff:

  • stork
  • requirements for the structure of your page html
  • static asset hosting
  • deployment requires updating your nginx settings

These all looked a bit scary to me, and since I've done some work using datasette I thought I'd revisit my initial idea.

My First Attempt

As I mentioned above, I wrote the beginning scaffolding late last summer. In my first attempt I tried to use a few tools to read the md files and parse their yaml structure and it just didn't work out. I also realized that Pelican can have reStructured Text and that any attempt to parse just the md file would never work for those file types.

My Second Attempt

The Plugin

During the holiday I thought a bit about approaching the problem from a different perspective. My initial idea was to try and write a datasette style package to read the data from pelican. I decided instead to see if I could write a pelican plugin to get the data and then add it to a SQLite database. It turns out, I can, and it's not that hard.

Pelican uses signals to make plugin in creation a pretty easy thing. I read a post and the documentation and was able to start my effort to refactor pelican-to-sqlite.

From The missing Pelican plugins guide I saw lots of different options, but realized that the signal article_generator_write_article is what I needed to get the article content that I needed.

I then also used sqlite_utils to insert the data into a database table.

def save_items(record: dict, table: str, db: sqlite_utils.Database) -> None:  # pragma: no cover
    db[table].insert(record, pk="slug", alter=True, replace=True)

Below is the method I wrote to take the content and turn it into a dictionary which can be used in the save_items method above.

def create_record(content) -> dict:
    record = {}
    author = content.author.name
    category = content.category.name
    post_content = html2text.html2text(content.content)
    published_date = content.date.strftime("%Y-%m-%d")
    slug = content.slug
    summary = html2text.html2text(content.summary)
    title = content.title
    url = "https://www.ryancheley.com/" + content.url
    status = content.status
    if status == "published":
        record = {
            "author": author,
            "category": category,
            "content": post_content,
            "published_date": published_date,
            "slug": slug,
            "summary": summary,
            "title": title,
            "url": url,
        }
    return record

Putting these together I get a method used by the Pelican Plugin system that will generate the data I need for the site AND insert it into a SQLite database

def run(_, content):
    record = create_record(content)
    save_items(record, "content", db)

def register():
    signals.article_generator_write_article.connect(run)

The html template update

I use a custom implementation of Smashing Magazine. This allows me to do some edits, though I mostly keep it pretty stock. However, this allowed me to make a small edit to the base.html template to include a search form.

In order to add the search form I added the following code to base.html below the nav tag:

    <section class="relative h-8">
    <section class="absolute inset-y-0 right-10 w-128">
    <form
    class = "pl-4"
    <
    action="https://search-ryancheley.vercel.app/pelican/article_search?text=name"
    method="get">
            <label for="site-search">Search the site:</label>
            <input type="search" id="site-search" name="text"
                    aria-label="Search through site content">
            <button class="rounded-full w-16 hover:bg-blue-300">Search</button>
    </form>
    </section>

Putting it all together with datasette and Vercel

Here's where the magic starts. Publishing data to Vercel with datasette is extremely easy with the datasette plugin datasette-publish-vercel.

You do need to have the Vercel cli installed, but once you do, the steps for publishing your SQLite database is really well explained in the datasette-publish-vercel documentation.

One final step to do was to add a MAKE command so I could just type a quick command which would create my content, generate the SQLite database AND publish the SQLite database to Vercel. I added the below to my Makefile:

vercel:
    { \
    echo "Generate content and database"; \
    make html; \
    echo "Content generation complete"; \
    echo "Publish data to vercel"; \
    datasette publish vercel pelican.db --project=search-ryancheley --metadata metadata.json; \
    echo "Publishing complete"; \
    }

The line

datasette publish vercel pelican.db --project=search-ryancheley --metadata metadata.json; \

has an extra flag passed to it (--metadata) which allows me to use metadata.json to create a saved query which I call article_search. The contents of that saved query are:

select summary as 'Summary', url as 'URL', published_date as 'Published Data' from content where content like '%' || :text || '%' order by published_date

This is what allows the action in the form above to have a URL to link to in datasette and return data!

With just a few tweaks I'm able to include a search tool, powered by datasette for my pelican blog. Needless to say, I'm pretty pumped.

Next Steps

There are still a few things to do:

  1. separate search form html file (for my site)
  2. formatting datasette to match site (for my vercel powered instance of datasette)
  3. update the README for pelican-to-sqlite package to better explain how to fully implement
  4. Get pelican-to-sqlite added to the pelican-plugins page

The Well Maintained Test

At the beginning of November Adam Johnson tweeted

I’ve come up with a test that we can use to decide whether a new package we’re considering depending on is well-maintained.

and linked to an article he wrote.

He came up (with the help of Twitter) twelve questions to ask of any library that you're looking at:

  1. Is it described as “production ready”?
  2. Is there sufficient documentation?
  3. Is there a changelog?
  4. Is someone responding to bug reports?
  5. Are there sufficient tests?
  6. Are the tests running with the latest <Language> version?
  7. Are the tests running with the latest <Integration> version?
  8. Is there a Continuous Integration (CI) configuration?
  9. Is the CI passing?
  10. Does it seem relatively well used?
  11. Has there been a commit in the last year?
  12. Has there been a release in the last year?

I thought it would be interesting to turn that checklist into a Click App using Simon Willison's Click App Cookiecutter.

I set out in earnest to do just that on November 8th.

What started out as just a simple Click app, quickly turned in a pretty robust CLI using Will McGugan's Rich library.

I started by using the GitHub API to try and answer the questions, but quickly found that it couldn't answer them all. Then I cam across the PyPI API which helped to answer almost all of them programatically.

There's still a bit of work to do to get it where I want it to, but it's pretty sweet that I can now run a simple command and review the output to see if the package is well maintained.

You can even try it on the package I wrote!

the-well-maintained-test https://github.com/ryancheley/the-well-maintained-test

Which will return (as of this writing) the output below:

1. Is it described as 'production ready'?
        The project is set to Development Status Beta
2. Is there sufficient documentation?
        Documentation can be found at
https://github.com/ryancheley/the-well-maintained-test/blob/main/README.md
3. Is there a changelog?
        Yes
4. Is someone responding to bug reports?
        The maintainer took 0 days to respond to the bug report
        It has been 2 days since a comment was made on the bug.
5. Are there sufficient tests? [y/n]: y
        Yes
6. Are the tests running with the latest Language version?
        The project supports the following programming languages
                - Python 3.7
                - Python 3.8
                - Python 3.9
                - Python 3.10

7. Are the tests running with the latest Integration version?
        This project has no associated frameworks
8. Is there a Continuous Integration (CI) configuration?
        There are 2 workflows
         - Publish Python Package
         - Test

9. Is the CI passing?
        Yes
10.  Does it seem relatively well used?
        The project has the following statistics:
        - Watchers: 0
        - Forks: 0
        - Open Issues: 1
        - Subscribers: 1
11.  Has there been a commit in the last year?
        Yes. The last commit was on 11-20-2021 which was 2 days ago
12. Has there been a release in the last year?
        Yes. The last commit was on 11-20-2021 which was 2 days ago

There is still one question that I haven't been able to answer programmatically with an API and that is:

Are there sufficient tests?

When that question comes up, you're prompted in the terminal to answer either y/n.

But, it does leave room for a fix by someone else!

Styling Clean Up with Bash

I have a side project I've been working on for a while now. One thing that happened overtime is that the styling of the site grew organically. I'm not a designer, and I didn't have a master set of templates or design principals guiding the development. I kind of hacked it together and made it look "nice enough"

That was until I really starting going from one page to another and realized that there styling of various pages wasn't just a little off ... but A LOT off.

As an aside, I'm using tailwind as my CSS Framework

I wanted to make some changes to the styling and realized I had two choices:

  1. Manually go through each html template (the project is a Django project) and catalog the styles used for each element

OR

  1. Try and write a bash command to do it for me

Well, before we jump into either choice, let's see how many templates there are to review!

As I said above, this is a Django project. I keep all of my templates in a single templates directory with each app having it's own sub directory.

I was able to use this one line to count the number of html files in the templates directory (and all of the sub directories as well)

ls -R templates | grep html | wc -l

There are 3 parts to this:

  1. ls -R templates will list out all of the files recursively list subdirectories encountered in the templates directory
  2. grep html will make sure to only return those files with html
  3. wc -l uses the word, line, character, and byte count to return the number of lines return from the previous command

In each case one command is piped to the next.

This resulted in 41 html files.

OK, I'm not going to want to manually review 41 files. Looks like we'll be going with option 2, "Try and write a bash command to do it for me"

In the end the bash script is actually relatively straight forward. We're just using grep two times. But it's the options on grep that change (as well as the regex used) that are what make the magic happen

The first thing I want to do is find all of the lines that have the string class= in them. Since there are html templates, that's a pretty sure fire way to find all of the places where the styles I am interested in are being applied

I use a package called djhtml to lint my templates, but just in case something got missed, I want to ignore case when doing my regex, i.e, class= should be found, but so should cLass= or Class=. In order to get that I need to have the i flag enabled.

Since the html files may be in the base directory templates or one of the subdirectories, I need to recursively search, so I include the r flag as well

This gets us

grep -ri "class=" templates/*

That command will output a whole lines like this:

templates/tasks/steps_lists.html:    <table class="table-fixed w-full border text-center">
templates/tasks/steps_lists.html:                <th class="w-1/2 flex justify-left-2 p-2">Task</th>
templates/tasks/steps_lists.html:                <th class="w-1/4 justify-center p-2">Edit</th>
templates/tasks/steps_lists.html:                <th class="w-1/4 justify-center p-2">Delete</th>
templates/tasks/steps_lists.html:                    <td class="flex justify-left-2 p-2">
templates/tasks/steps_lists.html:                    <td class="p-2 text-center">
templates/tasks/steps_lists.html:                        <a class="block hover:text-gray-600"
templates/tasks/steps_lists.html:                            <i class="fas fa-edit"></i>
templates/tasks/steps_lists.html:                    <td class="p-2 text-center">
templates/tasks/steps_lists.html:                        <a class="block hover:text-gray-600"
templates/tasks/steps_lists.html:                            <i class="fas fa-trash-alt"></i>
templates/tasks/step_form.html:        <section class="bg-gray-400 text-center py-2">
templates/tasks/step_form.html:            <button type="submit" class="bg-blue-500 hover:bg-blue-700 text-white font-bold py-2 px-4 rounded">{{view.action|default:"Add"}} </button>

Great! We have the data we need, now we just want to clean it up.

Again, we'll use grep onl this time we want to look for an honest to goodness regular expression. We're trying to identify everything in between the first open angle brackey (<) and the first closed angle bracket (>)

A bit of googling, searching stack overflow, and playing with the great site regex101.com gets you this

<[^\/].*?>

OK, we have the regular expression we need, but what options do we need to use in grep? In this case we actually have two options:

  1. Use egrep (which allos for extended regular expressions)
  2. Use grep -E to make grep behave like egrep

I chose to go with option 2, use grep -E. Next, we want to return ONLY the part of the line that matches the regex. For that, we can use the option o. Putting it all together we get

grep -Eo "<[^\/].*?>"

Now, we can pipe the results from our first command into our second command and we get this:

grep -ri "class=" templates/* | grep -Eo "<[^\/].*?>"

This will output to standard out, but next I really want to use a tool for aggregation and comparison. It was at this point that I decided the best next tool to use would be Excel. So I sent the output to a text file and then opened that text file in Excel to do the final review. To output the above to a text file called tailwind.txt we

grep -ri "class=" templates/* | grep -Eo "<[^\/].*?>" > tailwind.txt

With these results I was able to find several styling inconsistencies and then fix them up. In all it took me a few nights of working out the bash commands and then a few more nights to get the styling consistent. In the process I learned so much about grep and egrep. It was a good exercise to have gone through.

djhtml and justfile

I had read about a project called djhtml and wanted to use it on one of my projects. The documentation is really good for adding it to precommit-ci, but I wasn't sure what I needed to do to just run it on the command line.

It took a bit of googling, but I was finally able to get the right incantation of commands to be able to get it to run on my templates:

djhtml -i $(find templates -name '*.html' -print)

But of course because I have the memory of a goldfish and this is more than 3 commands to try to remember to string together, instead of telling myself I would remember it, I simply added it to a just file and now have this recipe:

# applies djhtml linting to templates
djhtml:
    djhtml -i $(find templates -name '*.html' -print)

This means that I can now run just djhtml and I can apply djhtml's linting to my templates.

Pretty darn cool if you ask me. But then I got to thinking, I can make this a bit more general for 'linting' type activities. I include all of these in my precommit-ci, but I figured, what the heck, might as well have a just recipe for all of them!

So I refactored the recipe to be this:

# applies linting to project (black, djhtml, flake8)
lint:
    djhtml -i $(find templates -name '*.html' -print)
    black .
    flake8 .

And now I can run all of these linting style libraries with a single command just lint

Prototyping with Datasette

At my job I work with some really talented Web Developers that are saddled with a pretty creaky legacy system.

We're getting ready to start on a new(ish) project where we'll be taking an old project built on this creaky legacy system (VB.net) and re-implementing it on a C# backend and an Angular front end. We'll be working on a lot of new features and integrations so it's worth rebuilding it versus shoehorning the new requirements into the legacy system.

The details of the project aren't really important. What is important is that as I was reviewing the requirements with the Web Developer Supervisor he said something to the effect of, "We can create a proof of concept and just hard code the data in a json file to fake th backend."

The issue is ... we already have the data that we'll need in a MS SQL database (it's what is running the legacy version) it's just a matter of getting it into the right json "shape".

Creating a 'fake' json object that kind of/maybe mimics the real data is something we've done before, and it ALWAYS seems to bite us in the butt. We don't account for proper pagination, or the real lengths of data in the fields or NULL values or whatever shenanigans happen to befall real world data!

This got me thinking about Simon Willison's project Datasette and using it to prototype the API end points we would need.

I had been trying to figure out how to use the db-to-sqlite to extract data from a MS SQL database into a SQLite database and was successful (see my PR to db-to-sqlite here)

With this idea in hand, I reviewed it with the Supervisor and then scheduled a call with the web developers to review datasette.

During this meeting, I wanted to review:

  1. The motivation behind why we would want to use it
  2. How we could leverage it to do Rapid Prototying
  3. Give a quick demo data from the stored procedure that did the current data return for the legacy project.

In all it took less than 10 minutes to go from nothing to a local instance of datasette running with a prototype JSON API for the web developers to see.

I'm hoping to see the Web team use this concept more going forward as I can see huge benefits for Rapid Prototyping of ideas, especially if you already have the data housed in a database. But even if you don't, datasette has tons of tools to get the data from a variety of sources into a SQLite database to use and then you can do the rapid prototyping!

Contributing to Tryceratops

I read about a project called Tryceratops on Twitter when it was tweeted about by Jeff Triplet

I checked it out and it seemed interesting. I decided to use it on my simplest Django project just to give it a test drive running this command:

tryceratops .

and got this result:

Done processing! 🦖✨
Processed 16 files
Found 0 violations
Failed to process 1 files
Skipped 2340 files

This is nice, but what is the file that failed to process?

This left me with two optons:

  1. Complain that this awesome tool created by someone didn't do the thing I thought it needed to do

OR

  1. Submit an issue to the project and offer to help.

I went with option 2 😀

My initial commit was made in a pretty naive way. It did the job, but not in the best way for maintainability. I had a really great exchange with the maintaner Guilherme Latrova about the change that was made and he helped to direct me in a different direction.

The biggest thing I learned while working on this project (for Python at least) was the logging library. Specifically I learned how to add:

  • a formatter
  • a handler
  • a logger

For my change, I added a simple format with a verbose handler in a custom logger. It looked something like this:

The formatter:

"simple": {
    "format": "%(message)s",
},

The handler:

"verbose_output": {
    "class": "logging.StreamHandler",
    "level": "DEBUG",
    "formatter": "simple",
    "stream": "ext://sys.stdout",
},

The logger:

"loggers": {
    "tryceratops": {
        "level": "INFO",
        "handlers": [
            "verbose_output",
        ],
    },
},

This allows the verbose flag to output the message to Standard Out and give and INFO level of detail.

Because of what I learned, I've started using the logging library on some of my work projects where I had tried to roll my own logging tool. I should have known there was a logging tool in the Standard Library BEFORE I tried to roll me own 🤦🏻‍♂️

The other thing I (kind of) learned how to do was to squash my commits. I had never had a need (or desire?) to squash commits before, but the commit message is what Guilherme uses to generate the change log. So, with his guidance and help I tried my best to squash those commits. Although in the end he had to do it (still not entiredly sure what I did wrong) I was exposed to the idea of squashing commits and why they might be done. A win-win!

The best part about this entire experience was getting to work with Guilherme Latrova. He was super helpful and patient and had great advice without telling me what to do. The more I work within the Python ecosystem the more I'm just blown away by just how friendly and helpful everyone is and it's what make me want to do these kinds of projects.

If you haven't had a chance to work on an open source project, I highly recommend it. It's a great chance to learn and to meet new people.


Page 1 / 11