Using Python to Check for File Changes in Excel

The Problem

Data exchange in healthcare is ... harder than it needs to be. Not all partners in the healthcare arena understand and use technology to its fullest benefit.

Take for example several health plans which want data reported to them for CMS (Centers for Medicare and Medicaid Services) regulations. They will ask their 'delegated' groups to fill out an excel file. As in, they expect you will actually fill out an excel file, either by manually entering the data OR by potentially copying and pasting your data into their excel file.

They will also, quite frequently, change their mind on what they want AND the order in which they want the data to appear in their excel file. But there's no change log to tell you what (if anything has changed). All that you will get is an email which states, "Here's the new template to be used for report XYZ" ... even if this 'new' report is the same as the last one that was sent.

Some solutions might be to use versioning software (like Git) but all they will do is tell you that there is a difference, not what the difference is. For example, when looking at a simple excel file added to git and using git diff you see:

diff --git a/Book3.xlsx b/Book3.xlsx
index 05a8b41..e96cdb5 100644
Binary files a/Book3.xlsx and b/Book3.xlsx differ

This has been a giant pain in the butt for a while, but with the recent shelter-in-place directives, I have a bit more time on the weekends to solve these kinds of problems.

The Solution

Why Python of Course!

Only two libraries are needed to make the comparison: (1) os, (2) pandas

The basic idea is to:

  1. Load the files
  2. use pandas to compare the files
  3. write out the differences, if they exist

Load the Files

The code below loads the necessary libraries, and then loads the excel files into 2 pandas dataframes. One thing that my team has to watch out for are tab names that have leading spaces that aren't easy to see inside of excel. This can cause all sorts of nightmares from a troubleshooting perspective.

import os
import pandas as pd

file_original = os.path.join(\\path\\to\\original\\file, original_file.xlsx)
file_new = os.path.join(\\path\\to\\new\\file, new_file.xlsx)

sheet_name_original = name_of_sheet_in_original_file
sheet_name_new = name_of_sheet_in_new_file

df1 = pd.read_excel(file_original, sheet_name_original)
df2 = pd.read_excel(file_new, sheet_name_new)

Use Pandas to compare

This is just a one liner, but is super powerful. Pandas DataFrames have a method to see if two frames are the same. So easy!

data_frame_same = df1.equals(df2)

Write out the differences if they exist:

First we specify where we're going to write out the differences to. We use w+ because we'll be writing out to a file AND potentially appending, depending on differences that are found. The f.truncate(0) will clear out the file so that we get just the differences on this run. If we don't do this then we'll just append to the file over and over again ... and that can get confusing.

f.open(\\path\\to\\file\\to\\write\\differences.txt, 'w+')
f.truncate(0)

Next, we check to see if there are any differences and if they are, we write a simple message to our text file from above:

if data_frame_same:
    f.write('No differences detected')

If differences are found, then we loop through the lines of the file, finding the differences and and writing them to our file:

else:
    f.write('*** WARNING *** Differences Found\n\n')
    for c in range(max(len(df1.columns), len(df2.columns))):
        try:
            header1 = df1.columns[c].strip().lower().replace('\n', '')
            header2 = df2.columns[c].strip().lower().replace('\n', '')
            if header1 == header2:
                f.write(f'Headers are the same: {header1}\n')
            else:
                f.write(f'Difference Found: {header1} -> {header2}\n')
        except:
            pass

f.close()

The code above finds the largest column header list (the file may have had a new column added) and uses a try/except to let us get the max of that to loop over.

Next, we check for differences between header1 and header2. If they are the same, we just write that out, if they aren't, we indicate that header1 was transformed to header2

A sample of the output when the column headers have changed is below:

*** WARNING *** Differences Found

Headers are the same: beneficiary first name
...
Difference Found: person who made the request -> who made the request?
...

Future Enhancements

In just using it a couple of times I've already spotted a couple of spots for enhancements:

  1. Use input to allow the user to enter the names/locations of the files
  2. Read the tab names and allow user to select from command line

Conclusion

I'm looking forward to implementing the enhancements mentioned above to make this even more user friendly. In the mean time, it'll get the job done and allow someone on my team to work on something more interesting then comparing excel files to try (and hopefully find) differences.

Mischief Managed

A few weeks back I decided to try and update my Python version with Homebrew. I had already been through an issue where the an update like this was going to cause an issue, but I also knew what the fix was.

With this knowledge in hand I happily performed the update. To my surprise, 2 things happened:

  1. The update seemed to have me go from Python 3.7.6 to 3.7.3
  2. When trying to reestablish my Virtual Environment two packages wouldn’t installed: psycopg2 and django-heroku

Now, the update/backdate isn’t the end of the world. Quite honestly, next weekend I’m going to just ditch homebrew and go with the standard download from Python.org because I’m hoping that this non-sense won’t be an issue anymore

The second issue was a bit more irritating though. I spent several hours trying to figure out what the problem was, only to find out, there wasn’t one really.

The ‘fix’ to the issue was to

  1. Open PyCharm
  2. Go to Setting
  3. Go to ‘Project Interpreter’
  4. Click the ‘+’ to add a package
  5. Look for the package that wouldn’t install
  6. Click ‘Install Package’
  7. Viola ... mischief managed

The next time this happens I’m just buying a new computer

CBV - PasswordChangeDoneView

From Classy Class Based Views PasswordChangeDoneView

Render a template. Pass keyword arguments from the URLconf to the context.

Attributes

  • template_name: Much like the LogoutView the default view is the Django skin. Create your own password_change_done.html file to keep the user experience consistent across the site.
  • title: the default uses the function gettext_lazy() and passes the string ‘Password change successful’. The function gettext_lazy() will translate the text into the local language if a translation is available. I’d just keep the default on this.

Example

views.py

class myPasswordChangeDoneView(PasswordChangeDoneView):
    pass

urls.py

path('password_change_done_view/', views.myPasswordChangeDoneView.as_view(), name='password_change_done_view'),

password_change_done.html

{% extends "base.html" %}
{% load i18n %}

{% block content %}
    <h1>
    {% block title %}
        {{ title }}
    {% endblock %}
    </h1>
<p>{% trans "Password changed" %}</p>
{% endblock %}

settings.py

LOGIN_URL = '/<app_name>/login_view/'

The above assumes that have this set up in your urls.py

Special Notes

You need to set the URL_LOGIN value in your settings.py. It defaults to /accounts/login/. If that path isn’t valid you’ll get a 404 error.

Diagram

A visual representation of how PasswordChangeDoneView is derived can be seen here:

PasswordChangeDoneView

Conclusion

Again, not much to do here. Let Django do all of the heavy lifting, but be mindful of the needed work in settings.py and the new template you’ll need/want to create

CBV - PasswordChangeView

From Classy Class Based Views PasswordChangeView

A view for displaying a form and rendering a template response.

Attributes

  • form_class: The form that will be used by the template created. Defaults to Django’s PasswordChangeForm
  • success_url: If you’ve created your own custom PasswordChangeDoneView then you’ll need to update this. The default is to use Django’s but unless you have a top level urls.py has the name of password_change_done you’ll get an error.
  • title: defaults to ‘Password Change’ and is translated into local language

Example

views.py

class myPasswordChangeView(PasswordChangeView):
    success_url = reverse_lazy('rango:password_change_done_view')

urls.py

path('password_change_view/', views.myPasswordChangeView.as_view(), name='password_change_view'),

password_change_form.html

{% extends "base.html" %}
{% load i18n %}

{% block content %}
    <h1>
    {% block title %}
        {{ title }}
    {% endblock %}
    </h1>
<p>{% trans "Password changed" %}</p>
{% endblock %}

Diagram

A visual representation of how PasswordChangeView is derived can be seen here:

PasswordChangeView

Conclusion

The only thing to keep in mind here is the success_url that will most likely need to be set based on the application you’ve written. If you get an error about not being able to use reverse to find your template, that’s the issue.

CBV - LoginView

From Classy Class Based Views LoginView

Display the login form and handle the login action.

Attributes

  • authentication_form: Allows you to subclass AuthenticationForm if needed. You would want to do this IF you need other fields besides username and password for login OR you want to implement other logic than just account creation, i.e. account verification must be done as well. For details see example by Vitor Freitas for more details
  • form_class: The form that will be used by the template created. Defaults to Django’s AuthenticationForm
  • redirect_authenticated_user: If the user is logged in then when they attempt to go to your login page it will redirect them to the LOGIN_REDIRECT_URL configured in your settings.py
  • redirect_field_name: similar idea to updating what the next field will be from the DetailView. If this is specified then you’ll most likely need to create a custom login template.
  • template_name: The default value for this is registration\login.html, i.e. a file called login.html in the registration directory of the templates directory.

There are no required attributes for this view, which is nice because you can just add pass to the view and you’re set (for the view anyway you still need an html file).

You’ll also need to update settings.py to include a value for the LOGIN_REDIRECT_URL.

Note on redirect_field_name

Per the Django Documentation:

If the user isn’t logged in, redirect to settings.LOGIN*URL, passing the current absolute path in the query string. Example: /accounts/login/?next=/polls/3/. *

If redirect_field_name is set then the URL would be:

/accounts/login/?<redirect_field_name>=/polls/3

Basically, you only use this if you have a pretty good reason.

Example

views.py

class myLoginView(LoginView):
    pass

urls.py

path('login_view/', views.myLoginView.as_view(), name='login_view'),

registration/login.html

{% extends "base.html" %}
{% load i18n %}

{% block content %}
<form method="post" action=".">
  {% csrf_token %}

  <div class="mui--text-danger">
    {% for error in form.non_field_errors %}
      {{error}}
    {% endfor %}
  </div>

  <div class="mui-textfield">
    {{ form.username.label }}
    {{ form.username }}
  </div>
  <div class="mui-textfield">
    {{ form.password.label }}
    {{ form.password }}
  </div>

  <input class="mui-btn mui-btn--primary" type="submit" value="{% trans 'Log in' %}" />
  <input type="hidden" name="next" value="{{ request.GET.next }}" />
</form>

<br><div class="mui-divider"></div><br>
{% endblock %}

settings.py

LOGIN_REDIRECT_URL = '/<app_name>/'

Diagram

A visual representation of how LoginView is derived can be seen here:

LoginView

Conclusion

Really easy to implement right out of the box but allows some nice customization. That being said, make those customizations IF you need to, not just because you think you want to.

CBV - LogoutView

From Classy Class Based Views LogoutView

Log out the user and display the 'You are logged out' message.

Attributes

  • next_page: redirects the user on logout.
  • redirect_field_name: The name of a GET field containing the URL to redirect to after log out. Defaults to next. Overrides the next_page URL if the given GET parameter is passed. 1
  • template_name: defaults to registration\logged_out.html. Even if you don’t have a template the view does get rendered but it uses the default Django skin. You’ll want to create your own to allow the user to logout AND to keep the look and feel of the site.

Example

views.py

class myLogoutView(LogoutView):
    pass

urls.py

path('logout_view/', views.myLogoutView.as_view(), name='logout_view'),

registrationlogged_out.html

{% extends "base.html" %}
{% load i18n %}

{% block content %}
<p>{% trans "Logged out" %}</p>
{% endblock %}

Diagram

A visual representation of how LogoutView is derived can be seen here:

Image Link from CCBV YUML goes here

Conclusion

I’m not sure how it could be much easier to implement a logout page.

  1. Per Django Docs ↩︎

CBV - DeleteView

From Classy Class Based Views DeleteView

View for deleting an object retrieved with self.get*object(), with a *

response rendered by a template.

Attributes

There are no new attributes, but 2 that we’ve seen are required: (1) queryset or model; and (2) success_url

Example

views.py

class myDeleteView(DeleteView):
    queryset = Person.objects.all()
    success_url = reverse_lazy('rango:list_view')

urls.py

path('delete_view/<int:pk>', views.myDeleteView.as_view(), name='delete_view'),

\<template_name>.html

Below is just the form that would be needed to get the delete to work.

    <form method="post">
    {% csrf_token %}
    <table border="1">
        <tr>
        <th>First Name</th>
        <th>Last Name</th>
        </tr>
        <tr>
            <td>{{ person.first_name }}</td>
            <td>{{ person.last_name }}</td>
        </tr>
    </table>
    <div>
        <a href="{% url 'rango:list_view' %}">Back</a>
        <input type="submit" value="Delete">
    </div>
    </form>

Diagram

A visual representation of how DeleteView is derived can be seen here:

DeleteView

Conclusion

As far as implementations, the ability to add a form to delete data is about the easiest thing you can do in Django. It requires next to nothing in terms of implementing. We now have step 4 of a CRUD app!

CBV - UpdateView

From Classy Class Based Views UpdateView

View for updating an object, with a response rendered by a template.

Attributes

Two attributes are required to get the template to render. We’ve seen queryset before and in CreateView we saw fields. As a brief refresher

  • fields: specifies what fields from the model or queryset will be displayed on the rendered template. You can you set fields to __all__ if you want to return all of the fields
  • success_url: you’ll want to specify this after the record has been updated so that you know the update was made.

Example

views.py

class myUpdateView(UpdateView):
    queryset = Person.objects.all()
    fields = '__all__'
    extra_context = {
        'type': 'Update'
    }
    success_url = reverse_lazy('rango:list_view')

urls.py

path('update_view/<int:pk>', views.myUpdateView.as_view(), name='update_view'),

\<template>.html

{% block content %}
    <h3>{{ type }} View</h3>
    {% if type == 'Create' %}
        <form action="." method="post">
    {% else %}
        <form action="{% url 'rango:update_view' object.id %}" method="post">
    {% endif %}
    {% csrf_token %}
    <table>
    {{ form.as_p }}
    </table>
    <button type="submit">SUBMIT</button>
    </form>
{% endblock %}

Diagram

A visual representation of how UpdateView is derived can be seen here:

UpdateView

Conclusion

A simple way to implement a form to update data in a model. Step 3 for a CRUD app is now complete!

My first commit to an Open Source Project: Django

Last September the annual Django Con was held in San Diego. I really wanted to go, but because of other projects and conferences for my job, I wasn’t able to make it.

The next best thing to to watch the videos from DjangoCon on YouTube. I watched a couple of the videos, but one that really caught my attention was by Carlton Gibson titled “Your Web Framework Needs You: An Update by Carlton Gibson”.

I took what Carlton said to heart and thought, I really should be able to do something to help.

I went to the Django Issues site and searched for an Easy Pickings issue that involved documentation and found issue 31006 “Document how to escape a date/time format character for the |date and |time filters.”

I read the steps on what I needed to do to submit a pull request, but since it was my first time ever participating like this … I was a bit lost.

Luckily there isn’t anything that you can break, so I was able to wonder around for a bit and get my bearings.

I forked the GitHub repo and I cloned it locally.

I then spent an embarrassingly long time trying to figure out where the change was going to need to be made, and exactly what needed to change.

Finally, with my changes made, I pushed my code changes to GitHub and waited.

Within a few hours Mariusz Felisiak replied back and asked about a suggestion he had made (but which I missed). I dug back into the documentation, found what he was referring to, and made (what I thought) was his suggested change.

Another push and a bit more waiting.

Mariusz Felisiak replied back with some input about the change I pushed up, and I realized I had missed the mark on what he was suggesting.

OK. Third time’s a charm, right?

Turns out, in this case it was. I pushed up one last time and this time, my changes were merged into the master and just like that, I am now a contributor to Django (albeit a very, very, very minor contributor).

Overall, this was a great experience, both with respect to learning about contributing to an open source project, as well as learning about GitHub.

I’m hoping that with the holidays upon us I’ll be able to find the time to pick up one or two (maybe even three) Easy Pickings issues from the Django issue tracker.

CBV - FormView

From Classy Class Based Views FormView

A view for displaying a form and rendering a template response.

Attributes

The only new attribute to review this time is form_class. That being said, there are a few implementation details to cover

  • form_class: takes a Form class and is used to render the form on the html template later on.

Methods

Up to this point we haven’t really needed to override a method to get any of the views to work. This time though, we need someway for the view to verify that the data is valid and then save it somewhere.

  • form_valid: used to verify that the data entered is valid and then saves to the database. Without this method your form doesn’t do anything

Example

This example is a bit more than previous examples. A new file called forms.py is used to define the form that will be used.

forms.py

from django.forms import ModelForm
from rango.models import Person


class PersonForm(ModelForm):
    class Meta:
        model = Person
        exclude = [
            'post_date',
        ]

views.py

class myFormView(FormView):
    form_class = PersonForm
    template_name = 'rango/person_form.html'
    extra_context = {
        'type': 'Form'
    }
    success_url = reverse_lazy('rango:list_view')

    def form_valid(self, form):
        person = Person.objects.create(
            first_name=form.cleaned_data['first_name'],
            last_name=form.cleaned_data['last_name'],
            post_date=datetime.now(),
        )
        return super(myFormView, self).form_valid(form)

urls.py

path('form_view/', views.myFormView.as_view(), name='form_view'),

\<template_name>.html

    <h3>{{ type }} View</h3>
    {% if type != 'Update' %}
        <form action="." method="post">
    {% else %}
        <form action="{% url 'rango:update_view' object.id %}" method="post">
    {% endif %}
    {% csrf_token %}
    <table>
    {{ form.as_p }}
    </table>
    <button type="submit">SUBMIT</button>
    </form>

Diagram

A visual representation of how FormView is derived can be seen here:

FormView

Conclusion

I really struggled with understanding why you would want to implement FormView. I found this explanation on Agiliq and it helped me grok the why:

FormView should be used when you need a form on the page and want to perform certain action when a valid form is submitted. eg: Having a contact us form and sending an email on form submission.

CreateView would probably be a better choice if you want to insert a model instance in database on form submission.

While my example above works, it’s not the intended use of FormView. Really, it’s just an implementation of CreateView using FormView


Page 6 / 13