• 11
    May - 2019

    Analytics
    12 min | 224

    #Analytics: Beautiful Python using PEP8

    Analytics | 12 min | 224


    analytics
    coding
    docker
    jupyter
    pep-8
    programming languages
    python

    PEP-8 (sometimes PEP 8 or PEP8) is a coding standard and style guide for readability and long-term maintainability of code in Python. It was written in 2001 by Guido van Rossum, Barry Warsaw, and Nick Coghlan and provides guidelines and best practices on how to program Python code. PEPs stand for Python Enhancement Proposals, and they describe and document the way Python language evolves, providing a reference point (and in some way a standardization) for the pythonic way to write code.

    This tutorial presents some of the most important key points of PEP-8. If you want to, you can read the full PEP-8 documentation. My goal is to present in the next articles a way to develop a continuous integration (CI) framework, in which, you can program in Python, and every time you commit your code to a repository, this will be proved if it is written 'correctly' and if it works (Test-driven development (TDD)). Thus, I start here with a quick description of PEP-8.

    This is what I will try to accomplish in this tutorial. You will be able to:

    1. Understand the reasoning behind the PEP-8 guidelines
    2. Write Python code that conforms to PEP-8
    3. Set a local environment based on Jupyter so that you can quickly write PEP-8 compliant Python code

    PEP-8 readability

    PEP-8 improves basically the readability of Python code. Python high readability is what Python programmers like most about it. Indeed, a high level of readability is one the important factor of the design of the Python language, following the recognized fact that code is read much more often than it is written. You may spend minutes, or a whole day writing a piece of code to analyze your data. Once you've written it, you're (almost) never going to (re-)write it again. But you'll most surely have to read it often. That piece of code might remain part of a project, and you may be using it for another projects and you'll have to remember what that code does and why you wrote it. So, code readability counts! Try the following:

    >>> import this
    The Zen of Python, by Tim Peters
    [...]

    When a Pythonista (veteran Python developer) calls portions of your code not "Pythonic", they usually mean that these lines of code do not follow the common guidelines and fail to express its intent in what is considered the most readable way. So, following PEP-8 is particularly important if you’re looking for a development job to show your professionalism and tell the others that you understand how to structure code, or/and if you need to collaborate with others and share you code, or/and if you are new to Python to remember what a piece of code does a few days, or weeks, after you've written it.

    General concepts

    Naming Convention

    When you write Python code, you need to define a lot of things: packages, classes, functions, variables, constants etc. and a big problem is the names that you should chose for those definitions. Sensible names will you save time and energy and PEP-8 defines a naming style that you should use. This will help you, because when you read a definition you can see what type of things (variable, constant, function etc.) you are dealing with. The following table outlines some of the common naming types in Python:

    Type Naming Convention Examples
    Function Lowercase word or words. Separate words by underscores to improve readability. function, my_function
    Variable Lowercase single letter (*), word, or words. Separate words with underscores to improve readability. x, var, my_variable
    Class Each word start with a capital letter. Do not separate words with underscores. This style is called camel case. Model, MyClass
    Method Lowercase word or words. Separate words with underscores to improve readability. class_method, method
    Constant An uppercase single letter (*), word, or words. Separate words with underscores to improve readability. CONSTANT, MY_CONSTANT, MY_LONG_CONSTANT
    Module A short, lowercase word or words. Separate words with underscores to improve readability. module.py, my_module.py
    Package A short, lowercase word or words. Do not separate words with underscores. package, mypackage


    Note: Avoid the use O, l, I as single letter variable name, this can be very confusing depending on the font of your editor. They can be confused with 0 and 1.

    Note (*): Avoid using single letters to define variable like e.g. x, unless you are using x as the argument of a mathematical function, it is not clear what x represents.

    Explicit code

    The most explicit and straightforward manner is always preferred in Python:

    Not recommended:

    def get_coordinates(*args):
        x, y = args
        return dict(**locals())
    
    def read(filename):
        # code for reading a csv or json
        # depending on the file extension

    Recommended:

    def get_coordinates(x, y):
        return {'x': x, 'y': y}
    
    def read_csv(filename):
        # code for reading a csv
    
    def read_json(filename):
        # code for reading a json

    In the "recommended" example, the function get_coordinates(...) explicitly receives x and y from the caller and returns a dictionary. The other two functions are e.g. for reading defined csv and json file types. The developers using all these functions know exactly what to do by reading the code lines, and also what the functions do, which is not the case with the "not recommended" example above.

    Names

    Choosing name for variables, functions, classes and so on is always very challenging. You should use some time to define your naming choices, when writing code as it will make your code more readable. You should also use descriptive names to make it clear what the objects represent. Avoid the use of "function", "variable", "constant", "module" and "package" words. These should be clear because you are using the naming convention. This helps you to make your code explicit. For example, if you use x to store a person's name as a string, you could end up with something like this:

    Not recommended:

    >>> x = 'Mauro Riva'
    >>> y, z = x.split()
    >>> print(z, y, sep=', ')
    'Riva, Mauro'

    The code works, but if you are reading it fast you are not going to know what x, y and z are, and it could be very confusing (are they system coordinates?). Choosing the right names should look like this:

    Recommended:

    >>> staff_member_name = 'Mauro Riva'
    >>> first_name, last_name = staff_member_name.split()
    >>> print(last_name, first_name, sep=', ')
    'Riva, Mauro'

    Abbreviations could be also a problem i.e. if you have a function that double a value, and you write the following:

    Not recommended:

    def db(x):
        return x * 2

    At a fist glance, everything is perfect, but in a few days, you can think that you are dealing with a database (db) function and not a multiply by two function. A much clearer way is the following:

    Recommended:

    def multiply_by_two(x):
        return x * 2

    One statement per line

    I used to put more than one variable initialization on the same line but... while some compound statements are appreciated for their brevity and their expressiveness, it is bad practice to have two disjointed statements on the same line of code:

    Not recommended:

    print('one'); print('two')
    
    if(x==1): print('starting')
    
    if <complex_comparison> and <complex_comparison>:
        # do something!

    Recommended:

    print('one')
    print('two')
    
    if(x==1): 
        print('starting')
    
    comparison_cond_1 = <complex_comparison>
    comparison_cond_2 = <complex_comparison>
    if (comparison_cond_1 and comparison_cond_2):
        # do something!

    Indentation

    Indentation is extremely important in Python. The indentation level determines which lines are together in a group. If you have an if block like this:

    # I write this article as I was sick! ;)
    temperature = get_temperature();
    if (temperature >= 38):
        print('Warning! You have fever!')

    The indented print should be only executed if the temperature is equal or greater than 38 i.e. only if the if statement returns/results True. The key indentation rules laid out by PEP 8 are the following:

    • Use 4 consecutive spaces to indicate indentation.
    • Prefer spaces over tabs.

    You should use spaces instead of tabs when indenting code. In VSCode, you can see the setting on the status bar (see Fig. 1). You can change the settings if you click on it. Furthermore, you can go to "Preferences: Open settings" after pressing Ctrl+Shift+P, and look for the settings there. There are also some extensions that allows to set it automatically depending on the opened file type (e.g. Python should have 4 spaces while Ruby only 2 spaces for indenting code).

    Visual Studio Code - Status Bar
    Fig. 1: Indentation configuration on VSCode (red arrow)

    If you are using Python 2+, you will not see errors when trying to run code that used a mixture of tabs and spaces for indentation. To help you to check consistency, you can add a -t or -tt flag to the interpreter when running the code on the command line.

    $ python2 -t module_temperature.py
    module_temperature.py: inconsistent use of tabs and spaces in indentation
    
    $ python2 -tt module_temperature.py
      File "module_temperature.py", line 10
        print(i, j)
                 ^
    TabError: inconsistent use of tabs and spaces in indentation

    When using -t, you will get warnings, if your code in inconsistent. Using -tt will issue errors instead of warning and your code will not run.

    Python 3 does not allow mixing of tabs and spaces, and you will get errors issued automatically.

    Code Layout

    Maximum Line Length and Line Breaking

    PEP-8 suggests that you limit all code lines to a maximum of 79 characters. For flowing long blocks of text with fewer structural restrictions (docstrings or comments), the line length should be limited to 72 characters.

    This makes it possible to have several files open side-by-side, voids line wrapping, and works well when using code review tools (e.g. git compare) that present two versions in adjacent columns.

    If it is not possible to keep the lines to a maximum of 79 characters, PEP-8 outlines some ways to allow statements to run over several lines. Python assumes line continuation if code is contained within parentheses, brackets, or braces. If you are using continuation to keep the lines under a maximum of 79 characters, you should use some indentation to improve readability. You can align the indented block with the opening delimiter:

    def mutiply_arguments(arg_one, arg_two,
                          arg_three, arg_four):
        [...]
        return result

    Sometimes you need only 4 spaces to align with the opening delimiter. If this occurs, it is hard to read the code, and determine which lines are nested code:

    t = 5
    if (t > 3 and
        t < 10):
        print(t)

    In this case PEP-8 provides two alternatives to improve the readability of the code:

    # add a comment after the final condition
    t = 5
    if (t > 3 and
        t < 10):
        # add a comment here
        print(t)
    
    # add extra indentation (4 spaces) to the line continuation
    t = 5
    if (t > 3 and
            t < 10):
        print(t)

    Another style of indentation is known as hanging indent.

    def mutiply_arguments(
            arg_one, arg_two,
            arg_three, arg_four):
        [...]
        return result    

    In this case, you should add an extra indentation to distinguish the continued line from code contained inside the function.

    If it is not possible to use continuation, backslashes may still be appropriate in some cases. Backslashes are accepted on e.g. long, multiple with-statements etc.:

    # eg 1
    with open('/path/to/some/file/you/want/to/read') as file_1, \
         open('/path/to/some/file/being/written', 'w') as file_2:
        file_2.write(file_1.read())
    
    # eg 2
    from mypackage import example1, \
        example2, example3

    If a line breaking needs to be placed around binary operators such as +, -, * and / etc., it should happen before the operator. Following the tradition from mathematics, it usually results in more readable code:

    income = (gross_wages
              + taxable_interest
              + (dividends - qualified_dividends)
              - ira_deduction
              - student_loan_interest)

    Blank Lines

    Blank lines between code lines can greatly improve the readability of your code. If the code is bunched up all together, it is difficult to read. However, if there are too many blank lines, you may need to scroll more than necessary to understand the written code. PEP-8 suggests four key points on how to use vertical whitespace:

    • Surround top-level function and class definitions with two blank lines.
    • Surround method definitions inside a class are surrounded by a single blank line.
    • Use blank lines in functions, sparingly, to indicate logical sections.
    • Extra blank lines may be used (sparingly) to separate groups of related functions.
    import math 
    
    def top_level_function():
        return None
    
    class MyFirstClass:
        pass
    
    class Vector2D:
    
        def __init__(self, x=0.0, y=0.0):
            self.x = x
            self.y = y
    
        def get_norm_coordinates(self):
            xx = pow(self.x, 2)
            yy = pow(self.y, 2)
            magnitude = math.sqrt(xx + yy)
    
            self.x = self.x/magnitude
            self.y = self.y/magnitude
    
            return (self.x, self.y)

    Closing Brace

    As I mentioned, line continuations allow breaking lines inside parenthesis, brackets, or braces to keep the lines under 79 characters. But it's easy to forget to close them and there are two ways to make them visible:

    # Line up the closing brace with the first 
    # non-whitespace character of the previous line:
    list_of_fruits = [
        'apple', 'kiwi', 'orange',
        'strawberry', 'raspberry', 'blueberry',
        ]
    
    # Line up the closing brace with the first 
    # character of the line that starts the construct:
    list_of_fruits = [
        'apple', 'kiwi', 'orange',
        'strawberry', 'raspberry', 'blueberry',
    ]
    

    Choose one, and remain using it.

    Whitespace in Expressions and Statements

    Adding extra whitespace can make code harder to read. Avoid extraneous whitespace in the following situations:

    • Immediately inside parenthesis, brackets, or braces:

      # Not recommended
      spam( ham[ 1 ], { eggs: 2 } )
      
      # Recommended
      spam(ham[1], {eggs: 2})
    • Between a trailing comma and a following close parenthesis or before a comma, semicolon, or colon:

      # Not recommended
      bar = (0, )
      print(x , y)
      
      # Recommended
      foo = (0,)
      print(x, y)
    • Before the open parenthesis that starts the argument list of a function call or an indexing or slicing:

      def multiply_by_two(x):
          return x * 2
      
      # Not recommended
      multiply_by_two (3)
      dct ['key'] = lst [index]
      
      # Recommended
      multiply_by_two(3)
      dct['key'] = lst[index]
    • More than one space around an assignment (or other) operator to align it with another.

      # Not recommended
      var1          = 1
      var2          = 2
      some_long_var = 3
      
      # Recommended
      var1 = 1
      var2 = 2
      some_long_var = 3


    Avoid adding whitespace at the end of a line. This is known as trailing whitespace. They are invisible and can produce errors that are difficult to trace.

    In some other situations, it is recommended to add spaces to simplify the reading:

    • Always surround these binary operators with a single space on either side:

      # Not Recommended
      var1=1
      var2=2
      if (var1<2 and var2>2):
          # do something
      
      # Recommended
      var1 = 1
      var2 = 2
      if (var1 < 2 and var2 > 2):
          # do something
    • If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies):

      # Not Recommended
      var1=var1+1
      var2 = var1 * 2 - 1
      hypot2 = x * x + y * y
      c = (a + b) * (a - b)
      
      # Recommended
      var1 = var1 + 1
      var2 = var1*2 - 1
      hypot2 = x*x + y*y
      c = (a+b) * (a-b)

    Comments

    You should always keep the comments up-to-date when the code changes. As PEP-8 says "Comments that contradict the code are worse than no comments".

    When adding comments to your code, you should remember the following points:

    • Limit the line length of comments and docstrings to 72 characters.
    • Comments should be complete sentences.
    • Try to write your comments in English "unless you are 100% sure that the code will never be read by people who don't speak your language".
    • Make sure to update comments if you change your code.

    Block Comments

    Use this type of comments to document a small section of code. Each line of a block comment starts with a # and a single space. To separate paragraphs use a line containing a single #.

    def get_magnitud(self):
        # Calculate the magnitude of a vector
        #
        # This second paragraph is needed

    Block comments are often the common way to comment code. But make sure to update them if you make changes to your code.

    Inline Comments

    Inline comments explain a single line of code. PEP-8 says the following:

    • Use inline comments sparingly
    • Start inline comments with a # and a single space on the same line they refer to
    • Separated inline comments by two or more spaces from the statement
    • Don’t use them to explain the obvious, e.g.
      x = x + 1   # Increment x

    Instead of using an inline comment, think about changing the variable name, e.g.:

    # Not recommended
    x = 'Mauro Riva'  # Staff member
    
    # Recommended
    staff_member = 'Mauro Riva'

    Documentation Strings

    The PEP-257 covers docstrings. This type of comments are strings enclosed in double (""") or single (''') quotation marks that appear on the first line of any function, class, method, or module.

    The important rules are the following:

    • Surround docstrings with three double quotes on either side
    • Write them for all public modules, functions, classes, and methods
    • Put the """ that ends a multiline docstring on a line by itself
    • For one-line docstrings, keep the """ on the same line
    def get_magnitud(self):
        """Calculate the magnitude of a vector
    
        This second paragraph is needed
        """
    
    def get_angle(self):
        """ This is a one-line comment """

    Programming Recommendations

    PEP-8 has also some programming recommendations. I listed some here, check the link for more.

    • Do not compare boolean values to True or False using the equivalence operator:

      fever = get_temperature() >= 38
      
      # Not recommended
      if fever == True:
          print('You are ill!')
      
      # Recommended
      if fever:
          print('You are ill!')
    • Use the fact that empty sequences are falsy in if statements:

      my_list = []
      
      # Not recommended
      if not len(my_list):
          print('List is empty!')
      
      # Recommended
      if not my_list:
          print('List is empty!')
    • Do not use if x: when you mean if x is not None:

      # Not recommended
      def foo(x):
          if x >= 0:
              return math.sqrt(x)
          else:
              return None
      
      def bar(x):
          if x < 0:
              return None
          return math.sqrt(x)
      
      # Recommended
      def foo(x):
          if x >= 0:
              return math.sqrt(x)
      
      def bar(x):
          if x < 0:
              return
          return math.sqrt(x)
    • Use is not rather than not ... is in if statements:

      # Not recommended
      if not foo is None:
          return 'foo exists!'
      
      # Recommended
      if foo is not None:
          return 'foo exists!
    • When catching exceptions, mention specific exceptions whenever possible instead of using a bare except: clause:

      try:
          import platform_specific_module
      except ImportError:
          platform_specific_module = None
    • Use .startswith() and .endswith() instead of string slicing to check for prefixes or suffixes:

      # Not recommended
      if foo[:3] == 'bar':
      
      # Recommended
      if foo.startswith('bar'): 

    PEP-8 Tools

    There are many tools that help you to make sure that your code is PEP-8 compliant while you are writing it. For old projects, there also tools that help you to update the projects to make them PEP-8 compliant.

    Linters

    A code linter is a program that analyses your code for potential errors. They can also provide suggestions in order to fix the errors. Linter are usually extensions to your editor, and they flag errors and styling problems while you are writing your code.

    Visual Studio Code

    For VSCode you can find a list of linters on the following link. By default, linting for Python is enabled in VSCode using Pylint, but you can enable other linters of your choice. Follow the link above to install those extensions.

    Auto formatters

    Auto formatters refactor your old code to conform with PEP-8 automatically. The most known package is black.

    Black works with Python 3.6+ and you can install it using:

    pip install black

    You can run the following command via the command line:

    $ black code.py
    reformatted code.py
    All done!

    Two other auto formatters, autopep8 and yapf, perform actions that are similar to what black does.

    Jupyter

    autopep8 can be used in Jupyter to reformat/prettify code in a notebook's code cell. You need to install the corresponding package:

    pip install autopep8 [--user]

    Then, you can activate it, if you have the jupyter-contrib-nbextensions installed (see Fig. 2 -Column 4-). To install the notebook extensions:

    # pip 
    pip install jupyter_contrib_nbextensions
    
    # conda
    conda install -c conda-forge jupyter_contrib_nbextensions

    If you want to extend your Jupyter Notebooks with extensions, and you are running this using Docker, please read the following article.

    Jupyter Nbextensions
    Fig. 2: Jupyter Nbextensions

    Conclusions

    This tutorial presented some important key points of PEP-8 to help you to write high-quality Python. However, I recommend you to read the full documentation. It is long, but it can help you a lot.

    To start with PEP-8 you should install the recommended tools. If you are using VSCode for Python programming, check this links to activate a code linter. If you are using Jupyter, activate the autopep8 extension, and finally, if you have old code, you can use black to make it PEP-8 compatible.

    There are also some PEP-8 cheat sheets that can be also used. I found a good one here. It summarizes the most important points of the PEP-8 guideline.

    If you want to learn more about PEP-8, then you can read the full documentation. Visit also pep8.org. The website contains same information as the documentation, but it has been nicely formatted.

    For some examples of good Python style, see these slides from a Python user group.


    Comments

    Empty