Learn Python Series (#4) - Round-Up #1

What Will I Learn?

You will learn how to combine essential Python language mechanisms, and the built-in string methods, to program your own, self-defined, real-life and useful functions,
In the code examples I'll be only using what I've covered in the previous Learn Python Series episodes.

Requirements

A working modern computer running macOS, Windows or Ubuntu
An installed Python 3(.6) distribution, such as (for example) the Anaconda Distribution
The ambition to learn Python programming

Difficulty

Intermediate

Tutorial Contents

A full description of the topics of this video tutorial, plus the contents of the tutorial itself.

Curriculum (of the `Learn Python Series`):

Learn Python Series (#4) - Round-Up #1

This is the first Round-up episode within the Learn Python Series, in which I will show you how to build interesting things using just the mechanisms that were covered already in the previous Learn Python Series episodes.

Of course, as the series progress, with each tutorial episode more tools are added to our tool belt, so to keep things organized I'll try to use mostly what was covered in the last few episodes.

Getting creative with strings

Programming is a creative task. Depending on the complexity of what you want to build, you first need to have a fairly clear idea of how to achieve your goal - a working program - and while you're coding you oftentimes run into problems (or "puzzles") that need to be solved. In order to become a proficient programmer, in Python and in any programming language, it's very important that you enjoy trying to solve those "puzzles". The more "tools" you have on your "tool belt", the complexer the puzzles you're able to solve. To get better at programming, I think it's also important to keep pushing your limits: get out of your comfort zone and expand your horizons!

Up until now, in the previously released Handling Strings tutorials, we've been discussing the usage of individual string methods. But of course we can combine their individual strengths to create self-defined functions that do exactly what we want them to do! That's actually the beauty of the Python programming language: we can pick (and import!) individual "tools", just the tools we need to use per project / script, use them as "building blocks", and then create even better tools or more advanced "building blocks" for our own purposes.

Disclaimer: The following two "mini-projects" cover how to program self-defined, somewhat useful, string handling functions. I'm not stating these are the best, let alone only, ways to program them. The goal is to show, to the reader / aspiring Python programmer, that only understanding what was covered already is enough to program interesting and useful code with!

Mini project `parse_url()`

In case you want to program a web crawler, to fetch unstructured data from web pages if an API providing structured JSON data is missing, or in case you want to build an run a full-fledged search engine, you need to handle URLs. URLs come in many forms, but still have components that are characteristic to any URL. In order to properly use URLs, you need to "parse" them and "split" them into their components.

Let's see how to develop a parse_url() function that splits several URL components and returns them as a tuple. We're looking to return these components:

protocol or scheme (e.g. https://),
host (which could be an IP address or something like www.google.com),
the domain name (e.g. steemit.com),
the Top Level Domain TLD (e.g. .com),
the subdomain (e.g. staging in https://staging.utopian.io),
and the file path (e.g. index.php?page=321)

PS: The explanations are put inside the code as # comments!

def parse_url(url):
    
    # First we initiate the variables we want 
    # the function to return, set all to an empty string.
    
    scheme = host = subdomain = tld = domain = path = ''    
    
    # ---------------------------------------------------
    # -1- Identify and, if applicable, isolate the scheme
    # ---------------------------------------------------
    
    needle = '://'
    if needle in url:
        
        scheme_index = url.find(needle)
        scheme = url[:scheme_index + len(needle)]        
        
        # Slice the scheme from the url
        
        url = url[len(scheme):]
    
    # ---------------------------------------------------
    # -2- Identify and, if applicable, isolate 
    #     the file path from the host
    # ---------------------------------------------------
    
    needle = '/'
    if needle in url:
        
        # Split the host from the file path.
        
        host, path = url.split(sep=needle, maxsplit=1)
        
    else:
        
        # The remaining url is the host
        
        host = url
        
    # ---------------------------------------------------
    # -3- Check if the host is an IP address or if it
    #     contains a domain
    # ---------------------------------------------------

    # Remove the dots from the host
    
    needle = '.'
    no_dots = host.replace(needle, '')
    if no_dots.isdigit() == False:
        
        # The host contains a domain, so continue
    
        # ---------------------------------------------------
        # -4- Identify and isolate the tld
        # ---------------------------------------------------    

        num_dots = host.count(needle)

        # --- NB1: ---
        # When num_dots == 0 , the string wasn't a url! ;-)
        # But let's just assume for now the string is a valid url.    

        if num_dots == 1:

            # The host does not contain a subdomain

            domain = host
            tld = host[host.find(needle)+1:]

        elif num_dots > 1:

            # The host might contain a subdomain

            # --- NB2: ---
            # In order to distinguish between a host containing
            # one or more subdomains, and a host containing a 3rd
            # or higher level tld, or both, we need a list 
            # that contains all tlds.
            #
            # That list seems to be here ...
            #
            # https://publicsuffix.org/list/public_suffix_list.dat
            #
            # ... but we haven't covered yet how to fetch 
            # data from the web.
            #
            # So for now, let's just create a list containing
            # some 3rd level tlds, and just assume it is complete.

            all_3rdlevel_tlds = ['co.uk', 'gov.au', 'com.ar']

            for each_tld in all_3rdlevel_tlds:
                if each_tld in host:

                    # Apparently the tld in the url is a 3rd level tld

                    tld = each_tld                
                    break

            # ---------------------------------------------------
            # PS: Notice that this `else` belongs to the `for`
            #     and not the `if` ! It only runs when the `for`
            #     exhausted but did not break.
            # ---------------------------------------------------

            else:            
                tld = host[host.rfind(needle)+1:]            

            # ---------------------------------------------------
            # -5- Identify and, if applicable, isolate 
            #     the subdomain from the domain
            # ---------------------------------------------------  

            host_without_tld = host[:host.find(tld)-1]        
            num_dots = host_without_tld.count(needle)

            if num_dots == 0:

                # The host doesn't contain a subdomain

                domain = host_without_tld + needle + tld

            else:

                # The host contains a subdomain

                subdomain_index = host_without_tld.rfind('.')
                subdomain = host_without_tld[:subdomain_index]
                domain = host[subdomain_index+1:]        

    return scheme, host, subdomain, domain, tld, path

# Let's test the function on several test urls!

test_urls = [
    'https://www.steemit.com/@scipio/recent-replies',
    'https://steemit.com/@scipio/recent-replies',
    'http://www.londonlibrary.co.uk/index.html',
    'http://londonlibrary.co.uk/index.html',
    'https://subdomains.on.google.com/',
    'https://81.123.45.2/index.php'
]

# And finally call the parse_url() function,
# and print its returned output!

for url in test_urls:
    print(parse_url(url))

# YES! It works like a charm! ;-)
# ---------
# Output:
# ---------
# ('https://', 'www.steemit.com', 'www', 'steemit.com', 'com', '@scipio/recent-replies')
# ('https://', 'steemit.com', '', 'steemit.com', 'com', '@scipio/recent-replies')
# ('http://', 'www.londonlibrary.co.uk', 'www', 'londonlibrary.co.uk', 'co.uk', 'index.html')
# ('http://', 'londonlibrary.co.uk', '', 'londonlibrary.co.uk', 'co.uk', 'index.html')
# ('https://', 'subdomains.on.google.com', 'subdomains.on', 'google.com', 'com', '')
# ('https://', '81.123.45.2', '', '', '', 'index.php')

('https://', 'www.steemit.com', 'www', 'steemit.com', 'com', '@scipio/recent-replies')
('https://', 'steemit.com', '', 'steemit.com', 'com', '@scipio/recent-replies')
('http://', 'www.londonlibrary.co.uk', 'www', 'londonlibrary.co.uk', 'co.uk', 'index.html')
('http://', 'londonlibrary.co.uk', '', 'londonlibrary.co.uk', 'co.uk', 'index.html')
('https://', 'subdomains.on.google.com', 'subdomains.on', 'google.com', 'com', '')
('https://', '81.123.45.2', '', '', '', 'index.php')

Mini project `encode_gibberish()` and `decode_gibberish()`

Remember my hidden message that was contained in the "Gibberish string", covered in Handling Strings Part 1? For a brief reminder, we used a -3 negative stride on a reversed string that contained the hidden message hidden within a bunch of nonsense.

This was the code:

gibberish_msg = """!3*oJ6iFupOGiF6cNFSHU 6dmVhoKUrTvfHi 
                    KteBrgHvaIgsX$snTeIgmV0 HvnYGembdJRd*&i$6h&6 &5a*h BGsF@iGv NhsIgiYdh67T"""
print(gibberish_msg[::-3])
# This is a hidden message        from Scipio!

This is a hidden message        from Scipio!

Now as the second "mini-project" for this Round-Up, let's learn how to program a function encode_gibberish() to encode a gibberish string from a message, and another one decode_gibberish() to reveal / decode the hidden message contained inside the gibberish!

PS: The explanations are put inside the code as # comments!

def encode_gibberish(message, stride=1):
    
    # Let's use a mixed-up `chars` list containing lower-case letters,
    # upper-case letters, integers 0-9, and some other characters,
    # all found on a regular keyboard.
        
    chars = ['x', '-', 'G', 'H', 'l', 'a', '{', 'r', 2, ']', 
         ';', 'F', 'E', 'A', 'V', ')', '$', '?', '/', 
         'i', 'M', 'p', 9, 'C', 'w', 'k', '}', ':', 
         '_', '%', 'D', 'I', 'b', 'z', 'd', 6, 'N', 
         'L', 'c', '.', 1, 'X', 'h', 4, '!', 'S', '~', 
         'u', '+', 'f', 'R', 8, 3, '&', '<', 'y', 'Z', 
         'P', 'n', '^', 'J', 'q', 5, 'o', 'W', '*', 'Q', 
         7, 'B', 'g', 'O', 'K', 'm', ',', 's', '>', 
         'T', '(', '#', 't', 'j', 'e', 
         'Y', '@', '[', 'v', '=', 'U'
    ]
    
    # Initialize an iterator for the `chars` list
    
    chars_index = 0

    # Convert the message string to a list
    
    message = list(message)
    
    # Quick fix for negative strides:
    # if stride is negative, use the 
    # absolute (positive) value
    
    abs_stride = stride * -1 if stride < 0 else stride
    
    # For all characters from the `message` list,
    # add characters from the `chars` list    
    
    for index in range(len(message)):
        
        # Iterate over the `chars` list, and per
        # `message` character concatenate as many
        # characters as the `stride` argument
        
        salt = ''        
        for i in range(abs_stride):
            salt += str(chars[chars_index])
            if chars_index == len(chars)-1:
                chars_index = 0
            else:
                chars_index += 1
        message[index] = message[index] + salt
    
    # Convert back to string
    message = ''.join(message)
 
    # In case of a negative stride, 
    # reverse the message    
    if stride < 0:
        message = message[::-1] 
    
    return message

def decode_gibberish(encoded_msg, stride=1):
    
    # Simply decode the encoded message using
    # the `stride` argument
    
    stride = stride + 1 if stride > 0 else stride -1
    return encoded_msg[::stride]

# Let's see if this works!
stride = -5
msg1 = "This is a very secret message that must be encoded at all cost. Because it's secret!"

# Encode, and decode
encoded_msg = encode_gibberish(msg1, stride)
decoded_msg = decode_gibberish(encoded_msg, stride)

# Print the encoded and decoded message strings
print(encoded_msg)
print(decoded_msg)

7Q*Wo!5qJ^ntPZy<&e38Rf+ru~S!4chX1.ceLN6dzsbID%_ :}kwCs9pMi/'?$)VAtEF;]2ir{alH G-xU=ev[@Yesjt#(Tu>s,mKaOgB7Qc*Wo5qeJ^nPZBy<&38 Rf+u~.S!4hXt1.cLNs6dzbIoD%_:}ckwC9p Mi/?$l)VAEFl;]2r{aalHG- xU=v[t@Yejta#(T>s ,mKOgdB7Q*Weo5qJ^dnPZy<o&38Rfc+u~S!n4hX1.ecLN6d zbID%e_:}kwbC9pMi /?$)VtAEF;]s2r{aluHG-xUm=v[@Y ejt#(tT>s,maKOgB7hQ*Wo5tqJ^nP Zy<&3e8Rf+ug~S!4haX1.cLsN6dzbsID%_:e}kwC9mpMi/? $)VAEtF;]2re{alHGr-xU=vc[@Yejet#(T>ss,mKO gB7Q*yWo5qJr^nPZye<&38Rvf+u~S !4hX1a.cLN6 dzbIDs%_:}kiwC9pM i/?$)sVAEF;i]2r{ahlHG-xT
This is a very secret message that must be encoded at all cost. Because it's secret!

What did we learn, hopefully?

That, although we have yet still only covered just a few Python languages mechanisms and haven't even used an import statement, which we will cover in the next Learn Python Series episode, we already have "the power" to program useful functions! We only needed 4 tutorial episodes for this, so let's find out just how much more we can learn in the next episodes! See you there!

Thank you for your time!

Posted on Utopian.io - Rewarding Open Source Contributors

Learn Python Series (#4) - Round-Up #1

Learn Python Series (#4) - Round-Up #1

What Will I Learn?

Requirements

Difficulty

Tutorial Contents

Curriculum (of the Learn Python Series):

Learn Python Series (#4) - Round-Up #1

Getting creative with strings

Mini project parse_url()

Mini project encode_gibberish() and decode_gibberish()

What did we learn, hopefully?

Thank you for your time!

Curriculum (of the `Learn Python Series`):

Mini project `parse_url()`

Mini project `encode_gibberish()` and `decode_gibberish()`