Learn Python Series (#4) - Round-Up #1
What Will I Learn?
- You will learn how to combine essential Python language mechanisms, and the built-in string methods, to program your own, self-defined, real-life and useful functions,
- In the code examples I'll be only using what I've covered in the previous
Learn Python Series
episodes.
Requirements
- A working modern computer running macOS, Windows or Ubuntu
- An installed Python 3(.6) distribution, such as (for example) the Anaconda Distribution
- The ambition to learn Python programming
Difficulty
Intermediate
Tutorial Contents
A full description of the topics of this video tutorial, plus the contents of the tutorial itself.
Curriculum (of the Learn Python Series
):
- Learn Python Series - Intro
- Learn Python Series (#2) - Handling Strings Part 1
- Learn Python Series (#3) - Handling Strings Part 2
Learn Python Series (#4) - Round-Up #1
This is the first Round-up episode within the Learn Python Series
, in which I will show you how to build interesting things using just the mechanisms that were covered already in the previous Learn Python Series
episodes.
Of course, as the series progress, with each tutorial episode more tools are added to our tool belt, so to keep things organized I'll try to use mostly what was covered in the last few episodes.
Getting creative with strings
Programming is a creative task. Depending on the complexity of what you want to build, you first need to have a fairly clear idea of how to achieve your goal - a working program - and while you're coding you oftentimes run into problems (or "puzzles") that need to be solved. In order to become a proficient programmer, in Python and in any programming language, it's very important that you enjoy trying to solve those "puzzles". The more "tools" you have on your "tool belt", the complexer the puzzles you're able to solve. To get better at programming, I think it's also important to keep pushing your limits: get out of your comfort zone and expand your horizons!
Up until now, in the previously released Handling Strings
tutorials, we've been discussing the usage of individual string methods. But of course we can combine their individual strengths to create self-defined functions that do exactly what we want them to do! That's actually the beauty of the Python programming language: we can pick (and import!) individual "tools", just the tools we need to use per project / script, use them as "building blocks", and then create even better tools or more advanced "building blocks" for our own purposes.
Disclaimer: The following two "mini-projects" cover how to program self-defined, somewhat useful, string handling functions. I'm not stating these are the best, let alone only, ways to program them. The goal is to show, to the reader / aspiring Python programmer, that only understanding what was covered already is enough to program interesting and useful code with!
Mini project parse_url()
In case you want to program a web crawler, to fetch unstructured data from web pages if an API providing structured JSON data is missing, or in case you want to build an run a full-fledged search engine, you need to handle URLs. URLs come in many forms, but still have components that are characteristic to any URL. In order to properly use URLs, you need to "parse" them and "split" them into their components.
Let's see how to develop a parse_url()
function that splits several URL components and returns them as a tuple. We're looking to return these components:
- protocol or scheme (e.g.
https://
), - host (which could be an IP address or something like
www.google.com
), - the domain name (e.g.
steemit.com
), - the Top Level Domain TLD (e.g.
.com
), - the subdomain (e.g.
staging
inhttps://staging.utopian.io
), - and the file path (e.g.
index.php?page=321
)
PS: The explanations are put inside the code as # comments
!
def parse_url(url):
# First we initiate the variables we want
# the function to return, set all to an empty string.
scheme = host = subdomain = tld = domain = path = ''
# ---------------------------------------------------
# -1- Identify and, if applicable, isolate the scheme
# ---------------------------------------------------
needle = '://'
if needle in url:
scheme_index = url.find(needle)
scheme = url[:scheme_index + len(needle)]
# Slice the scheme from the url
url = url[len(scheme):]
# ---------------------------------------------------
# -2- Identify and, if applicable, isolate
# the file path from the host
# ---------------------------------------------------
needle = '/'
if needle in url:
# Split the host from the file path.
host, path = url.split(sep=needle, maxsplit=1)
else:
# The remaining url is the host
host = url
# ---------------------------------------------------
# -3- Check if the host is an IP address or if it
# contains a domain
# ---------------------------------------------------
# Remove the dots from the host
needle = '.'
no_dots = host.replace(needle, '')
if no_dots.isdigit() == False:
# The host contains a domain, so continue
# ---------------------------------------------------
# -4- Identify and isolate the tld
# ---------------------------------------------------
num_dots = host.count(needle)
# --- NB1: ---
# When num_dots == 0 , the string wasn't a url! ;-)
# But let's just assume for now the string is a valid url.
if num_dots == 1:
# The host does not contain a subdomain
domain = host
tld = host[host.find(needle)+1:]
elif num_dots > 1:
# The host might contain a subdomain
# --- NB2: ---
# In order to distinguish between a host containing
# one or more subdomains, and a host containing a 3rd
# or higher level tld, or both, we need a list
# that contains all tlds.
#
# That list seems to be here ...
#
# https://publicsuffix.org/list/public_suffix_list.dat
#
# ... but we haven't covered yet how to fetch
# data from the web.
#
# So for now, let's just create a list containing
# some 3rd level tlds, and just assume it is complete.
all_3rdlevel_tlds = ['co.uk', 'gov.au', 'com.ar']
for each_tld in all_3rdlevel_tlds:
if each_tld in host:
# Apparently the tld in the url is a 3rd level tld
tld = each_tld
break
# ---------------------------------------------------
# PS: Notice that this `else` belongs to the `for`
# and not the `if` ! It only runs when the `for`
# exhausted but did not break.
# ---------------------------------------------------
else:
tld = host[host.rfind(needle)+1:]
# ---------------------------------------------------
# -5- Identify and, if applicable, isolate
# the subdomain from the domain
# ---------------------------------------------------
host_without_tld = host[:host.find(tld)-1]
num_dots = host_without_tld.count(needle)
if num_dots == 0:
# The host doesn't contain a subdomain
domain = host_without_tld + needle + tld
else:
# The host contains a subdomain
subdomain_index = host_without_tld.rfind('.')
subdomain = host_without_tld[:subdomain_index]
domain = host[subdomain_index+1:]
return scheme, host, subdomain, domain, tld, path
# Let's test the function on several test urls!
test_urls = [
'https://www.steemit.com/@scipio/recent-replies',
'https://steemit.com/@scipio/recent-replies',
'http://www.londonlibrary.co.uk/index.html',
'http://londonlibrary.co.uk/index.html',
'https://subdomains.on.google.com/',
'https://81.123.45.2/index.php'
]
# And finally call the parse_url() function,
# and print its returned output!
for url in test_urls:
print(parse_url(url))
# YES! It works like a charm! ;-)
# ---------
# Output:
# ---------
# ('https://', 'www.steemit.com', 'www', 'steemit.com', 'com', '@scipio/recent-replies')
# ('https://', 'steemit.com', '', 'steemit.com', 'com', '@scipio/recent-replies')
# ('http://', 'www.londonlibrary.co.uk', 'www', 'londonlibrary.co.uk', 'co.uk', 'index.html')
# ('http://', 'londonlibrary.co.uk', '', 'londonlibrary.co.uk', 'co.uk', 'index.html')
# ('https://', 'subdomains.on.google.com', 'subdomains.on', 'google.com', 'com', '')
# ('https://', '81.123.45.2', '', '', '', 'index.php')
('https://', 'www.steemit.com', 'www', 'steemit.com', 'com', '@scipio/recent-replies')
('https://', 'steemit.com', '', 'steemit.com', 'com', '@scipio/recent-replies')
('http://', 'www.londonlibrary.co.uk', 'www', 'londonlibrary.co.uk', 'co.uk', 'index.html')
('http://', 'londonlibrary.co.uk', '', 'londonlibrary.co.uk', 'co.uk', 'index.html')
('https://', 'subdomains.on.google.com', 'subdomains.on', 'google.com', 'com', '')
('https://', '81.123.45.2', '', '', '', 'index.php')
Mini project encode_gibberish()
and decode_gibberish()
Remember my hidden message that was contained in the "Gibberish string", covered in Handling Strings Part 1? For a brief reminder, we used a -3
negative stride on a reversed string that contained the hidden message hidden within a bunch of nonsense.
This was the code:
gibberish_msg = """!3*oJ6iFupOGiF6cNFSHU 6dmVhoKUrTvfHi
KteBrgHvaIgsX$snTeIgmV0 HvnYGembdJRd*&i$6h&6 &5a*h BGsF@iGv NhsIgiYdh67T"""
print(gibberish_msg[::-3])
# This is a hidden message from Scipio!
This is a hidden message from Scipio!
Now as the second "mini-project" for this Round-Up, let's learn how to program a function encode_gibberish()
to encode a gibberish string from a message, and another one decode_gibberish()
to reveal / decode the hidden message contained inside the gibberish!
PS: The explanations are put inside the code as # comments
!
def encode_gibberish(message, stride=1):
# Let's use a mixed-up `chars` list containing lower-case letters,
# upper-case letters, integers 0-9, and some other characters,
# all found on a regular keyboard.
chars = ['x', '-', 'G', 'H', 'l', 'a', '{', 'r', 2, ']',
';', 'F', 'E', 'A', 'V', ')', '$', '?', '/',
'i', 'M', 'p', 9, 'C', 'w', 'k', '}', ':',
'_', '%', 'D', 'I', 'b', 'z', 'd', 6, 'N',
'L', 'c', '.', 1, 'X', 'h', 4, '!', 'S', '~',
'u', '+', 'f', 'R', 8, 3, '&', '<', 'y', 'Z',
'P', 'n', '^', 'J', 'q', 5, 'o', 'W', '*', 'Q',
7, 'B', 'g', 'O', 'K', 'm', ',', 's', '>',
'T', '(', '#', 't', 'j', 'e',
'Y', '@', '[', 'v', '=', 'U'
]
# Initialize an iterator for the `chars` list
chars_index = 0
# Convert the message string to a list
message = list(message)
# Quick fix for negative strides:
# if stride is negative, use the
# absolute (positive) value
abs_stride = stride * -1 if stride < 0 else stride
# For all characters from the `message` list,
# add characters from the `chars` list
for index in range(len(message)):
# Iterate over the `chars` list, and per
# `message` character concatenate as many
# characters as the `stride` argument
salt = ''
for i in range(abs_stride):
salt += str(chars[chars_index])
if chars_index == len(chars)-1:
chars_index = 0
else:
chars_index += 1
message[index] = message[index] + salt
# Convert back to string
message = ''.join(message)
# In case of a negative stride,
# reverse the message
if stride < 0:
message = message[::-1]
return message
def decode_gibberish(encoded_msg, stride=1):
# Simply decode the encoded message using
# the `stride` argument
stride = stride + 1 if stride > 0 else stride -1
return encoded_msg[::stride]
# Let's see if this works!
stride = -5
msg1 = "This is a very secret message that must be encoded at all cost. Because it's secret!"
# Encode, and decode
encoded_msg = encode_gibberish(msg1, stride)
decoded_msg = decode_gibberish(encoded_msg, stride)
# Print the encoded and decoded message strings
print(encoded_msg)
print(decoded_msg)
7Q*Wo!5qJ^ntPZy<&e38Rf+ru~S!4chX1.ceLN6dzsbID%_ :}kwCs9pMi/'?$)VAtEF;]2ir{alH G-xU=ev[@Yesjt#(Tu>s,mKaOgB7Qc*Wo5qeJ^nPZBy<&38 Rf+u~.S!4hXt1.cLNs6dzbIoD%_:}ckwC9p Mi/?$l)VAEFl;]2r{aalHG- xU=v[t@Yejta#(T>s ,mKOgdB7Q*Weo5qJ^dnPZy<o&38Rfc+u~S!n4hX1.ecLN6d zbID%e_:}kwbC9pMi /?$)VtAEF;]s2r{aluHG-xUm=v[@Y ejt#(tT>s,maKOgB7hQ*Wo5tqJ^nP Zy<&3e8Rf+ug~S!4haX1.cLsN6dzbsID%_:e}kwC9mpMi/? $)VAEtF;]2re{alHGr-xU=vc[@Yejet#(T>ss,mKO gB7Q*yWo5qJr^nPZye<&38Rvf+u~S !4hX1a.cLN6 dzbIDs%_:}kiwC9pM i/?$)sVAEF;i]2r{ahlHG-xT
This is a very secret message that must be encoded at all cost. Because it's secret!
What did we learn, hopefully?
That, although we have yet still only covered just a few Python languages mechanisms and haven't even used an import
statement, which we will cover in the next Learn Python Series
episode, we already have "the power" to program useful functions! We only needed 4 tutorial episodes for this, so let's find out just how much more we can learn in the next episodes! See you there!
Thank you for your time!
Posted on Utopian.io - Rewarding Open Source Contributors