paint-brush
Removing Single Line Comments: Python for Beginnersby@h3avren
2,227 reads
2,227 reads

Removing Single Line Comments: Python for Beginners

by Ajay Singh RanaMarch 7th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

I was working on a college project which was similar to markup and wanted to add comment support to it. After failing multiple times I set two basic rules for myself: If the first character of the line is ‘#’ remove the whole line. If the number of apostrophes or quotes before the “#” symbol are even then remove everything from the ” symbol to the end of the lines.
featured image - Removing Single Line Comments: Python for Beginners
Ajay Singh Rana HackerNoon profile picture

How to Remove Single Line Comments…

I was recently working on a college project (pyramid) of mine that was similar to markup and wanted to add comment support to it. As comments are very useful for documentation, and I find these a blessing, I wanted to ensure my project had this feature too.


So, I set out to write code for parsing these. I failed in quite a few approaches of mine, but then I succeeded and wanted to share the joy of having successfully coded a comment remover.


I wanted to implement comments starting with “#”. After failing multiple times, I set two basic rules for myself:


  • If the first character of the line is ‘#’, remove the whole line.


  • If the number of apostrophes or quotes before the “#” symbol is even, then remove everything from the “#” symbol to the end of the line.


Having set these two rules, I now had a direction for myself to move in. I was testing my code against the following text:


test.txt


#this is a comment
this is not a "#comment"
this is a # comment and #this follows in
"#this is not a comment" but #this is
"# not a comment"

Implementing the First Rule Is as Easy As:

with open("test.txt","r") as file:
	text = file.read()

lines = text.strip().split('\n')	# splitting lines 
comments = []	# to store commented lines for removal
for line in lines:
	if(line[0] == "#"):
		comments.append(line)

for line in comments:
	lines.remove(line)


Doing this removes all the lines that start with a “#”. Now, we head on to the second rule which was quite interesting to implement. Here is how it goes:


  • Maintain a list with indexes of apostrophes, quotes, and hash symbols for each line. And a separate list of indexes of comments for each line.


  • Count the number of apostrophes and quotes for each hash whose index is less than that of the hash itself. Two cases arise here:


  • If the count of apostrophes as well as quotes is even, then add the index of the hash symbol to the comment list; do not check for the remaining hash symbols.


  • If the count of apostrophes or the quotes is not even, then check for the next hash in the line. If there are no more hashes in the line, add 0 as an index to the comments list.


  • Now, we have the indexes of the start of comments in each line, and lines that do not have a comment have an index of 0 for them. Therefore, we’ll now move ahead to remove text starting at the index of the hash to the end of the line in order to remove the comment.


  • And would do nothing in case the index is 0, as the line doesn’t have any comment.

    Here’s the Full Implementation of the Rule Appended to the Above Code:

lines = text.strip().split('\n')
literals_indexes = []
comments = []
for line in lines:
    if(line[0] == '#'):
        comments.append(line)
    else:
        index_apos = []
        index_quote = []
        index = []
        for (i,char) in enumerate(line):
            if(char == "'"):
                index_apos.append(i)
            if(char == '"'):
                index_quote.append(i)
            if(char == '#'):
                index.append(i)
        literals_indexes.append([index_apos,index_quote,index])

for comment in comments:
    lines.remove(comment)
comments = []
for indexes in literals_indexes:
    if(indexes[2] != []):
        for hashes in indexes[2]:
            count_apos = 0
            count_quotes = 0
            append_flag = False
            if(indexes[0] != []):
                for apos in indexes[0]:
                    if(apos < hashes):
                        count_apos += 1
                    else:
                        break
            if(indexes[1] != []):
                for quotes in indexes[1]:
                    if(quotes < hashes):
                        count_quotes += 1
                    else:
                        break
            if(((count_apos % 2) == 0) and ((count_quotes % 2) == 0)):
                append_flag = True
                comments.append(hashes)
                break
        if(not append_flag):
            comments.append(0)
    else:
        comments.append(0)

new_text = []
for (line,index) in zip(lines,comments):
    if(index != 0):
        line = line.replace(line[index:],"")
    new_text.append(line)
new_text = \n'.join(new_text)


Though not the best solution, this worked for me. I hope I was able to write a tidy article on my experience. I am a happy man after having implemented this tiny feature.


I am well aware of popular tools such as regex and wouldn’t wonder if someone came up with some regex expression to remove comments (it would be tough though).