Removing Single Line Comments: Python for Beginners

How to Remove Single Line Comments…

I was recently working on a college project (pyramid) of mine that was similar to markup and wanted to add comment support to it. As comments are very useful for documentation, and I find these a blessing, I wanted to ensure my project had this feature too.

So, I set out to write code for parsing these. I failed in quite a few approaches of mine, but then I succeeded and wanted to share the joy of having successfully coded a comment remover.

I wanted to implement comments starting with “#”. After failing multiple times, I set two basic rules for myself:

If the first character of the line is ‘#’, remove the whole line.

If the number of apostrophes or quotes before the “#” symbol is even, then remove everything from the “#” symbol to the end of the line.

Having set these two rules, I now had a direction for myself to move in. I was testing my code against the following text:

test.txt

#this is a comment
this is not a "#comment"
this is a # comment and #this follows in
"#this is not a comment" but #this is
"# not a comment"

Implementing the First Rule Is as Easy As:

with open("test.txt","r") as file:
	text = file.read()

lines = text.strip().split('\n')	# splitting lines 
comments = []	# to store commented lines for removal
for line in lines:
	if(line[0] == "#"):
		comments.append(line)

for line in comments:
	lines.remove(line)

Doing this removes all the lines that start with a “#”. Now, we head on to the second rule which was quite interesting to implement. Here is how it goes:

Maintain a list with indexes of apostrophes, quotes, and hash symbols for each line. And a separate list of indexes of comments for each line.

Count the number of apostrophes and quotes for each hash whose index is less than that of the hash itself. Two cases arise here:

If the count of apostrophes as well as quotes is even, then add the index of the hash symbol to the comment list; do not check for the remaining hash symbols.

If the count of apostrophes or the quotes is not even, then check for the next hash in the line. If there are no more hashes in the line, add 0 as an index to the comments list.

Now, we have the indexes of the start of comments in each line, and lines that do not have a comment have an index of 0 for them. Therefore, we’ll now move ahead to remove text starting at the index of the hash to the end of the line in order to remove the comment.
And would do nothing in case the index is 0, as the line doesn’t have any comment.

Here’s the Full Implementation of the Rule Appended to the Above Code:

lines = text.strip().split('\n')
literals_indexes = []
comments = []
for line in lines:
    if(line[0] == '#'):
        comments.append(line)
    else:
        index_apos = []
        index_quote = []
        index = []
        for (i,char) in enumerate(line):
            if(char == "'"):
                index_apos.append(i)
            if(char == '"'):
                index_quote.append(i)
            if(char == '#'):
                index.append(i)
        literals_indexes.append([index_apos,index_quote,index])

for comment in comments:
    lines.remove(comment)
comments = []
for indexes in literals_indexes:
    if(indexes[2] != []):
        for hashes in indexes[2]:
            count_apos = 0
            count_quotes = 0
            append_flag = False
            if(indexes[0] != []):
                for apos in indexes[0]:
                    if(apos < hashes):
                        count_apos += 1
                    else:
                        break
            if(indexes[1] != []):
                for quotes in indexes[1]:
                    if(quotes < hashes):
                        count_quotes += 1
                    else:
                        break
            if(((count_apos % 2) == 0) and ((count_quotes % 2) == 0)):
                append_flag = True
                comments.append(hashes)
                break
        if(not append_flag):
            comments.append(0)
    else:
        comments.append(0)

new_text = []
for (line,index) in zip(lines,comments):
    if(index != 0):
        line = line.replace(line[index:],"")
    new_text.append(line)
new_text = \n'.join(new_text)

Though not the best solution, this worked for me. I hope I was able to write a tidy article on my experience. I am a happy man after having implemented this tiny feature.

I am well aware of popular tools such as regex and wouldn’t wonder if someone came up with some regex expression to remove comments (it would be tough though).

Removing Single Line Comments: Python for Beginners

Too Long; Didn't Read

How to Remove Single Line Comments…

Implementing the First Rule Is as Easy As:

Here’s the Full Implementation of the Rule Appended to the Above Code:

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

Trending Topics

Classic

Neon Noir

Minty

Newspaper

HN StartUps

Removing Single Line Comments: Python for Beginners

Too Long; Didn't Read

How to Remove Single Line Comments…

Implementing the First Rule Is as Easy As:

Here’s the Full Implementation of the Rule Appended to the Above Code:

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES

Trending Topics

Classic

Neon Noir

Minty

Newspaper

HN StartUps