paint-brush
Using AI to Analyze Open Source Projects Quicklyby@hackerclzpkllsk00003b7w9rqeeo59
291 reads

Using AI to Analyze Open Source Projects Quickly

by ls1911August 27th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

A.I. can be used to help make sense of complex open-source projects. I used OpenAI to analyze and describe an open source project called GenAIPot. The project is a honeypot simulation tool that emulates services such as SMTP and POP3, enhanced with AI-generated responses.
featured image - Using AI to Analyze Open Source Projects Quickly
ls1911 HackerNoon profile picture

In my quest to explore how A.I. can lend a hand in cybersecurity and development, I used Open AI to analyze and describe an open source project GenAIPot, a project that takes honeypots to the next level by using A.I. to generate its content.


If you’re curious about how artificial intelligence can help make sense of complex open-source projects – and maybe save you a few headaches along the way – you’re in for a treat. Below, I’ll share what I found and how A.I. helped me understand the nuts and bolts of GenAIPot.


I first asked it to describe the project visually, it wasn’t very helpful.


Graphical Analysis of Open Source Project



After some tweaking with the requests, OpenAI provided a proper representation of the project as i wanted it.


Instead of asking for a visual presentation, I requested it using specific characters which i wanted to illustrate the structure of the project, hoping it would better convey my intent.


So my prompt was:

show the files structure with arrows and ---->. |


And it worked !



After analyzing the project directory and files structure, Its time to look what OpenAI can tell me on the entry point of the code.

OpenAI analyzed it as follow:


The genaipot.py file is the main entry point for a cybersecurity project called GenAIPot. 
This project appears to be a honeypot simulation tool that emulates services such as SMTP and POP3, 
enhanced with AI-generated responses. below are key components of the project:

	1.	Imports:
	•	The file imports various modules, including smtp_protocol and pop3.pop3_protocol 
        for handling SMTP and POP3 protocols, respectively.
	•	It also includes ai_services to integrate AI-driven responses, possibly leveraging OpenAI’s services.
	•	Functions related to authentication (auth module) and database setup (database module) are also included.
	•	The project uses Twisted, a powerful networking engine, for handling asynchronous networking tasks.
	•	The Halo library is used for displaying a spinner in the terminal, likely for visual feedback during operations.
	•	Other standard imports include argparse for parsing command-line arguments, and logging for logging debug 
        and operational information.

	2.	Command-Line Interface:
	•	The tool supports various command-line arguments, allowing users to configure the honeypot, deploy it using Docker, 
        and start specific honeypot services (SMTP and POP3) or all services at once.
	•	A --debug option enables detailed logging, which is essential for debugging and monitoring purposes.

	3.	Logging and Configuration:
	•	Based on the provided arguments, the file configures the logging level. Debug mode provides detailed output 
        to help with troubleshooting.

	4.	Modular Design:
	•	The project is modular, importing and utilizing different modules for handling various tasks like AI responses, 
        authentication, and protocol handling. This makes the codebase more maintainable and scalable.


Okay, cool. Pretty quickly, I was able to get a brief overview of what’s happening inside the application and how it works.


Next, I wanted to explore what how it is using the A.I., So I look into a file called ai_services.py.


From the description above, I gathered that this file is responsible for managing the A.I. aspects of GenAIPot.


Here is what chatgpt had to say about it:


Purpose:

This module provides methods for interacting with OpenAI’s API, 
focusing on generating AI-driven responses that likely enhance the realism of the honeypot. 
It also includes functionality for managing configurations, logging, and saving/loading responses.

Key Components:

	1.	Configuration Management:
	•	  The configparser module is used to read settings from a configuration file (located in the etc/config.ini directory).
          This allows the project to be easily configurable based on different deployment environments or preferences.

	2.	AIService Class:
	      The AIService class handles the core interactions with OpenAI’s API.
	•	Attributes:
	  	  technology, domain, and segment: These fields are likely configured to adjust the AI’s responses based on 
          the specific context or scenario being emulated by the honeypot.
	•	anonymous_access: 
          This might control whether the honeypot simulates anonymous or authenticated interactions.
	•	debug_mode: 
          A flag to enable or disable verbose logging for debugging purposes.

	3.	Initialization:
	•	  The __init__ method initializes the AIService class, accepting an optional API key and debug mode setting. 
          This setup ensures that the class can operate flexibly depending on whether it is run in a production or 
          development environment.

	4.	Logging:
	•	  The logging configuration is set to ERROR by default to minimize noise, 
          but this can be overridden by enabling debug mode, which is useful during development or troubleshooting.

The AI service module’s role is crucial in generating dynamic and context-aware responses, 
which can make the honeypot appear more realistic and convincing to attackers. 


Okay, cool. The answer provided me with more technical details, but I wanted to dive deeper into the specifics of this file, so I asked for more information:



where is it generate the content dynamically in the code using a.i ?


OpenAI Dynamic Content Generation Explanation:


1. query_responses Method:

•	    This method is responsible for querying OpenAI to generate responses based on a given prompt.
•	    The method takes a prompt and a response_type as parameters. 
                  The prompt is what the AI will respond to, and  response_type indicates the type of response expected 
                  (e.g., “email”).
•	The code then makes an API call using openai.ChatCompletion.create, which sends the prompt to OpenAI’s service.


2. Extracting the Response:

•	The response from OpenAI is captured from the API response, specifically from response.choices\[0\]\['message'\]\['content'\]. 

•	The response is then cleaned up and returned as response_text.


3. Saving Responses:

•	After generating the response, the module provides functionality to save it to a file for future reference. 

    This can help optimize performance by avoiding repeated API calls for the same interaction.


Example Code Snippet:

def query_responses(self, prompt, response_type):
    """
    Query OpenAI for responses based on the provided prompt and response type.

    Args:
        prompt (str): The input prompt for the AI.
        response_type (str): The type of response expected (e.g., "email").

    Returns:
        str: The response text from OpenAI.
    """
    logger.debug(f"Querying OpenAI for {response_type} responses...")
    
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": prompt}]
    )
    
    response_text = response.choices[0]['message']['content'].strip()
    
    self._save_raw_response(response_text, response_type)
    
    return response_text



Summary:


•	The ai_services.py file dynamically generates content by querying OpenAI’s API.

•	The method query_responses is where the interaction with the AI happens, 

              producing contextually relevant responses based on the prompts provided by the honeypot’s protocols.

•	These responses are then saved and can be used to respond to attackers in a way that seems authentic and believable.


This dynamic content generation is what makes GenAIPot particularly powerful, as it allows the honeypot to adapt and generate realistic responses on the fly, increasing its effectiveness in deceiving attackers.



The response above shows that ChatGPT is capable of analyzing open-source projects from GitHub and quickly & relatively efficiently providing both technical and non-technical details.

GenAIPot doesn’t stop at the A.I capabilities, It also have integrated Machine Learning (ML) capabilities to perform predictions, anomaly detection and more.


So, let’s dive into the code and explore what it’s really doing; without looking directly at the code!


OpenAI Description of Analytics.py file:

The file in the GenAIPot project is responsible for analyzing and predicting patterns in the data captured by 
the honeypot, as well as detecting anomalies. Here’s a detailed breakdown of how it works:

Key Components and Functions:

	1.	Prediction with Prophet (perform_prediction):
	•	This function uses the Prophet library, a powerful tool developed by Facebook for time-series forecasting, to predict patterns in the data.
	•	Specifically, it predicts the length of commands over time, helping to identify trends or unusual patterns in the commands attackers issue to the honeypot.
	How it works:
	•	The function takes a DataFrame (df) as input, which contains timestamps and commands.
	•	It calculates the length of each command and then fits a Prophet model to this data.
	•	The model forecasts future command lengths, saving the results to a CSV file (future_forecast.csv), making it easy to analyze and visualize later.


Example Code:

def perform_prediction(df):
    df['y'] = df['command'].str.len()
    df['ds'] = pd.to_datetime(df['timestamp'])
    model = Prophet()
    model.fit(df)
    future = model.make_future_dataframe(periods=30, freq='S')
    forecast = model.predict(future)
    forecast.to_csv("future_forecast.csv", index=False)
    print("Prediction complete. Results saved to future_forecast.csv")


	2.	Anomaly Detection (detect_anomalies):
	•	This function identifies anomalies in two key areas: command lengths and IP address connection frequencies.
	•	By detecting unusual patterns, the function helps flag potentially malicious activity that deviates from normal behavior.
	How it works:
	•	The function creates separate Prophet models for command lengths and IP address connection frequencies.
	•	It uses these models to detect and flag any anomalies, saving the results to CSV files (command_anomalies.csv, ip_anomalies.csv), which can be used for further investigation.


Example Code:

def detect_anomalies(df):
    model_command = Prophet()
    model_command.fit(df_command)
    anomalies_command = model_command.predict(df_command)
    df_command.to_csv("command_anomalies.csv", index=False)
    
    model_ip = Prophet()
    model_ip.fit(df_ip)
    anomalies_ip = model_ip.predict(df_ip)
    df_ip.to_csv("ip_anomalies.csv", index=False)
    
    print("Anomaly detection complete. Results saved to command_anomalies.csv, command_forecast.csv, ip_anomalies.csv, and ip_forecast.csv")


Summary:

	•	Dynamic Content Generation: While the analytics.py file does not generate content in the traditional sense, 
        it dynamically analyzes data and predicts future behavior based on past interactions with the honeypot.
	•	Prediction: The perform_prediction function uses time-series forecasting to predict command patterns, 
        which can be used to anticipate attacker behavior.
	•	Anomaly Detection: The detect_anomalies function identifies irregularities in command patterns and 
        IP connection frequencies, flagging potential security threats.
	•	Output: Both functions save their results in CSV files, allowing for further analysis or visualization.

In essence, the analytics.py file equips the GenAIPot project with the ability to detect and respond to suspicious 
behavior, making it a powerful tool for identifying and understanding potential threats. 




The conclusion?


In very short time we are able to inspect different projects and understand how they are structured and technically working without needing to get into the code and waste a lot of time.


Is it saving time? Yep,

Is it easy to use? Yep,

Is it friendly process? Not so much, A.I can be smarter


One last thing, I asked OpenAI to create an image that describes the project


OpenAI: The traditional Chinese fortress symbolizes the honeypot, and the subtle, futuristic elements are blended with flowing ink and watercolor strokes, giving it a harmonious mix of classical art and modern cybersecurity themes.


If you got reading to this point and you havent tried using GenAIPot , give it a try :)