The Problem: Research Takes Too Long As a researcher and developer, I found myself spending hours manually searching academic databases, reading abstracts, and trying to synthesize findings across multiple sources. For my work on circular economy and battery recycling, I needed to: circular economy and battery recycling Search 5 different academic databasesRead through dozens of papersExtract key findings and methodologiesIdentify research gaps and trendsSynthesize everything into a coherent report Search 5 different academic databases Read through dozens of papers Extract key findings and methodologies Identify research gaps and trends Synthesize everything into a coherent report This process took 4-6 hours per week. I knew there had to be a better way. 4-6 hours per week The Solution: Full-Stack Research Automation I built an n8n workflow that does all of this automatically in under 5 minutes. Here's what it does: ✅ Queries 5 academic APIs simultaneously ✅ Uses AI to extract insights from each paper ✅ Scores papers for relevance and quality ✅ Stores everything in Google Sheets ✅ Generates a comprehensive synthesis report ✅ Emails me a beautiful HTML report Result: I went from 4-6 hours of manual work to one click and 5 minutes of waiting. Result: one click and 5 minutes of waiting Tech Stack Here's what powers this automation: ToolPurposen8nWorkflow automation platform (open-source)Groq AILlama 3.3 70B model for content extraction & synthesisSemantic ScholarComputer science & general academic papersOpenAlex200M+ open-access research papersCrossrefDOI registry & journal metadataarXivPreprints in physics, math, CSPubMedBiomedical & life sciencesGoogle SheetsData persistenceGmailReport delivery ToolPurposen8nWorkflow automation platform (open-source)Groq AILlama 3.3 70B model for content extraction & synthesisSemantic ScholarComputer science & general academic papersOpenAlex200M+ open-access research papersCrossrefDOI registry & journal metadataarXivPreprints in physics, math, CSPubMedBiomedical & life sciencesGoogle SheetsData persistenceGmailReport delivery ToolPurposen8nWorkflow automation platform (open-source)Groq AILlama 3.3 70B model for content extraction & synthesisSemantic ScholarComputer science & general academic papersOpenAlex200M+ open-access research papersCrossrefDOI registry & journal metadataarXivPreprints in physics, math, CSPubMedBiomedical & life sciencesGoogle SheetsData persistenceGmailReport delivery ToolPurpose Tool Tool Purpose Purpose n8nWorkflow automation platform (open-source) n8n n8n n8n Workflow automation platform (open-source) Workflow automation platform (open-source) Groq AILlama 3.3 70B model for content extraction & synthesis Groq AI Groq AI Groq AI Llama 3.3 70B model for content extraction & synthesis Llama 3.3 70B model for content extraction & synthesis Semantic ScholarComputer science & general academic papers Semantic Scholar Semantic Scholar Semantic Scholar Computer science & general academic papers Computer science & general academic papers OpenAlex200M+ open-access research papers OpenAlex OpenAlex OpenAlex 200M+ open-access research papers 200M+ open-access research papers CrossrefDOI registry & journal metadata Crossref Crossref Crossref DOI registry & journal metadata DOI registry & journal metadata arXivPreprints in physics, math, CS arXiv arXiv arXiv Preprints in physics, math, CS Preprints in physics, math, CS PubMedBiomedical & life sciences PubMed PubMed PubMed Biomedical & life sciences Biomedical & life sciences Google SheetsData persistence Google Sheets Google Sheets Google Sheets Data persistence Data persistence GmailReport delivery Gmail Gmail Gmail Report delivery Report delivery Architecture Overview Here's the complete workflow in 7 stages: Manual Trigger → Configuration → 5 Parallel API Calls → Normalize & Deduplicate → AI Extraction → Score & Filter → Google Sheets + AI Synthesis → Email Report Manual Trigger → Configuration → 5 Parallel API Calls → Normalize & Deduplicate → AI Extraction → Score & Filter → Google Sheets + AI Synthesis → Email Report Total nodes: 23 Total nodes: Execution time: 2-5 minutes Execution time: Papers processed: Up to 50 (10 per API) Papers processed: Stage 1: Configuration Instead of hardcoding search parameters in every node, I created a central configuration node: { "keywords": "circular economy battery recycling remanufacturing", "min_year": "2020", "min_citations": "2", "max_results": "10", "relevance_threshold": "15" } { "keywords": "circular economy battery recycling remanufacturing", "min_year": "2020", "min_citations": "2", "max_results": "10", "relevance_threshold": "15" } This makes the workflow easy to customize. Want to research a different topic? Just change the keywords field. keywords Stage 2: Parallel API Collection The workflow queries all 5 APIs simultaneously using n8n's parallel execution. Each API call is preceded by a "Rate Limit Delay" node to avoid 429 errors. simultaneously Example: Semantic Scholar API // HTTP Request Node Configuration { "url": "https://api.semanticscholar.org/graph/v1/paper/search", "queryParameters": { "query": "{{ $json.keywords }}", "limit": "{{ $json.max_results }}", "fields": "title,abstract,year,authors,citationCount,venue,externalIds,url,openAccessPdf", "year": "{{ $json.min_year }}-" } } // HTTP Request Node Configuration { "url": "https://api.semanticscholar.org/graph/v1/paper/search", "queryParameters": { "query": "{{ $json.keywords }}", "limit": "{{ $json.max_results }}", "fields": "title,abstract,year,authors,citationCount,venue,externalIds,url,openAccessPdf", "year": "{{ $json.min_year }}-" } } Key insight: All API nodes have onError: "continueRegularOutput" set. This means if one API fails, the workflow continues with whatever data it successfully retrieved. Key insight: onError: "continueRegularOutput" Stage 3: Normalization & Deduplication Each API returns data in a different format. This JavaScript code normalizes everything into a standard structure: // Simplified normalization logic const normalized = []; const seenDOIs = new Set(); const seenTitles = new Set(); for (const item of apiResponses) { // Skip errors if (item.json.error || item.json.status === 429) { console.log('Skipping failed API'); continue; } // Normalize Semantic Scholar if (item.json.data && Array.isArray(item.json.data)) { for (const paper of item.json.data) { const normalized = { title: paper.title || 'Unknown', abstract: paper.abstract || 'No abstract', year: paper.year || null, authors: paper.authors.map(a => a.name).join(', '), citations: paper.citationCount || 0, doi: paper.externalIds?.DOI || null, source: 'Semantic Scholar' }; // Deduplicate by DOI if (normalized.doi && !seenDOIs.has(normalized.doi)) { seenDOIs.add(normalized.doi); results.push(normalized); } } } // Similar logic for OpenAlex, Crossref, arXiv, PubMed... } // Simplified normalization logic const normalized = []; const seenDOIs = new Set(); const seenTitles = new Set(); for (const item of apiResponses) { // Skip errors if (item.json.error || item.json.status === 429) { console.log('Skipping failed API'); continue; } // Normalize Semantic Scholar if (item.json.data && Array.isArray(item.json.data)) { for (const paper of item.json.data) { const normalized = { title: paper.title || 'Unknown', abstract: paper.abstract || 'No abstract', year: paper.year || null, authors: paper.authors.map(a => a.name).join(', '), citations: paper.citationCount || 0, doi: paper.externalIds?.DOI || null, source: 'Semantic Scholar' }; // Deduplicate by DOI if (normalized.doi && !seenDOIs.has(normalized.doi)) { seenDOIs.add(normalized.doi); results.push(normalized); } } } // Similar logic for OpenAlex, Crossref, arXiv, PubMed... } Why this matters: Without normalization, the next stages wouldn't know how to read the data. Without deduplication, you'd get the same paper multiple times. Why this matters: Stage 4: AI-Powered Content Extraction Now comes the magic. For each paper, I use Groq's Llama 3.3 70B model to extract structured insights. Groq's Llama 3.3 70B The Prompt const prompt = `You are an expert research analyst. Analyze this paper: Title: ${paper.title} Abstract: ${paper.abstract} Year: ${paper.year} Authors: ${paper.authors} Extract: 1. research_question: Main research question 2. methodology: Research methods used 3. key_findings: Main findings (2-3 sentences) 4. conclusion: Key conclusions 5. themes: Array of themes (e.g., ["circular economy", "battery recycling"]) Return ONLY valid JSON with these exact keys.`; const prompt = `You are an expert research analyst. Analyze this paper: Title: ${paper.title} Abstract: ${paper.abstract} Year: ${paper.year} Authors: ${paper.authors} Extract: 1. research_question: Main research question 2. methodology: Research methods used 3. key_findings: Main findings (2-3 sentences) 4. conclusion: Key conclusions 5. themes: Array of themes (e.g., ["circular economy", "battery recycling"]) Return ONLY valid JSON with these exact keys.`; The API Call // Groq API Request { "model": "llama-3.3-70b-versatile", "messages": [ { "role": "system", "content": "You are a research extraction assistant. Always return valid JSON only." }, { "role": "user", "content": prompt } ], "temperature": 0.3, // Low temperature for factual extraction "max_tokens": 2000 } // Groq API Request { "model": "llama-3.3-70b-versatile", "messages": [ { "role": "system", "content": "You are a research extraction assistant. Always return valid JSON only." }, { "role": "user", "content": prompt } ], "temperature": 0.3, // Low temperature for factual extraction "max_tokens": 2000 } Optimization: I use n8n's batching feature to process 5 papers at once, reducing API calls by 80%. Optimization: Stage 5: Scoring & Filtering Not all papers are equally relevant. I built a custom scoring algorithm: Relevance Score (max 70 points) const keywords = ['circular', 'economy', 'recycling', 'remanufacturing', 'sustainability', 'waste', 'battery', 'lithium']; let score = 0; for (const keyword of keywords) { if (title.includes(keyword)) score += 6; if (abstract.includes(keyword)) score += 4; if (themes.includes(keyword)) score += 3; } const keywords = ['circular', 'economy', 'recycling', 'remanufacturing', 'sustainability', 'waste', 'battery', 'lithium']; let score = 0; for (const keyword of keywords) { if (title.includes(keyword)) score += 6; if (abstract.includes(keyword)) score += 4; if (themes.includes(keyword)) score += 3; } Quality Score (max 30 points) const citationScore = Math.min(citations / 3, 20); // Max 20 points const recencyScore = Math.max(0, 10 - (currentYear - paperYear)); // Max 10 points const qualityScore = citationScore + recencyScore; const citationScore = Math.min(citations / 3, 20); // Max 20 points const recencyScore = Math.max(0, 10 - (currentYear - paperYear)); // Max 10 points const qualityScore = citationScore + recencyScore; Total Score const totalScore = relevanceScore + qualityScore; // Max 100 points const totalScore = relevanceScore + qualityScore; // Max 100 points Papers below the threshold (default: 15) are filtered out. Stage 6: Storage & Synthesis Filtered papers are saved to Google Sheets for future reference. Then, all papers are aggregated into one massive prompt for AI synthesis: Google Sheets const prompt = `You are a research synthesis expert analyzing ${papersCount} papers. [Paper summaries with all extracted data...] Generate a comprehensive synthesis report with: - Executive Summary - Key Themes Identified - Emerging Trends - Research Gaps - Methodological Approaches - Key Findings Summary - Future Research Directions`; const prompt = `You are a research synthesis expert analyzing ${papersCount} papers. [Paper summaries with all extracted data...] Generate a comprehensive synthesis report with: - Executive Summary - Key Themes Identified - Emerging Trends - Research Gaps - Methodological Approaches - Key Findings Summary - Future Research Directions`; Groq's AI generates a multi-section markdown report analyzing all papers together. multi-section markdown report Stage 7: Email Delivery The final node converts the markdown synthesis into a beautiful HTML email: // Markdown to HTML conversion let html = synthesisText .replace(/## (.*?)\n/g, '<h2 style="color: #2c3e50;">$1</h2>') .replace(/\*\*(.*?)\*\*/g, '<strong>$1</strong>') .replace(/^- (.*?)$/gm, '<li>$1</li>'); // Markdown to HTML conversion let html = synthesisText .replace(/## (.*?)\n/g, '<h2 style="color: #2c3e50;">$1</h2>') .replace(/\*\*(.*?)\*\*/g, '<strong>$1</strong>') .replace(/^- (.*?)$/gm, '<li>$1</li>'); The email includes: Paper count and dateData source summaryFull synthesis report with stylingLink to Google SheetsTimestamp Paper count and date Data source summary Full synthesis report with styling Link to Google Sheets Timestamp Results & Impact Before Automation ⏱️ 4-6 hours per week manually searching and reading📊 10-15 papers reviewed per session📝 Inconsistent note-taking and synthesis ⏱️ 4-6 hours per week manually searching and reading 4-6 hours per week 📊 10-15 papers reviewed per session 10-15 papers 📝 Inconsistent note-taking and synthesis Inconsistent After Automation ⏱️ 5 minutes (one click + wait time)📊 50 papers processed automatically📝 Structured AI-generated synthesis every time📧 Professional reports delivered to inbox ⏱️ 5 minutes (one click + wait time) 5 minutes 📊 50 papers processed automatically 50 papers 📝 Structured AI-generated synthesis every time Structured 📧 Professional reports delivered to inbox Professional reports Time saved: ~20 hours per month Time saved: Key Learnings 1. Error Handling is Critical Setting onError: "continueRegularOutput" on all API nodes means one failed API doesn't crash the entire workflow. onError: "continueRegularOutput" 2. Batching Saves API Costs Processing 5 papers at once reduced my Groq API calls by 80%. 3. Prompt Engineering Matters Specifying "Return ONLY valid JSON" in the system message dramatically improved parsing reliability. 4. Centralised Configuration is Essential One configuration node makes the workflow easy to customize and maintain. 5. Parallel Execution is Powerful Querying 5 APIs simultaneously reduced execution time from 15+ minutes to under 5 minutes. How to Use This Workflow Prerequisites n8n instance (self-hosted or cloud)Groq API key (free tier available)Google account (for Sheets & Gmail) n8n instance (self-hosted or cloud) Groq API key (free tier available) Google account (for Sheets & Gmail) Setup Steps Import the workflow into n8nConfigure credentials: Import the workflow into n8n Import the workflow Configure credentials: Configure credentials: Groq API (OpenAI-compatible endpoint)Google Sheets OAuth2Gmail OAuth2 Groq API (OpenAI-compatible endpoint) Google Sheets OAuth2 Gmail OAuth2 Update configuration: Update configuration: Update configuration: Change keywords to your research topicAdjust relevance_threshold for filteringUpdate email recipient Change keywords to your research topic keywords Adjust relevance_threshold for filtering relevance_threshold Update email recipient Run the workflow and wait for your report! Run the workflow and wait for your report! Run the workflow Code Repository The complete workflow JSON and setup guide are available on GitHub: 👉 https://github.com/chidoziemanagwu/Research-Automation-Workflow https://github.com/chidoziemanagwu/Research-Automation-Workflow Includes: workflow.json - The complete n8n workflowREADME.md - Overview and featuresSETUP.md - Step-by-step configuration guide workflow.json - The complete n8n workflow workflow.json README.md - Overview and features README.md SETUP.md - Step-by-step configuration guide SETUP.md Future Enhancements I'm planning to add: 📄 PDF full-text analysis for deeper insights📊 Citation network visualisation⏰ Scheduled execution (daily/weekly) 📄 PDF full-text analysis for deeper insights PDF full-text analysis 📊 Citation network visualisation Citation network visualisation ⏰ Scheduled execution (daily/weekly) Scheduled execution Conclusion Building this automation taught me that the best code is the code you don't have to write. By combining n8n's visual workflow builder, Groq's powerful AI, and multiple academic APIs, I created a system that does in 5 minutes what used to take me hours. the best code is the code you don't have to write If you're a researcher, student, or anyone who regularly reviews academic literature, I encourage you to build something similar. The tools are accessible, the APIs are (mostly) free, and the time savings are massive. About the Author Chidozie Managwu is a contributor to FreeCodeCamp, a software engineer and an automation expert focused on using AI and workflow tools to solve real-world problems.