← Naman Bhoj

What I Worked On

ForYourResearch

2023-2025
React, TailwindCss, Django, PostgreSQL, AWS, LangChain, Pinecone, OpenAI, Docker, Github Actions, Cursor, AWS (EC2, Lambda, CloudWatch), Scrapy, Airflow, POSTgres

Customer: Researchers conducting systematic literature reviews (35 ICPs at UofC, 50 stakeholders engaged).
What we did: Automated web app using scraping and text analysis for data collection.
Status Quo: Traditional manual approach (12-18 months, $30,000-50,000) vs. automated approach (1 week, hundreds of dollars).
Outcomes: 98% timeline reduction and 99% cost reduction.
Context: Evidence-based research requiring efficient literature review processes. Conducted customer discovery to identify beachhead market and develop quantified value proposition.

Vaccine Rules Intelligence Pipeline

2025
FastAPI, AWS Lambda, SQS, SNS, SES, S3, EventBridge, Glue, Athena, CDK, Claude API, Retool

Customer: Teams or individuals needing automated AI-generated recommendations delivered daily across multiple regions.
What we did: Built a serverless, event-driven pipeline that processes AI prompts through Claude, fans results out to email and a queryable data lake, and runs fully automated on a daily cron — no manual intervention required.
Status Quo: Manually running prompts, copy-pasting outputs, emailing results, and managing spreadsheets vs. a fully automated pipeline that ingests, processes, stores, and delivers AI results across any number of countries daily.
Outcomes: 100% elimination of manual prompt-to-delivery workflow. Horizontally scalable to any number of jobs with zero additional infrastructure changes.
Context: Event-driven architecture with decoupled producers and consumers — FastAPI (entry point with rate limiting) queues jobs to SQS instantly; Lambda workers pull jobs and call Claude API; SNS fans out results in parallel to an email queue (Lambda → SES) and an S3 queue (Lambda → Bronze JSON layer); EventBridge triggers a scheduler Lambda daily to auto-submit one job per country and kicks off a Glue Workflow that transforms Bronze JSON → Silver Parquet; Athena queries the Silver layer; Retool surfaces results in a dashboard. All infrastructure provisioned as code via AWS CDK.

Service Where & Why It Was Used
FastAPIEntry point — exposes POST /submit, validates requests, enforces rate limiting (10 req/min per IP via slowapi), pushes jobs to SQS, returns 202 Accepted immediately
SQS (job_queue)Decouples the API from processing — holds jobs until a Lambda worker picks them up; FIFO ensures order; DLQ captures failures after 3 attempts
Lambda (claude_worker)Pulls jobs from job_queue, calls the Claude API with the prompt, publishes the structured result to SNS
Claude APIThe AI brain — takes the prompt and returns the generated recommendation/result
SNSFan-out hub — receives the Claude result and simultaneously delivers it to both the email queue and the S3 queue in parallel
SQS (email_queue)Buffers results destined for email delivery; decouples SNS fan-out from the email worker
Lambda (email_worker)Triggered by email_queue — formats the result and sends it via SES
SESDelivers the final AI-generated result to the recipient's inbox
SQS (s3_queue)Buffers results destined for storage; decouples SNS fan-out from the S3 writer
Lambda (s3_worker)Triggered by s3_queue — writes the result as a JSON file to S3 Bronze layer under recommendations/{date}/
EventBridgeTwo roles: (1) triggers scheduler_worker Lambda daily at midnight UTC to auto-submit one job per country; (2) triggers the Glue Workflow daily to run the ETL
Lambda (scheduler_worker)Loops through configured countries and pushes one job per country into job_queue — fully automated, no human trigger needed
S3Three-zone storage — Bronze (raw JSON results), Silver (Parquet after ETL), Athena query results
Glue Crawler (Bronze)Scans Bronze S3 prefix, infers schema, registers the table in the Data Catalog
Glue ETL JobTransforms Bronze JSON → Silver Parquet for efficient querying
Glue Crawler (Silver)Updates the Data Catalog after ETL writes new Silver partitions
Glue Data CatalogCentral metadata store — schema, location, and partitions for both Bronze and Silver tables
AthenaSQL engine over S3 — queries the Silver Parquet layer; results written back to S3
RetoolUser-facing dashboard — connects to Athena, lets users query, filter, and visualize AI results without writing SQL
CDKInfrastructure as code — provisions every resource above with a single cdk deploy; outputs queue URLs, SNS ARN, and S3 bucket name

Pawitraa

2019-2021
React, Django, PostgreSQL, Digital Ocean, TailwindCSS, Docker, TailwindCss

Customer: Students in Himalayan regions of India lacking access to quality education.
What we did: Built web platform delivering evidence-based educational content in Maths, Life Sciences, and Computer Science.
Status Quo: Limited access to quality education resources in remote regions.
Outcomes: Bridged accessibility gap through scalable digital platform.
Context: Non-profit tech initiative addressing educational inequity in underserved communities.

Real-Time Digital Twin Creation Using LiDAR and iPad

2023-2024
Unity, Langchain, OpenAI, C#, Apple LiDAR, ARKit, Gaussian Splatting, Docker, Github Actions

Customer: Construction teams, facility managers, and AR collaborators requiring spatial understanding for TELUS Communication customer support workers.
What we did: Built real-time pipeline capturing spatial data via iPad LiDAR and rendering as 3D digital twins using Gaussian Splatting.
Status Quo: Manual site documentation and remote collaboration limited by 2D representations.
Outcomes: Real-time spatial capture enabling immersive digital twins for analysis and collaboration.
Context: Applications in smart construction, facility management, and distributed AR collaboration.

End to End Remote Support Solution with LLM Powered Avatar

2024-2025
Unity, Meta Quest, LLM, Speech-to-Text, ARKit, Github Actions, Docker

Customer: Remote support teams and field technicians requiring hands-free, context-aware assistance in TELUS Communication.
What we did: Built end-to-end pipeline scanning physical environments with iPad LiDAR, reconstructing as digital twins, streaming to Meta Quest with LLM-powered voice interactions.
Status Quo: Traditional remote support lacks spatial context and requires manual reference materials.
Outcomes: Hands-free, context-aware VR experiences enabling remote experts to assist with full environmental understanding.
Context: VR-based remote support for field operations requiring spatial reasoning and collaborative problem-solving.

Doc to HTML

2025
Next.js, FastAPI, OpenAI, Docker, TypeScript

Customer: Organizations in mission-critical industries (Healthcare, Defense, Law) managing internal documentation.
What we did: Built rapid prototype converting internal docs to structured HTML guides using AI.
Status Quo: Manual documentation conversion is time-consuming and error-prone.
Outcomes: Prototype developed within 6 hours; streamlines multi-guide conversion to HTML.
Context: Early-stage prototype; future improvements include semantic search for document clustering and comprehensive guide generation.

Time-Series Energy Consumption Prediction for Smart Homes

2022
Python, TensorFlow, Keras, PyTorch, scikit-learn, Docker, Jupyter Notebook, AWS Sagemaker

Customer: Utility companies and residential homeowners in smart home ecosystems.
What we did: Developed advanced time-series forecasting models for energy consumption prediction.
Status Quo: Traditional forecasting methods lack accuracy for demand-response optimization.
Outcomes: Improved energy usage predictions enabling optimized power generation, smarter demand-response systems, and reduced carbon footprints.
Context: Did Data integration in smart home energy management contributing to sustainability and grid efficiency.

LSTM Powered Identification of Clickbait Content

2021
Python, TensorFlow, Keras, LSTM, Random Forest, NLP, Docker, Jupyter Notebook,AWS Sagemaker

Customer: Online content consumers and content platforms combating misinformation.
What we did: Built NLP-based detection system using LSTM neural networks for clickbait identification.
Status Quo: Manual content moderation insufficient for scale; poor-quality headlines erode trust.
Outcomes: LSTM model achieved 95.03% accuracy, outperforming Random Forest (93.89%) and Naive Bayes (93.32%).
Context: Research addressing information quality and user trust in digital media.

AI-Powered, Low-Latency Intrusion Detection for Power Systems

2021
Python, scikit-learn, Random Forest, Gradient Boosting, SVM,AWS Sagemaker

Customer: Power grid operators and critical infrastructure protecting against cyber threats.
What we did: Developed machine learning-based intrusion detection system using feature selection and ensemble methods.
Status Quo: Traditional rule-based systems miss novel attack patterns and have high false-positive rates.
Outcomes: Achieved 11.874% accuracy improvement using Random Forest feature selection with SVM, requiring only 30 features.
Context: Critical infrastructure cybersecurity data intergation enhancing power system resilience.

Effective Identification of Spam Job Postings Using Employer-Defined Linguistic Features

2022 · 2022 1st International Conference on AI in Cybersecurity (ICAIC)
Python, NLP, scikit-learn, TF-IDF, Feature Engineering, Pandas, NumPy

Customer: Job seekers and recruitment platforms combating fraudulent listings.
What we did: Designed a linguistic feature extraction pipeline from employer-authored text to classify spam job postings.
Status Quo: Manual review of job posts at scale is infeasible; fraudulent listings erode platform trust.
Outcomes: Effective classification using domain-specific linguistic cues derived from employer language patterns.
Context: NLP and feature engineering research at the intersection of labor market integrity and text classification.

Feature Selection and Scaling for Random Forest Powered Malware Detection System

2021 · 2021 10th IEEE International Conference on Communication Systems and Network Technologies
Python, scikit-learn, Random Forest, Feature Selection, SMOTE, Pandas, NumPy

Customer: Cybersecurity teams and endpoint protection platforms.
What we did: Evaluated feature selection and scaling strategies to optimize a Random Forest malware detection pipeline.
Status Quo: High-dimensional malware feature spaces degrade classifier performance and increase inference cost.
Outcomes: Targeted feature selection and proper scaling significantly improve detection accuracy and reduce model complexity.
Context: Security-focused data science research establishing best practices for ML-based malware classification.

Improved Identification of Negative Tweets Related to Covid-19 Vaccination by Mitigating Class Imbalance

2021 · 2021 13th International Conference on Computational Intelligence and Communication Networks
Python, NLP, LSTM, SMOTE, scikit-learn, TensorFlow

Customer: Public health agencies and misinformation researchers monitoring vaccine sentiment.
What we did: Built a sentiment classification pipeline to identify negative vaccine-related tweets, with techniques to address severe class imbalance.
Status Quo: Negative vaccine content is rare but high-impact; class imbalance causes most models to underperform on the minority class.
Outcomes: Improved recall and F1 on negative tweet detection by applying SMOTE and re-weighting strategies.
Context: Applied NLP research supporting real-time public health surveillance during the Covid-19 pandemic.

Naive and Neighbour Approach for Phishing Detection

2021 · 2021 10th IEEE International Conference on Communication Systems and Network Technologies
Python, scikit-learn, Naive Bayes, KNN, URL Feature Extraction, Pandas

Customer: End users and email/browser security platforms targeted by phishing attacks.
What we did: Built and benchmarked Naive Bayes and KNN classifiers on extracted URL and page-level features for phishing site detection.
Status Quo: Rule-based blocklists lag behind rapidly evolving phishing campaigns.
Outcomes: Demonstrated lightweight ML models as viable real-time phishing detectors with competitive accuracy and low inference overhead.
Context: Security data science research evaluating interpretable, low-latency classifiers suitable for edge deployment.

Comparative Analysis of Feature Selection Techniques for Malicious Website Detection in SMOTE Balanced Data

2021 · RS Open Journal on Innovative Communication Technologies, Vol. 2(3)
Python, scikit-learn, SMOTE, Random Forest, Decision Tree, Feature Selection, Pandas

Customer: Browser security vendors and enterprise network defenders.
What we did: Conducted comparative analysis of feature selection techniques on SMOTE-balanced datasets for malicious website detection.
Status Quo: Imbalanced security datasets and noisy features reduce classifier reliability in real-world deployments.
Outcomes: Identified optimal feature selection approaches that maximize detection precision after data balancing.
Context: Data engineering and ML research establishing reproducible preprocessing workflows for web security classification.

A Machine Learning Framework for Security and Privacy Issues in Building Trust for Social Networking

2023 · Cluster Computing, Vol. 26(6), 3907–3930
Python, scikit-learn, Graph ML, NLP, Privacy-Preserving ML

Customer: Social network operators and users requiring trust and privacy guarantees.
What we did: Designed a machine learning framework addressing security threats and privacy leakage in social network data.
Status Quo: Social platforms face multi-vector threats — fake accounts, data harvesting, malicious content — with no unified ML defense layer.
Outcomes: Proposed and validated a cohesive framework integrating threat detection, anomaly scoring, and privacy-aware data handling.
Context: Systems-level research published in Cluster Computing, combining ML pipeline design with privacy engineering.

Challenges and Opportunities in Edge Computing Architecture Using Machine Learning Approaches

2022 · Artificial Intelligence and Machine Learning for EDGE Computing, pp. 395–409
Python, scikit-learn, Edge ML, Distributed Systems, TensorFlow Lite

Customer: IoT system architects and edge infrastructure engineers.
What we did: Surveyed and analyzed ML deployment strategies for edge computing environments, identifying key architectural trade-offs.
Status Quo: Centralized ML inference creates latency and bandwidth bottlenecks unsuitable for real-time edge applications.
Outcomes: Mapped opportunities for lightweight model deployment at the edge, informing architecture decisions for constrained environments.
Context: Systems research bridging ML and edge computing infrastructure design.

Tree Based Classification of Firewall Logs to Dodge Intrusion

2021 · 2021 10th IEEE International Conference on Communication Systems and Network Technologies
Python, scikit-learn, Decision Tree, Random Forest, Log Parsing, Pandas

Customer: Network security operations teams monitoring firewall activity.
What we did: Applied tree-based classifiers to structured firewall log data to detect and classify intrusion attempts.
Status Quo: Manual log analysis is slow and misses patterns across high-volume network traffic.
Outcomes: Demonstrated that tree-based models can accurately classify intrusion events from raw firewall logs with low overhead.
Context: Data pipeline and ML research applied to network security operations.

On Analysis of Effectiveness of Ensemble, Distance and Tree Based Methods for Secure Power Systems

2021 · 2021 International Conference on Data Analytics for Business and Industry (ICDABI)
Python, scikit-learn, Gradient Boosting, KNN, Decision Tree, Ensemble Methods

Customer: Power grid security engineers evaluating ML-based anomaly detection.
What we did: Benchmarked ensemble, distance-based, and tree-based classifiers for detecting anomalies in power system data.
Status Quo: No consensus on which ML family performs best for critical infrastructure anomaly detection.
Outcomes: Provided empirical guidance on model selection for power system security, identifying performance trade-offs across families.
Context: Comparative data science research supporting robust ML adoption in critical infrastructure.

On Learning from Natural Language to Develop Intelligent System to Identify Potential Medical Condition

2021
Python, NLP, LSTM, BERT, scikit-learn, TensorFlow, Healthcare NLP

Customer: Healthcare providers and clinical decision support systems.
What we did: Explored NLP-based approaches to extract and classify medical conditions from natural language text.
Status Quo: Unstructured clinical text contains high-value diagnostic signals inaccessible to traditional rule-based systems.
Outcomes: Developed intelligent system capable of identifying potential medical conditions from patient-authored and clinical text.
Context: Applied NLP research toward automating clinical triage and decision support.

Robust Malware Detection Using Machine Learning

2021 · 2021 10th IEEE International Conference on Communication Systems and Network Technologies
Python, scikit-learn, Random Forest, SVM, Feature Engineering, Pandas

Customer: Endpoint security vendors and enterprise IT defense teams.
What we did: Built and evaluated a robust ML pipeline for malware detection using static and behavioral features.
Status Quo: Signature-based antivirus fails against novel and obfuscated malware variants.
Outcomes: ML-based pipeline demonstrated strong generalization against unseen malware families with high detection rates.
Context: Security data science research establishing ML as a reliable layer in multi-stage malware defense.