Data Engineering / Web ScrapingBiomedical Data ScienceMachine Learning Engineering
Publié il y a environ 8 heures
Stage
⏱️3-6 mois
💼Hybride
📅Expire dans 13 jours
Ferme les onglets non utiles.
Description du poste
Overview
This internship aims to build a complete data pipeline and analytics platform to monitor e-commerce competitors, extract product information and provide actionable insights for management.
The project combines web scraping, hybrid database architecture (PostgreSQL, MongoDB, DuckDB), interactive dashboards, forecasting models and LLM-generated summaries for tactical recommendations.
Objectives
Automate data scraping and product information extraction from multiple e-commerce sources to obtain structured product, price and availability data.
Build a hybrid database architecture (PostgreSQL, MongoDB, DuckDB, …) to support real-time analytics, historical storage and fast analytical queries.
Design and implement an interactive dashboard for competitive analysis and KPI visualization to support decision-making.
Integrate predictive models to forecast price trends and market changes and provide forward-looking KPIs.
Generate AI-written summaries and insights using an LLM for management reporting and recommend strategic actions.
Required skills & technologies
Strong programming skills in Python and experience with web automation libraries (BeautifulSoup, Scrapy, Selenium) for resilient scraping pipelines.
Proficiency in data analysis and visualization (Pandas, Plotly, Streamlit, Dash) to build dashboards and perform exploratory analysis.
Knowledge of machine learning for forecasting and recommendation systems (LSTM, scikit-learn) to model price trends and product demand.
Familiarity with LLMs and text summarization techniques (GPT, BERT, RAG, LangChain) to produce concise management reports and insights.
Experience with PostgreSQL, MongoDB and DuckDB for hybrid data management and real-time analytics; design schemas and ETL processes.
Deliverables & tasks
Implement robust, scalable scrapers and data ingestion pipelines that handle changes in source websites, rate limits and data quality issues.
Design the hybrid database schema, implement data storage strategies (transactional vs analytical), and enable efficient joins/queries across stores.
Develop an interactive dashboard (Streamlit/Dash/Plotly) showing competitive KPIs, price evolution, product comparisons and alerts.
Train and validate forecasting models (e.g., LSTM, classical ML) for price/demand prediction and integrate them into the analytics pipeline.
Implement an LLM-based summarization and insight-generation module (RAG/LangChain pipeline) to produce periodic reports and recommend actions to management.
How to apply
To apply, send your CV and a brief motivation email to
hr@iovision.io
indicating relevant projects or examples of scraping/ML work.
Use the email subject: "Application for 07 09 08 Objectives PFE" so your application is routed to the correct project contact.