About Me
I’m Pedro Igor Gomes de Morais, a Data Scientist based in Vitória, ES, Brazil, working at the crossroads of Natural Language Processing, predictive modeling, and scalable data infrastructure. I enjoy translating academic insights into production-ready systems and automating workflows that unlock meaningful efficiency gains.
Experience
Machine Learning Engineer – Labic · Conecta Turismo (Apr 2024 – Present)
- Design and maintain ETL pipelines powered by Google Places and Instagrapi data to keep tourism intelligence dashboards current for 10,000+ businesses.
- Automate data quality checks and reporting so stakeholders receive actionable insights with 75% faster turnaround than manual processes.
Undergraduate Researcher – Data Science Lab (DSL) · UFES (May 2023 – Present)
- Built churn prediction models in PyTorch for TIM Brasil with 85% accuracy, delivering cost-reduction guidance exceeding US$50K/month in churn risk mitigation.
- Leading the development of an automated WhatsApp ingestion and BERT-based sentiment analysis stack to monitor misinformation in national conversations.
- Co-author of machine learning and NLP publications presented at AINA’25 and SBRC’24.
Undergraduate Researcher – Neoscópio · UFES (Sep 2023 – Sep 2024)
- Collected and processed 50K+ Portuguese news articles with BeautifulSoup to support research in large-scale NLP research.
- Streamlined dataset usability by 40% through optimized preprocessing pipelines that improved downstream experimentation speed.
Education
B.S. in Computer Science · Federal University of Espírito Santo (UFES) – Ongoing
Relevant coursework: Data Science, Deep Learning, NLP, Data Mining, Distributed Systems.
Skills & Interests
- Languages: Python, SQL, Kotlin/Java, C/C++, Elixir
- ML & NLP: PyTorch, Transformers, Scikit-learn, SHAP, FAISS
- Tools & Infrastructure: Docker, Git, Linux, DVC, Pandas, PySpark, GCP
- Visualization: Matplotlib, Seaborn
- Interests: Research translation, reusable pipelines, applied NLP for Portuguese, clean code / documentation, community knowledge sharing.
Papers
- Discourse Dynamics on X and Bluesky Amid Brazil’s 2024 Environmental and Political-Digital Crises
Maps how the 2024 environmental emergencies and political debates shaped conversations as audiences moved from X to Bluesky, highlighting emerging digital spaces and how climate/political narratives adapted to those platforms. - Além da Conexão: Combinando Múltiplas Fontes de Dados para Entender e Prever Evasão de Internet Residencial (SBRC 2024)
Fuses internal operator logs with Anatel public data to build churn predictors that reached ~80% accuracy with 70%+ precision/recall, while surfacing the features that most influence residential exit decisions. - The Value of Complaints: Churn Prediction in a Major Residential Internet Service Provider Using Textual Data (Springer 2025)
Uses NLP to turn regulator complaints into churn signals; the recall-improved models identify word-level patterns via Integrated Gradients, giving actionable insights for retention teams.
Contact
I’m always open to collaborations, research conversations, and data challenges. Connect on LinkedIn or drop a line at pedroigorgm@gmail.com.