About Me

I’m Pedro Igor Gomes de Morais, a Data Scientist based in Vitória, ES, Brazil, working at the crossroads of Natural Language Processing, predictive modeling, and scalable data infrastructure. I enjoy translating academic insights into production-ready systems and automating workflows that unlock meaningful efficiency gains.

Experience

Machine Learning Engineer – Labic · Conecta Turismo (Apr 2024 – Present)

Design and maintain ETL pipelines powered by Google Places and Instagrapi data to keep tourism intelligence dashboards current for 10,000+ businesses.
Automate data quality checks and reporting so stakeholders receive actionable insights with 75% faster turnaround than manual processes.

Undergraduate Researcher – Data Science Lab (DSL) · UFES (May 2023 – Present)

Built churn prediction models in PyTorch for TIM Brasil with 85% accuracy, delivering cost-reduction guidance exceeding US$50K/month in churn risk mitigation.
Leading the development of an automated WhatsApp ingestion and BERT-based sentiment analysis stack to monitor misinformation in national conversations.
Co-author of machine learning and NLP publications presented at AINA’25 and SBRC’24.

Undergraduate Researcher – Neoscópio · UFES (Sep 2023 – Sep 2024)

Collected and processed 50K+ Portuguese news articles with BeautifulSoup to support research in large-scale NLP research.
Streamlined dataset usability by 40% through optimized preprocessing pipelines that improved downstream experimentation speed.

Education

B.S. in Computer Science · Federal University of Espírito Santo (UFES) – Ongoing
Relevant coursework: Data Science, Deep Learning, NLP, Data Mining, Distributed Systems.

Skills & Interests

Languages: Python, SQL, Kotlin/Java, C/C++, Elixir
ML & NLP: PyTorch, Transformers, Scikit-learn, SHAP, FAISS
Tools & Infrastructure: Docker, Git, Linux, DVC, Pandas, PySpark, GCP
Visualization: Matplotlib, Seaborn
Interests: Research translation, reusable pipelines, applied NLP for Portuguese, clean code / documentation, community knowledge sharing.

Papers

Discourse Dynamics on X and Bluesky Amid Brazil’s 2024 Environmental and Political-Digital Crises
Maps how the 2024 environmental emergencies and political debates shaped conversations as audiences moved from X to Bluesky, highlighting emerging digital spaces and how climate/political narratives adapted to those platforms.
Além da Conexão: Combinando Múltiplas Fontes de Dados para Entender e Prever Evasão de Internet Residencial (SBRC 2024)
Fuses internal operator logs with Anatel public data to build churn predictors that reached ~80% accuracy with 70%+ precision/recall, while surfacing the features that most influence residential exit decisions.
The Value of Complaints: Churn Prediction in a Major Residential Internet Service Provider Using Textual Data (Springer 2025)
Uses NLP to turn regulator complaints into churn signals; the recall-improved models identify word-level patterns via Integrated Gradients, giving actionable insights for retention teams.

Contact

I’m always open to collaborations, research conversations, and data challenges. Connect on LinkedIn or drop a line at pedroigorgm@gmail.com.