About Me

I’m Pedro Igor Gomes de Morais, a Data Scientist based in Vitória, ES, Brazil, working at the crossroads of Natural Language Processing, predictive modeling, and scalable data infrastructure. I enjoy translating academic insights into production-ready systems and automating workflows that unlock meaningful efficiency gains.

Experience

Machine Learning Engineer – Labic · Conecta Turismo (Apr 2024 – Present)

  • Design and maintain ETL pipelines powered by Google Places and Instagrapi data to keep tourism intelligence dashboards current for 10,000+ businesses.
  • Automate data quality checks and reporting so stakeholders receive actionable insights with 75% faster turnaround than manual processes.

Undergraduate Researcher – Data Science Lab (DSL) · UFES (May 2023 – Present)

  • Built churn prediction models in PyTorch for TIM Brasil with 85% accuracy, delivering cost-reduction guidance exceeding US$50K/month in churn risk mitigation.
  • Leading the development of an automated WhatsApp ingestion and BERT-based sentiment analysis stack to monitor misinformation in national conversations.
  • Co-author of machine learning and NLP publications presented at AINA’25 and SBRC’24.

Undergraduate Researcher – Neoscópio · UFES (Sep 2023 – Sep 2024)

  • Collected and processed 50K+ Portuguese news articles with BeautifulSoup to support research in large-scale NLP research.
  • Streamlined dataset usability by 40% through optimized preprocessing pipelines that improved downstream experimentation speed.

Education

B.S. in Computer Science · Federal University of Espírito Santo (UFES) – Ongoing
Relevant coursework: Data Science, Deep Learning, NLP, Data Mining, Distributed Systems.

Skills & Interests

  • Languages: Python, SQL, Kotlin/Java, C/C++, Elixir
  • ML & NLP: PyTorch, Transformers, Scikit-learn, SHAP, FAISS
  • Tools & Infrastructure: Docker, Git, Linux, DVC, Pandas, PySpark, GCP
  • Visualization: Matplotlib, Seaborn
  • Interests: Research translation, reusable pipelines, applied NLP for Portuguese, clean code / documentation, community knowledge sharing.

Papers

  • Discourse Dynamics on X and Bluesky Amid Brazil’s 2024 Environmental and Political-Digital Crises
    Maps how the 2024 environmental emergencies and political debates shaped conversations as audiences moved from X to Bluesky, highlighting emerging digital spaces and how climate/political narratives adapted to those platforms.
  • Além da Conexão: Combinando Múltiplas Fontes de Dados para Entender e Prever Evasão de Internet Residencial (SBRC 2024)
    Fuses internal operator logs with Anatel public data to build churn predictors that reached ~80% accuracy with 70%+ precision/recall, while surfacing the features that most influence residential exit decisions.
  • The Value of Complaints: Churn Prediction in a Major Residential Internet Service Provider Using Textual Data (Springer 2025)
    Uses NLP to turn regulator complaints into churn signals; the recall-improved models identify word-level patterns via Integrated Gradients, giving actionable insights for retention teams.

Contact

I’m always open to collaborations, research conversations, and data challenges. Connect on LinkedIn or drop a line at pedroigorgm@gmail.com.