Consultor de Software i Enginyer de dades amb un fort background en infraestructura de processament de dades i recuperació d'informació (IR).

Key Skills

  • Coaching teams about best practices for their data processing pipeline.
  • Software engineering best practices.
  • Ruby, Java, Scala, Javascript.
  • Elasticsearch, Logstash, Kibana, Beats.
  • Neo4j, DEX, MySQL.
  • Ansible
  • Hadoop, Spark, Hive.

Elsewhere

Presentations and Conferences


2016

The log shipping scene been between us for a long time: from syslog, rsyslog to nowadays Fluentd, Flume and Logstash. Logstash been pushing hard to introduce new features that make the experience better for everyone. At the end of the day, a healthy shipper means a happy sysadmin. The latest Logstash includes persistence to reduce the chance of data loss, monitoring to find how everything is going and configuration management to make your life a lot easier. But wait, there’s more! Offline support, improved shutdown semantics, etc … features that will make your logs shipped and you a rested sysadmin. In this talk we’ll see this features in action thought a real live sensor monitoring example. By the end of the session, you will be able to use the full power of Logstash in your own deployments.

A short talk on how we validate our released inside the logstash project with jruby, rspec and ssh.

2015

Tecnical hands on and overview about the ElasticStack.
In this talk, we will cover several strategies for successfully scaling Logstash. Through the lens of several real-life war stories, you willl learn how to make Logstash sing alongside RabbitMQ, Redis, ZeroMQ, Kafka and much more. If you are ready to grow at scale and make your infrastructure more resilient, this talk is for you. Talk given at OSDC 2015 in Berlin, (link)

2014

Slides used to introduce the graph processing atelier being made at eurucamp 2014.

Publications


2016

Admins und Entwickler lieben Logs, mit denen sie Problemen und Fehlern auf die Spur kommen können. Wenn relevante Ereignisse allerdings verteilt über mehrere Rechner in unterschiedlichen Formaten protokolliert werden, macht das die Analyse arg umständlich. Logstash führt verschiedene Event-Quellen zusammen und füttert sie beispielsweise in einen Elasticsearch-Suchindex.

2010

The analysis of the relationship among data entities has lead to model them as graphs. Since the size of the datasets has significantly grown in the recent years, it has become necessary to implement efficient graph databases that can load and manage these huge datasets. In this paper, we evaluate the performance of four of the most scalable native graph database projects (Neo4j, Jena, HypergraphDB and DEX). We implement the full HPC Scalable Graph Analysis Benchmark, and we test the performance of each database for different typical graph operations and graph sizes, showing that in their current development status, DEX and Neo4j are the most efficient graph databases.