My development environment

A customized development environment could be a huge productivity boost in the day to day work. In this post, I will share the tools and configurations I currently use.

Continue reading

git commit fixup

In this article, I will describe a git option to quickly fix a previous commit. This sometimes happens when I want to fix a typo in a previous commit after few new commits. The goal is to keep a “clean” git history with consistent commits adding features to facilitate the code reviews.

Continue reading

Using Docker to simplify Cassandra development in JHipster

JHipster is an open source project that generates a fully working application in seconds. With a minimal configuration, JHipster accelerates the start of new projects by integrating frontend, backend, security and a database.

Cassandra is one of the supported databases and JHipster generates all the configuration needed to access the cluster.

But it is often hard for the developers to configure and maintain a local Cassandra cluster.

Moreover, there is no standard tooling to manage the schema migrations, like Liquibase or Flyway for SQL databases, making it difficult to synchronize the schema between every environment and a local configuration.

JHipster’s goal is to provide the most simple and productive development environment out of the box for the developers, and this tool has been added in the latest (3.4.0) version.

In this post, I’ll describe the design of the tool and the basic commands to use it.

Continue reading

A tour of Databricks Community Edition: a hosted Spark service

With the recent announcement of the Community Edition, it’s time to have a look at the Databricks Cloud solution. Databricks Cloud is a hosted Spark service from Databricks, the team behind Spark.

Continue reading

Testing strategy for Spark Streaming – Part 2 of 2

In a previous post, we’ve seen why it’s important to test your Spark jobs and how you could easily unit test the job’s logic, first by designing your code to be testable and then by writing unit tests.

In this post, we will look at applying the same pattern to another important part of the Spark engine: Spark Streaming.

Continue reading

Testing strategy for Apache Spark jobs – Part 1 of 2

Like any other application, Apache Spark jobs deserve good testing practices and coverage.

Indeed, the costs of running jobs with production data makes unit testing a must-do to have a fast feedback loop and discover the errors earlier.

But because of its distributed nature and the RDD abstraction on top of the data, Spark requires special care for testing.

In this post, we’ll explore how to design your code for testing, how to setup a simple unit-test for your job logic and how the spark-testing-base library can help.

Continue reading