programming

Laws - or principles - can give us guidance and teach us lessons from our peers’ mistakes. In this article, I will introduce you to five laws I always have in the back of my mind when designing or implementing a software. Some of them relate to pure development, some are related to system organizations. All of them should be useful for your growth as a software engineer.

Fix the Git completion with Oh-my-Zsh on Mac

in programming

July 16, 2017

In a previous post, I have explained how I have setup oh-my-zsh with the git plugin. I am also using homebrew to manage the packages installed on my Mac. After upgrading Git recently, I have noticed the Git completion was not as powerful anymore.

MongoDB and Apache Spark - Getting started tutorial

in programming

May 3, 2017

MongoDB and Apache Spark are two popular Big Data technologies.

In my previous post, I listed the capabilities of the MongoDB connector for Spark. In this tutorial, I will show you how to configure Spark to connect to MongoDB, load data, and write queries.

To demonstrate how to use Spark with MongoDB, I will use the zip codes from MongoDB tutorial on the aggregation pipeline documentation using a zip code data set. I have prepared a Maven project and a Docker Compose file to get you started quickly.

Introduction to the MongoDB connector for Apache Spark

in programming

March 31, 2017

MongoDB is one of the most popular NoSQL databases. Its unique capabilities to store document-oriented data using the built-in sharding and replication features provide horizontal scalability as well as high availability.

Apache Spark is another popular “Big Data” technology. Spark provides a lower entry level to the world of distributed computing by offering an easier to use, faster, and in-memory framework than the MapReduce framework. Apache Spark is intended to be used with any distributed storage, e.g. HDFS, Apache Cassandra with the Datastax’s spark-cassandra-connector and now the MongoDB’s connector presented in this article.

By using Apache Spark as a data processing platform on top of a MongoDB database, you can benefit from all of the major Spark API features: the RDD model, the SQL (HiveQL) abstraction and the Machine Learning libraries.

In this article, I present the features of the connector and some use cases. An upcoming article will be a tutorial to demonstrate how to load data from MongoDB and run queries with Spark.

Add a Git hook to automatically verify a repository's email

in programming

December 3, 2016

I use Git a lot and I often have to switch between my personal repositories (ie: Github) and my professional (Ippon) repositories on the same laptop. My default Git email is configured to my personal email and I have often forgotten to configure it to my professional email when creating/cloning a repository for my company. Like everything in Git, this can be automated to avoid mistakes.

In this post, I will show how I use a Git hook to check the email configured in any repository before every commit.

My development environment

in programming

November 27, 2016

A customized development environment could be a huge productivity boost in the day to day work. In this post, I will share the tools and configurations I currently use.

git commit fixup

in programming

November 23, 2016

In this article, I will describe a git option to quickly fix a previous commit. This sometimes happens when I want to fix a typo in a previous commit after few new commits. The goal is to keep a “clean” git history with consistent commits adding features to facilitate the code reviews.

Using Docker to simplify Cassandra development in JHipster

in programming

June 27, 2016

JHipster is an open source project that generates a fully working application in seconds. With a minimal configuration, JHipster accelerates the start of new projects by integrating frontend, backend, security and a database.

Cassandra is one of the supported databases and JHipster generates all the configuration needed to access the cluster.

But it is often hard for the developers to configure and maintain a local Cassandra cluster.

Moreover, there is no standard tooling to manage the schema migrations, like Liquibase or Flyway for SQL databases, making it difficult to synchronize the schema between every environment and a local configuration.

JHipster’s goal is to provide the most simple and productive development environment out of the box for the developers, and this tool has been added in the latest (3.4.0) version.

In this post, I’ll describe the design of the tool and the basic commands to use it.

A tour of Databricks Community Edition: a hosted Spark service

in programming

April 13, 2016

With the recent announcement of the Community Edition, it’s time to have a look at the Databricks Cloud solution. Databricks Cloud is a hosted Spark service from Databricks, the team behind Spark.

Testing strategy for Spark Streaming – Part 2 of 2

in programming

March 30, 2016

In a previous post, we’ve seen why it’s important to test your Spark jobs and how you could easily unit test the job’s logic, first by designing your code to be testable and then by writing unit tests.

In this post, we will look at applying the same pattern to another important part of the Spark engine: Spark Streaming.

Raphael Brugier

5 laws every developer should know

Fix the Git completion with Oh-my-Zsh on Mac

MongoDB and Apache Spark - Getting started tutorial

Introduction to the MongoDB connector for Apache Spark

Add a Git hook to automatically verify a repository's email

My development environment

git commit fixup

Using Docker to simplify Cassandra development in JHipster

A tour of Databricks Community Edition: a hosted Spark service

Testing strategy for Spark Streaming – Part 2 of 2

Search