AWS Lambda Execution context in Java demystified

In this article, I will demystify the Execution Context of a Lambda invocation and how you can take advantage of it. More specifically for Lambdas written in Java, but this can apply to any language.

Continue reading

5 laws every developer should know

Laws - or principles - can give us guidance and teach us lessons from our peers’ mistakes. In this article, I will introduce you to five laws I always have in the back of my mind when designing or implementing a software. Some of them relate to pure development, some are related to system organizations. All of them should be useful for your growth as a software engineer.

Continue reading

Fix the Git completion with Oh-my-Zsh on Mac

In a previous post, I have explained how I have setup oh-my-zsh with the git plugin. I am also using homebrew to manage the packages installed on my Mac. After upgrading Git recently, I have noticed the Git completion was not as powerful anymore.

Continue reading

MongoDB and Apache Spark - Getting started tutorial

MongoDB and Apache Spark are two popular Big Data technologies.

In my previous post, I listed the capabilities of the MongoDB connector for Spark. In this tutorial, I will show you how to configure Spark to connect to MongoDB, load data, and write queries.

To demonstrate how to use Spark with MongoDB, I will use the zip codes from MongoDB tutorial on the aggregation pipeline documentation using a zip code data set. I have prepared a Maven project and a Docker Compose file to get you started quickly.

Continue reading

Introduction to the MongoDB connector for Apache Spark

MongoDB is one of the most popular NoSQL databases. Its unique capabilities to store document-oriented data using the built-in sharding and replication features provide horizontal scalability as well as high availability.

Apache Spark is another popular “Big Data” technology. Spark provides a lower entry level to the world of distributed computing by offering an easier to use, faster, and in-memory framework than the MapReduce framework. Apache Spark is intended to be used with any distributed storage, e.g. HDFS, Apache Cassandra with the Datastax’s spark-cassandra-connector and now the MongoDB’s connector presented in this article.

By using Apache Spark as a data processing platform on top of a MongoDB database, you can benefit from all of the major Spark API features: the RDD model, the SQL (HiveQL) abstraction and the Machine Learning libraries.

In this article, I present the features of the connector and some use cases. An upcoming article will be a tutorial to demonstrate how to load data from MongoDB and run queries with Spark.

Continue reading

Add a Git hook to automatically verify a repository's email

I use Git a lot and I often have to switch between my personal repositories (ie: Github) and my professional (Ippon) repositories on the same laptop. My default Git email is configured to my personal email and I have often forgotten to configure it to my professional email when creating/cloning a repository for my company. Like everything in Git, this can be automated to avoid mistakes.

In this post, I will show how I use a Git hook to check the email configured in any repository before every commit.

Continue reading