Laws - or principles - can give us guidance and teach us lessons from our peers’ mistakes. In this article, I will introduce you to five laws I always have in the back of my mind when designing or implementing a software. Some of them relate to pure development, some are related to system organizations. All of them should be useful for your growth as a software engineer.
In a previous post, I have explained how I have setup oh-my-zsh with the git plugin. I am also using homebrew to manage the packages installed on my Mac. After upgrading Git recently, I have noticed the Git completion was not as powerful anymore.
MongoDB and Apache Spark are two popular Big Data technologies.
To demonstrate how to use Spark with MongoDB, I will use the zip codes from MongoDB tutorial on the aggregation pipeline documentation using a zip code data set. I have prepared a Maven project and a Docker Compose file to get you started quickly.
MongoDB is one of the most popular NoSQL databases. Its unique capabilities to store document-oriented data using the built-in sharding and replication features provide horizontal scalability as well as high availability.
Apache Spark is another popular “Big Data” technology. Spark provides a lower entry level to the world of distributed computing by offering an easier to use, faster, and in-memory framework than the MapReduce framework. Apache Spark is intended to be used with any distributed storage, e.g. HDFS, Apache Cassandra with the Datastax’s spark-cassandra-connector and now the MongoDB’s connector presented in this article.
By using Apache Spark as a data processing platform on top of a MongoDB database, you can benefit from all of the major Spark API features: the RDD model, the SQL (HiveQL) abstraction and the Machine Learning libraries.
In this article, I present the features of the connector and some use cases. An upcoming article will be a tutorial to demonstrate how to load data from MongoDB and run queries with Spark.
I use Git a lot and I often have to switch between my personal repositories (ie: Github) and my professional (Ippon) repositories on the same laptop. My default Git email is configured to my personal email and I have often forgotten to configure it to my professional email when creating/cloning a repository for my company. Like everything in Git, this can be automated to avoid mistakes.
In this post, I will show how I use a Git hook to check the email configured in any repository before every commit.