A step-by-step guide to deploying Amundsen on AWS

Savio
4 min readFeb 11, 2021

Why use Amundsen?

Amundsen is a platform for cataloging and searching data and metadata, perfect for automating this cataloging and doing google-style searches using the page-rank engine.

Amundsen Architecture

Amundsen consists of five micro-services.

  1. Frontend service: User iteration portal, building in flask.
  2. Metadata service: A proxy layer to interact with the graph database.
  3. Neo4j: A back-end server that saves all the metadata extracted from different sources. Uses the neo4j graph-oriented database to save metadata.
  4. Databuilder: A data ingestion library to build ETL that insert, update and / or delete metadata, we can use Apache Airflow to orchestrate the Databuilder jobs.
  5. Search service: A proxy layer for interacting with search backend functionality that uses elastic search.

Starting the setup

After this short introduction I will demonstrate an easy way to provide Amundsen on AWS using an EC2 machine and docker.

On the EC2 panel start a new instance using linux (I used a t2.medium), save your .pem key in a safe place, we will need it to access EC2 via SSH.

Starting a linux EC2

On the EC2 panel go to your newly created instance and click on connect, select the ssh option, copy the connection string, it should look something like this:

$ ‘ssh -i “yourkey.pem“ ec2-user@ec2–00–000–00–00.compute-1.amazonaws.com

Open the terminal, navigate to the directory where the .pem key is located and execute the SSH connection command

SSH connection with EC2

Transferring Amundsen files to EC2

First, let’s update our instance using the command:

$ sudo yum update -y

Now we can follow two paths, transfer the Amundsen .zip file from our machine via SCP or download the file directly on EC2, in this tutorial we will transfer the file from our machine to EC2, to do this place the .zip and the .pem key in the same folder and access it via terminal (open another terminal window and navigate to the folder with .pem and .zip) and run the following command:

$ scp -i /folder/yourkey.pem  /folder/amundsen_files.zip ec2_user@ec2_ip:/destiny_folder

If you don’t have unzip on ec2, install it with the following command:

$ sudo yum install unzip

Access EC2 via SSH and unzip the .zip that was downloaded with the following command:

$ unzip destiny_folder/amundsen_files.zip

Install the docker and docker-compose on this instance with the following commands:

$ sudo yum install docker -y$ sudo curl -L https://github.com/docker/compose/releases/latest/download/docker-
compose-$(uname
-s)-$(uname -m) -o /usr/local/bin/docker-compose
$ sudo chmod +x /usr/local/bin/docker-compose

Starting the Amundsen image on the EC2 docker

Start the docker and increase the machine’s temporary memory (elasticsearch needs a larger temporary memory) with the following commands:

$ sudo service docker start$ sudo sysctl -w vm.max_map_count=262144

Navigate to the folder with the “docker-amundsen.yml” file and run the following command to start our amundsen on docker:

$ docker-compose -f docker-amundsen.yml up

Wait until all micro-services are started, after that execute the command:

$ docker ps

It is expected that 5 micro services will appear with their respective ips and ports, we will be able to access the front end via your ip (public ipv4 from your EC2) and port through the browser (If you are unable to release the front end port in the security group EC2 on the EC2 panel). perform this test.

docker ps command
Amundsen front-end

Final considerations

This article was focused on how to provision Amundsen on AWS in the simplest possible way, in my next post, I will show how we can configure our instance to run the Amundsen image automatically when our EC2 is started and I will also show how to run a simple code of data-builder that consumes Redshift metadata and how to automate this code with a linux cron-job.

--

--

Savio

Data engineer, passionate about the world of data