Dam in mist. Photo by Paul Mocan on Unsplash

Elixir vs Python for real world AI/ML (Part 2)

Chris Hopkins profile picture

Chris Hopkins

12 January 2023 · 7 min read

This is Part two of our blog post comparing Elixir and Python when applied to a real world AI/ML problem. If you haven’t read it check out the first part.

Just to recap, we're trying to train a model to predict the water level of a dam. In part one we fetched the data from an API, cleaned it and combined it into a single DataFrame. In part two we’re going to build, train and evaluate a neural network using both Python and Elixir. Then we’ll take a look back at the Elixir and Python implementations and compare and contrast the two ecosystems.

Prepare data for the model

We have our data neatly in one DataFrame (df in the code below). Now we need to sort it into:

  • Training data
  • Testing data
  • Validation data (just in the Elixir Implementation)

Each of these categories are subdivided further into:

  • Features (input to the model)
  • Labels (expected output of the model)

We want 80% of our data for training and 20% for testing. The training data will be split further so that 20% of the training data is used for validation while training. The features will be the rainfall and stream level data. The labels will be the water level of the dam.

Here we’ll be using the Elixir library Nx. Nx is a multi-dimensional tensor library for Elixir. The real beauty of it is that the tensors can target three backends, Elixir Native code, Google XLA (the same backend as TensorFlow) and LibTorch. Enabling your neural networks to have similar performance to PyTorch and TensorFlow. See our introductory post about Nx.

Here’s the Python code:

feature_columns = [
  "v_568051_10", "v_568045_10", "v_563079_10",
  "v_563046_10", "v_563035_10", "v_212250_100",

label_column = "water_level_difference"
model_columns = feature_columns + [label_column]

train_data = df.copy()[model_columns].sample(frac=0.8, random_state=12345)
test_data = df.copy()[model_columns].drop(train_data.index)

train_features = train_data.copy()
test_features = test_data.copy()

train_labels = train_features.pop("water_level_difference").values.reshape(-1, 1)
test_labels = test_features.pop("water_level_difference").values.reshape(-1, 1)

And the Elixir code:

# snippet of the helper module
defmodule Helper do


    def split_data(df, decimal) do
        row_no = DF.n_rows(df)

    {first, second} =
      |> DF.to_rows()
      |> Enum.shuffle()
      |> Enum.split(round(decimal * row_no))

    {DF.new(first), DF.new(second)}

  def df_to_batches(df, feature_cols, label_col, batch_size \\ 1) do
      df_to_tensor_batches(df[feature_cols], batch_size),
      df_to_tensor_batches(df[[label_col]], batch_size)

  def df_to_tensor_batches(df, batch_size) do
    |> df_to_tensor()
    |> Nx.shuffle(axis: 0)
    |> Nx.to_batched(batch_size, leftover: :discard)



feature_columns = [

label_column = "water_level_difference"
model_columns = [label_column | feature_columns]

df = DF.select(df, model_columns)

{train_df, test_df} = Helper.split_data(df, 0.8)
{train_df, validation_df} = Helper.split_data(train_df, 0.8)

training_batches = Helper.df_to_batches(train_df, feature_columns, label_column, 8)
validation_batches = Helper.df_to_batches(validation_df, feature_columns, label_column, 1)
testing_batches = Helper.df_to_batches(test_df, feature_columns, label_column, 1)

As you can see the Elixir code is a little more involved. Axon (Elixir’s neural network library) is choosey about the format it receives the data in, it won’t take an Explorer DataFrame, only zipped batches of Nx tensors. However, there isn’t an obvious and optimal way to convert an Explorer data frame into an Nx tensor. The same is true for splitting the data - there’s no out-of-the-box way of doing it.

Building the model

Now we have all our data in the right format we are ready to build our neural network. Funnily enough this is actually the easiest part. In our Python version we will use TensorFlow and in Elixir we’re going to use Axon.

Axon is a neural network library built completely on top of Nx. Axon has a bunch of sensible APIs that are simple enough for a beginner but flexible enough that an expert can easily do just what they want.

Let’s take a look at the TensorFlow model in Python first, then compare with Elixir.


normaliser = tf.keras.layers.Normalization(axis=1)

test_model = tf.keras.Sequential(name="stream_and_rain_model", layers=[
    layers.Dense(units=16, activation="relu"),


model =
  |> Axon.dropout(rate: 0.5)
  |> Axon.dense(16)
    |> Axon.relu()
  |> Axon.dropout(rate: 0.5)
  |> Axon.dense(1)

You can see how these are both remarkably similar, the only major difference is that the Python model has a normalising layer. There isn’t a layer for this in Axon yet, as it’s simple enough to do the transformation yourself before the features are input to the neural network - the layer would be a nice quality of life improvement though.




history = test_model.fit(
    # Suppress logging.
    # Calculate validation results on 20% of the training data.
    validation_split = 0.2)


model_params =
  |> Axon.Loop.trainer(:mean_absolute_error, Axon.Optimizers.adam(0.001))
  |> Axon.Loop.validate(model, validation_batches)
  |> Axon.Loop.metric(:mean_absolute_error, "validation_loss")
  |> Axon.Loop.run(training_batches, %{}, epochs: 30)

Again, the semantics are almost exactly the same. Python does a little more for you here - remember how we didn’t need to make a validation_batches variable? That’s because we can simply tell Python to validate while training for us.


Both the models are training well, we can see the reduction in loss and validation loss over the epochs. While you do get this information printed in the terminal in Axon, there wasn’t a clear way to extract it, so the chart below is from the Python code.

Here is the Python code:

hist_df = pd.DataFrame(history.history)
hist_df["epoch"] = history.epoch
    columns={"loss":"training_loss", "val_loss":"validation_loss"},

    fold=['training_loss', 'validation_loss'], 
    as_=['variable', 'loss']

It produces:

Training results

We can clearly see the training loss and the validation loss going down.

We can use our test data to evaluate our model.


test_model.evaluate(test_features, test_labels)
# 29/29 [==============================] - 0s 1ms/step - loss: 0.0208


|> Axon.Loop.evaluator()
|> Axon.Loop.metric(:mean_absolute_error)
|> Axon.Loop.run(testing_batches, model_params, epoch: 1)
# Batch: 927, mean_absolute_error: 0.0245324

These loss values of 0.0208 and 0.0245 mean that on average the model is within approximately 0.02m of the real change in dam water level, not too bad.

We can observe this visually by feeding the model all of the feature data, and then comparing the predictions of the change in water level with the real change in water level.

In Python we are using Altair (a Vega-Lite binding). In Elixir we’re using VegaLite (an Elixir wrapper around Vega-Lite), so it produces near identical graphs.

Here is the Python code for that:

y = test_model.predict(df[feature_columns])

compare_df = pd.DataFrame({
    "t": df[["t"]].values.flatten(),
    "actual": df[["water_level_difference"]].values.flatten(),
    "prediction": y.flatten()

base = alt.Chart(compare_df.reset_index()[0:300]).encode(

) + base.mark_line(color="orange").encode(

And the Elixir code:

defmodule Helper do

  def df_to_tensor(df) do
    |> DF.names()
    |> Enum.map(&Explorer.Series.to_tensor(df[&1]))
    |> Nx.stack(axis: 1)


all_features = df_to_tensor(df[feature_columns])

all_labels =
  |> Series.to_tensor()
  |> Nx.to_flat_list()

{_init_fn, predict_fn} = Axon.build(model, mode: :inference)

predictions =
  predict_fn.(model_params, all_features)
  |> Nx.to_flat_list()

row_no = 4640

chart_data =
    prediction: predictions,
    actual: all_labels,
    count: Enum.map(1..row_no, & &1)
  |> DataFrame.to_rows()

Vl.new(width: 400, height: 400)
|> Vl.data_from_values(Enum.take(chart_data, 300))
|> Vl.layers([
  |> Vl.param("prediction_chart", select: :interval, bind: :scales, encodings: ["x", "y"])
  |> Vl.encode(:y, field: "prediction", type: :quantitative)
  |> Vl.encode(:x, field: "count", type: :quantitative)
  |> Vl.mark(:line, color: "orange"),
  |> Vl.encode(:x, field: "count", type: :quantitative)
  |> Vl.encode(:y, field: "actual", type: :quantitative)
  |> Vl.mark(:line, color: "blue")

This produces:

Actual water level vs predicted

The orange line is the model’s prediction, the blue line is the ground truth. We can see that it seems to predict large spikes quite well, but not necessarily their magnitude. The model seems to have a slight systematic bias towards predicting lower values than reality. I think this may be because our dataset is incomplete as we don’t have data about the outflow of the dam.

I think for a model where the input data is only from the day before and its predicting the next day, it’s performing remarkably well. This model could be improved, of course, with more data sources but also by inputting multiple days worth of data at once. The purpose of this post was to compare Python and Elixir with a real problem, so that’s outside the scope of this post. If I’ve piqued your interest you can see the repo (including a multi-day model in Python so the accuracy is better).

Elixir is punching way above its weight

I was not expecting Elixir to be better than Python for data science, and right now it isn’t. The Python data science ecosystem is huge and has had a vast amount of money poured into it by big tech. However it’s important to acknowledge just how far Elixir’s data science ecosystem has come in the past two years. It’s gone from zero to being a totally viable option for a data science project. It’s done this by standing on the shoulders of giants, taking all the best bits from the whole ecosystem - packaging them together in a simple but flexible way.

The Elixir data science ecosystem has early adopter challenges

Being an early adopter in a young ecosystem will always be harder than participating in a mainstream one. The Elixir data science ecosystem is no different. While you’ll find lots of excitement online, you might not be able to find a blog or forum post with the information you need to do ‘X’, or how to get rid of error ‘Y’. Fortunately a lot of love has been given to the documentation on these projects - practically everything is documented. The quick start guides and working examples are especially helpful.

Another niggle I’ve found is that there is some ‘glue’ missing between the libraries within the ecosystem. For example, why is there no function to turn an Explorer data frame into a 2D Nx tensor? It seems like an obvious omission. And why doesn’t some kind of test_train_split/2 function exist within Axon? These are just a couple of example friction points and there may be good reasons for them, but I did find myself having to build my own, probably inefficient solutions, for something that should already exist.

Its not a deal breaker, its the side effect of a young and exciting ecosystem, but the lack of small developer conveniences does break your flow when working on data engineering problems. As the community and ecosystem mature we should see the developer experience improve.

It feels better to program in Elixir

I may well be biased, but it felt way better programming in Elixir. I found the code easier to reason about and more understandable.

The main reason for this is that the Elixir was far more readable, just look at this example:

# python
water_level_df = water_level_df[~water_level_df["q"].isin([201,205])]

# elixir
water_level_df = DF.filter(water_level_df, q != 201 and q != 255)

Both of these are one liners. But the Elixir is much more self-explanatory, even a junior developer would know exactly what the code was doing without prior knowledge.

I also found that I was never looking over my shoulder, as I was in Python, considering if any of the functions returned a reference to an object, or whether the data was copied then modified. I found this especially with pandas and it actually caused a couple of pesky bugs, these bugs encourage you to defensively .copy() your data which isn’t good for performance. In contrast, in Elixir I knew that I was always passing an immutable object. It removed a whole category of errors from the project.


All in all this was a rewarding experience. I’d still say for now, most of the time, Python is a safe option. From the outset you know it can do whatever you need it to. But if you know it can be done with the current Elixir tooling, I’d go for it - Elixir is much more enjoyable to work in. Where I think the Elixir data science tools can really shine is when you already have an Elixir codebase (e.g. a web app) and you want to incorporate some data driven feature (e.g. recommendations). This is now viable to do in-app without having the burden of spinning off a Python microservice to process your data.

The Elixir data science tooling and community is still very young. Two years ago it was practically non-existent and now it’s a viable and performant option. Python has had a ten year head start but if it carries on at this rate I may be telling you Elixir is the clear winner before too long.

Chris Hopkins profile picture


Chris Hopkins

Chris is a full-stack software developer, with a keen interest in writing reliable, fault tolerant and resilient software. He has in-depth experience with load testing, automated testing and observability for deep application introspection. He's a big fan of functional and actor model programming utilising Test Driven Development.

More articles

Ash Framework


Announcing Ash 3.0

James Harton profile picture

James Harton

8 May 2024 – 2 min read

Precision machine tooling fuelled the industrial revolution


From Iron to Algorithms: How development frameworks are transforming software engineering

Ben Melbourne profile picture

Ben Melbourne

3 May 2024 – 3 min read

Want to read more?

The latest news, articles, and resources, sent to your inbox occasionally.