Leveling Up Your Elixir Option Handling

Leveling Up Your Elixir Option Handling

With the NimbleOptions library, you can define powerful and flexible argument validation schemas.

When writing libraries in any programming language, it is helpful to do as much validation of function arguments as possible up front to avoid spending time doing work that will eventually fail. This is a significant advantage to having type systems since many of the argument validations you might want to do have to do with ensuring that the correct types are passed to your function. With Elixir being dynamically typed (for now), there are other language idioms used to achieve argument validation. For example, you might use the combination of multiclause functions, pattern matching, and function guards to achieve a similar outcome to static type checks.

def negate(int) when is_integer(int) do
  -num
end

def negate(bool) when is_boolean(bool) do
  not bool
end

def negate(_) do
  raise "Type must be "int" or "bool"
end

This is the example that José Valim uses in his recent ElixirConfUS talk entitled The Foundations of the Elixir Type System. As you can see, you can use multiclause functions to separate concerns for each type that the function accepts, and you can include the last clause that matches every other type and raise an error with the appropriate message. This is a perfectly fine solution for rudimentary argument validation, but once you start needing to validate on properties not captured in guards or validating keyword arguments, it can get a bit messier.

You can always elect to do manual keyword validation using Keyword.validate, but if you have multiple functions that share similar validation then you might find it repetitive, in which case you might extract those validations into a function. Then you might start realizing that you want more powerful validations, and soon enough, you decide to extract that logic into its own module. Well, as it turns out, the folks over at Dashbit have already done that with the Nimble Options package!

Intro to NimbleOptions

NimbleOptions is described as "a tiny library for validating and documenting high-level options." It allows you to define schemas in which to validate your keyword options and raises the appropriate error when a validation errors. It includes many built-in types to define schemas, or you can provide custom definitions. It also allows you to pair the documentation about an option with the option itself, and then can conveniently generate your documentation for all of your options. You can reuse schema definitions and compose them as you would any keyword list (since schemas are defined as Keyword Lists), and even compile your schemas ahead of time (assuming you have no runtime-only terms in your definitions).

The API is very easy to learn, and the library is small enough that you can feel good incorporating it into your code base. One other benefit of the library is that it allows you to perform transformations on the parameters while also validating, which was a real benefit to me when writing binding to an external API. If I didn't like a particular API parameter, or if some did not make sense in the context of Elixir, then I could change the Elixir-facing API and use NimbleOptions to transform the parameter to the required field to pass to the external API. Let's dive deeper into that real-world use case.

Leveling Up Your Validations

As I was writing EXGBoost, I found that one of the pain points was finding a good way to do the bevy of parameter validation needed for the library. XGBoost itself has many different parameters that it may accept, and the way in which some parameters act can be dependent on other parameters.

There were several unique considerations I had when writing my parameter validations. I will explain the problem and show how NimbleOptions helped me solve it.

💡
The code that I will be referencing is available in its totality here, and you can find the documentation that was generated (including that which was generated with NimbleOptions) here.

Custom Validations

As with many machine learning models, there are parameters that must be validated within a real-number range (not an Elixir Range), so I knew those would also need repeated custom validation. Think parameters such as regularization terms (alpha,beta), learning rates (eta), etc. One interesting case for XGBoost is the colsample_by* family of parameters. The XGBoost C API treats each one as a separate parameter, but each shares the same validation. Also, these parameters work cumulatively since they control the tree sampling according to different characteristics, so a valid option could be {'colsample_bytree':0.5, 'colsample_bylevel':0.5, 'colsample_bynode':0.5}, which would reduce 64 features to 8 features at each split. I wanted to simplify this API a bit to

colsample_by: [tree: 0.8, node: 0.8, level: 0.8]

We can do this by taking advantage of a custom type in the definition schema. First, let's write the definition for this parameter:

colsample_by: [
    type: {:custom, EXGBoost.Parameters, :validate_colsample, []},
    doc: """
    This is a family of parameters for subsampling of columns.
    All `colsample_by` parameters have a range of `(0, 1]`, the default value of `1`, and specify the fraction of columns to be subsampled.
    `colsample_by` parameters work cumulatively. For instance, the combination
    `col_sampleby: [tree: 0.5, level: 0.5, node: 0.5]` with `64` features will leave `8`.
      * `:tree` - The subsample ratio of columns when constructing each tree. Subsampling occurs once for every tree constructed. Valid range is (0, 1]. The default value is `1`.
      * `:level` - The subsample ratio of columns for each level. Subsampling occurs once for every new depth level reached in a tree. Columns are subsampled from the set of columns chosen for the current tree. Valid range is (0, 1]. The default value is `1`.
      * `:node` - The subsample ratio of columns for each node (split). Subsampling occurs once every time a new split is evaluated. Columns are subsampled from the set of columns chosen for the current level. Valid range is (0, 1]. The default value is `1`.
    """
  ]

Next, we write the validator. With custom type in NimbleOptions, your validator must return {:error, reason} or {:ok, options}, where options are the outputted validated options.

def validate_colsample(x) do
  unless is_list(x) do
    {:error, "Parameter `colsample` must be a list, got #{inspect(x)}"}
  else
    Enum.reduce_while(x, {:ok, []}, fn x, {_status, acc} ->
      case x do
        {key, value} when key in [:tree, :level, :node] and is_number(value) ->
          if in_range(value, "(0,1]") do
            {:cont, {:ok, [{String.to_atom("colsample_by#{key}"), value} | acc]}}
          else
            {:halt,
             {:error, "Parameter `colsample: #{key}` must be in (0,1], got #{inspect(value)}"}}
          end

        {key, _value} ->
          {:halt,
           {:error,
            "Parameter `colsample` must be in [:tree, :level, :node], got #{inspect(key)}"}}

        _ ->
          {:halt, {:error, "Parameter `colsample` must be a keyword list, got #{inspect(x)}"}}
      end
    end)
  end
end

And just like that, we can now have all three possible colsample_by* options succinctly under one key, while still adhering to the XGBoost C API. We will touch more on other transformations later.

Overridable Configuration Defaults

There are certain parameters that must be passed to the XGBoost API, but we don't necessarily want the user to have to define them on each API call. So instead we can define a default value that the user can either globally override or override on each call. One case of this is with the parameter nthread for EXGBoost.train and EXGBoost.predict. It would be tedious for the user to have to pass the nthread option for each invocation to these high-level APIs, so instead we use Application.compile_env/3 function to set a default in our NimbleOptions schema definitions.

 nthread: [
      type: :non_neg_integer,
      default: Application.compile_env(:exgboost, :nthread, 0),
      doc: """
      Number of threads to use for training and prediction. If `0`, then the
      number of threads is set to the number of cores.  This can be set globally
      using the `:exgboost` application environment variable `:nthread`
      or on a per booster basis.  If set globally, the value will be used for
      all boosters unless overridden by a specific booster.
      To set the number of threads globally, add the following to your `config.exs`:
      `config :exgboost, nthread: n`.
      """
    ]

With this definition, the highest precedence is when a user passes the nthread option to the API call. If they do not provide that option, then it falls back to the value the user set under the same key in their config.exs file. If the user did not set the key, then the default of 0 is used, which in this case refers to using all available cores. Additionally, since we used Application.compile_env/3, then this schema still has no runtime-only terms and can thus be compiled (which means that any runtime changes to the environment will not be reflected).

Parameter Transformation

The XGBoost C API requires that all parameters be JSON string encoded, which means that many parameter names they use are not valid atoms. For example, all of the Learning Task Parameters objectives use colons (:) to separate parameters used for different types of models ("reg:squarederror","binary:logistic", "multi:softmax"). We could just use those directly, but that does not feel very "Elixir" to me. So instead I opted to use atoms and just replace the colons with an underscore (:reg_squarederror,":binary_logistic", :multi_softmax), which just looks much cleaner within a keyword list. Atoms convey to the user that there is a limited enumeration of valid options, whereas a string conveys a user-defined option.

So to achieve this we can define the definition with a custom validation:

objective: [
      type: {:custom, EXGBoost.Parameters, :validate_objective, []},
      default: :reg_squarederror,
      doc: ~S"""
      Specify the learning task and the corresponding learning objective. The objective options are:
        * `:reg_squarederror` - regression with squared loss.
        * `:reg_squaredlogerror` - regression with squared log loss $\frac{1}{2}[\log (pred + 1) - \log (label + 1)]^2$. All input labels are required to be greater than `-1`. Also, see metric rmsle for possible issue with this objective.
        * `:reg_logistic` - logistic regression.
        * `:reg_pseudohubererror` - regression with Pseudo Huber loss, a twice differentiable alternative to absolute loss.
        * `:reg_absoluteerror` - Regression with `L1` error. When tree model is used, leaf value is refreshed after tree construction. If used in distributed training, the leaf value is calculated as the mean value from all workers, which is not guaranteed to be optimal.
        * `:reg_quantileerror` - Quantile loss, also known as pinball loss. See later sections for its parameter and Quantile Regression for a worked example.
        * `:binary_logistic` - logistic regression for binary classification, output probability
        * `:binary_logitraw` - logistic regression for binary classification, output score before logistic transformation
        * `:binary_hinge` - hinge loss for binary classification. This makes predictions of `0` or `1`, rather than producing probabilities.
        * `:count_poisson` - Poisson regression for count data, output mean of Poisson distribution.
            * `max_delta_step` is set to `0.7` by default in Poisson regression (used to safeguard optimization)
        * `:survival_cox` - Cox regression for right censored survival time data (negative values are considered right censored). Note that predictions are returned on the hazard ratio scale (i.e., as `HR = exp(marginal_prediction)` in the proportional hazard function `h(t) = h0(t) * HR`).
        * `:survival_aft` - Accelerated failure time model for censored survival time data. See [Survival Analysis with Accelerated Failure Time](https://xgboost.readthedocs.io/en/latest/tutorials/aft_survival_analysis.html) for details.
        * `:multi_softmax` - set XGBoost to do multiclass classification using the softmax objective, you also need to set num_class(number of classes)
        * `:multi_softprob` - same as softmax, but output a vector of ndata * nclass, which can be further reshaped to ndata * nclass matrix. The result contains predicted probability of each data point belonging to each class.
        * `:rank_ndcg` - Use LambdaMART to perform pair-wise ranking where Normalized Discounted Cumulative Gain (NDCG) is maximized. This objective supports position debiasing for click data.
        * `:rank_map` - Use LambdaMART to perform pair-wise ranking where Mean Average Precision (MAP) is maximized
        * `:rank_pairwise` - Use LambdaRank to perform pair-wise ranking using the ranknet objective.
        * `:reg_gamma` - gamma regression with log-link. Output is a mean of gamma distribution. It might be useful, e.g., for modeling insurance claims severity, or for any outcome that might be gamma-distributed.
        * `:reg_tweedie` - Tweedie regression with log-link. It might be useful, e.g., for modeling total loss in insurance, or for any outcome that might be Tweedie-distributed.
      """
    ]

Then define the validation function itself:

def validate_objective(x) do
  if(
    x in [
      :reg_squarederror,
      :reg_squaredlogerror,
      :reg_logistic,
      :reg_pseudohubererror,
      :reg_absoluteerror,
      :reg_quantileerror,
      :binary_logistic,
      :binary_logitraw,
      :binary_hinge,
      :count_poisson,
      :survival_cox,
      :survival_aft,
      :multi_softmax,
      :multi_softprob,
      :rank_ndcg,
      :rank_map,
      :rank_pairwise,
      :reg_gamma,
      :reg_tweedie
    ],
    do: {:ok, Atom.to_string(x) |> String.replace("_", ":")},
    else:
      {:error,
       "Parameter `objective` must be in [:reg_squarederror, :reg_squaredlogerror, :reg_logistic, :reg_pseudohubererror, :reg_absoluteerror, :reg_quantileerror, :binary_logistic, :binary_logitraw, :binary_hinge, :count_poisson, :survival_cox, :survival_aft, :multi_softmax, :multi_softprob, :rank_ndcg, :rank_map, :rank_pairwise, :reg_gamma, :reg_tweedie], got #{inspect(x)}"}
  )
end

Another situation where I found the use of these transformations to lead to a much cleaner, "Elixir" styled API is with certain evaluation metrics. Again, these were defined in the XGBoost API as strings and certain options could be appended with a - to indicate an inversion of sorts (ndcg vs ndcg-), while other options could be parameterized to indicate a cut-off value (ndcg vs ndcg@n where n is an integer), and you could even combine these two modifiers (such as ndcg@n-). While it would be perfectly serviceable to just use strings in these situations, again it felt like it would stand out as inelegant compared to the usual elegance of the language. So, as a way of keeping the options as atoms while allowing these modifications, I changed the API to prepend inv in the atom names for those which would otherwise append - in their string, and use a 2-Tuple for those options which would be parametrized, as opposed to the @ used in their string counterparts. This would lead to an API such as :ndcg,:inv_ndcg, {:inv_ndcg, n}

def validate_eval_metric(x) do
    x = if is_list(x), do: x, else: [x]

    metrics =
      Enum.map(x, fn y ->
        case y do
          {task, n} when task in [:error, :ndcg, :map, :tweedie_nloglik] and is_number(n) ->
            task = Atom.to_string(task) |> String.replace("_", "-")
            "#{task}@#{n}"

          {task, n} when task in [:inv_ndcg, :inv_map] and is_number(n) ->
            [task | _tail] = task |> Atom.to_string() |> String.split("_") |> Enum.reverse()
            "#{task}@#{n}-"

          task when task in [:inv_ndcg, :inv_map] ->
            [task | _tail] = task |> Atom.to_string() |> String.split("_") |> Enum.reverse()
            "#{task}-"

          task
          when task in [
                 :rmse,
                 :rmsle,
                 :mae,
                 :mape,
                 :mphe,
                 :logloss,
                 :error,
                 :merror,
                 :mlogloss,
                 :auc,
                 :aucpr,
                 :ndcg,
                 :map,
                 :tweedie_nloglik,
                 :poisson_nloglik,
                 :gamma_nloglik,
                 :cox_nloglik,
                 :gamma_deviance,
                 :aft_nloglik,
                 :interval_regression_accuracy
               ] ->
            Atom.to_string(task) |> String.replace("_", "-")

          _ ->
            raise ArgumentError,
                  "Parameter `eval_metric` must be in [:rmse, :mae, :logloss, :error, :error, :merror, :mlogloss, :auc, :aucpr, :ndcg, :map, :ndcg, :map, :ndcg, :map, :poisson_nloglik, :gamma_nloglik, :gamma_deviance, :tweedie_nloglik, :tweedie_deviance], got #{inspect(y)}"
        end
      end)

    {:ok, metrics}
  end

Composability

Since definition schemas are just normal Keyword Lists, we can compose definitions like you would any other Keyword List. For example, with XGBoost, there is a class of booster called a Dart Booster, which is just Tree Booster with dropout. So assuming we have our tree booster schema defined as @tree_booster_params, we can define our Dart Booster as:

@dart_booster_params @tree_booster_params ++
   [
     sample_type: [
       type: {:in, [:uniform, :weighted]},
       default: :uniform,
       doc: """
       Type of sampling algorithm.
          * `:uniform` - Dropped trees are selected uniformly.
          * `:weighted` - Dropped trees are selected in proportion to weight.
       """
     ],
     normalize_type: [
       type: {:in, [:tree, :forest]},
       default: :tree,
       doc: """
       Type of normalization algorithm.
          * `:tree` - New trees have the same weight of each of dropped trees.
              * Weight of new trees are `1 / (k + learning_rate)`.
              * Dropped trees are scaled by a factor of `k / (k + learning_rate)`.
          * `:forest` - New trees have the same weight of sum of dropped trees (forest).
              * Weight of new trees are 1 / (1 + learning_rate).
              * Dropped trees are scaled by a factor of 1 / (1 + learning_rate).
       """
     ],
     rate_drop: [
       type: {:custom, EXGBoost.Parameters, :in_range, ["[0,1]"]},
       default: 0.0,
       doc: """
       Dropout rate (a fraction of previous trees to drop during the dropout). Valid range is [0, 1].
       """
     ],
     one_drop: [
       type: {:in, [0, 1]},
       default: 0,
       doc: """
       When this flag is enabled, at least one tree is always dropped during the dropout (allows Binomial-plus-one or epsilon-dropout from the original DART paper).
       """
     ],
     skip_drop: [
       type: {:custom, EXGBoost.Parameters, :in_range, ["[0,1]"]},
       default: 0.0,
       doc: """
       Probability of skipping the dropout procedure during a boosting iteration. Valid range is [0, 1].
          * If a dropout is skipped, new trees are added in the same manner as gbtree.
          * **Note** that non-zero skip_drop has higher priority than rate_drop or one_drop.
       """
     ]
   ]

Putting It All Together

XGBoost separates its training and prediction options into four classes of options:

General parameters relate to which booster we are using to do boosting, commonly tree or linear model
Booster parameters depend on which booster you have chosen
Learning task parameters decide on the learning scenario. For example, regression tasks may use different parameters with ranking tasks.
Command line parameters relate to behavior of CLI version of XGBoost.

Additionally, there are global parameters, so we will treat them as Application level parameters. We can ignore the last class since they don't apply to our use case. Since certain Learning Task Parameters are only valid for certain objectives, we must separate those concerns similarly to how Booster Parameters depend on options from General Parameters.

Our final validation flow will look like this:

stateDiagram-v2 G: General Parameters Dart: Dart Parameters Tree: Tree Booster Parameters Linear: Linear Booster Parameters Learning: Learning Task Parameters Survival: Survival Parameters Rank: Rank Parameters Reg: Reg Parameters Multi: Multi Parameters [*] --> G G --> Dart: if Booster == dart G --> Tree : if Booster == gbtree G --> Linear : if Booster == gblinear Dart --> Learning Tree --> Learning Linear --> Learning Learning --> Survival: if task == "survival" Learning --> Rank: if task == "rank" Learning --> Reg: if task == "reg" Learning --> Multi: if task == "multi" Survival --> [*] Rank --> [*] Reg --> [*] Multi --> [*] Learning --> [*]: else

We define a separate validate!/1 function which we will invoke to validate this flow when our API is used. So let's first stub out our function:

def validate!(params) when is_list(params) do
end

Now we get and validate only the general parameters from all of the options passed:

def validate!(params) when is_list(params) do
  general_params =
        Keyword.take(params, Keyword.keys(@general_params))
        |> NimbleOptions.validate!(@general_schema)
end

Now, as a way of giving the user an escape hatch in case they want to directly pass through to the XGBoost C API, forgoing the niceties we're providing, we also add a :validate_parameters boolean option, which defaults to true, but the user can set to false:

  if general_params[:validate_parameters] do
    ...
  else
    params
  end

Now, within the true branch, we gather and validate the booster params:

booster_params =
  case general_params[:booster] do
    :gbtree ->
      Keyword.take(params, Keyword.keys(@tree_booster_params))
      |> NimbleOptions.validate!(@tree_booster_schema)

    :gblinear ->
      Keyword.take(params, Keyword.keys(@linear_booster_params))
      |> NimbleOptions.validate!(@linear_booster_schema)

    :dart ->
      Keyword.take(params, Keyword.keys(@dart_booster_params))
      |> NimbleOptions.validate!(@dart_booster_schema)
  end

Next we gather the Learning Task params:

learning_task_params =
    Keyword.take(params, Keyword.keys(@learning_task_params))
    |> NimbleOptions.validate!(@learning_task_schema)

Now we use the selected :objective option from the learning_task_params to validate the parameters which are objective-dependent:

extra_params =
    case learning_task_params[:objective] do
      "reg:tweedie" ->
        Keyword.take(params, Keyword.keys(@tweedie_params))
        |> NimbleOptions.validate!(@tweedie_schema)

      "reg:pseudohubererror" ->
        Keyword.take(params, Keyword.keys(@pseudohubererror_params))
        |> NimbleOptions.validate!(@pseudohubererror_schema)

      "reg:quantileerror" ->
        Keyword.take(params, Keyword.keys(@quantileerror_params))
        |> NimbleOptions.validate!(@quantileerror_schema)

      "survival:aft" ->
        Keyword.take(params, Keyword.keys(@survival_params))
        |> NimbleOptions.validate!(@survival_schema)

      "rank:ndcg" ->
        Keyword.take(params, Keyword.keys(@ranking_params))
        |> NimbleOptions.validate!(@ranking_schema)

      "rank:map" ->
        Keyword.take(params, Keyword.keys(@ranking_params))
        |> NimbleOptions.validate!(@ranking_schema)

      "rank:pairwise" ->
        Keyword.take(params, Keyword.keys(@ranking_params))
        |> NimbleOptions.validate!(@ranking_schema)

      "multi:softmax" ->
        Keyword.take(params, Keyword.keys(@multi_soft_params))
        |> NimbleOptions.validate!(@multi_soft_schema)

      "multi:softprob" ->
        Keyword.take(params, Keyword.keys(@multi_soft_params))
        |> NimbleOptions.validate!(@multi_soft_schema)

      _ ->
        []
    end

Finally, we return all of the params we gathered:

general_params ++ booster_params ++ learning_task_params ++ extra_params

The final validation function looks like this:

@doc """
  Validates the EXGBoost parameters and returns a keyword list of the validated parameters.
  """
@spec validate!(keyword()) :: keyword()
def validate!(params) when is_list(params) do
  # Get some of the params that other params depend on
  general_params =
    Keyword.take(params, Keyword.keys(@general_params))
    |> NimbleOptions.validate!(@general_schema)
  
  if general_params[:validate_parameters] do
    booster_params =
      case general_params[:booster] do
        :gbtree ->
          Keyword.take(params, Keyword.keys(@tree_booster_params))
          |> NimbleOptions.validate!(@tree_booster_schema)
  
        :gblinear ->
          Keyword.take(params, Keyword.keys(@linear_booster_params))
          |> NimbleOptions.validate!(@linear_booster_schema)
  
        :dart ->
          Keyword.take(params, Keyword.keys(@dart_booster_params))
          |> NimbleOptions.validate!(@dart_booster_schema)
      end
  
    learning_task_params =
      Keyword.take(params, Keyword.keys(@learning_task_params))
      |> NimbleOptions.validate!(@learning_task_schema)
  
    extra_params =
      case learning_task_params[:objective] do
        "reg:tweedie" ->
          Keyword.take(params, Keyword.keys(@tweedie_params))
          |> NimbleOptions.validate!(@tweedie_schema)
  
        "reg:pseudohubererror" ->
          Keyword.take(params, Keyword.keys(@pseudohubererror_params))
          |> NimbleOptions.validate!(@pseudohubererror_schema)
  
        "reg:quantileerror" ->
          Keyword.take(params, Keyword.keys(@quantileerror_params))
          |> NimbleOptions.validate!(@quantileerror_schema)
  
        "survival:aft" ->
          Keyword.take(params, Keyword.keys(@survival_params))
          |> NimbleOptions.validate!(@survival_schema)
  
        "rank:ndcg" ->
          Keyword.take(params, Keyword.keys(@ranking_params))
          |> NimbleOptions.validate!(@ranking_schema)
  
        "rank:map" ->
          Keyword.take(params, Keyword.keys(@ranking_params))
          |> NimbleOptions.validate!(@ranking_schema)
  
        "rank:pairwise" ->
          Keyword.take(params, Keyword.keys(@ranking_params))
          |> NimbleOptions.validate!(@ranking_schema)
  
        "multi:softmax" ->
          Keyword.take(params, Keyword.keys(@multi_soft_params))
          |> NimbleOptions.validate!(@multi_soft_schema)
  
        "multi:softprob" ->
          Keyword.take(params, Keyword.keys(@multi_soft_params))
          |> NimbleOptions.validate!(@multi_soft_schema)
  
        _ ->
          []
      end
  
    general_params ++ booster_params ++ learning_task_params ++ extra_params
  else
    params
  end
end

Conclusion

I hope I demonstrated how NimbleOptions can make some fairly complex validation logic into very manageable and intuitive code. I wanted to share my thoughts on the library since it really worked wonders for my particular use case, which I believe showed a wide range of validation techniques. With this validation flow, I can have complex sets of parameters such as the following:

params = [
      num_boost_rounds: num_boost_round,
      tree_method: :hist,
      obj: :multi_softprob,
      num_class: num_class,
      eval_metric: [
        :rmse,
        :rmsle,
        :mae,
        :mape,
        :logloss,
        :error,
        :auc,
        :merror,
        :mlogloss,
        :gamma_nloglik,
        :inv_map,
        {:tweedie_nloglik, 1.5},
        {:error, 0.2},
        {:ndcg, 3},
        {:map, 2},
        {:inv_ndcg, 3}
      ],
      max_depth: 3,
      eta: 0.3,
      gamma: 0.1,
      min_child_weight: 1,
      subsample: 0.8,
      colsample_by: [tree: 0.8, node: 0.8, level: 0.8],
      lambda: 1,
      alpha: 0,
      grow_policy: :lossguide,
      max_leaves: 0,
      max_bin: 128,
      predictor: :cpu_predictor,
      num_parallel_tree: 1,
      monotone_constraints: [],
      interaction_constraints: []
    ]

and completely validate it within Elixir (although the XGBoost C API performs its own validation as well).

Finally, when you need to produce documentation for your modules, NimbleOptions makes it incredibly easy. Since all of my documentation was paired with the parameters themselves, my final module documentation just looks like this:

@moduledoc """
  Parameters are used to configure the training process and the booster.

  ## Global Parameters

  You can set the following params either using a global application config (preferred)
  or using the `EXGBoost.set_config/1` function. The global config is set using the `:exgboost` key.
  Note that using the `EXGBoost.set_config/1` function will override the global config for the
  current instance of the application.

  ```elixir
  config :exgboost,
    verbosity: :info,
    use_rmm: true,
  ```
  #{NimbleOptions.docs(@global_schema)}

  ## General Parameters
  #{NimbleOptions.docs(@general_schema)}

  ## Tree Booster Parameters
  #{NimbleOptions.docs(@tree_booster_schema)}

  ## Linear Booster Parameters
  #{NimbleOptions.docs(@linear_booster_schema)}

  ## Dart Booster Parameters
  #{NimbleOptions.docs(@dart_booster_schema)}

  ## Learning Task Parameters
  #{NimbleOptions.docs(@learning_task_schema)}

  ## Objective-Specific Parameters

  ### Tweedie Regression Parameters
  #{NimbleOptions.docs(@tweedie_schema)}

  ### Pseudo-Huber Error Parameters
  #{NimbleOptions.docs(@pseudohubererror_schema)}

  ### Quantile Error Parameters
  #{NimbleOptions.docs(@quantileerror_schema)}

  ### Survival Analysis Parameters
  #{NimbleOptions.docs(@survival_schema)}

  ### Ranking Parameters
  #{NimbleOptions.docs(@ranking_schema)}

  ### Multi-Class Classification Parameters
  #{NimbleOptions.docs(@multi_soft_schema)}
  """

Not only does it produce great documentation for users of the library, but it is nw logically organized and would be easy to navigate for any future developers or contributors. The use of NimbleOptions makes my code much more self-documenting, which is a great quality to strive for in a code base. I hope I convinced you to try the library out for yourself. Let me know how it goes!

Comments