# Node Reputation Scoring

Each incoming web transaction will be graded on the following characteristics to ultimately assign each Grass node a reputation score.

**Completeness**: This aspect evaluates whether the data is whole or if there are missing elements. It assesses if the data set includes all the necessary data points for the intended use case.**Consistency**: This dimension checks for uniformity in the data across different data sets or within the same data set over time.**Timeliness**: This measures whether the data is up-to-date and available when needed.**Availability**: This assesses the degree to which the data from each node can be made available.

To express the evaluation of incoming requests based on the described characteristics and to ultimately assign each node a reputation score, we can define a formula that incorporates each of these dimensions. Let's denote the reputation score as *R*, and the characteristics as follows:

*C*for Completeness*N*for Consistency*T*for TimelinessA for Availability

Where:

$w_C$, $w_N$, $w_T$

*,*$w_A$ are the weights assigned to Completeness, Consistency, Timeliness, and Reliability, respectively and are used to normalize reputation score.

Each variable described:

$R$: Reputation Score - The final score assigned to each node, calculated based on the weighted sum of the four characteristics.

$C$: Completeness - Evaluates if the data set is complete, including all necessary data points.

$N$: Consistency - Checks for uniformity in the data, ensuring consistency across different sets or over time.

$T$: Timeliness - Measures if the data is current when needed.

$A$: Availability - Assesses the likelihood that data will be available from each node.

$w_C, w_N, w_T, w_A$: Weights - These are the relative importance assigned to each characteristic, determining how much each aspect influences the final reputation score.

### Dynamic allocation of weights:

$w_C = 0.5$ (Completeness, $C$, is highly prioritized).

For a time period $t_1$to $t_2$:

$\textbf{N}_{t_1, t_2}$is the set of all observed values of $N$from each active node in the time period.

$\textbf{T}_{t_1, t_2}$is the set of all observed values of $T$from each active node in the time period.

$\textbf{A}_{t_1, t_2}$is the set of all observed values of $A$ from each active node in the time period.

$\sigma (\textbf{N}_{t_1, t_2})$is the standard deviation of $\textbf{N}_{t_1, t_2}$; it measures how much variability there is in consistency across all active nodes during the time period $t_1$to $t_2$.

$\sigma (\textbf{T}_{t_1, t_2})$is the standard deviation of $\textbf{T}_{t_1, t_2}$; it measures how much variability there is in timeliness across all active nodes during the time period $t_1$to $t_2$.

$\sigma (\textbf{A}_{t_1, t_2})$is the standard deviation of $\textbf{A}_{t_1, t_2}$; it measures how much variability there is in availability across all active nodes during the time period $t_1$to $t_2$.

The remaining total weight of 0.5 is distributed across $w_N, w_T,$ and $w_A$linearly according to their standard deviation.

Last updated