Repository
https://github.com/steemit/steem
Introduction
HF20 removed the so-called "vote dust threshold" for a better user experience.
"If a vote is placed that is below the required threshold, it will be rejected by the blockchain. This can create a bad user experience for new users, as their votes can fail for seemingly no reason. ...
In hardfork 20, this “vote dust threshold” will be removed. After this change users with any amount of SP will be able to cast votes so long as they have sufficient bandwidth. Votes that are below the threshold will be posted to the blockchain but will have no impact on rewards. This will allow users to have a better user experience on all Steem-based applications by enabling them to vote whenever they want to (as long as they don’t exceed their generous bandwidth allocation), without adding to the computational load on the blockchain by requiring that it calculate the impact of effectively powerless votes on the rewards pool. ..."
@steemitblog/hardfork-20-velocity-development-update
But now we all know that a real bad user experience is that a user cannot write.

a typical user with generous starting 15 SP who can make only 1 comment, but 20 votes).
- Required RC for a comment is much higher (about 14 times) than for a vote. Even if they can vote, it only makes them need to wait longer to write something.
- If they figure out that their vote doesn't count at all, it might be an even worse user experience for both voters and authors.

a post receiving so many dust votes. Do you like this if you're the author? Probably not.
Moreover, dust vote can also create a burden on the Steem blockchain. Furthermore, this can be a low-hanging attack point. Suppose tons of accounts do dust votes. (Currently even an account with 0 SP can vote.) Then it's possible that an entire block can be filled with dust votes. Vote dust threshold may need to be reintroduced for both user experience and blockchain health.
Scope
- I analyzed fine details of votes on "almost" all posts from 8-15-2018 to 12-14-2018 (HF20 is on Sep 25/26th).
- Total 1.1 million posts
- 675K posts after HF20 that have valid voting data (out of 957K all posts, see below.)
- 434K posts before HF20 that have valid voting data (out of 806K all posts, see below.)
- Total 62 million votes (38 million after HF20)
The reason of "almost": this kind of analysis cannot (and should not, in my opinion) be done by using Steem API. But unfortunately, @steemsql sometimes has active_votes
field in comments
table that is still pending, which means they don't have finalized voting data. So I filtered them out. (Note that TxVotes
cannot be used since it doesn't have rshares information.) Since they are still too many, the filtering provided some natural random sampling (55~70% coverage, actually this is way more than enough. If you wanna do the same analysis, you definitely wanna use a much smaller sample. Parsing active_votes and categorizing each vote takes quite a time.)
Technical Background
This analysis needs some technical background. Obviously, what we need is how to determine dust vote. You may think you know how to do this, but you may miss something.
Dust vote: a vote that has lower than 50,000,000 rshares (defined as STEEM_VOTE_DUST_THRESHOLD
in the code below)
How are dust votes handled?
The vote is still accepted, but any vote is deducted by the threshold. Thus, a vote less than the threshold is set to have zero rshares.
Here is the code that removes dust votes:
How to find dust vote? Be careful. It's not just a vote that has 0 rshares.
It's tempting to find any votes that has zero rshares. But this is wrong. Any late vote comes after payout also has zero rshares, too! So they need to be separated. I also counted downvote that has negative rshares separately. This entire process takes lots of time, mainly because active_votes are a text field in steemsql. So, you first need to parse them into json, and to determine whether it's late vote or dust vote. And to determine if it's dust or late, you need to compare the post created time (created
) and the voting time (time
in active_votes
), which is again a string.
Results
Dust votes are increasing
- Dust votes are increasing.
- Late votes and down votes do not have clear trend.
Dust votes are clearly increasing. You may think it's because the total number of votes is increasing. If you think so, you're not an active Steemit user :) Where have you been?
Ratio of dust vote is also increasing.
- Dust vote ratio is increasing. Over the last 2.5 months, dust vote ratio is increased by 150% (from 0.5% to 1.25%).
The total number of votes (red with the right axis) is decreasing, unfortunately, as you can even feel if you're an active user. As a result, the ratio of dust votes to total votes is also increasing. - Right after HF20, dust and late vote ratios are quite high.
Note that the spikes of dust and late vote ratios are not incorrect. Come on, you already forgot that interesting moment that even Dolphins cannot write and most cannot even vote? Even if they were able to vote, due to the voting power reset, much more votes were dust than usual. And so many people voted posts that they liked but too late.
Wait a sec, how was it before HF20?
Very good question! First of all, again there was vote dust threshold, so any vote below the threshold wasn't even accepted, i.e., not on the blockchain. Thus, strictly speaking, we cannot compare two periods the same way.
But, don't give up. One thing we can do is to find vote that has less than rshares of the threshold, STEEM_VOTE_DUST_THRESHOLD 50,000,000
. What's the problem of this? Obviously we cannot help but underestimating dust votes, since there must be many dust votes rejected. But at least this helps seeing the trend.
- The trend of dust votes clearly changed after HF20.
Again, the small number of dust votes itself could be mainly due to the threshold, so let me explain further with the ratio graph.
Dust vote ratio was quite stable before HF20 but increases after HF20
Despite the impossibility of the fair comparison, one thing for sure is the ratio was stable before HF20. How many votes were reject due to the threshold? We don't know, since they are not on the blockchain. But I believe that they are not so many. I also tried with higher threshold, but it doesn't change the trend of the ratio before HF20.Down vote and late vote ratios are pretty stable before and after HF20.
Because there were no policy change about them.It clearly shows the downtime at the HF20 and another before HF20.
No dust vote ratio spike in the first downtime perfectly makes sense, since there was no voting power reset as opposed to HF20's. So there is only late vote ratio spike. The reasons behind the spikes were already explained in the graphs for post-HF20 only.
Conclusion
In contrast to its original intention, removal of vote dust threshold doesn't seem to make a better user experience. If they know that dust vote doesn't increase any payout but still uses voting power and RC, both voters and authors will have a bad user experience. Dust votes are increasing since HF20, and this may be a real concern for the Steem blockchain in the future. Therefore, vote dust threshold may need to be reintroduced for both user experience and blockchain health.
"I can't write!"
"You can still vote."
"Thanks! Just voted your kind reply."
"Gotcha! You now wait even more to write."
"???"
Tools and Scripts
- @steemsql: query the raw data.
SELECT author, permlink, total_payout_value, CAST(active_votes AS TEXT) AS active_votes, created
FROM Comments
WHERE DEPTH = 0
AND created < last_payout
AND created BETWEEN '2018-9-26' AND '2018-12-14'
ORDER BY created ASC
You may wanna query by dividing the time range and process them separately and combine them. (It's a huge data) Note that 2018-09-26 is the date of HF20.
- Python: analyze
active_votes
import swifter
from dateutil.parser import parse
from datetime import datetime, timedelta
def categorize_vote(x):
votes = json.loads(x['active_votes'])
x['num_votes'] = len(votes)
num_dustvotes = 0
num_downvotes = 0
num_latevotes = 0
for v in votes:
ts_voted = parse(v['time'])
if (ts_voted - x['created']).total_seconds() > 7*24*60*60:
num_latevotes += 1
continue
rshares = int(v['rshares'])
if rshares < 0:
num_downvotes += 1
elif rshares <= 50000000: # this code is for pre-HF20, change this to rshares == 0 for post-HF20, better to run two periods separately, it's too big otherwise.
num_dustvotes += 1
x['num_dustvotes'] = num_dustvotes
x['num_downvotes'] = num_downvotes
x['num_latevotes'] = num_latevotes
return x
df['created'] = df.index
df = df.swifter.apply(categorize_vote, axis=1)
Note that this is a large data set, so you may want to use some parallelization. In my case, swifter
worked well. It's simple to use, just put swifter
between df
and apply
, that's all :)
Relevant Links and Resources
Original intention of removing vote dust threshold
Two posts that are useful to understand rshares better (yes, they're mine :)
- Effect of haircut, early voting, beneficiary on dust payout by @blockchainstudio
Note that 'dust payout' and 'dust vote' are totally different dust :) - 100% SP vs 50:50 Which one is better? Does the haircut matter? by @blockchainstudio
Proof of Authorship
While this is my first utopian article using large-scale data (finally I opened the hell gate and got @steemsql access :), here are my other recent utopian articles in addition to the two others.
- [Bug Fix - Merged and Live!] Finally, Busy can edit posts older than 7 days! If you're using Busy and editing old posts, you owe me :)
- busy feed/blog/replies/follow bugs due to API no longer supported
- Why SBD print rate is still 1% despite the haircut? Bug report, explanation, and suggestions
ps. The reviewer @abh12345 let me know there was prior analysis on dust vote, and to me it seems this:
- Don't cast worthless votes - zero-value votes in HF20 by @crokkon and another cited herein.
I really like @crokkon's posts and my main contribution is showing the increasing trend and explaining it in terms of UX/vulnerability. Hope Steemit to think about this problem seriously when they have more resources :) Thanks.