Project Information
- Repository: https://github.com/steemit/hivemind
- Project Name: Hivemind
- Publisher: Steemit inc.
- Related issue at Github: https://github.com/steemit/hivemind/issues/191
Problem
Hivemind backed api.steemit.com
reports invalid/missing following data for some of the accounts. (In comparison to a full node)
How to reproduce
- Query the user
curbot
's following list. (condenser_api.get_following
)
curl -s --data '{"jsonrpc":"2.0", "method":"condenser_api.get_following", "params":["curbot",null,"blog",100], "id":1}' https://api.steemit.com
- Do the same query on a full node: (https://rpc.usesteem.com)
curl -s --data '{"jsonrpc":"2.0", "method":"condenser_api.get_following", "params":["curbot",null,"blog",100], "id":1}' https://rpc.usesteem.com
You can see the response is different and incomplete in api.steemit.com.
.
A Python script the detect discrepancies
I believe this is not an exceptional case. I have seen more discrepancies like that while trying to test/benchmark the tower's new endpoints.
This Python script detects discrepancies on follower lists.
from steem import Steem
from steem.account import Account
def get_diff(account):
followers_on_hivemind = Account(
account,
steemd_instance=Steem(
nodes=["https://api.steemit.com"])
).get_followers()
followers_on_full_node = Account(
account,
steemd_instance= Steem(
nodes=["https://rpc.usesteem.com"])
).get_followers()
print(
"Accounts listed on api.steemit.com but not in the rpc.usesteem.com")
print(set(followers_on_hivemind).difference(set(followers_on_full_node)))
print("*" * 42)
print(
"Accounts listed on rpc.usesteem.com but not in the api.steemit.com")
print(set(followers_on_full_node).difference(set(followers_on_hivemind)))
The result for @emrebeyler
's followers:
Accounts listed on api.steemit.com but not in the rpc.usesteem.com
set()
******************************************
Accounts listed on rpc.usesteem.com but not in the api.steemit.com
{'hariyati.amin', 'curbot', 'kenzyobiadi', 'erhanbute'}
After some digging, I have found a rare case on a differently formatted custom json.
For example, I have checked the account history of curbot
that when he exactly followed my account, and found this transaction:
Transaction ID: aaccccb73b6dfcb4bbf95f6d2dcb76e1c87137e9
Looks like curbot
was bundling follow operations into one transaction. And steemd picked up these and registered as valid follow actions.
However, hive's indexer ignores the custom_json
op if loaded json's length is greater than 2.
For this case it's greater than 2 because the format is like:
[
['follow', {
'follower': 'curbot',
'following': 'kevinwong',
'what': ['blog']
}],
['follow', {
'follower': 'curbot',
'following': 'nothingismagick',
'what': ['blog']
}],
['follow', {
'follower': 'curbot',
'following': 'simnrodrguez',
'what': ['blog']
}],
['follow', {
'follower': 'curbot',
'following': 'steem-ua',
'what': ['blog']
}],
['follow', {
'follower': 'curbot',
'following': 'decentraland',
'what': ['blog']
}],
['follow', {
'follower': 'curbot',
'following': 'mikepm74',
'what': ['blog']
}],
['follow', {
'follower': 'curbot',
'following': 'empath',
'what': ['blog']
}],
['follow', {
'follower': 'curbot',
'following': 'emrebeyler',
'what': ['blog']
}],
['follow', {
'follower': 'curbot',
'following': 'eroche',
'what': ['blog']
}],
['follow', {
'follower': 'curbot',
'following': 'ervinneb',
'what': ['blog']
}]
]
This explains curbot
.
Regarding my other 3 missing followers:
Follower | Following | Tx ID | Block num | Timestamp |
---|---|---|---|---|
erhanbute | emrebeyler | d10dcd1bdb661fc4e63f2464fa2262624db5d003 | 26710986 | 2018-10-11T09:55:21 |
kenzyobiadi | emrebeyler | 9ef235eb36aac5e466b97ad3e459b7eb9495f898 | 26492393 | 2018-10-03T19:38:45 |
hariyati.amin | emrebeyler | 383a36f7aa65724eb634ebdae141366674dc1df8 | 26450469 | 2018-10-02T08:41:33 |
Timestamps suggest that it happened between 2018-10-02
a 2018-10-10
. These transactions don't involve anything unusual.
Additionaly, I have checked roadscape
's followers on Steem:
Got this discrepancies:
{'curbot', 'kamvreto', 'msutyler'}
We know the problem w/ curbot
so I have checked the other accounts.
For the kamvreto
, they followed roadscape
at 2016-07-25T22:35:12
.
Here is the account history output:
{
'trx_id': '2b7595b1f3e0e0105156d518b83d7eeaa19b6070',
'block': 3514062,
'trx_in_block': 3,
'op_in_trx': 0,
'virtual_op': 0,
'timestamp': '2016-07-25T22:35:12',
'op': ['custom_json', {
'required_auths': [],
'required_posting_auths': ['kamvreto'],
'id': 'follow',
'json': '{"follower":"kamvreto","following":"roadscape","what":["posts","blog"]}'
}]
}
It was a legacy custom_json transaction. The tricky part is that transaction's what
property includes two elements.
You can see the Follow constructor expects one element:
https://github.com/steemit/hivemind/blob/60dc61ee4bbde2080421a3fdf10c5b83be840e8b/hive/indexer/follow.py#L71
For this reason, Hive also ignores that.
The problem is same with the other missing follower of roadscape
:
{
'trx_id': 'c7694ff17ba7ba3fbe1740f05c2727ecbd98cd62',
'block': 3409232,
'trx_in_block': 1,
'op_in_trx': 0,
'virtual_op': 0,
'timestamp': '2016-07-22T06:18:27',
'op': ['custom_json', {
'required_auths': [],
'required_posting_auths': ['msutyler'],
'id': 'follow',
'json': '{"follower":"msutyler","following":"roadscape","what":["posts","blog"]}'
}]
}
Expanding the sample size:
Discrepancies on @utopian-io
's followers:
Accounts listed on rpc.usesteem.com but not in the api.steemit.com
{'qawazd', 'steemgems', 'curbot'}
Follower | Following | Tx ID | Block num | Timestamp |
---|---|---|---|---|
steemgems | utopian-io | 25e9c3d8e625e634b68bd5e16e99327fd37174ae | 26722368 | 2018-10-11T19:25:27 |
qawazd | utopian-io | 8de43899a8ad84b8bd65a896e71e3e0eafda0757 | 26838941 | 2018-10-15T20:37:51 |
Follow operations are valid. Dates are close to what we miss at @emrebeyler's account: 2018-10-11
and 2018-10-15
.
TL;DR
We have missing follow ops on api.steemit.com's hive instance. (Generally clustered around the month
2018-10
.)Hive ignores if the follow operation includes multiple follows. (steemd accepts it. The case with the @curbot)
Hive ignores some legacy follow operations. Because, these ops may include two elements in the
what
property. (Ex:["posts", "blog"]
)