@unarist Hmmm. I don't think this breaks since_id. Even if it's newly-fetched status, since_id is about chronology. "Give me toots written after this"
I agree with the concern of faking time.
The delay should not be a problem since regardless of when sidekiq executes delivery, "published" is the original created_at?
@Gargron New behavior seems correct for published order, but it's different from current one, arrived order. We can fetch complete timeline by since_id={cached_statuses.latest.id} now, but it's not guaranteed with the new behavior.
Technically, 1 msec delay of status arriving may cause losing it from timeline for apps which only use REST API (e.g. Amaroq). Also ISO8601 encoding strips msec part, so it will be past timestamp even if zero latency on delivery. The case is not rare.
cc @Clworld
@unarist @Gargron I think really big delay isn't caused by Sidekiq delay and it's caused by other functions (fetching boosted original status, search toot url, etc). I suggest arrival numbering for simple toot delivery.(Sidekiq delay is not big because it will retry only 3 times(OStatus) and 8 times(ActivityPub))
@Clworld @Gargron btw, since chronological order of lower bits of new id is not guaranteed, new status may have past id if both statuses were created in a millisecond, I think.
If so, using latest status id as since_id is still may cause fetching leakage, which many apps (incl. WebUI) does and we've provided via Link header https://github.com/tootsuite/mastodon/blob/d5091387c6ddbe03b118b0cfb6d74cf821b84fb2/app/controllers/api/v1/timelines/public_controller.rb#L48 If client wants to new statuses without leakage, they should clear lower bits before use for since_id.
@Gargron @unarist @Clworld Jumping in here: strong agree on "don't backdate IDs when they're 'normal' incoming statuses".
Redis home TL keys aren't an issue: they now have their ID as value too, and Redis sorts on key and then value. (Although it sorts on ASCII value, which means there's a minor chance of a wrong ordering in a little bit when that rolls over to another digit, but it's unlikely to bite anyone and won't happen again for decades.)
The "sequence" ending is effectively random for each millisecond (and then incremented therein), so it's somewhat unlikely that it will ever roll over in practice, but zeroing out the last 16 bits in Link headers when the most current status is included is probably safest, yes. I can investigate doing that if you'd like.
Any other concerns? 🙂
We still need something, in case multiple things happen in the same millisecond (which is increasingly likely/common). Removing it also doesn't actually get us anything: a few bits fewer, but still need to deal with biggish ints, and a potentially wrapping sequence counter. If we keep with millis, the impact is minimal and easy to address. (Using a separate table to reset that to zero every milli adds a lot of overhead too.)
@Gargron @Clworld @unarist I actually missed that specific post, but it's a common way to handle things: that's what I'd do if I had to merge lots of app servers into the database without a single serial counter. The low 16 bits let us do that (say 2**6 app servers and then 10 bits for a counter) in the future without expanding. The hashed ID is just a way of letting the DB assign things without giving away a count or requiring coordination of servers.
@Gargron @Clworld @unarist I should note that both the Instagram and Twitter models are only serializable between milliseconds, not overall: if two different processes handle messages in the same millisecond, they will be out of order by ID. This is basically standard. We can handle it from the Link headers by just zeroing out the last bits with effectively no ill effects.
@Gargron @Clworld @unarist Oh, sorry, yes, if you narrow to one shard on the database. However, note that their low bits still wrap around, because they do a modulus on a Postgres sequence. There's no good way to reset that to zero every millisecond in Postgres.
(I maintain that this isn't a problem, and if two statuses that arrived in the same millisecond show up out of order, nobody will notice/care. We just have to make sure they aren't skipped.)
@Gargron @aschmitz @unarist I guess what it means:
Current snowflake id which Mastodon uses does not sequential within msec.
So, Don't return same msec status may cause lack of status within msec when some cases in paging.
Returning same msec status help clients for getting all status, but it causes duplication of status when there is statuses which created in same msec.
So, client should perform dedup and ignore statuses already got when calling REST API.
@Gargron @aschmitz @unarist
There are 2 problems.
1. Id non-sequentiality within msec. It causes lack of status when getting new status.
2. Redis key precision problem
Additional filtering for redis result can solve 2, but can't solve 1.
Returning same msec statuses at range border can solve both 1 and 2, but It causes status dup rarely and client should dedup them.
Public page should do some additional filtering for dedup (I think public page does not have 'read new' function).
@Clworld @unarist @Gargron Proposed fix returns statuses with IDs:
> since_id & ~0xffff
and
< max_id | 0xffff
But removes statuses with IDs of since_id or max_id.
In the (somewhat uncommon) case two toots occurred in the same millisecond, this will return a spurious duplicate that the client will have to dedup (or deal with).
@aschmitz @unarist @Gargron Note: We should taking care about API should return statuses as many as DEFAULT_STATUSES_LIMIT when there is more statuses in range when filtering redis result. Returning statuses count < DEFAULT_STATUSES_LIMIT has meaning there is no more statuses in requested range. When removing max_id and since_id, 2 more statues we should get from redis.
@Clworld @unarist @Gargron I made some notes to move this off a super-long thread: https://github.com/tootsuite/mastodon/issues/5228
@aschmitz @Gargron @Clworld I like option 4 too, although we should care that it only skip *recent* millisecond. If the latest status has been created in enough time before, it shouldn't be skipped. We may want to determine "most recent millisecond" in the DB, since DB and Rails can be running on a different server.