You seem to be arguing against something that the article doesn't claim. The article isn't equating inactivity and fake/spam, but that: of the accounts that actively send tweets ~20% are fake/spam.
Sure that's a different question from what proportion of all users are fake/spam, but this is still a perfectly valid question to ask, and the fact that they're only considering active users is in the title so I really don't get your complaint.
If you want an analysis that attempts to answer a different question go find or write one that addresses the question you want answered...
The article clearly states (emphasis mine):
> This represents the largest set of accounts on Twitter we could acquire, but it includes analysis of many older accounts that haven’t sent tweets in the last 90 days and thus, likely don’t fit Twitter’s definition of mDAUs (monetizable Daily Active Users).
From the linked Twitter earnings report:
> We define monetizable daily active usage or users (mDAU) as Twitter users who logged in or were otherwise authenticated and accessed Twitter on any given day through Twitter.com or Twitter applications that are able to show ads.
EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.
That edit was made 40m before you joined the conversation. Noting your edits is a social convention and voluntary concession offered by a posts' author to validate replies that were made before the edit, while clarifying the authors intended message for future readers. If those future readers use the content of the edit message to shallowly refute the post, consider the incentive this creates to not follow that convention for all authors in the future. If you have a valid refutation, surely you can find evidence for such in the body of the message rather than nitpicking the edit history.
I think you misunderstood their response. They are saying that the study has an unusual definition of "active", and that your need to clarify the definition proves that it is unusual.
Though personally I think filtering specifically for users that actively send tweets makes sense, since that's really what matters when it comes to measuring how healthy and authentic the discourse is
It seems like everyone is arguing about different metrics and it makes more sense to discuss different, specific measures that might fall into a range of behaviors that are "active" in some sense rather than focusing on which definition of "active" is somehow the best one.
What would be more interesting would be to adapt this and answer several different questions about the proportion of spam among accounts with different metrics of activity to see how things change. For example, does the percentage of spam accounts go down a lot if we lower the bar for "active"? How much & how fast?
Twitter's quarterly earnings define active users thusly:
> Twitter defines monetizable daily active usage or users (mDAU) as people, organizations, or other accounts who logged in or were otherwise authenticated and accessed Twitter on any given day through twitter.com, Twitter applications that are able to show ads, or paid Twitter products, including subscriptions.
I'm pretty sure I've heard a similar definition from Facebook.
This definition supports g-clef's critique that the article picks an unorthodox way to measure active users, resulting in an inflated percentage of accounts being measured as spam/fake accounts, vs what the percentage would be if measured against Twitter's definition of 'active', which includes lurkers.
Strange rant. It's not about you editing your post in general. It's that your edit shows that saying "active accounts" when you really mean "accounts that have recently tweeted" is wrong, like the very title of this submission.
Look, there are dozens of potentially interesting and valuable questions to ask on this subject. Answers to which may produce a wide range of insights and conclusions. And there's a whole potential conversation about which questions are most important, that may have different answers depending on the context.
But there's no reason to pin the whole frame of the conversation to the one question for which Twitter corporate chose to publish an answer, unless the only question we are interested in is "did Twitter technically lie" which is the most uninteresting question in this whole situation. If this is the sole context you are using to frame this issue then maybe you should consider if you're following the current news cycle a little too closely.
The idea that there is such a thing as an 'inaccurate definition of active' is silly.
In the light of Musk's statements, which presumably precipitated this timely article, I would say the question of whether Twitter technically lied is the most important question for Musk doing the things he does.
If you're more interested in Twitter's ecosystem as a whole, it is less interesting.
At every company I've worked at any time someone has asked "How many active users do we have?" it was a difficult question to answer because everyone's idea of "active user" was different.
"Active, as in logs in regularly? Wait, what is 'regularly'? Once a week? Once a month? Every day? Does 'active user' mean, online right now?"
Etc, etc...
Their definition of "active user" is relative, not inaccurate.
> I don't know, that seems more interesting than most questions that could be asked about Twitter.
Why? Twitter is a for profit corporation. If, on the balance, lying serves their interests (I'm sorry, I meant "is consistent with their fiduciary duty to their shareholders") more than edging up to the line without crossing it, that's what they will do.
Even the watchdog organizations such as the FTC and SEC that police the speech of corporations more or less limit themselves to material statements that move markets or influence consumer behavior in ways that can be considered fraudulent. The FTC, FDA, and others are concerned with a fairly narrow reading of consumer harm, the SEC is motivated by the health and trustworthiness of the public market. In any case, there pretty much always has to be some sort of alleged harm. Lying per-se is hardly ever forbidden. So if the advantages of a lie outweigh the (risk adjusted) penalties and reputational risks, that's that.
I think a conversation about what ways we expect and permit corporations to lie, either specifically in financial statements or to the general public, is much more interesting than a discussion of exactly how many fake tweets there are and exactly how many accounts are making them, though I guess you could construe that as broadly part of the same conversation.
> I think a conversation about what ways we expect and permit corporations to lie, either specifically in financial statements or to the general public, is much more interesting than a discussion of exactly how many fake tweets there are
I agree, that would also be a much more interesting conversation than "did Twitter technically lie."
Sure. But if I'm looking to purchase Twitter, I think I'd be much more interested in and concerned about this "white" lie than you are as a general consumer.
I think it's pretty easy to argue that their definition is intentionally misleading, which may not be technically inaccurate, but is arguably just as bad.
The big story in the news last week was "Elon Musk says deal on hold while verifying twitter's 5% Monthly Active Users stat", or something to that effect.
That's the context this article was published in. It is transparently obvious they are re-using the word "active twitter accounts" to cause confusion with the definition of "active" that has been being bandied around. The post is using such a title as a clickbait, to hop aboard a trend.
I think the title, and lack of significant clarification in the article, make it clearly misleading, and I don't think pedantic "well technically active can have multiple definitions" changes the reality of the situation meaningfully.
> Why? Twitter included lurkers in its dataset, this article didn't, why should that impact stats in the direction of fake accounts being smaller?
Because you usually don't create fake accounts to lurk, but to do "something".
I'm speculating, but even when you create bots to boost follower counts you'd probably make them post now and then so as to seem "active".
It makes sense that the proportion of tweeting accounts being bots is much higher than the proportion of lurkers. And since there are also more lurkers in turn than posters, I would say that the real number is much lower than that.
I don't buy the speculation as obviously accurate.
Let's say I own a twitter bot farm. I make 20k accounts, have a system setup that logs into each of them from a unique IP each month at random times to make sure they're not banned yet, and advertise it out. On month 1, someone buys 1000 of them as followers. On month 2, someone buys 1000 of them to tweet spam. etc etc.
Each month, there's 20k active bot accounts (logged in to verify they weren't banned). Only a small number may actually tweet though since buyers may have not gotten them yet. Bot accounts lurk too, for months on end, before ever acting.
I'm not claiming this is accurate, but I am claiming this is a reasonable alternative which doesn't align with the view of bot accounts being more prevalent in tweeting accounts than lurking accounts.
A metric they've artificially inflated by gating tweets, which works to their advantage when calculating spam. With that in mind, I think I'm more inclined to look at spam as a percentage of active tweeters and ignore lurkers.
I thought the parent was criticizing Twitter's active monthly user definition, which only includes people who have tweeted in the past 90 days. The article used this definition of active use as well.
Twitter requires users to log in before lurking so their definition of activity is intentionally selective. I'd be surprised if Twitter doesn't know how active their users actually are, even the lurkers.
I read lots of tweets and don't have a Twitter account, or at least one that I've logged into in the last 10 years... The philosophical question seems to be, "am I a Twitter user"?
You could probably argue that most of the world read Twitter and hence are users, account or none. It's that pervasive.
But then there's the next question: "am I a user that reportedly matters to Twitter's business?". What people are trying to land on, in light of Elon's tweet that the deal is on hold pending investigation of Twitter's metrics reporting, seems to be a framework for carving out what exactly constitutes a user that brings the platform revenue that shows up in quarterly reports and hence would directly relate to the tangible value of the enterprise.
In reality, nobody knows what numbers are being thrown around behind closed doors. This article is just one framing.
It's not an active account if by "active" they mean "generating content". While Twitter isn't a typical content aggreagation site like Youtube or Reddit, tweets are still "content" in the sense that they drive further user engagement on the site.
Words used to mean things. The current HN submission title just says active, heavily implying accounts with any kind of activity (eg. like, follow/unfollow), not "users who Tweet".
Sure, clickbait headlines are the norm and the devil lurks in the details, but still, many comments have been spent on this, because it's clearly misleading.
~80% of email is spam, it doesn't surprise anyone, because it's so cheap to send spam. Similarly it's easy to create fake accounts and spam, yet it doesn't mean much.
Who's counted as "engaged"? The people reading, or only the people writing? More to the point, if Twitter moved to a subscription model, would zero lurkers buy in?
Seems like social network aren't interested in counting those that don't use all potential features of the platform. I'd say a lurker/ghost member is definitely an active account.
I would say that if someone is able to be advertised to (since that is what makes the business money) then they should be counted. So yes, there should be no requirement to tweet to be counted.
"Active user" is a common industry term with a well-defined meaning. It's misleading to use it to mean something else, particularly when there are a number of more appropriate choices, e.g. "20% of Twitter posters".
The article clearly defines those accounts as "active" because it's the only way an external observer can somehow isolate an "active" group. Only twitter can know how many users are "lurkers".
And since they are trying most probably to get some PR for their company, they use their specific definition of "active Twitter account".
When you are in the context of :
- Twitter determine the active status of an account using login
- People are wondering the % of active users as defined per the twitter metrics
But then use your own definition of active and write only a one liner on the difference with no reflection on the impact it might have and no warning on the fact you are answering a different question.
Then my conclusion is you want people to make this mistake.
> EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.
Made me laugh because you had to add it and made more effort than the author of the article to prevent the confusion :D.
Interesting. This could be a bracketing error, because I read
> it includes analysis of many older accounts that haven’t sent tweets in the last 90 days and thus, likely don’t fit
> Twitter’s definition of mDAUs (monetizable Daily Active Users)
As implying that they think accounts that haven't tweeted in the past 90 days don't fit Twitter's mDAU definition. Given the placement of the qualifying phrase, I think that's a reasonable parsing of the sentence, but I see your point that they could be trying to imply their set doesn't fit the definition. If so, that sentence is very badly constructed.
The full quote doesn't do SparkToro and Followerwonk any credit:
> Followerwonk selected a random sample from only those accounts that had public tweets published to their profile in the last 90 days, a clear indication of “activity.” Further, Followerwonk regularly updates its profile database (every 30 days) to remove any protected or deleted accounts. We believe this sample is both large enough in size to be statistically significant, and curated to most closely resemble what Twitter might consider a monetizable Daily Active User (mDAU).
The fact that they don't even consider the concept of a non-tweeting lurker to be an mDAU brings their entire analysis into question. Let's face it - Twitter is an emotionally-charged enough place, and tweets have such a way of living forever and being taken out of context, that there are many who use it to consume (and perhaps Like) content but will not tweet publicly. These people are still viewing and engaging with advertisements! Twitter absolutely should consider them monetizable!
But of course, engagement data on lurkers is internal only, and Likes data counts against global API caps: https://developer.twitter.com/en/docs/twitter-api/tweets/lik.... Which means that SparkToro and Followerwonk are incentivized to ignore these users. That they do ignore them, and don't address it anywhere in their methodology, is highly suspect.
The article is just clickbait. The title is obviously clickbait (based on your edit you've realized that "active account" !== "accounts that tweet"). Then they try to define active account:
> “Spam or Fake Twitter accounts are those that do not regularly have a human being personally composing the content of their tweets, consuming the activity on their timeline, or engaging in the Twitter ecosystem.”
Ok, but "consuming the activity on their timeline" is essentially unknowable outside of Twitter, since you can't see what tweets people are viewing. It turns out they're trying to infer this through some other signals like follower count, etc. But you can imagine why that might be sketchy.
Then they constrain the analysis:
> A more fair assessment of Mr. Musk’s Twitter following would only include accounts that have tweeted in the past 90 days
Let's be real, if you look at a list of Elon tweet replies, they might as well all be spam. Just search @elonmusk and sort by latest. Then compare that to the sorted tweet replies under an actual tweet. IDK how many millions of dollars and man-hours went into the AI that sorted this list, but it seems to just be putting the blue checks at the top and shrugging at the rest. I doubt this three man team is doing any better at spam detection.
For manipulation / spam purposes I don't really care about accounts that don't actively post/like/retweet/follow. The mDAU isn't useful at all for determining if the activity on Twitter is done largely by bots.
I do wonder how "fake" is calculated. Is @tweetsfrommydog fake? It's a real person making tweets that are funny and provide value to the platform, but it's not a real person as an individual tweeting their personal thoughts, are corporate accounts or parody accounts fake?
It is valid criticism because the context of this article is that Elon Musk wants to know whether Twitter's own claims of ~5% fake/spam accounts is accurate. We do really want an analysis that investigates that precise question and not a related one.
According to Matt Levine, that's "not how any of this works". The $1B is if he could not secure financing, but it appears we are now past that point. The relevant question is whether the Twitter board wants to sue in court to compel a sale.
Given what Musk does to the personal lives of his opponents, I'm not sure I would want to fight him. But given how many laws and rules he's broken at the point, I think there is a clear failure of justice if he can just do whatever he feels like without repercussions due to his common popularity.
Sure that's a different question from what proportion of all users are fake/spam, but this is still a perfectly valid question to ask, and the fact that they're only considering active users is in the title so I really don't get your complaint.
If you want an analysis that attempts to answer a different question go find or write one that addresses the question you want answered...
The article clearly states (emphasis mine):
> This represents the largest set of accounts on Twitter we could acquire, but it includes analysis of many older accounts that haven’t sent tweets in the last 90 days and thus, likely don’t fit Twitter’s definition of mDAUs (monetizable Daily Active Users).
From the linked Twitter earnings report:
> We define monetizable daily active usage or users (mDAU) as Twitter users who logged in or were otherwise authenticated and accessed Twitter on any given day through Twitter.com or Twitter applications that are able to show ads.
EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.