Invitation for technical discussion on next-generation Thunderbird (Semantic Desktop: Pyramid of Users including for Privacy)

Paul Fernhout pdfernhout at kurtz-fernhout.com
Sun Apr 30 17:01:40 UTC 2017


On 28.04.2017 09:49, Gervase Markham wrote:
> On 25/04/17 04:26, Paul D. Fernhout wrote:
>> So, for me, a good assumption is that people will eventually have
>> billions of message items comprising terabytes of data in their local
>> (shared-with-email) store.
> 
> I don't think that's a good assumption at all, because the "delete"
> button exists. Who has access to their text messages from 10 years ago?
> I think I do, but then I'm a data migration geek. I doubt one person in
> 100 does. People don't want to manage this much data.

It would be helpful to clarify what you mean by "People"? All people? 
Almost all people? Most people? Some people? A few people?

For example of why such clarity matters, one of the biggest mistakes 
Mozilla made in the Firefox UI redesign was misunderstanding the pyramid 
of users of any application. That was a mistake which is part of falling 
Firefox marketshare and likely will result in upcoming massive 
(negative) changes for Mozilla as search revenue contracts dry up. 
Ideally Thunderbird would not repeat that same mistake.

Essentially, in very round numbers, there is a pyramid of users for any 
major extensible application:
* <0.1% of an apps user base are the developers
* 1% of the user base is power users who customize the app for 
themselves and makes plugins and answer deeper questions and strongly 
advocate for it; they also test heavily thorugh their own use, report 
bugs, and may make some improvements to the core
* 9% of users are fairly savvy and also answer simpler questions and use 
at most a few of the customization features
* 90% of users just use it out of the box because it is popular and well 
maintained

One of Mozilla's major mistakes with Firefox (in order to chase Chrome's 
user base) was to say, essentially, we don't need to support the 1% of 
users who want to customize it and heavily use it in new ways. Mozilla 
decided to focus on the 99% of users because that is easier and 
customization just causes confusion for some users and supporting all 
the legacy customizations of the 1% would be hard (and Mobile and IoT 
were the new shiny things money went into instead). And guess what -- 
that 1% of Firefox power users got really annoyed and started leaving 
the platform (at least emotionally). And it turns out, there goes all 
the volunteers who who had made Mozilla a success in the first place by 
evangelizing Firefox and testing it and reporting bugs and creating an 
ecosystem around it. For example, I used to install Firefox for family 
and friends on desktops -- now I just tell people to buy a Chromebook.

TB:NG is facing similar design decisions right now. Is TB:NG for the 
Thunderbird power users who care about maintaining Thunderbird some of 
whom have archives of millions of messages? If we decide not to support 
power users, then why bother with doing more than reskinning Nylas Mail 
2.0?

To make a point, this is my first time posting to this list from ISP's 
RoundCube webmail from my Chromebook instead of Thunderbird (although I 
have used that webmail service on-and-off for years from laptops, BCCing 
myself like now so Thunderbird POP on my desktop eventually gets a copy, 
given Thunderbird's lack of a server component). RoundCube works for 
email. It's a feasible solution for email for most people -- even if it 
is not everything I want.

=== Deletion takes energy

Deletion is management! If you want to make things easy for users, 
either delete everything automatically on a schedule as it comes in or 
delete nothing ever. Deciding what to delete selectively is a very hard 
problem involving a lot of human effort and anxiety. To what value when 
disk is so cheap?

That's probably one other reason you still have all those chat messages 
-- it would take more effort and emotion to delete most of them than it 
does to just keep them all and search them locally when you want.

And further, as Pat Helland wrote of append-only big data but applies as 
well to local data, given systems that delete stuff are more fragile:
"Accountants Don’t Use Erasers"
https://blogs.msdn.microsoft.com/pathelland/2007/06/14/accountants-dont-use-erasers/

I guess I'm naturally biased towards preservation rather than deletion. 
I'm also a trustee for my historical society trustee -- and even as we 
keep up some old buildings, I'm concerned about the loss of our digital 
history as well and the fact that almost no one seems to care about it. 
Yet, the 1% who do care (often not the financial 1%) can make a big 
difference for the community.

Theodore Sturgeon (who wrote the Skills of Xanadu previously mentioned 
which inspired Ted Nelson and Hypertext and so on) also coined a law: 
"ninety percent of everything is crap".
https://en.wikipedia.org/wiki/Sturgeon%27s_law

I won't disagree that, say, we could make the web itself a lot easier to 
use (and more accurate) for most users by deleting 90% of it. :-) But 
rather than censorship, we use search engines and social processes and 
semantic tagging and other tools instead. By not deleting the 90% of the 
web most of us don't need, we avoid deleting the 10% (or less) of 
content that really matters if we had guessed wrong. TB:NG as a 
comprehensive messaging platform should adopt the same strategy.

=== Video is an increasingly bigger part of messaging

On-demand movie apps and video conferencing apps are typically 
implemented through processing streams of short messages. Any regular 
user can reach millions of messages handled (or stored) fairly quickly 
these days. In round numbers, a 1 GB (1,000,000 K) movie or video 
conference stream divided by 100K per message for a video frame is 
10,000 messages. Watch 100 movies a year or have 100 video conferences 
per year for ten years, and that is 10,000,000 messages handled.

Now you may say, but wait, TB:NG does not need to handle streaming video 
messages. Well, that's a design choice. But why make that choice without 
discussion -- especially given steaming video is an increasing part of 
corporate communications and now a new feature of Slack? Why not think 
about how text messaging (including annotations, summaries, chat 
messages, and of course emails) interacts with video?

=== Privacy value in local stores to keep search local

And of course, why shouldn't people be able to subscribe to thousands of 
busy mailing lists for years just in case to search later maybe? Why 
should we assume key information will be on the web versus discussions? 
My secret weapon as a software developer years ago was using DejaNews to 
look up technical information from Usenet. Now I use a search engine and 
look at discussions in web forums, blogs, or issue trackers. But why 
should I not want to have a local copy of all Bugzilla issues for all 
apps I use continuously updates via the equivalent of an RSS feed? Why 
should I want to be dependent on someone else's web servers for 
searching for issues of interest?

 From a privacy standpoint, it makes much more sense to subscribe to lots 
of lists and also download their full archives locally and search 
locally than it does to give up private (even strategic) information 
about what I am thinking about at the moment by using a search engine. 
When I was at IBM Research around the turn of the century (among other 
things, helping develop a forerunner to Siri called the IBM Personal 
Speech Assistant, implementations of XML standards like XSL-FO, and a 
digital video system that embedded HTML in videos), I always had to 
think twice about whatever I typed into a search query as far as what 
IBM Confidential information a search might disclose. Makes me realize a 
potential market for a TB:NG handling billions of messages may actually 
be big research companies who want to keep searches in-house.

While I can appreciate the limits of small companies in maintaining a 
reliable IT infrastructure, I still don't get why many bigger companies 
use Slack instead of hosting chat in-house with, say, Mattermost, or why 
the smaller companies don't host Mattermost with third parties. I 
outline some of the privacy risks of Slack here:
http://pdfernhout.net/reasons-not-to-use-slack-for-free-software-development.html#Slacks_privacy_policy_guarantees_very_little

Now that it looks like ISPs can sell user browsing data, thinking about 
alternative ways to support privacy by just downloading everything and 
searching locally seems worthwhile to me. But that means we need a 
personal messaging system that scales.
https://www.eff.org/deeplinks/2017/03/five-creepy-things-your-isp-could-do-if-congress-repeals-fccs-privacy-protections

My family has a personal library of 1000s of books. One of the 
surprising values in that library in a digital age is knowing that 
corporations and government are not looking over my shoulder when I read 
those books. Sadly I'm increasingly allergic to dust from books, so I 
use a Kindle more and more -- and I don't like that situation. Why not 
just download all of, say, Project Gutenberg and then choose what I read 
privately on a great e-reader? And then why not download all discussions 
about all those books and then privately follow the threads I am 
interested in?

Also, I have been on mailing lists where email archives went away. Why 
not get archives now, before, say, Google decides to paywall or delete 
or even alter them?

=== organizing all our messages is worthwhile

Of course, I am also all for writing great software that helps in 
prioritizing what we look at, of what we have, if we have billions of 
message stored locally. But that is a different issue (and one involving 
Library Science and "The Discipline of Organizing" and more).
http://disciplineoforganizing.org/
"This book changed my view of organizing as the dull, tedious task of 
putting things in order into thinking of it as the marvelous study of 
how people add structure to things ... Glushko is the master of the 
discipline of organizing, painting the fascinating story of how 
different organizational schemes change our behavior and our thoughts."

So, if organizing millions or even billions of messages a challenge? 
Yes. But for the right people it is an interesting and worthwhile 
challenge.

==== Most users don't care, but that does not matter

You'd be right to say 90% to 99% of users don't care about privacy. 
Sigh. But as above, it is the 1% of users who do who could make TB:NG a 
success by evangelizing it, testing it, fixing it, and so on.

--Paul Fernhout (pdfernhout.net)
"The biggest challenge of the 21st century is the irony of technologies 
of abundance in the hands of those still thinking in terms of scarcity."


More information about the tb-planning mailing list