Reading this article left me wondering just where the 200 million tags this guy needs are supposed to come from. Manual curation?! Automatically derived by file extension? file headers? what is the cost of opening a file, parsing its filetype, comparing against a reference, writing it to a database, etc. How is that cheaper than current indexers (which all seem to work fine btw)?
I rarely waste effort trying to remember filenames in the first place, much less needing some expensive tag curation to locate files. I simply use a bit of discipline organizing the directory structure(s). If I do ever need to actually search for something, it will be constrained to a narrow subset of directories and ignore the other 199.9 million files or whatever.
Moreover, I just don't have the problem of searching for filename fragments to begin with. Nor do I see a reasonable way to use a whole host of powerful unix techniques with a whackadoodle tiny tags filesystem. Or the need to produce a list of 20 million images in 2 seconds. What use would that be anyway? I'm not going to read a list like that - I'm going to operate on it.
Please correct me if I'm wrong, but the versatility of `find` is far more powerful if you actually need to handle/sort through that many files, and something like `fzf` probably curtails all these complaints in the first place.
If I had a penny for every time someone on HN responds with something like this - "just become more disciplined and you don't need X" - I'd be a millionaire. Doesn't matter what it is, type systems, memory safety, a better UI for Git… there's always someone ready to chime in with how their workflow means these problems don't happen, or, even better, asking the question why would anyone need this?
Yes, why would anyone need better search or a faster, easier to organise file system? I can't think why.
A better search and tagging can be valuable tools. But no matter how good search gets, it will not stop users from putting files untagged into one big junk folder.
Being able to think about how to order your files is a fundamental skill in this day and age and doing this on a big scale does indeed require discipline.
IMO it is just a false hope to think tags would help with the root cause of a lack of care about the data.
Being able to think about how to order your files is a fundamental skill in this day and age and doing this on a big scale does indeed require discipline.
I'm not sure that's true, because no one does that on mobile devices. Some people have even suggested that young people who've grown up with mobile phones struggle with filesystems because they have no experience of file management despite having plenty of experience of computing.
Mobiles have shown that it's possible to remove the concept of files entirely from the user facing side of an OS.
The same is true to a lesser extent with online office suites. You don't need to know the name of a file in Google Docs - you refer to things by their titles.
Moving from file names to tags, or any meta data really, would be possible. Whether it'd be better is a matter of opinion.
> Mobiles have shown that it's possible to remove the concept of files entirely from the user facing side of an OS.
I think the way Android does it is completely the wrong way around, as it makes it completely centered around apps, not documents. Making you a slave to the App, which in turn gets used to force you into using cloud services. It goes so far that you don't even have control over your files anymore, if you delete and App, the files created by that App will get deleted with it, you don't even get a warning.
I rarely use Android, but every interaction with it has been god awful. And from what I hear new version of Android will start making tools like SSHelper impossible, so you can't even workaround the madness anymore.
There are two main places where Android apps store files: within an application-private slice of the main filesystem, and the shared /sdcard (which, as its name implies, was originally a removable SD card, but nowadays is just another slice of the main filesystem). What the parent is complaining about is the former (and a per-application directory on the later), which is removed whenever the application is uninstalled (or the user tells Android to clear the application state); and unless you have root on the phone (or are looking at the per-application directory below /sdcard), or the application explicitly exports it, it's not even visible to any file explorer.
> Mobiles have shown that it's possible to remove the concept of files entirely from the user facing side of an OS.
That is an inherent weakness of mobile OS's and prevents them from competing with traditional computers in workplaces. It works as long as you never want to do anything complicated with your computer, but most people working in offices needs the organization offered by a file system, not to mention how crippling this would be to use as a development environment.
We have systems, but they don’t need to be “files”.
On development environments: we manage all our code under version control, our IDE index all the sources and most of us navigate by pretty unusual ways. I guess we would be fine without any direct access to the fs as long as we had a layer that gave back streams of files we look for and a way to commit changes to git. For a lot of devs I think everything is already abstracted by their IDE.
A lot of people work the same. For instance navigating exclusively through links sent to them, their office app’s “recent documents” and the “open” dialogue with the folder they stored all their docs and might not completely understand where it is exactly (except if it’s straight their “Desktop” folder). I think a ton of adults get by with that on their everyday job.
I’m still largely pro-file-systems, but your comment made me think. Here’s some loose thoughts, just to get into the problem space.
I navigate my codebase at work primarily using the name of the entity I’m aiming to inspect or work on next (e.g. “Popup” or “uiEventStream”)
- usually using fuzzy search. This matches by file ame, but feasibly could operate by entity symbolic name to the same effect
- increasingly using VSCode’s “find references”, which already operates by entity name (at least that’s now the UI appears)
However.. I also use the file tree, because important and meaningful application structure is encoded in the tree. The tree (and its node namez) gives me sections of the app, collections of entity types, and hints how they’re connected. This is invaluable. It helps new colleagues learn the application structure and it helps old hands get to what they want faster. It forms a “silent” background context against which all entity-based decisions get made.
The structure could be encoded as tags, with all the files dumped in a single directory. I have yet to see a tagging interface work as well for tag hierarchies as a directory tree works.
Tag hierarchies are a specialised use, or extension of generalised tags. Tagging systems typically emphasise (through UI and explanatory notes) the unstructured approach. Structure and unstructure are basically opposed so making a single UI work for both seems problematic. Educating users, most of whom wont use the word “taxonomy” in daily life, how to use a tool supporting a model with an almost inherent self-contraction seems like a mammoth task.
To add to that, there is IMO a renaissance of “fuzzy” navigation, with macos’ quick search (the Sherlock/quicksilver clone), and tools like Obsidian spreading the model of moving to apps and file by a set of partial keywords.
I agree with you that a tree structure is important, the same way people still look at trees to navigate pages on most sites, going through categories, sub-categories etc.. More and more the tree is just dissociated with the actual representation, the same ways urls don’t exactly match the site structure on so many sites.
I’d imagine the tags would have some hierarchical relations if we were to use tags exclusively.
Many programming languages have a tagging system for that: namespaces, packages, etc.
Often they have a close mapping to filesystem, but with IDE support that isn't strictly needed. (In reality however it currently is, as the filesystem is the language agnostic common interface between version control system, IDE, etc.)
>Mobiles have shown that it's possible to remove the concept of files entirely from the user facing side of an OS.
... and completely cripple the user's creative powers, making him a passive consumer. you cannot get serious work done on mobile and that is as true now as it was ten years ago.
> Mobiles have shown that it's possible to remove the concept of files entirely from the user facing side of an OS.
It's ok when you have a file type exclusive to an app AND the app provides export functionality. But it breaks as soon as you need to share files between apps.
Arguably, this is more about implementations than the principle, but ask any musician about the IOS music apps and they'll tell you they're great... except the file management.
> Mobiles have shown that it's possible to remove the concept of files entirely from the user facing side of an OS.
File browsers got very popular very quickly on Android and there is a bundled one on there now. So mobile has shown that despite the designers attempt to deprecate files, it didn't work out.
I strongly doubt that regular users ever bothered to use Android's file system.
If there are even files involved, most people will just use the app that received or created that file to interact with it. Especially since apps sometimes put files in completely random places, that even I as a technical person have trouble finding.
> I'm not sure that's true, because no one does that on mobile devices.
Even my mother sorts her pictures into gallery folders. Granted – a lot of sorting on mobile happens automatically (per app).
But "consumer devices don't need an accessible file system" is not a good argument to extrapolate that to machines people use productively. Don't get me wrong, I do think we can improve filesystems in terms of usability – I just don't think having some of it in your head will go away any time soon (and if it does, it will not be an improvement).
My point is, that in a productive environment the filesystem becomes part of your brain, just like a carpenter's workshop becomes part of their brain. This is not a bug, it is a feature. You don't need to think about where things are, because you arranged your environment in a way that suits the tasks you are doing 99% of the time. Now if someone came in and arranged the tools for you, moved them around automatically by their own logic, chances are that it doesn't fit your current task, your personal preferences, etc.
Moving from a world where you blindly know where something is, to one where you have to guesstimate what another entity "thought" would be an appropriate place for the thing they are looking for is not progress. If you were to make a automatic system that can read thoughts and put the file precisely in the place people are expecting it to be – that would be an improvement, but everything else not so much.
The key difference for mehere is the one between productive work and consumption: If you are in a space where you are consuming (e.g. a food on a buffet) it is totally acceptable to not have it your way. Who cares if it takes you 5 seconds more to find the balsamico for your salad? Tasks that you don't do productively like looking at pictures on your smartphone – who cares if it takes you a minute more to find a thing? But if you are a professional photographer and you look for that one picture you took in a specific session 4 years ago not a lot will beat a well built folder structure.
Because you don't really have valuabe data on a mobile phone. It's mainly just photos, and they are all in one folder ordered by date. So adding tags to that is a feasible strategy.
Everything else on the phone most people don't consider as permanent data, so it's not worth organizing it. You contacts are in the cloud, so are your chat logs,... And app configuration data can always be recreated with some effort.
Let's remove street addresses altogether, because that requires hard memorization, right? Instead let's all put any house of a city all along a contiguous space and refer to them via a description of their appearance.
I don't disagree with your premise that mobile abstracts away the file system. However many people do put rather a lot of effort into organizing files on mobile. I clean and sort my downloads folder just like on my computer, but more importantly the majority of people I know use folders to handle the now thousands of pictures we generate on mobile.
> Some people have even suggested that young people who've grown up with mobile phones struggle with filesystems because they have no experience of file management despite having plenty of experience of computing.
Currently teaching introductory programming at college level, can confirm.
> But no matter how good search gets, it will not stop users from putting files untagged into one big junk folder.
I personally don't see folders (or traditional file organization) and tags as competitive technologies. I think they're complementing each other very well. I generally put my stuff under well defined folders, but tag the notes (or the files if I have the capability).
95% of the time, I can just go to the folder and get what I need, but sometimes I need to search something which I don't remember whether I have it or think I misplaced. In that case file indexing and search really comes in handy.
I apply these methods at least in three places very enthusiastically: Pagico, Evernote and Tiddly Wiki. Both have hierarchical organization models (It's fixed in Evernote, not mandated in Pagico, and Tiddly is just free floating by nature), but they're meticulously tagged. I rarely use search in either of these. However this doesn't mean tags save me serious time or effort. I think both ways of organization is very useful, at the end of the day.
As a pet peeve, I really don't like this strong worded titles and posts. You don't have to kill something working well to enhance it with something to make it better for some or every use case.
I’m on the side of peoples who gave up on organization entirely, and have a stream of scanned documents with only the scanned date as the title for 99% of them.
The docs are OCRed and I retrieve them mostly by search, with tags for a few critical docs and by approximate dates for the rest if search by content fails completely.
This is viable, and there’s no way I’ll go back to manually setting up tags and names on all the docs that we scan in case it’s needed some day. It’s like asking everyone to do inbox zero with their mail, why would you put your time in the hand of uncontrolled external forces feeding you more info day after day?
> But no matter how good search gets, it will not stop users from putting files untagged into one big junk folder.
The difference is it's more likely the user will notice other tags the apply in each thing that would have been moved to 'junk' then if they had to more coarsely categorize all the junk in advance.
I’m all for optimising usability but there comes a point where one over optimises. While I love the app-orientated design of Android and iOS (ie you share data between apps rather than apps sharing a file system), they are effectively toy OSs for toy devices. Sure some people are hugely productive on them these days but they’re the exception rather than the norm. Whereas I depend on a file system to organise my data. In fact the file system is directly interpreted as name spacing in a number of different programming languages.
I get that most people are either too lazy or too technologically inept to own a computer but this race to the bottom to support everyone who doesn’t give a crap has to end somewhere. You see it with Windows UI in Win11 removing popular options so the designers can streamline the UI. You see it with this article too. Some people are always going to struggle simply because they are forced to use a computer day in and day out rather than them wanting to use it. But designing a unified system pandering for them but servicing everyone just makes the experience shit for those of us how genuinely know how to use a computer and depend on these features.
To use a car analogy (because for some reason people love comparing cars to computers…) I have no issue with track cars being sold without air con, a radio, etc because they’re a toy not a tool. So you optimise for that single purpose: racing on the track. But I sure as hell want the kitchen sink thrown into my family car.
I sometimes wonder if the problem isn’t computers but rather our assumption that everyone should be able to use a computer without training. If your job depends on using a file system correctly then you should be trained on that in exactly the same way that you’re taught how to use any of the specialist applications. In fact pre-computers, companies did exactly that with training their staff in how the filing system works!
By analogy: if you were to store all your paper copies of bank statements, bills, mortgage papers, etc... Would you just dump them in one big pile in the middle of your living room, or would you sort them into vaguely themed folders to impose some organisation? Hierarchical filesystems are valuable to those that want to organise data in ways that tags can only emulate.
Can I shout "Plumber invoice 2021" at that pile and the right document will come flying out? Or just "invoices 2021"? A pile that could do this would probably be fine for me.
The difference is that if a file isn't sorted in a file system, it just sits on your Desktop or Download folder and is still easily visible. If you use tags and your file doesn't get tagged properly it just is lost in the pile and impossible to retrieve.
I find tags work very well for discovering other peoples content on the web, but really don't help much with organizing your own data.
One operation that seems especially problematic with tags is copying. If I want to modify something on a file system and keep a backup, I just copy the directory tree before I do my modifications. If I copy something with tags, I end up with two things with the same tags, which is not very useful for keeping them separate.
Also how do you deal with removable media (USB, DVD) in a purely tagged system? What if the tags on the media conflict with your own tags? Once you allow filtering by device, you are just reinventing the file system again.
The solution should be a combination of folders and tags. E.g. imagine a folder that just contains all your photos without any substructures. It would be easy to just select e.g. all "2018 photos", "birthday photos" or "photos of your parents" in there without needing specific subfolders for those things (especially since those subfolders would conflict with eachother).
But I agree that I wouldn't want those pictures to be mixed with e.g. other random screenshots or drawings that I made, even if they could be separated by tags somehow. So folders as a hard separation would still make sense.
> "The difference is that if a file isn't sorted in a file system, it just sits on your Desktop or Download folder and is still easily visible. If you use tags and your file doesn't get tagged properly it just is lost in the pile and impossible to retrieve."
Wouldn't a shortcut to a view of un-tagged files sorted by recent basically serve the same role as an unsorted Downloads/whatever folder?
Anecdotally, my short answer to this is 'no'. At least a Downloads folder has a loosely defined purpose. I have often gone back to try and find something I downloaded that's no longer available (or too big to download again), and it's tricky at best to find something already.
Now, imagine how many files are being shifted around on a regular basis. Temp files, cached downloads, automatic installs of software updates, all sorts of crap your IT department may remotely put on your work laptop, etc. Sorting by date isn't all that useful. At least with folders, we have an 'enforced' blast radius if random junk shows up; my temp files stay in my temp folders, install files stay where they should, system files stay with my OS, etc.
And don't get me started on how things can end up in odd places on mobile devices OSs.
The lost file might not be recent and it might not even be untagged, it might just be tagged incorrectly (e.g. `vacation-2019` vs `vaccination2019`).
The nice thing with a hierarchical directory structure is that every file has a place, even if a file is misplaced or misnamed, there is a good chance it will be near where it needs to be.
With tagging you don't really have that, it's just a pile and you have to hope that you can remember a query to make the file show up again.
The biggest problem however is that I don't see how you can actually work within a tagged system. How do you extract a `.zip`? How do you copy a file? How do you deal with removable media (DVD, USB)? Finding a file and handing it over to an app is not the only way we deal with files.
Your local file system is a work environment, where you are the one creating and modifying files. Tagging seems to works best when it comes to exploring around other peoples content, but that kind of exploration is not something I do on my local machine with my own files, since I already know where I put them.
As long as you know exactly what's in the pile I guess? A good folder/directory structure tells you pretty quickly what you've got available and makes it easy to browse and explore even if you've got no idea what you might find going in.
It gets annoying though when you have just single invoices or documents that don't quite fit in your structure. I eventually started throwing all received documents in one folder and used the name as tags basically. E.g. 20210214_plumber_invoice.pdf or 20181204_someshop_invoice_playstation.pdf...
(And yes I use the filename for tagging because I don't trust e.g. Windows tagging system to be there forever or be copied properly onto other operating systems.)
Do you mean Joe the Plumber from London, UK or Joe the Plumber from London, Ohio ?
We all know how this ends up. It ends up being like Google where the search engine uses word embeddings and the like and removes word from your search queries or replaces November by December because they are both months so you can substitute one for the other right ?
They could help if the tags were 'Occupation = Plumber' and 'Name = Joe'. Now a search for all files where both tags are present will get you Joe the Plumber's invoices. If you want everything from any plumber or from any person named Joe, then just leave off one of the tags from your search. It is very much like when querying rows in a relational database, just adjust your WHERE clause.
I agree. Unfortunately, right now the 'well structured relational database' is completely separate from the file system. Didgets was designed to combine the two into a single coherent system so that you can't update one without the other. By 'combining' doesn't mean I did what WinFS tried to do and just take a filesystem and a database and stick them together somehow. I built a completely new system from the ground up that incorporates traditional filesystem features (block allocation, stream management, metadata control, folder hierarchies, etc.) with solid relational database features (schema, tags).
If your memory is anything like mine, chances are you momentarily don't remember the word "plumber".
Hierarchical structures, while inflexible and sometimes prone to mis-categorization, provide navigational cues that tags don't provide. It's almost like with GUIs vs CLI - if you know what's already possible and want to express yourself precisely, you want a CLI (tags with lots of Boolean operators to precisely include/exclude). And conversely, if you don't know what's already possible, but could figure it out if you have the options laid out in front of you, then GUI (a hierarchy with all choices already laid out) will be more relevant.
I keep passports, degree certificates, deeds, health insurance docs.. the things I would grab if I was running out the door, in a single box file.
Everything else is basically unsorted, maybe vaguely sorted by date of putting on top of the pile, by placing things together after searching for them once, or by 'I think I know where I saw it'.
I have tried putting everything in themed folders, it's a waste of time. The time spent searching for something is much less than the time spent organizing everything in advance. The modal piece of paper will be thrown away after a few years without ever having been needed.
Is that not effectively a first-level hierarchy with no further subdivisions? The "important stuff" category and "everything else" category are already a useful taxonomy, even if very minimalist.
One of the biggest problems with folder hierarchies is that files can often be classified in several different ways. To take your paper statements analogy, do you organize by year, by institution, or by category? What if you have a 2002 bank statement? Do you put it in the '2002' pile or the 'bank statements' pile? Using existing file hierarchies allow you to store the digital document in the '2002' folder and then create a hard or soft link in the 'bank statements' folder, but that can be a hassle. Tags allow you to attach them to documents, photos, videos, etc. without worrying about how you might organize them. Luckily, Didgets lets you organize your file using either a hierarchical folder structure or just by using tags. It is your choice.
To me, it seems best to exclude as many possibilities as I can during each step of the (naturally recursive) search. Filtering by "is bank account statement" excludes a lot more files than "was incorporated into my files in 2002", since most people only have a few bank accounts but a lot of photos, videos and other things that they create or download in a given year.
I think the best system is actually a mix of hierarchy and tags. Top-level, very broad "semantic zones" (aka is this .PDF a bank statement, a cake recipe, a textbook, or some temporary file from the browser cache) would lend themselves to being represented as a shallow hierarchy, and items within a specific semantic zone could be then freely tagged or further subdivided into a hierarchy, whichever approach makes sense for that particular semantic zone.
You assume that there are fewer bank statements than 2002 files in your argument. What if you loaded in a million bank statements in 2005 but only created a few thousand files in 2002? With Didgets, I can tell how many objects have each tag attached so I order the search to eliminate based on how likely the set I am searching for has each tag.
How many bank accounts does an average person have? Even for the most extreme cases, we're looking at the low hundreds of statements annually, at the maximum. If you're not an average person but instead a business or an archivist, then you need a custom system anyway.
I'm really not trying to criticise or diminish the value of your system. All I'm saying is that even without an additional tag (or hybrid tag+hierarchy) overlay, a hierarchical system can be quite useful as long it's well thought-out by the user.
Having every file pollute a global namespace seems to require more discipline than the current hierarchical system where you can easily copy a directory tree without having to worry about breaking something else.
That is the main problem with these so called "solutions", they usually take more effort and discipline than the problem they originally set out to solve. The right solution is just to learn the original system properly rather than trying to invent an even worse way to work around it.
Yeah, you can see the downside in the demo video. When he shows off the search for pictures, there's a random mixture of actual photos and things like toolbar icons and whatnot. Sure, you could fix this by tagging everything and doing a more complex search, but that sounds like a lot of work and discipline, more than eg the guy doing the demo was willing to put into it.
Actually no. I wanted the demo video to be short (4 minutes) so I didn't do a lot of complex searches. I have other videos, but to show everything takes a 20 minute video and I didn't think that was a good length for an introduction.
The article proposes to replace the current file system approach (which works just fine for me, by the way, thank you very much) with something different to solve a problem that I (just like the post you reply to) have no interest in.
Better search? Sure! Improved speed of storage and retrieval? Great! But either show that it is not degrading current functionality or be ready for pushback from people suspicious that their current setups will break. My 2c.
don't you think a tag system would require even more discipline?
what do you think happens if you make a mistake with your tags and/or there are typos in the filename? With a directory structure, you can navigate to the location and see the list of items to quickly identify what you were looking for. It is far more forgiving when it comes to poor organisation or mistakes. With a pure tag system, a file with the wrong name/tags is pretty much forever lost.
Not necessarily. Missing or misspelled tags could be discovered just like a row in a database that has a column value that is missing or misspelled can be. For example if you want all your photos to have a 'Year' tag attached for when the photo is taken, just query for all photos WHERE 'Year = NULL'. The same goes for values like names. If you see that you have 10,000 files that have 'Name = Karl' attached but only one that has 'Name = Kral' attached, then that is an easy fix.
With only a few hundred file, it's easy to look at the list and spot the outliers. In a real world scenario, how would you know that a few dozen files are missing when you search for "karl" and the files tagged with "Kral" don't show up? On a small file collections that only you has access to, you might remember them and notice that they aren't part of the results but that doesn't work for large libraries or if multiple people are collaborating.
With a directory structure at least you can look into the folder of the project, see what's inside and open the files to find the one you were looking for. If you were looking for a specific Word file and only a dozen of them are present in the folder, you can always just open all of them manually to check what's inside regardless of how poorly they were named/managed. Good luck trying to find the Word file with bad tags when searching for "*.docx" return thousands of results.
Cleaning data for tags is about the same as cleaning data in a relational database table. Here is a demo video of how Didgets does that: https://www.youtube.com/watch?v=kqkNeU1LYEQ
Just think of each defined tag as one of the columns in the table.
If you were trying to find a physical copy of an important tax related letter, would you prefer to search for it in a folder dedicated to tax document from that year or from a room filled with every single piece of paper that you have ever received by mail in your life?
A pure tag system only works for small libraries, it requires far more discipline by properly tagging every single file, it does not scale and it does not work well when you collaborate with other people. It works well in situation where you can automate the tagging (eg. a collection of pirated moves) but is pure garbage for normal files that you typically use.
It's a lot easier to tell people to place pictures of karl in the "karl" folder than it is to make sure that every single picture gets properly tagged with the word "karl". I can imagine hundreds of different scenarios where it gets tagged slightly wrong. Typos won't be easy to fix because they will simply not show up in the search when you type it. How many files with "K arl", "Carl" or " karrl" are there? no one will know.
There seems to be a lot of confusion here about Didget's tagging system. It is not meant to replace the file hierarchy, but to supplement it. With Didgets you can still organize all your files in a plain old folder hierarchy without tagging everything. Tags just provide a secondary way to search for things. So you can still stick all your photos of Karl in a folder named 'Karl' if you like.
Because such a system would entail its own drawbacks, such as a larger CPU load or a more fragile disk organization, whilst most people wouldn't really need it.
This is obviously hyperbole and you're well aware you haven't read 100 million Hacker News comments, let alone that many with exactly the same basic message.
But I was curious what this might be equivalent to in terms of time investment. A quick style guide check recommends 15-20 words per sentence for English language written communication. Assuming the low end of that, and minimal single-sentence comments, that is still equivalent to reading the entire 14-book Wheel of Time series, which tends to take most people several years, 369 times.
I guess the point is, the proposed system wouldn’t actually be easier to organize. The metadata that would make searching so easy is what’s missing. But a new data structure doesn’t solve for the missing metadata. And without that extra metadata, searching would not actually be improved.
Surely that's the tagging. I use tags extensively on my Mac because it's so useful to me but it's clearly an afterthought for Apple, and I struggle to use it at times.
Making tags a first class citizen would improve things immensely. The search index being a first class citizen again, would also improve things - why should I find Spotlight indexes loitering in the dark corners of my filesystem as dot files? I know there's a file index kept somewhere full of inodes and suchlike, why isn't search index data kept with it?
I also don't know why I have to rely on file system watchers that seem to be external to the file system and thus eventually sucking vast amounts of CPU when a hook into the main index would suffice. I don't write file systems so I can't tell why this is the case, or in fact, if it is the case but appears to me that it isn't every time I need to kill a file watcher.
Most of the suggestions in the article seemed good to me (immutable files, smaller meta data pages etc), I'm sure there are others around, but I'm also not sure why there's a need among some to protect the status quo by relying on good behaviour, of all things.
With Didgets, tags are an integral part of the system. They don't get lost or forgotten when you copy a file from one place to another. Searches are a native part of the system as well so you aren't relying on a separate indexing service that has its own database somewhere else.
BTW, managing file data using folders and tags are just a few of the features of the system. I found out the columnar stores I used for tagging, were easily used to also form relational tables. I can load in a 100 million row, 40 column table and do queries against it much faster than the same data loaded into Postgres, MySQL or SQL server.
Where's the "don't need X" part? Have I dismissed hierarchical file systems out of hand? Where did I suggest greater discipline should be the approach.
I also use automated tools to help me with the tagging but I think that it's not a magic bullet - did I claim it was?
No, and I didn't do any of the other things I asked for evidence of either.
So, if I had a penny for every time someone misquoted me I'd have a penny more right now.
i try to organize my stuff, but sometimes i forget where in the organization i put something. then a brute-force search helps. if i keep good directory and filenames, then locate will do the trick. once i found one item, any related other things are usually nearby.
folders are more convenient because they are part of the file system. there is no ls by tag or even a gui filemanager that shows files by tag. that's one reason why tags need to be part of the filesystem, because if they are not, then most filemanagers would not support them.
and technically, file extensions are kind of like tags. and it's really ugly that they are in the filename string. that messes up a lot of things. it would be better if they were proper tags independent of the name. so you can rename a file without changing the tags, similar to the problem with EXIF.
or more importantly, you could reference a file without that reference depending on the tags of the file. your jpg/jpeg example is also a problem caused by this situation. it would go away with proper tags
macOS does store tags in the filesystem (which you can access using xattr at the command line) but I have no earthly idea how you find files by tag or really do anything with them.
The master tag list seems to be Finder-specific preference data though.
it is from 2005, so not really current, but the arguments are interesting.
in short: filesystem attributes are systemwide (but you and i may want to have different tags on the same shared file) and the user needs to have permission on the files, so you can't tag files that you can read but can't change.
i believe these issues are solvable, esp. the latter would work if we have permissions to add tags but not the content of a file. (like you can rename a file even if you don't have write permission to the file)
xattrs can certainly be used to store tagging info. There are a couple major problems with them though. 1) xattrs are not supported by all file systems and they are not enabled by default in some. If you copy a file with xattrs from one file system to another that either doesn't support them or didn't enable their use, then your xattrs are thrown away in the copy. 2)Searching for files based on xattrs in a large folder tree (e.g. several million files across thousands of folders) is exceptionally slow by nature.
right, but the alternative is no support for tags at all, so xattr gets us halfway there, and filesystems that don't have it need to keep up.
searching can be sped up by building an index. apps that want to use tags will need to do that, just like they build an index of files already. because searching filenames is also slow.
> Plus, this is how we organise [0] stuff in real life.
The way I organize people I know and places and all sorts of other entities I cannot physically place into boxes and folders is a lot more like the tag approach, though.
Google photos is pretty amazing. I enter a search for "car" and immediately can see the photos os several of the cars I've owned over the years.
One day I needed to remember when I had travelled to a certain city, searched on my Google photos and it instantly showed the photos I took in the city, including the exact dates.
Yes, I know letting Google know all about my life like that through photos may not be the greatest idea... but wow, does the photo search work nicely?!
The google photos image search is amazing. The other day I was trying to remember how long it had been since I smashed my toe doing yard work so I tried searching “toe nail” and it pulled up exactly the picture I was looking for.
> It sounds somewhat like “gmail for files” which is … problematic because email search works well enough because it’s relatively rarely done.
Gmail does not work for me.
As someone in IT I get some number of automated messages (e.g., cron). With Gmail all I can do is tag them and have a "folder" / view of just those tagged messages. But they also pollute my Archive 'folder' as well.
But I do not want them there, because they are not a priority generally, and they pollute search results.
I want an actual separate folder to file these messages in that is out of the way so as not to pollute the rest of the namespace.
And that’s the issue. The status quo may be the best choice for the lowest common denominator. But some power users could get much more out of something with a different approach. You can’t force a one-size-fits-all ontology onto the masses.
People need to wake up and realize that not all software technologies need to be popular to be successful or useful. It seems people around here assume this without even thinking about it first.
My response to these types of proposals is “just imagine that folders are tags and each level of hierarchy is a tag, symlink for multiple tags.”
It’s funny because the author just proposed a different, I think worse due to novelty and minimal benefit, organizing hierarchy.
I think Apple has a decent approach where their spotlight indexes very well (I use hit command+space and the first letter or two instead of navigating finder), and they support tagging files.
When importing files into Didgets, the program automatically gathers information from the source file system and attaches specific tags to each file. For example, the file name is attached as a 'name' tag. Each folder name in its path is attached as a 'folder' tag. The file extension is attached as an 'extension' tag. In addition a SHA1 hash is created from the data stream and attached as a tag. You also imported them by dropping the files or folder onto a 'drop zone' on the create tab in the GUI. Any tags attached to that drop zone are also automatically attached to any file dropped on it. So dropping 100 photos on the 'My Wedding' drop zone might attach the tags 'Event = Wedding' and 'Year - 2022' to every photo. Searches for files that have a tag 'Folder = Microsoft' would find every file that had 'Microsoft' as a folder anywhere in its path.
> it will be constrained to a narrow subset of directories and ignore the other 199.9 million files or whatever.
I think this is a vastly underrated point. I am usually not interested in searching the majority of files on my filesystem. I can't remember the last time I needed to search through system files for normal computer use reasons.
I also think the author completely skips over how to handle related files. If my application needs to load a library, how does it find the file to use? If it's by name, how are name clashes handled? I suppose it could be by tag, with built-in tags, but then you won't be able to change the tags without having to change configs or the binary itself.
The core problem at keeping the files organized is that unless you are dealing with a stream of effectively pre-tagged files, tagging/categorizing/grouping emerges after sufficient number of files arrive. Therefore organizing is proactive
I rarely waste effort trying to remember filenames in the first place, much less needing some expensive tag curation to locate files. I simply use a bit of discipline organizing the directory structure(s). If I do ever need to actually search for something, it will be constrained to a narrow subset of directories and ignore the other 199.9 million files or whatever.
Moreover, I just don't have the problem of searching for filename fragments to begin with. Nor do I see a reasonable way to use a whole host of powerful unix techniques with a whackadoodle tiny tags filesystem. Or the need to produce a list of 20 million images in 2 seconds. What use would that be anyway? I'm not going to read a list like that - I'm going to operate on it.
Please correct me if I'm wrong, but the versatility of `find` is far more powerful if you actually need to handle/sort through that many files, and something like `fzf` probably curtails all these complaints in the first place.