This will probably get buried, but I'm wondering: Has any scripting language attempted to use SQLite as its memory model? You get GC, memory safety, atomicity guarantees, core dump functionality, and rich data structures for free. If configured to run in-memory, I reckon it wouldn't impose enough overhead to be problematic.
It sounds crazy, but most scripting languages add this stuff anyway.
With rqlite this can be extended to distributed systems programming as well.
That's interesting. There would be overhead compared to directly managing memory. But it would be wild to be able to declare an array of structs that you could directly query.
var people person[5]
people[5] // { name: "Human Person", age: 39 }
people.query("SELECT name FROM a WHERE age = 39") // "Human Person"
If everything in memory was a table structure, you could do joins across variables too.
Would it have to be a scripted language? Seems like you could do it as a compiled language.
I love SQL so if I had the option to use it within in application I'd be hard pressed not to. But from what I've learned from the popularity of ORMs and NoSQL, most don't share that feeling.
I'm not prisenco, but does LINQ have its memory model in SQL?
What tickles me about the idea I posted above is that you could take a snapshot of your entire program state (at sequence points) and it's just an SQLite db file.
Linq uses a bunch of iterators functions that accept lambdas. C# allows them to capture the lambdas as an expression so it can convert the expression into SQL or whatever.
it also has an SQL-like syntax you can use that is compiled into those iterators.
the interface is generic, with builtin support for ienumerable types and SQL data sources.
To take it even further, the language grammar could be a superset of the SQL grammar.
> Would it have to be a scripted language?
Not at all. It's just the first thing which came to mind. I'm tickled by the idea of having the entire program state (stack frames and all!) in SQLite so that you could ship a snapshot of your program around.
> There would be overhead compared to directly managing memory.
Of course, but if the critical queries were pre-compiled via sqlite3_prepare, I doubt the overhead would so greatly exceed the memory management overhead in language runtimes which ship with a GC that it would be a show-stopper.
While not serialized to a database, memory-dump/snapshot based distribution was done with smalltalk and various lisps. Basically dump the gc and the executable just reloads it right into RAM. Wasn't the best from what I've read over time.
Having an SQL based callstack sounds painfully slow.
I don't have references, just bits of notion from reading various accounts of things online over the years. Most of the software that was distributed as an image would also run as an image and stop by saving that image down to the disk. This could lead to persistent difficult to debug errors (imagine a random RAM bit flip getting persisted into the image) that couldn't be recreated because they only existed on that one clients machine. Each customer was effectively modifying their versions as they ran and saved and reloaded the software, and the as seller was contracted to make local modifications for them, so the install base would diverge over time. There was no central or single source to rebuild the program from. Turning it off and on again just reloaded the saved bugs.
Let's pretend that someone had made "bad" changes and saved them in their current image file, and they had not archived their other changes.
They would be able to recover all their "good" changes without the "bad" changes because their changes were automatically recorded in a transaction log file.
They could replay the "good" changes from the changes log into an original image file, and not replay the "bad" changes.
Considering this further... I think Lua might be a good match for this idea. Everything it can do can be pretty easily mapped to SQLite queries and its type system is pretty close to the limited set of types offered by SQLite.
We can also integrate this directly to something like lua running on sqlite compiled by cosmopolitan project to even ship a cross platform single executable sqlite.
The possibilites are endless.
If you are genuinely interested in this project. I am willing to help in my free time. But I am a novice and I don't have experience in low level langauges like C/rust and much prefer golang.
this is such a crazy good take , I was thinking of creating a golang based module loader / code loader directly from a sqllite file and a tool / cli that can help you with this thing where what I intend to do is lets say my cli name is goter (go + motor IDK)
goter init (creates a sqllite and a main.go etc. but all main.go etc. are just a symlink to something inside sqllite I am not sure , its just a really really hypothetical idea)
goter add dep <some dependency>
(which it also then loads in that sqllite file)
goter run main.go (which can run)
goter share (which can output a sqllite file)
goter load .db file (which could even have whole golang versions inside of it)
goter build docker (creates a docker image , if possible)
goter collab ipc (creates a ipc gateway between two servers using something like piping-web if behind nat or directly as well) (to have really easy multi deployments)
and integration with popular tools like go-blueprint / even the ability to create custom templates like go-blueprint if possible since clack has been migrated to go by some guy.
this doesn't have to be golang specific , it can be language agnostic.
My thoughts about this were creating a dead simple cloud editor / code runner (where the code runs inside bubblewrap) and this has the golang code and the whole codebase of a single user is just a sqllite file , insanely simple to transport , Insanely efficient.
I am not sure if this makes sense , but you have guided me in a direction that feels just as enlightening as the day I discovered cosmopolitan project on hackernews / redbean (offtopic) (cross platform executables say what!)
THANKS A LOT , I AM BOOKMARKING THIS , THIS IS SO SO GOOD OF AN IDEA THAT I MIGHT'VE LOST A BIT OF MY MIND AT 1 AM Comprehending how good of an idea this might be.
I also think about this sometimes, maybe not as only memory model but a simple scripting language with tight integration with sqlite would be pretty cool.
P.S. Actually if think I saw something similar somewhere in the past, if it comes to mind I'll send it
I had a situation with HashBackup (I'm the author) where I needed a map-like data structure that could get rather large (hundreds of MB to a few GB) with big backups. In Python, storing this data in a dict was not really feasible because every unique integer takes 24 bytes, and since the data structure stores things like blockid and file offsets, there are a lot of unique integers.
Instead, I created an in-memory SQLite database for the data structure and it was something like 1/3 the size of the Python dict. It was around 10-15% slower, but for the memory savings it was an okay trade-off.
Have you taken a peek at sqlsync? It's a cool take where you write your entire application logic to operate as transactions and queries over a sqlite database but you do it in wasm so that the same transaction logic can run everywhere. The end result is nifty. That's at least my recollection of the thing.
I think the core idea is fascinating. This model of making the data of the application transparently persistent was a fad in the 1980s-1990s when object databases like Objectivity and GemStone were all the rage. Those faded from the mainstream for various reasons, though they're still around and still heavily used in industries such as finance.
Object databases like Objectivity, POET, Matisse, and others provide tools that took your source code and augmented your classes with low-level code that automatically loaded objects from the database. So you wrote your classes and ran the tools, and you suddenly got magical persistent objects. (A lot like CORBA and COM, but usually without an IDL.) These databases initially only did C++, but later they supported Java, which had the big benefit that the tool could take the compiled byte code, disassemble it, inject the persistent glue, and then assemble it again. (GemStone was Smalltalk, which already has this idea of persistent built in.)
Someone eventually applied this to Python, which had the benefit, being a dynamic language like Python, of letting everything be done entirely at runtime. The most successful project was Zope, the CMS, which came with the ZODB. With ZODB, you could treat entire graphs of Python values as transparently persisted to a database file: You just did "db.users[1].name = 'bob'" and that was it.
Of course, this results in the access path problem described in the tar pit paper. Without a way to slice and dice data from any angle, your stuck with the ad-hoc graph structure you've created through values and their relationships, as opposed to the relational model. I'm not sure this is entirely an unsolved problem, though lack of queryability was the Achilles' heel of all these objects databases.
Another persistent language is M [1], also known as MUMPS, a quirky language and toolchain that came out of NIH and has been popular in the medical industry since the 1970s, and also in banking (Fidelity National).
M is tightly coupled with a kind of hierarchical array-oriented database; any variable can be declared as persistent, meaning that they will be read from and written to the data store, and persisted across runs. Since the underlying representation are in the form of sparse arrays, you can have lots of huge, nested data structures that are super efficient to read and write. I'm not sure there is anything else like it.
In fact, InterSystems, the main commercial vendor of M software, markets their implementation as an object database (it also does SQL, on top of M).
So I just found brainfck written in sql and its crazy good
I was hoping that I could find something like lisp intepreter in brainfck and AI hallucinated a hell lot ,
I implemented a brainfck interpreter in pure SQL using CTEs
23 points•15 comments•submitted 6 years ago by chunkyksSQLite, db of champions to r/SQL
I did this in SQLite, but it'll probably work in other databases.
Why? No good reason. Anyways, this program is a bit traditional:
$ sqlite3 < bf.sql
Hello World!
WITH RECURSIVE
program AS
(SELECT '++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.' AS p,
'' AS input, 3 AS width
),
jumpdepth AS
(SELECT 0 AS idx, 0 AS jumpdepth, '' AS jumplist, NULL as jumpback, NULL AS direction, p || '0' AS p, width FROM program
UNION ALL
SELECT idx+1, CASE SUBSTR(p, idx+1, 1)
WHEN '[' THEN jumpdepth+1
WHEN ']' THEN jumpdepth-1
ELSE jumpdepth END,
CASE SUBSTR(p, idx+1, 1)
WHEN '[' THEN SUBSTR('0000000' || (idx+1), -width) || jumplist
WHEN ']' THEN SUBSTR(jumplist,width+1)
ELSE jumplist END,
CASE SUBSTR(p, idx+1, 1)
WHEN ']' THEN CAST(SUBSTR(jumplist,1,width) AS INTEGER)
ELSE NULL END,
CASE SUBSTR(p, idx+1, 1)
WHEN '[' THEN 'L'
WHEN ']' THEN 'R'
ELSE NULL END,
p, width
FROM jumpdepth
WHERE LENGTH(p)>=idx),
jumptable(a,b,dir) AS
(SELECT idx,jumpback,'L' FROM jumpdepth WHERE jumpback IS NOT NULL
UNION ALL
SELECT jumpback,idx+1,'R' FROM jumpdepth WHERE jumpback IS NOT NULL),
bf(ep, p, width, defaulttapeentry, ip, dp, instruction, output, input, tape) AS
(SELECT 0, p, width, SUBSTR('0000000', -width), 1, 1, '', '', input, SUBSTR('000000', -width)
FROM program
UNION ALL
SELECT ep+1, p, width, defaulttapeentry, CASE WHEN jumptable.b IS NOT NULL AND
((dir='R' AND CAST(SUBSTR(tape, width(dp-1)+1, width) AS INTEGER)=0)
OR
(dir='L' AND CAST(SUBSTR(tape, width(dp-1)+1, width) AS INTEGER)!=0)) THEN jumptable.b
ELSE ip+1 END,
CASE SUBSTR(p, ip, 1)
WHEN '>' THEN dp+1
WHEN '<' THEN MAX(dp-1,1)
ELSE dp END,
SUBSTR(p, ip, 1),
CASE WHEN SUBSTR(p, ip, 1)='.' THEN (output || CHAR(SUBSTR(tape, (dp-1)width+1, width))) ELSE output END,
CASE WHEN SUBSTR(p, ip, 1)=',' THEN SUBSTR(input, 2) ELSE input END,
CASE SUBSTR(p, ip, 1)
WHEN '<' THEN CASE WHEN dp=1 THEN defaulttapeentry || tape ELSE tape END
WHEN '>' THEN CASE WHEN dpwidth=LENGTH(tape) THEN tape || defaulttapeentry ELSE tape END
WHEN '+' THEN SUBSTR(tape,1,width(dp-1)) || SUBSTR('0000000' || (CAST(SUBSTR(tape,width(dp-1)+1,width) AS INTEGER)+1), -width) || SUBSTR(tape,widthdp+1)
WHEN '-' THEN SUBSTR(tape,1,width(dp-1)) || SUBSTR('0000000' || (CAST(SUBSTR(tape,width(dp-1)+1,width) AS INTEGER)-1), -width) || SUBSTR(tape,widthdp+1)
WHEN ',' THEN SUBSTR(tape,1,width(dp-1)) || SUBSTR('0000000' || (UNICODE(SUBSTR(input,1,1))), -width) || SUBSTR(tape,width*(dp+1))
ELSE tape END
FROM bf LEFT JOIN jumptable ON jumptable.a=ip WHERE LENGTH(p) >= ip)
SELECT output FROM bf ORDER BY ep DESC LIMIT 1;
> When the value of auto-vacuum is 2 or "incremental" then the additional information needed to do auto-vacuuming is stored in the database file but auto-vacuuming does not occur automatically at each commit as it does with auto_vacuum=full. In incremental mode, the separate incremental_vacuum pragma must be invoked to cause the auto-vacuum to occur.
The note of several of them as being defaults for a web app kind of sets expectations for me. These are defaults for maybe you're writing a web app and want it to be a bit more like MySQL or Postgres. They aren't defaults for using SQLite as an in-memory cache on a complex piece of analysis.
SQLite is an amazingly widely used piece of software. It's impossible for one set of defaults to be perfect for all use cases.
I agree. While it’s probably not possible to settle on defaults that work for each and every scenario, my personal preference is that factory defaults should tend to optimise for safety primarily. (Both operational safety, but also in regards to usage.)
For example, OP suggests setting the `synchronous` pragma to `NORMAL`. This can be a performance gain, but it also comes at the cost of slightly decreased durability. So for that setting, I’d feel that `FULL` (the default) makes more sense as factory default for a database.
I agree that they are tradeoffs and I find it strange to use a term like "sensible" to describe them. In my mind, it implies that if you're not using them you're not being sensible. And in my experience, people who can't explain that these are tradeoffs often use words like that[0]. Of course English is not my first language so that may be the reason why I think that.
These may be useful settings for a certain kind of application under a certain workload. As usual, monitor your application and decide what is suitable for your situation. It is limiting to think in a binary way of something being sensible or not.
[0] Like "Best Practices". Any else you can think of?
That's the essence of defaults: they're just better defaults, not an excuse to stop caring about what better settings to use once you have runtime performance metrics to guide whether they need changing and which direction to change them in.
Right, but they are not the official defaults. If you are changing the official defaults, then you might as well do it in a more principled way.
If you are going to need to optimise with performance metrics anyway, then why not stick to just the official defaults (unless the official defaults are non-functional, is that the case?)
I think anyone who is reading a blog post on “better defaults” is front loading some of the optimisation, so you could let them make a principled choice straight away for marginal extra cost.
There is some principle here: these are more sensible defaults (but still only defaults) for a very different purpose when compared to the defaults that make sense for "generic SQLite use" that has to keep those set to something that works across all versions. Having different, domain-specific defaults based on an assumption of "you're setting up a new project using the current version of SQLite" makes a whole lot of sense.
A lot of these aren’t defaults because of backwards compatibility. IMO there is no reason to not use WAL mode, but it’s not default because it came later
As long as all the processes are on the same machine the wal mode is a good idea.
It's not good in the case where multiple machines are sharing the same database. Like say if you had a shared settings file which allowed multiple VMs to be set in one place.
Obviously the same-machine situation is the most common. But you asked for a reason for when wal is not appropriate.
> If you are going to need to optimise with performance metrics anyway, then why not stick to just the official defaults (unless the official defaults are non-functional, is that the case?)
Well, why don't you do the research and tell us?
In my 10-15 years of dealing with official defaults of many programs is that they do work, but in 90% of cases they are overly-conservative.
“To accelerate searching the WAL, SQLite creates a WAL index in shared memory. This improves the performance of read transactions, but the use of shared memory requires that all readers must be on the same machine [and OS instance]. Thus, WAL mode does not work on a network filesystem.”
“It is not possible to change the page size after entering WAL mode.”
“In addition, WAL mode comes with the added complexity of checkpoint operations and additional files to store the WAL and the WAL index.”
“SQLite does not guarantee ACID consistency with ATTACH DATABASE in WAL mode. “Transactions involving multiple attached databases are atomic, assuming that the main database is not ":memory:" and the journal_mode is not WAL. If the main database is ":memory:" or if the journal_mode is WAL, then transactions continue to be atomic within each individual database file. But if the host computer crashes in the middle of a COMMIT where two or more database files are updated, some of those files might get the changes where others might not.”
WAL isn't the footgun, SQLite over network attached storage is. Additional files are not a problem. If you need cross-DB file ACID you can just turn off WAL mode in that instance.
>> WAL isn't the footgun, SQLite over network attached storage is
I agree that using SQLite over a network is not a good idea. But people do do that, and it works tolerably well. On most networks it'll work just fine, especially if writes are sparse.
The point of defaults is that they should cover the "widest range". By all means read the docs, and choose to use Wal. But that's an action you take and understand.
The point of articles like this though is to help alert people to the settings that exist. You should always read the docs and apply them to your context before using any setting.
It’s unreliable on any network for good reason: SQLite relies on the kernel for state information on the file system. Two machines sharing one file over a network have two unsynchronized versions of that state. One machine’s write does not update the other’s in-kernel image.
These all sounds like load-specific / ancillary concerns.
I don't think most people are in one of those cases, and most people will greatly benefit from WAL. WAL should absolutely be the default unless you have a good reason to not use it.
ACID guarantees work just fine. They don't work perfectly in scenarios where SQLite should not be used anyway: multiple machines accessing the same DB file over network, and having multiple attached databases.
> You may prefer non-acid characteristics in your databases in general.
No. Quite the opposite. However the multi-database case strikes me as a very limited edge case, I would be surprised if double digit % users of sqlite even knew it’s an option let alone used it, and the concern is further lessened because afaik things like FKs do not work cross-db.
Unfortunately, there doesn't seem to be a PRAGMA option for strict tables. Docs at following link say that 'STRICT' must be appended to end of CREATE TABLE statement to disable the flexible typing 'feature'.
I don't think there can be with sqlite's backwards compatibility model. The DB has to assume it may be accessed with multiple versions of the library or clients using different pragma settings. So one client disabling strict (or using a version predating that feature) and inserting a bunch of violating records would break the DB file when used by another client with strict enabled.
With it part of table creation, which IIRC is read for each connection to create the schema, it's part of the DB file not the client configuration. If you open a DB using strict with a version that doesn't support it, it will fail but not cause problems for other users.
I wish there was a branch of SQLite that cleaned up everything, set new sane behavior (and broke backward compatibility since it’s needed to achieve that).
Eg strict mode by default, WAL2, BEGIN CONCURRENT, etc.
I would go higher than this by default. Probably the best default is something like "amount of time a human would be willing to wait for something they want before giving up" which, if you have to pick one value, is hopefully longer than 5 sec. Perhaps 30000.
Now, there are also cases where fail-fast is best. However 5000 isn't good for those either. And that assumes a wider system that handles failures in quite a nice way, which doesn't happen without special effort, so doesn't make sense to target with a default value.
> Why?: Prevents "database is locked" errors by giving SQLite 5 seconds to wait for a locked resource before returning an error, useful for handling multiple concurrent accesses.
Alongside that one of the most important things to do is to have strong read/write segregation: separate readonly and r/w pools, the latter having a single connection which is immediately set to BEGIN IMMEDIATE when it's borrowed out.
How is the meaning not clear? If you're trying for a "there's no connections in sqlite" type gotcha that's not even true. SQLite itself uses the term internally and it shows up in the docs.
In my tests, with all these settings, if you hammer SQLite with reads/writes, it won't be able do any WAL checkpoints (needs no active transactions), and the WAL file can quickly grow from, say, 8 MB to tens, hundreds of GB until your machine runs of out space (the docs mention this). So my default is to also keep track of the size of the WAL file, and if it grows beyond some threshold, temporarily suspend all reads/writes and checkpoint manually.
It sounds crazy, but most scripting languages add this stuff anyway.
With rqlite this can be extended to distributed systems programming as well.