Hacker Newsnew | past | comments | ask | show | jobs | submit | vjerancrnjak's commentslogin

I tried just repeating guó for as many times as symbols and repetition was not recognized.

Although I like the active aspect of the approach. Language apps where sound is the main form of learning should have a great advantage, as any written text just confuses as every country has its own spin on orthography. Even pinyin, despite making sense, for a beginner, has so many conflicting symbols.


> I tried just repeating guó for as many times as symbols and repetition was not recognized.

Can you elaborate? I'm not sure I understand.


I think he's saying transliteration and romanization is horribly flawed in some instances.

I think this is standard. It applies to domains as well. I experienced government services blocks as well -- they send me an email, yet block my reply. I complain every time and rarely does anyone care, the support person does not escalate, so my email remains blocked, sometimes I'm told system is working as configured, completely ignoring that I am a real person and system is hostile towards me.

It's just general fragility of tech and lack of care from the creators/maintainers. These systems are steampunk, fragile contraptions that no one cares to actually make human friendly or are built on crappy foundations.


We call it the email mafia.

To send emails we need to pay for a mail service. Or get ads of course Gmail is part of the ring.

Like most things it start with good intentions, to fight spam. As if it even worked, I guess we would get far more without they will say.


It's one of the downsides of decentralized networks. Trust is built or pay-your-way-into'd.

This has nothing to do with decentralized networks. It's simple incompetence.

If you haven't received any mail from a mail system before (or in a long time) and then it sends you one message, it probably isn't spam, because spammers are typically going to send you a large number of messages. You also typically want to let the first few messages through so the recipient can see them and then classify it as spam or not, so that you get some data on how to treat future messages from that sender.

This is the same thing a centralized system should be doing with individual users. You impose some reputation on accounts (e.g. by sender/registration IP address) and then if that address starts spamming people it gets blocked, and otherwise it doesn't.


Is there a government requirement to be reachable by its citizens? That would seem to violate it.

I mean, yes? But that's by sending a letter, or a fax. Email is not part of this...

This is one of the things that E-Delivery (something which Europe is now implementing[1,2,3]) is going to fix.

It's sort of like email, but based on the XML stack (SOAP / WSDL / XML Crypto / XML Sig), with proper citizen authentication and cryptographically-signed proof of sending and delivery.

[1] https://ec.europa.eu/digital-building-blocks/sites/spaces/DI... [2] https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A... [3] https://ec.europa.eu/digital-building-blocks/sites/spaces/DI...


How ugly it is...

This should have been updated decades ago to include email. Is it possible for any government to function properly?

We are repeating obvious things here aren't we? I moved to Germany from a very pro IT country Finland. I've been here now for 15 years, and while I still disagree with their idea of dismissing email, I kind of got used to it. A couple more decades and it'll happen...

The main issue is that who is supposed to implement it? The gov has 2 possibilities: hire a contractor, or do it themself. DIY has the issue that nobody wants to work for the gov because as any IT specialist you'd earn 1/3 or 1/4 of what you would earn in a private company. Stateworkers here cannot be fired. So you trade money for extreme "stability" (read: laziness). Hiring a contractor requires money they also don't see the necessity to spend. And that's how you end up in this situation. There are also other issues like no national wide implementation plan. Every state, every commune has to figure out and build stuff themself.

This internal compulsion is just learned behavior. The society conditions you to work instead of play.

Nothing wrong with that, I have that compulsion as well.

Having a compulsion to play, purely for the sake of playing is a much healthier view. Useful, not useful, hard problem, easy problem, should not matter, you're playing.

Sometimes you can't be useful, yet you can always play.

All stems from inability to have systems without labor. Work, work.

I like how Pope John Paul II flipped the narrative and said work exist for the person, as a way for person to express itself. Made me realize how even communism stays trapped in labor mentality.


As we mature mentally, we need more interesting games to play, more interesting challenges. Work is many times the result of this.

It's the same with romance. When we are children we have a crush on somebody, become pretend "boyfriend and girlfriend", and as we mature the game becomes more interesting as it becomes real.

But it's all a game throughout life.

So perhaps it is those who enjoy work who has elevated their spiritual level, and not the other way around?


Was about to comment the same till I found your comment.

I have this compulsion too, and did some deep-diving at some point through therapy. I found that really it's just likely conditioning from family/society.

If you are generally praised for helping out whilst growing up and this is when you receive a lot of love/attention, it's natural to build pathways that favour this and thus behavioural patterns.


I like this thought. It is interesting to look at our current societal/economic systems on the earth and realize none of them will survive the death of scarcity.


In abstract terms capitalism doesn't depend on scarcity? Capitalism as in centralisation of the means of production (even when there is no human labor anymore).


It’s funny how there is continuous reinvention of parsing approaches.

Why isn’t there already some parser generator with vector instructions, pgo, low stack usage. Just endless rewrites of recursive descent with caching optimizations sprinkled when needed.


Hardware also changes across time, so while something that was initially fast, people with new hardware tries it, finds it now so fast for them, then create their own "fast X". Fast forward 10 more years, someone with new hardware finds that, "huh why isn't it using extension Y" and now we have three libraries all called "Fast X".


Because you have to learn how to use any given parser generator, naive code is easy to write, and there are tons of applications for parsing that aren't really performance critical.


I'd say because parsing is very specific kind of work heavily dependent on the grammar you're dealing with


A parser spends time:

1. Consuming tokens.

2. Recognizing the grammar.

3. Producing AST nodes.

Steps 1 and 3 are heavily dependent on the data types that make the most sense for the previous (lexing) and next (semantic analysis) phases of the compiler. There is no one Token type that works for every language, nor one AST type.

The recognizing the grammar part is relatively easy, but since so much of the code is consuming and producing datatypes that are unique to a given implementation, it's hard to have very high performance reusable libraries.


There are good parser generators, but potentially not as Rust libraries.



Meanwhile C++ has more than a hundred, with a focus on production-ready rather than innovative design patterns.

Both products are nothing but reliable. Redshift can’t even go around partitioning limits, or S3 limits.

But what’s funny is that Claude Code is from US company so can’t be used in a boycott scenario


Redshift is used at the largest e-commerce site in the world and was built specifically to “shift” away from “Big Red” (Oracle).


What can I say, I expected more than what they actually offer. A Redshift job can fail because S3 tells it to slow down. How can I make this HA performance product slower given its whole moat is an S3 based input output interface.

As a compute engine its SQL capabilities are worse than the slowest pretend timeseries db like Elasticsearch.


Are you trying to treat an OLAP database with columnar storage like an OLTP database? If you are, you would probably have the same issue with Snowflake.

As far as S3, are you trying to ingest a lot of small files or one large file? Again Redshift is optimized for bulk imports.


Redshift does not fit into aws ecosystem. If you use kinesis, you get up to 500 paritions with a bunch of tiny files, now I have to build a pipeline after kinesis that puts all of it into 1 s3 file, only to then import it into redshift which might again put it on s3 backed storage for Its own file shenanigans.

Clickhouse, even chdb inmemory magic has better S3 consumer than Redshift. It sucks up those Kinesis files like nothing.

Its a mess.

Not to mention none of its Column optimizations work and the data footprint of gapless timestamp columns is not basically 0 as it is in any serious OLAP but it is massive, so the way to improve performance is to Just align everything on the same timeline so its computation engine does not beed to figure out how to join stuff that is Actually time Aligned

I really can’t figure out how anyone can do seriously big computations with Redshift. Maybe people like waiting hours for their SQL to execute and think software is just that slow.


Really good to hear that. We've had AWS reps trying to push RedShift on multiple occasions after we've done our research and selected Clickhouse for our analytical workloads. Every time we have a meeting with them for some other reason - the topic of RedShift returns, they always want to discuss it again.

You realize “the pipeline” you have to build is literally just Athena SQL statement “Create table select * from…”. Yes you can run this directly from S3 and it will create one big file

https://docs.aws.amazon.com/athena/latest/ug/ctas.html

I have a sneaking suspicion that you are trying to use Redshift as a traditional OLTP database. Are you also normalizing your table like an OLTP database instead of like an OLAP

https://fiveonefour.com/blog/OLAP-on-Tap-The-Art-of-Letting-...

And if you are using any OLAP database for OLTP, you’re doing it wrong. It’s also a simple “process” to move data back and forth between Aurora MySQL or Postgres by federating your OlTP database with Athena (handwavy because I haven’t done it) or the way I have done it is use one Select statement to export to S3 and another to export into your OLTP database.

And before you say you shouldn’t have to do this, you have always needed some process to take data from your normalized data to un normalized form for reporting and analytics.

Source: doing boring enterprise stuff including databases since 1996 and been working for 8 years with AWS services outside AWS (startups and consulting companies) and inside AWS (Professional Services no longer there)

Why are you doing this manually? There is a built in way of doing Kinesis Data Streams to Redshift

https://docs.aws.amazon.com/streams/latest/dev/using-other-s...

Also by default, while you can through Glue Catalog have S3 directly as a destination for Redshift, by default it definitely doesn’t use S3.


These things cost money, Redshift handling live ingestion from Kinesis is tricky.

There is no need for Athena, Redshift ingestion is a simple query that reads from S3. I dont want to copy 10TB of data just to have it in 1 file. And yes, default storage is a bit better than S3 but for an OLAP database there seems to be no proper column compression and data footprint is too big resulting in slow reads if one is not careful.

I mentioned clickhouse, data is obviously not OLTP schemed.

I don’t have normalized data. As I mentioned, Clickhouse consumer goes through 10TB of blobs and ends up having 15GB of postprocessed data in like 5-10 minutes, slowest part is downloading from S3.

I am not willing to pay 10k+ a month for something that absolutely sucks compared to a proper OLAP db.

Redshift is just made for some very specific, bloated, throw as much software pipelines as you can, pay as much money as you can, workflows that I just don’t find valuable. Its compute engine and data repr is just laughably slow, yeah, it can be as fast as you want by throwing parallel units but it’s a complete waste of money.


It seems like you want a time series database not an OLAP. Every problem you described you would also have with Snowflake or another OLAP database


Thanks for having this discussion with me. I believe I don't want a time series database. I want to be able to invent new queries and throw them at a schema, or create materialized views to have better queries etc. I just don't find Snowflake or Redshift anywhere close to what they're selling.

I think these systems are optimized for something else, probably organizational scale, predictable low value workloads, large teams that just throw their shit at it and it works on a daily basis, and of course, it costs a lot.

My experience after renting a $1k EC2 instance and slurping all of S3 onto it in a few hours, and Redshift being unable to do the same, made me not consider these systems reliable for anything other than ritualistic performative low value work.


I’ve told you my background. I’m telling you that you are using the wrong tool for the job. It’s not an issue with the database. Even if you did need an OLAP database like Reddhift, you are still treating it like an OLTP database as far as your ETL job. You really need to do some additional research


I do not need JOINs. I do not need single row lookups or updates. I need a compute engine and efficient storage.

I need fast consumers, I need good materialized views.

I am not treating anything like OLTP databases, my opinion on OLTP is even harsher. They can’t even handle the data from S3 without insane amounts of work.

I do not even think in terms of OLTP OLAP or whatever. I am thinking in terms of what queries over what data I want to do and how to do it with the feature set available.

If necessary, I will align all postgresql tables on a timeline of discrete timestamps instead of storing things as intervals, to allow faster sequential processing.

I am saying that these systems as a whole are incapable of many things Ive tried them to do. I have managed to use other systems and did many more valuable things because they are actually capable.

It is laughable that the task of loading data from S3 into whatever schema you want is better done by tech outside of the aws universe.

I can paste this whole conversation into an LLM unprompted and I don’t really see anything I am missing.

The only part I am surely missing are nontechnical considerations, which I do not care about at all outside of business context.

I know things are nuanced and there’s companies with PBs of data doing something with Redshift, but people do random stuff with Oracle as well.


And you honestly still haven’t addressed the main point - you are literally using the wrong tool for the job and didn’t do your research for the right tool. Even a cursory overview of Redshift (or Snowflake) tells you that it should be used for bulk inserts, aggregation queries, etc.

Did you research how you should structure your tables fir optimum performance for OLAP databases? Did you research the pros and cons of using a column based storage engine like Redshift to a standard row based storage engine in an traditional RDMS? Not to mention depending on your use case you might need ElssticSearch.

This if completely a you problem for not doing your research and using the worse possible tool for your use case. Seriously, reach out to an SA at AWS and they can give you some free advice, you are literally doing everything wrong.

That sounds harsh. But it’s true.


Clickhouse is column based storage, I can also apply delta compression, where gapless timestamp columns basically have 0 storage cost. I can apply Gorilla as well and get nice compression from irregular columns. I am aware of Redshift's AZ64 cols and they are a let down.

I can change sort order, same as in Redshift with its sort keys, to improve compression and compute. Redshift does not really exploit this sort-key config as much as it could.

My own assessment is that I'm extremely skilled at making any kind of DB system yield to my will and get it to its limits.

I have never used Redshift, Clickhouse or Snowflake with 1 by 1 inserts. I have mentioned S3 consumers (a library or a service, optimized to work well with autoscaling done by S3, respecting SlowDown -- something Redshift itself is incapable of respecting -- and achieving enormous download rates -- some of the consumers I've used completely saturate the 200Gbps limits of some EC2 machines at AWS). These consumers cannot be used in a 1-by-1 setting, the whole point is to have an insanely fast pipelining system with batched processing, interleaving network downloads with CPU compute, so that in the end, any kind of data repackaging and compression is negligible compared to download, so you can just predict how long the system will take to ingest by knowing what your peak download speed is, because the actual compute is fully optimized and pipelined.

Now, it might just be Redshift has bugs and I should report them, but I did not have the experience of AWS reacting quickly to any of the reports I've made.

I disagree, it's not a me problem. I am a bit surprised after all I've written that you're still implying I want OLTP, am using the wrong tool for the job. There are just some tools I would never pick, because they just don't work as advertised, Redshift is one of them. There are much better in-memory compute engines that work directly with S3, and you can create any kind of trash low-value pipelines with them, if you reach mem limits of your compute system, there are much better compute engine + storage combos than Redshift. My belief is that Redshift is purely a nontechnical choice.

Now, to steelman you, if you're saying:

* data warehouse as managed service,

* cost efficiency via guardrails,

* scale by policy, not by expertise,

* optimize for nontechnical teams,

* hide the machinery,

* use AWS-native bloated, slow or expensive glue (Glue, Athena, Kinesis, DMS),

* predictable monthly bill,

* preventing S3 abuse,

* preventing runaway parallelism,

* avoiding noisy-neighbor incidents (either by protecting me or protecting AWS infra),

* intentionally constrained to satisfy all of the above,

then yes, I agree, I am definitely using the wrong tool but as I said, if the value proposition is nontechnical, I do not really care about that.


> My own assessment is that I'm extremely skilled at making any kind of DB system yield to my will and get it to its limits.

Yes an according to my assessment I’m also very good in bed and extremely handsome.

But there is an existence proof seeing that you are running into issues yet millions of people use AWS services and know how to use the right tool for the job

I’m not defending Redshift for your use case, I’m saying you didn’t do your research and you did absolutely everything wrong. From my cursory research of Clickhouse, I probably would have chosen that too for use case


I did not do anything wrong. I had no choice with Redshift and had instructions from above. I made it work really well for what it can do and was surprised how much it sucks even when it has its own data inside of it and has to do compute. As a completely closed system, it's not impressive at all. It has absolutely shameful group-by SQL, completely inefficient sort-key and compression semantics, and absolutely can't attach itself to Kinesis directly without costing you insane amounts of money, because as you already know, Redshift is not a live service (you won't use it by connecting directly to it and expect good performance), it's primarily a parallel compute engine.

Your assessment of me is flawed. You haven't really shown any kind of low-level expertise on how actually these systems work, you've just name dropped OLTP OLAP as if that means anything at all. What is Timescale (now TigerData), OLTPOLAPBLAPBLAP? If someone tells you to use Timescale, you have to figure out how to use it and make the system yield to your will. If system sucks, it yields harder, if system is well designed, it's absolutely beautiful. For example, I would never use Timescale as well, yet you can go on their page and see unicorns using it. I have no idea why, but let them have their fun. There's successful companies using Elasticsearch for IoT telemetry, so who am I to argue I wouldn't do that as well.

There's nothing wrong with using PostgreSQL for timeseries data, you just need to know how to use it. At some point, scaling wise, it will fail, but you're deciding on tradeoffs.

So yes, my assessments have a good track record, not only of myself, but of others as well. I am extremely open to any kind of precise criticism and have been wrong bazillion times and I take part in these kinds of passionate discussions on the internet because I am aware I can absolutely be convinced of the other side. Otherwise, I would have quit a long time ago.


I remember being quite surprised that my implementation which uses manual stack updates is much slower than what compiler had with recursion.

Turns out, I was pushing and popping from stack on every conceptual "recursive call", but compiler figured out it can keep 2-3 recursive levels in registers and pop/push 30% of the time, had more stuff in memory than my version as well.

Even when I reduced memory read/writes to ~50% of the recursive program, kept most of the state in registers, the recursive program was faster anyway due to just using more registers than me.

I realized then that I cannot reason about the microoptimizations at all if I'm coding in a high-level language like C or C++.

Hard to predict the CPU pipeline, sometimes profile guided optimization gets me there faster than my own silliness of assuming I can reason about it.


Bruteforce thinking works in this case, given that there's only ~12*2^12 total states and transition matrix is very sparse, 1/11 is quick to calculate.

But not all of these states are valid, visited set is just defined by 2 markers on the circle (and the start position), so now state count is much smaller.

Ladybug needs to be on 7 or 5 while having a nice (7,5) visited state to reach 6, movements inside (7, 5) don't really matter, so state count gets to 12*11/2=66. Quite small and enough to do by hand.

edit: been thinking a bit on finding a short proof, as 1/11 (or 1/(N-1) in general case) sounds like there could be a nice short proof, but it only made me realize how these constructive proofs are so clean and any attempts to formalize this gets me into graph theory vibes where I just feel like proof is making nonsymbolic leaps in reasoning that I just can't feel are true.


https://www.scylladb.com/2019/12/12/how-scylla-scaled-to-one...

I like this one where they put a dataset on 80 machines only then for someone to put the same dataset on 1 Intel NUC and outperform in query time.

https://altinity.com/blog/2020-1-1-clickhouse-cost-efficienc...

Datasets never become big enough…


>Datasets never become big enough…

Not only is this a contrived non-comparison, but the statement itself is readily disproven by the limitations basically _everyone_ using single instance ClickHouse often run into if they actually have a large dataset.

Spark and Hadoop have their place, maybe not in rinky dink startup land, but definitely in the world of petabyte and exabyte data processing.


When a single server is not enough, you deploy ClickHouse on a cluster, up to thousands of machines, e.g., https://clickhouse.com/blog/how-clickhouse-powers-ahrefs-the...


Well, at my old company we had some datasets in the 6-8 PB range, so tell me how we would run analytics on that dataset on an Intel NUC.

Just because you don't have experience of these situations, it doesn't mean they don't exist. There's a reason Hadoop and Spark became synonymous with "big data."


These situations are rare not difficult.

The solutions are well known even to many non-programmers who actually have that problem:

There are also sensor arrays that write 100,000 data points per millisecond. But again, that is a hardware problem not a software problem.


Well yeah, but that's a _very_ different engineering decision with different constraints, it's not fully apples to apples.

Having materialised views increases insert load for every view, so if you want to slice your data in a way that wasn't predicted, or that would have increased ingress load beyond what you've got to spare, say, find all devices with a specific model and year+month because there's a dodgy lot, you'll really wish you were on a DB that can actually run that query instead of only being able to return your _precalculated_ results.


If you remove diacritics its completely valid BCS and same meaning.


BCS have a word for mermaid/siren though (sirena) so it's Mala Sirena. Which makes sense with the sea right there and proximity to Greece so Homeric legends about sirens will presumably be in the culture.


There's a nice song by Daleka Obala - Morska Vila. That's a first ring.


Makes sense. Maybe it's a bit like how informal English tends to use Germanic rather than Latin derivations?


Where's the great poet or singer songwriter to capture the essence of our humanity today?

All the myths have been captured and repackaged already... that lucky old sun.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: