More

Wiretrip · 2025-07-31T14:24:49 1753971889

Not just Altman, but also Amodei, Pichai and Nadella. These guys are desperate now.

Wiretrip · 2025-07-31T14:23:19 1753971799

Yes but the main thrust of his criticism is the fragility of the whole 'AI' stack, where every part of the stack is unprofitable, unstable and inter-reliant - if any one of those parts goes tits up - everybody else goes with them.

Also, a lot of AI 'users' arrive, search for an actual use case, find none, and then move on.

Wiretrip · on April 6, 2020

You've got to remember that the average IQ is 100. There are a lot of people in the UK below this, just look at Brexit and who we elected, sigh....

samizdis · on April 6, 2020

https://newsthump.com/2020/04/05/5g-conspiracy-theories-all-...

rbanffy · on April 6, 2020

Half of the population will have an IQ that's below average.

Wiretrip · on April 7, 2020

Actually I think it's 52% <ducks> :-)

Wiretrip · on April 6, 2020

I would love to see an article about how people pronounce some of the strange names and acronyms we have ended up with. Quick survey? 'etc', 'gif', 'uefi', 'wget', 'cifs' ?

dsr_ · on April 6, 2020

I once made a British sysadmin T|N>K * when I referred to eff tee pee dot ick ack ook, a repository of open source software.

(* tea piped through nose out to keyboard)

twic · on April 6, 2020

ick ack uck, please!

riffic · on April 6, 2020

I'll just say "et-sea" and stop there to avoid further arguments

timrice · on April 6, 2020

Etcetra Gift Yoo-effy Wuh-get Sifs

Wiretrip · on March 12, 2020

Yes it was real. My fav phrase to describe the work was 'KY2K Jelly - helps you insert 4 digits where only 2 would go before' :-)

Wiretrip · on March 3, 2020

PDF is, without a doubt, one of the worst file formats ever produced and should really be destroyed with fire... That said, as long as you think of PDF as an image format it's less soul destroying to deal with.

lm28469 · on March 3, 2020

PDF is good at what it's supposed to be good. Parsing pdf to extract data is like using a rock as a hammer and a screw as a nail, if you try hard enough it'll eventually work but it was never intended to be used that way.

mumblemumble · on March 3, 2020

I think my fastener analogy would probably involve something more like trying to remove a screw that's been epoxied in. Or perhaps trying to do your own repairs on a Samsung phone.

It's not that the thing you're trying to do is stupid. It's probably entirely legitimate, and driven by a real need. It's just that the original designers of the thing you're trying to work on didn't give a damn about your ability to work on it.

Finnucane · on March 3, 2020

Actually, parsing text data from a pdf is more like using the rock to unscrew a screw, in that it was not meant to be done that way at all. But yeah, the pdf was designed to provide a fixed-format document that could be displayed or printed with the same output regardless of the device used.

I'm not sure (I haven't thought about it a lot) that you could come up with a format that duplicates that function and is also easier to parse or edit.

anoncake · on March 3, 2020

It's closer to using a screwdriver to screw in a rock. The task isn't supposed to be done in the first place but the tool is the least wrong one.

mark-r · on March 3, 2020

I would think any word processing document format would duplicate that function and be better.

bachmeier · on March 3, 2020

It's pretty silly when you think about it. There's an underlying assumptions that you'll work with the data in the original format that you used to make the PDF.

_8ljf · on March 4, 2020

“PDF is good at what it's supposed to be good.”

QFT. PDF should really have been called “Print Description Format”. At heart it’s really just a long list of non-linear drawing instructions for plotting font glyphs; a sort of cut-down PostScript.

https://en.wikipedia.org/wiki/PostScript

(And, yes, I have done automated text extraction on raw PDF, via Python’s pdfminer. Even with library support, it is super nasty and brittle, and very document specific. Makes DOCX/XLSX parsing seem a walk in the park.)

What’s really annoying is that the PDF format is also extensible, which allows additional capabilities such as user-editable forms (XFDF) and Accessibility support.

https://www.adobe.com/accessibility/pdf/pdf-accessibility-ov...

Accessibility makes text content available as honest-to-goodness actual text, which is precisely what you want when doing text extraction. What’s good for disabled humans is good for machines too; who knew?

i.e. PDF format already offers the solution you seek. Yet you could probably count on the fingers of one hand the PDF generators that write Accessible PDF as standard.

(As for who’s to blame for that, I leave others to join up the dots.)

ProZsolt · on March 3, 2020

PDF is great what it meant to be, a digital printed paper, with its pros (It will look exactly the same anywhere) and cons (Can't easily extract data from it or modify it).

Currently, there is no viable alternative if you want the pros but not the cons

bobbylarrybobby · on March 3, 2020

For me, the biggest con of PDFs is that like physical books, the font family and size cannot be changed. This means you can't blow the text up without having to scroll horizontally to read each line or change the font to one you prefer for whatever reason. It boggles my mind that we accept throwing away the raw underlying text that forms a PDF. PDF is one step above a JPEG containing the same contents.

mumblemumble · on March 3, 2020

> Currently, there is no viable alternative if you want the pros but not the cons

I remember OpenXPS being much easier to work with. That might be due to cultural rather than structural differences, mind - fewer applications generate OpenXPS, so there's fewer applications to generate them in their own special snowflake ways.

ProZsolt · on March 3, 2020

This is the first time I heard of it. When I search for it I only find the Wikipedia article and 99 links to how to convert it to pdf.

The problem with this is that from an average person perspective it doesn't have the pros. There is no built-in or first-party app that can open this format on Mac and Linux. More than 99% of the users only want to read or print it. It's hard to convince them to use an alternative format when it's way more difficult to do the only thing they want to do.

degski · on March 3, 2020

It's a Windows-thing, since W7, IIRC. It's ok now, but it has been buggy for years, and yes, who eats xps-files, so better it is, but it's not more useful.

tonyedgecombe · on March 3, 2020

It was too late and probably too attached to Microsoft to succeed. It is still used as the spool file format for modern printer drivers on Windows.

carapace · on March 3, 2020

Screenshots of Smalltalk. (I'm joking.)

bob1029 · on March 3, 2020

We have to fill existing PDFs from a wide range of vendors and clients. Our approach is to raster all PDFs to 300DPI PNG images before doing anything with them.

Once you have something as a PNG (or any other format you can get into a Bitmap), throwing it against something like System.Drawing in .NET(core) is trivial. Once you are in this domain, you can do literally anything you want with that PDF. Barcodes, images, sideways text, html, OpenGL-rendered scenes, etc. It's the least stressful way I can imagine dealing with PDFs. For final delivery, we recombine the images into a PDF that simply has these as scaled 1:1 to the document. No one can tell the difference between source and destination PDF unless they look at the file size on disk.

This approach is non-ideal if minimal document size is a concern and you can't deal with the PNG bloat compared to native PDF. It is also problematic if you would like to perform text extraction. We use this technique for documents that are ultimately printed, emailed to customers, or submitted to long-term storage systems (which currently get populated with scanned content anyways).

rahimnathwani · on March 3, 2020

You could probably reduce file size by generating your additions as a single PDF, and then combining that with the original 'form', using something like

pdftk form.pdf multibackground additions.pdf output output.pdf

mopsi · on March 3, 2020

> No one can tell the difference between source and destination PDF unless they look at the file size on disk.

Not even when they try to select and copy text?

hnick · on March 3, 2020

You can add PDF tag commands to make rasterised text selectable and searchable, though they probably aren't doing that.

equasar · on March 3, 2020

Any recommended library for .NET to extract text by coordinates?

Rury · on March 3, 2020

There's itext7 (also for java). Not sure how it compares with other libraries, but it will parse text along with coordinates. You just need to write your own execution strategy to parse how you want.

From my experience, it seems to grab text just fine, the tricky part is identifying & grabbing what you want, and ignoring what you don't want... (for reasons mentioned in the article)

https://github.com/itext/itext7-dotnet

https://itextpdf.com/en/resources/examples/itext-7/parsing-p...

bob1029 · on March 3, 2020

I don't know that this could exist for all PDFs.

Sounds like you are in need of OCR if you want to be able to use arbitrary screen coords as a lookup constraint.

eterps · on March 3, 2020

Lots of people doing their daily jobs are not aware of the information loss that occurs whenever they are saving/exporting as PDF.

totololo · on March 3, 2020

In the consulting industry I’ve seen PDF being used precisely because third parties couldn’t mess with the content anymore.

KineticLensman · on March 3, 2020

Yes, the company I once worked for used to supply locked PDF copies to make it slightly harder for casual readers to re-use / steal our text.

ldenoue · on March 3, 2020

That’s the approach I’m using to reformat “reflow” PDFs for mobile in my app https://readerview.app/

darknoon · on March 3, 2020

The first link on your demo gives me an error (mobile safari) https://www.appblit.com/pdfreflow/viewdoc?url=http://arxiv.o...

Gatsky · on March 3, 2020

I have been waiting for this for so long. It really works, well done.

stronglikedan · on March 3, 2020

Tell that to the entire commercial print industry, where they work very well.

Ididntdothis · on March 3, 2020

Yup. I still have PTSD from a project where I needed to extract text from millions of PDFs

adrianN · on March 3, 2020

What alternative do you propose? Postscript?

Koshkin · on March 3, 2020

Why not, .ps.gz works pretty well.

hamburglar · on March 3, 2020

... and is much more difficult to extract text from than PDF, given that it's turing complete (hello halting problem) and doesn't even restrict your output to a particular bounding box.

goatlover · on March 3, 2020

It was never meant to be a data storage format. It's for reading and printing.

BlueTemplar · on March 3, 2020

Except it sucks for reading?

goatlover · on March 3, 2020

I haven't experienced problems reading articles and books in PDF format on my phone.

efreak · on March 5, 2020

I read ebooks on my Nintendo DSi for several years when I was in college; The low-resolution screen combined with my need for glasses (and dislike of wearing them) made reading PDF files unbearable. Later on I got a cheap android tablet and reading PDF files was easier, but still required constant panning and zooming. Today I use a more modern device (2013 Nexus 7 or 2014 NVidia Shield), and I still don't like PDF files. I usually open the PDF in word if possible, save it in another format, then convert to epub with calibre, and dump the other formats.

Epubs in comparison are easy, as all it takes is a single tap or button press to continue. When there's no DRM on the file (thanks HB, Baen) I read in FBReader with custom fonts, colors, and text size. It doesn't hurt any that the epub files I get are usually smaller than the PDF version of the same book.

Personally, I think the fact that Calibre's format converter has so many device templates for PDF conversion says a lot.

_8ljf · on March 4, 2020

Try being visually impaired.

drdeadringer · on March 3, 2020

I have a doubt. What am I missing?

grishka · on March 3, 2020

You clearly haven't ever worked with MP3.

Wiretrip · on March 3, 2020

Totally agree. I worked on a project that had to try and extract tables from PDFs. It is much harder that it would first appear.

ldenoue · on March 3, 2020

Detecting where tables are is still an active research areas. Once we know where on the page, it’s easier to parse out their structure.

Wiretrip · on Jan 22, 2020

Actually a very good alternative would be Java and JavaFX, maybe using Eclipse as the IDE.

FpUser · on Jan 22, 2020

I used either. Not really good alternative IMO

thu2111 · on Jan 22, 2020

Well why not? I've used Delphi for years and also Java/JavaFX/Swing. There's not much reason to use Delphi anymore when Java IDEs can do the same things, and much more, and with docs that are just as good or better, all for free.

FpUser · on Jan 22, 2020

In Delphi I do desktop applications that perform real time low level device control, talk over USB and some specialized radio gizmos, DirectX graphics and multimedia processing, real time data presentation, bulk data processing, game like communications over UDP etc. etc. All at the same time from a different threads. Good luck doing it in Java. I did try doing low level stuff with it per client request and it felt like masochism. Java was created with the different things in mind.

p2detar · on Jan 22, 2020

The only thing that you seem to be right about is the low level device control stuff. Everything else is pretty much already easy enough to do in Java. The language was designed with multithreading in mind.

FpUser · on Jan 22, 2020

"...multithreading in mind..." - Easy and efficiency are 2 different things. Java synchronization primitives and it's ability to flip bits incur heavy performance penalty when compared to native ones. You might not care about it but I do.

You are also free to point me to a good DirectX framework in Java and/or the one that let's you write Directshow and Media Foundation filters and graphs as well.

Besides your logic reads like this: since language A does the same as language B and you happen to like language A then nobody needs B. Sorry but it works both ways.

When I was doing enterprise stuff Java provided more value to me in that particular situation. Windows desktop: sorry I'll choose Delphi any time.

thu2111 · on Jan 23, 2020

Java synchronization primitives and it's ability to flip bits incur heavy performance penalty when compared to native ones

I think your knowledge on this might be a bit out of date. If you look at the Unsafe class (in old versions of Java) or the VarHandle class in newer versions, you can implement synchronisation and bit flipping with the same efficiency as Delphi, indeed it compiles down to optimal machine code.

You are also free to point me to a good DirectX framework in Java and/or the one that let's you write Directshow and Media Foundation filters and graphs as well.

Well, you can do this, albeit nobody really does because Java code tries culturally to be cross platform so using DirectX would be an odd thing for it to do. There are COM frameworks that let you write COM objects in Java. Whether you can do high performance image processing does depend on whether the JITC auto-vectorises though, at least until the vector API ships.

But you can for instance instantiate a JavaFX video player and then apply hw accelerated effects to it. Writing your own image filters would these days be best done in shader languages or lots of AVX anyway, so neither Delphi nor Java would really be appropriate for that.

Windows desktop: sorry I'll choose Delphi any time.

Yes, if you want to specifically write a Windows-only desktop app (that isn't WPF/WinUI/the new stuff I guess) and use lots of Win32 specific APIs, then Delphi is going to be better at that. On this we fully agree.

Still, these days people mostly want to write for at least Windows and macOS if not also Linux. And then you can write a good desktop app with JavaFX and the java packager tool (with native installers and the general appearance of a themed Windows app).

FWIW I used Delphi for years in the 1990s, even wrote a 3D video game with it in OpenGL. I've also written cross platform desktop apps with Java/JavaFX. So I have a lot of experience of both to compare them against. Delphi was fantastic at the time, and I still sometimes point out to people how the web platform fails to compare to even Delphi 2 + Windows 95 in many ways. I don't think we're really so far apart. But times have changed and Delphi wins only if you compare it to the web.

p2detar · on Jan 22, 2020

> Windows desktop: sorry I'll choose Delphi any time.

Let me quote you then.

> Besides your logic reads like this: since language A does the same as language B and you happen to like language A then nobody needs B. Sorry but it works both ways.

There are perfectly good enough tools for the Windows Desktop already and they are available directly from Microsoft.

FpUser · on Jan 22, 2020

Visual C is not such a good tool to write desktop application. .NET monstrocities with almost the same problems in regards to low level stuff, multimedia etc as Java when I can have single exe with no dependencies? Thanks but no thanks.

Also I do not really appreciate mentoring. Was doing just fine completely on my own for the last 20 years.

benibela · on Jan 22, 2020

Delphi can show the assembly code generated for every line.

That is important to actually learn how the computer works

billfruit · on Jan 22, 2020

I would suggest IntelliJ than Eclipse presently.

Wiretrip · on Jan 23, 2020

Agreed. I discounted that becuase of the cost but I guess there's the community edition!

Aperocky · on Jan 22, 2020

Or html canvas and js.

Wiretrip · on Jan 22, 2020

Positive reasons: 1) It runs out of the box without and Internet ocnnection. 2) The IDE is extremely good. 3) It teaches OOP and memory management (something often forgotten by today's programmers). 4) It is capable of giving very low level access to the host machine (like C). 5) You can even use (and teach) tricky things like pointers.

badsectoracula · on Jan 22, 2020

Last time i checked recent versions of Delphi have DRM that need internet connection. I even remember someone from the official newsgroups saying that he had to install a crack so that they can use it on foot in their laptop when they had no internet connection.

jstimpfle · on Jan 22, 2020

TBH, I found the low-level pointer stuff to be inaccessible. There is so much magic going on behind the scenes, and on the other hand it's hard to do pointer value arithmetic as well as pointer type arithmetic (you need to create aliases to no end). C is much more convenient for dealing with pointers, and it makes it much more obvious what happens from reading the code.

In Delphi culture, people say PChar instead of char* , and make tons of PFoo and PPFoo. And I think the reason is that some syntactical restrictions made Pascal easier to implement back in its time (last I checked there were still significant restrictions in Delphi)

Wiretrip · on Jan 22, 2020

I am a professional programmer and I still use Delphi. It's great!