"why not alternative", would be better framed as, "here's a fun variation" — because both approaches are just playing around with technology, for fun / curiosity / exploration. Storing in the pixels is a fun approach, resulting in something Rube Goldberg-esque.
weetii 1 days ago [-]
Hey, yeah, I wrote the article. This (of course) would be more practical. Thanks for pointing it out. I wanted the payload to "live" in actual pixel data rather than hidden text inside an XML file. That’s why I went this way :)
peter-m80 1 days ago [-]
The ico file format allows multiple resolution icons, so a lot of data
econ 13 hours ago [-]
You can use any image format. Try using an animated gif.
Good point, I might add a section in the article where I list alternative approaches. Thanks
dodslaser 1 days ago [-]
Or just put the entire website in a <foreignObject> and render it in the favicon.
cogman10 1 days ago [-]
If you wanted to play around and do something a little more challenging (though you'd be bulking up the javascript) then one thing you could do is play with a bespoke html compression. You could store the tags in 4 bits `0001` first bit, tag open or close, and the remaining 3 bits indicate which tag is being used (div/p/b/h1/etc). With at least one of the values like `0111` indicating text is following and another tag like `1111` indicating that an unsupported tag follows.
If you extend it out to 8 bits you can pretty nearly store all the html tags (it'd give you 256 tags to play with).
chrismorgan 1 days ago [-]
Regular expressions? Ugh. Encode it properly as XML in the correct namespace, load it so, and take it from that.
Or just serve the SVG file and use <foreignObject> to embed the HTML, and include <link rel="icon" href=""> inside it. In theory you should be able to define a <view id="icon"> and use <link rel="icon" href="#icon">, but in practice neither Firefox nor Chromium seems to be handling that properly in a favicon, which is disappointing.
Tepix 1 days ago [-]
It's a hack. A one-liner. Go crazy with it. Or touch grass ;-)
Oh yeah and favicon isn't part of the DOM.
decendegen 23 hours ago [-]
lol
reichstein 1 days ago [-]
Just because it's my windmill to tilt at: `[\s\S]` can be written shorter and more precisely as `[^]`.
MomsAVoxell 1 days ago [-]
[\s\S] vs. [^]
A quixotic windmill tilt if ever I saw one.
GoToRO 1 days ago [-]
ai says [^] is not portable; I did not test it. Too bad, I'll stick to [\s\S].
spiesd 1 days ago [-]
Too bad, indeed. It's well-defined in javascript (and thus, appropriate in this admittedly niche context). It's non-portable across different regex engines, yes.
You can use the favicon cache as storage too, by redirecting users across domains. It's been proposed as a potential fingerprinting risk[0], and if a browser naively reuses the cache for incognito mode, it could be used to track users across browser profiles.
My thoughts instinctively went to "this has to be being used for fingerprinting" when I read OPs blog. Are anti fingerprinting measures taking into account the use of the canvas api with favicons?
The link to the supercookie site is dead unfortunately.
Walf 2 days ago [-]
PNG has comment chunks tEXt, zTXt, and iTXt. You can have a completely normal image whose file is stuffed with as much content as you want. That is less fun, I suppose.
weetii 1 days ago [-]
Yes, that would also work, thanks for pointing it out
1 days ago [-]
franciscop 2 days ago [-]
Is this timing coincidence? I just submitted 1h (30 mins before this) ago a website I just made about storing your stock porfolio in a URL + favicon!
I found the agressively staccato, clearly LLM-generated content extremely difficult to read.
k2enemy 1 days ago [-]
Halfway through I was sure that there would be a reveal at the end of the article that the article itself was stored in the site's favicon, thus explaining the short, terse sentences. I was genuinely disappointed when I realized it wasn't. Missed opportunity!
bstsb 1 days ago [-]
for the first time in a while on HN, i disagree with the characterisation as AI-generated. at most it was drafted with an LLM, but the final output is pretty human to me.
they used the wrong it’s/its, made But. its own one-word sentence, didn’t capitalise HTML, and used “okayy” in parenthesis. all of this isn’t to criticise the writer - i enjoyed it more seeing these little imperfections that make up a blog post
FWIW -- I'm not as repulsed by it as the parent comment. But I do want to substantiate that it _is_ heavily LLM-written.
(If you're unfamiliar, Pangram has garnered a reputation as the leading LLM-detector, with a minimal rate of false positives; IME this has come with the tradeoff of being easy to manipulate/tweak your way into turning an LLM-generated piece of text into reporting a false negative, but for most folks that's worthwhile.)
darianvc 1 days ago [-]
People do be having too much time...
Is the navigation of the site also AI generated? This doesn't make any sense and proves why these AI detectors don't work
MomsAVoxell 1 days ago [-]
I have been writing for a long time, and using AI for as long as it was available to me, and I have noticed that I get accused of being an AI more and more - and I do not think that is because I am an AI, but because we are all being consumed with the AI mind-set, and thus the AI is taking over our thinking - so perhaps subconsciously, I am indeed formulating my thoughts in a more AI-centric manner, prompting the association across the vast distances of the internet by other human beings (- or, AI) of my conscious thought, with an artifice.
This is a banal insult, but it is also a dire warning wherever I see it - these days, people moaning about being AI may as well just be AI - automatic ignorance - but .. I do have to wonder.
Am I, AI?
phyzome 22 hours ago [-]
I wouldn't trust Pangram too much. I've seen it give "100% LLM-written, high confidence" to multiple 100% human written articles (high confidence).
benhill70 1 days ago [-]
I like the way it's written. I often write in a similar manner and I have never used LLMs to generate an writing for me. I have written exactly this way at work.
Too me, the author is just trying to get to the point. They know people start skimming if there is too much text.
SamBam 1 days ago [-]
> The important catch
> The favicon doesn't actually contain the whole website itself.
This is the kind of thing that is extremely idiomatic LLM speak. There's nothing particularly wrong about it per se, but it just makes everyone who is familiar with LLMs say "oh, it's written by an AI" and it just becomes disappointing.
themadturk 19 hours ago [-]
I complained about this style of writing on Medium a few months ago. The author of the article replied that it's a preferred style if you anticipate your writing to be read on a small smartphone screen. This kind of makes sense. Whether that article (or this one) was AI-generated or not, I don't know.
istjohn 1 days ago [-]
I found the writing engaging and enjoyable to read.
estetlinus 1 days ago [-]
It’s the new internet. So, so annoying.
scottmcdot 1 days ago [-]
Which bit? The short sentences?
bonoboTP 1 days ago [-]
Not just the length but the structure, the way the headlines are phrased, the use of "honestly", the "not X but Y", many things cumulatively, not one particular thing in itself. If you work a lot with LLM writing, you notice. Same way you recognize the writing style of famous authors. It's never one particular thing but many.
stevenhuang 24 hours ago [-]
There should be a pathology for thinking things must be LLM generated when it's simply not always the case.
People's ability to discern is completely fried.
esquivalience 6 hours ago [-]
Do you think I was wrong about this one? Maybe.
I'm usually fairly forgiving about it and like to err on the side of being generous to the individual but in this case it seemed very clear to me and got in the way of the message. I noticed the .de domain and wondered whether it might be AI translation, But I don't think it was, and in my experience, AI translation doesn't give the same uncanny valley vibes.
bonoboTP 1 days ago [-]
Agreed. Disappointing that more people don't notice it's AI.
noduerme 1 days ago [-]
Yeah, but it's kinda weird. The typical LLM headers and bullet points are there, but it's like someone took an axe to the rest of the spew. I too would rather read someone's original bad writing than their bad editing of AI writing, but it's kinda interesting how this all shakes out.
darianvc 1 days ago [-]
Might stop using bullet points for not being flagged as AI lol
"Very small" -> yeah, this header is mostly AI generated. No hate against the author but this doesn't make any sense as header
netsharc 1 days ago [-]
It doesn't seem to be LLM, but reads like one. The author is German, maybe it's a language expertise thing, maybe he likes the LLM style (unrelated to his nationality).
But yeah, sentences that only have 3-4 word each feel like 3rd grade writing; I couldn't read it.
weetii 1 days ago [-]
Hey, I've always written like this. In school I couldn't stand subordinate clauses and long sentences because I'd lose my train of thought. But yea, I've noticed that people often find it hard to read so I'm going to work on that
SoMomentary 1 days ago [-]
I actually feel the opposite of what most people are saying here. I thought your writing style was great. It felt like you respected my time as a reader and got to the fucking point.
I thought the lack of fluff was refreshing!
darianvc 1 days ago [-]
You're good fr. People on here who try to make their day about being AI detectives. You're trying to work on it and that's what matters
cubefox 1 days ago [-]
Did you use an LLM to write at least part of the article?
bartvk 1 days ago [-]
I wish people would include their prompts.
MomsAVoxell 1 days ago [-]
Oh, I am so aligned with this mentality:
A monitor is storage.
A keyboard is storage.
Forum posts are storage. Markov-approved tweaks in an edit, over time, certainly enough for quite a lot of storage. Dual-use storage to boot, since .. you know .. sometimes the comments are socially interesting.
Best thing is, nobody really knows if their chicken casserole recipe isn't just a handle to a carefully constructed GUID pointing across to .. lets say, for humor .. a thousand different forum postings ...
I do have to wonder if the author is familiar with PoC||GTFO, for this is certainly a technique one will find deep within the depths of the Alchemist Owls' holy tomes...
drob518 1 days ago [-]
Codes within codes. Wheels within wheels.
clusmore 17 hours ago [-]
I'd love to see you try serving the exact same file for both but the trick is that you need to return different Content-Type headers depending on what is requested. When the browser requests /favicon from a navigation event it will use Accept: text/html etc, you return the file with Content-Type: text/html and inside the response you have a <link rel="favicon" href="/favicon" type="img/png"> literally the same resource but the browser will now likely fetch with Accept: image/... and you could return the same file with Content-Type: image/png and the same resource will get used for both. Unless the browser caches the response, I feel like this would work.
If you don't control the headers of your webserver (eg GitHub Pages) I would settle for a symlink favicon.png that just links back to favicon.html which I think would trick the server into returning different Content-Types.
econ 13 hours ago [-]
I played with this once hoping the browser wouldn't care what the file was.
The plan came from an experiment from long ago where I put 1x1 images at the end on my pages, the images loaded from websites my page linked to. Preloading the assets made those pages load much faster. Sadly it also broke pages that served "hot linking images not allowed" text on images.
The new plan was to have a javascript or css file called favicon.ico so that the browser would load it at the same time the html was requested. Then one wouldn't have to wait for the html to be parsed for the second round trip to happen.
Sadly it didn't work.
jorisw 1 days ago [-]
Fun Fact: You can use any inline SVG for a favicon and keep it right in the HTML document.
This also allows you to use an emoji directly as a favicon, like so:
Just as a heads-up, if you do this and you want to use #rrggbb color codes or url(#id) links, you have to escape the # as %23, otherwise it gets parsed as a URL fragment and your SVG code is cut off there.
A 256b intro coded by placing pixels in photoshop and saving into an exe.
inglor 1 days ago [-]
Cool! Here is a GH repo demonstrating unbounded favicons I made 11 years ago - it crashes some browsers - wanna guess how long it took each one to fix it :D https://github.com/benjamingr/favicon-bug
1vuio0pswjnm7 19 hours ago [-]
Not a new idea of course. For example, back in 2000, someone stored deCSS in a favicon
That’s awesome. I took this a bit further a few years ago making a url only notepad quine that as you add data to it, creates itself. that can be saved as a bookmarklet. Have to watch the gif to understand
I saw one before that was doing this and saved the whole thing as a base64 stream. So the url would dynamically update and have all its data. Pretty cool. I suppose the main obstacle is now where do I load the site from, and how much can be stored in itself or an image :)
purple-leafy 14 hours ago [-]
That’s a pretty clever idea to treat it as a stream, why didn’t I think of that…
tetrisgm 12 hours ago [-]
Lmk if you end up pushing this further!
berkes 1 days ago [-]
I'd imagine the (aggressive) caching of the favicon by browsers makes it a challenge, but you could generate the favicon dynamically, then have JS extract the sequentially. Basically streaming arbitraily large content to a webpage via favicons. Via blocks of 239 bytes.
It may be a fun, novel way to proxy webpages that are otherwise blocked. Though, i guess, the service rendering the favicons can just as easily be blocked then.
echoangle 1 days ago [-]
> The length header is important because the image itself may contain unused pixels at the end. If there's no length value, there's no way to know where the real payload stops.
Not really, can’t you just pad with 0 bytes and stop reading when you encounter one that’s not part of the current Unicode codepoint?
zahlman 1 days ago [-]
Zero bytes won't ever be part of a multi-byte character in UTF-8. They simply represent code point 0 (which is valid, but wouldn't appear in normal text) by themselves.
echoangle 1 days ago [-]
Ah even better so you can just use null terminated strings
terrycody 15 hours ago [-]
Any real usage scenarios for this? I mean, can it be used as anti-filter or whatever things.
herodoturtle 1 days ago [-]
How long before someone ports DOOM into a favicon? ^_^
(For the technical gurus here, would that even be possible?)
shakna 1 days ago [-]
You can already play it in a favicon [0].
But as favicons can be svgs, and let you store foreign objects... You could store the whole thing in the favicon, but might also need a line of JS to extract it.
Have an index.html that's also (byte-to-byte equal) served as favicon.ico. If that page "works" and the favicon doesn't show garbage, it is a website stored in a favicon (by my standards).
divvsaxena 1 days ago [-]
This is one of those projects that's completely impractical but makes the web more interesting. I love seeing people explore weird constraints just to see what's possible.
Hmm this is cool but what are the practical use cases?
It didn’t load first time round on my browser (Brave) without disabling its prevent tracking feature…
MomsAVoxell 1 days ago [-]
Practical use cases for stashing data in places people least expect it?
Wallet password.
New ecosystem for the kids.
That's two, at least.
superjose 2 days ago [-]
Pretty cool tbh!!! Would have loved seeing the decoder code!!!
It's also pretty interesting to think how an attacker could exploit images on his behalf. Never thought that would be a way!!!
Thanks!
schobi 2 days ago [-]
I guess the decoder is more than the 208 bytes that this page uses..
But maybe you can misuse this and store a session ID / cookie in a favicon (give everyone a unique one) and survive some cookie cleanup and evade privacy restrictions?
Maybe you can still make it that the favicon looks like an image a little to not raise suspicion?
Favicons seem to be cached across private browsing sessions. Oh no
RetroTechie 1 days ago [-]
I'm tempted to think that only someone working for a company in the advertising industry could come up with that.
Must EVERYTHING be polluted by ad tech & privacy intrusions?
frankzero 1 days ago [-]
I personally won't do things this way, but this is really cool and I could see the applications already.
beardyw 1 days ago [-]
I would have used a minimal service worker to unpack the web data and present it as if it were just a normal page being loaded.
A neat improvement would be to make the decoder into a bookmarklet. This would avoid the overhead of serving the script. Of course you would rely on the user having the bookmarklet installed, but when you serve HTML you also rely on the user having a web browser installed.
Use this favicon.svg:
use this in your <head> to use a svg favicon: finally, use this in your <body> to extract it and add it to your document body:I don't know what this is but it's huge.
https://news.ycombinator.com/favicon.ico
If you extend it out to 8 bits you can pretty nearly store all the html tags (it'd give you 256 tags to play with).
Or just serve the SVG file and use <foreignObject> to embed the HTML, and include <link rel="icon" href=""> inside it. In theory you should be able to define a <view id="icon"> and use <link rel="icon" href="#icon">, but in practice neither Firefox nor Chromium seems to be handling that properly in a favicon, which is disappointing.
Oh yeah and favicon isn't part of the DOM.
A quixotic windmill tilt if ever I saw one.
So you could layer this experiment: favicon is svg, that contains encoded raster, whose bytes are encoded html.
At the very least it would make a mindboggling CTF step.
Nope, you can do it all in a single file with an html/png polyglot (and nowadays you can get better compression ratios with newer formats like webp).
https://web.archive.org/web/20120801001616/http://daeken.com...
[0]: https://www.schneier.com/blog/archives/2021/02/browser-track...
The link to the supercookie site is dead unfortunately.
https://news.ycombinator.com/item?id=48606396
“Pong in S Favicon” https://news.ycombinator.com/item?id=48608681
they used the wrong it’s/its, made But. its own one-word sentence, didn’t capitalise HTML, and used “okayy” in parenthesis. all of this isn’t to criticise the writer - i enjoyed it more seeing these little imperfections that make up a blog post
FWIW -- I'm not as repulsed by it as the parent comment. But I do want to substantiate that it _is_ heavily LLM-written.
(If you're unfamiliar, Pangram has garnered a reputation as the leading LLM-detector, with a minimal rate of false positives; IME this has come with the tradeoff of being easy to manipulate/tweak your way into turning an LLM-generated piece of text into reporting a false negative, but for most folks that's worthwhile.)
Is the navigation of the site also AI generated? This doesn't make any sense and proves why these AI detectors don't work
This is a banal insult, but it is also a dire warning wherever I see it - these days, people moaning about being AI may as well just be AI - automatic ignorance - but .. I do have to wonder.
Am I, AI?
Too me, the author is just trying to get to the point. They know people start skimming if there is too much text.
> The favicon doesn't actually contain the whole website itself.
This is the kind of thing that is extremely idiomatic LLM speak. There's nothing particularly wrong about it per se, but it just makes everyone who is familiar with LLMs say "oh, it's written by an AI" and it just becomes disappointing.
People's ability to discern is completely fried.
I'm usually fairly forgiving about it and like to err on the side of being generous to the individual but in this case it seemed very clear to me and got in the way of the message. I noticed the .de domain and wondered whether it might be AI translation, But I don't think it was, and in my experience, AI translation doesn't give the same uncanny valley vibes.
"Very small" -> yeah, this header is mostly AI generated. No hate against the author but this doesn't make any sense as header
But yeah, sentences that only have 3-4 word each feel like 3rd grade writing; I couldn't read it.
I thought the lack of fluff was refreshing!
Best thing is, nobody really knows if their chicken casserole recipe isn't just a handle to a carefully constructed GUID pointing across to .. lets say, for humor .. a thousand different forum postings ...
I do have to wonder if the author is familiar with PoC||GTFO, for this is certainly a technique one will find deep within the depths of the Alchemist Owls' holy tomes...
If you don't control the headers of your webserver (eg GitHub Pages) I would settle for a symlink favicon.png that just links back to favicon.html which I think would trick the server into returning different Content-Types.
The plan came from an experiment from long ago where I put 1x1 images at the end on my pages, the images loaded from websites my page linked to. Preloading the assets made those pages load much faster. Sadly it also broke pages that served "hot linking images not allowed" text on images.
The new plan was to have a javascript or css file called favicon.ico so that the browser would load it at the same time the html was requested. Then one wouldn't have to wait for the html to be parsed for the second round trip to happen.
Sadly it didn't work.
This also allows you to use an emoji directly as a favicon, like so:
(HN isn't showing the emoji):f3=Ygbukte!in"c"Hbviann:1h3> =n= " Exgtyvhkle znt%pfsdafipg thfivlmu "vas dcdpeed hrondbvjbno"rixfls, ;.q>
I use Helium on Linux with Polish locale.
A 256b intro coded by placing pixels in photoshop and saving into an exe.
https://web.archive.org/web/20010408040524if_/http://decss.z...
To extract
https://github.com/con-dog/serverless-architecture
It may be a fun, novel way to proxy webpages that are otherwise blocked. Though, i guess, the service rendering the favicons can just as easily be blocked then.
Not really, can’t you just pad with 0 bytes and stop reading when you encounter one that’s not part of the current Unicode codepoint?
(For the technical gurus here, would that even be possible?)
But as favicons can be svgs, and let you store foreign objects... You could store the whole thing in the favicon, but might also need a line of JS to extract it.
[0] https://vidferris.github.io/FaviconDoom/
Related interesting project: https://github.com/EtherDream/web2img
It didn’t load first time round on my browser (Brave) without disabling its prevent tracking feature…
Wallet password.
New ecosystem for the kids.
That's two, at least.
It's also pretty interesting to think how an attacker could exploit images on his behalf. Never thought that would be a way!!!
Thanks!
But maybe you can misuse this and store a session ID / cookie in a favicon (give everyone a unique one) and survive some cookie cleanup and evade privacy restrictions?
Maybe you can still make it that the favicon looks like an image a little to not raise suspicion?
Favicons seem to be cached across private browsing sessions. Oh no
Must EVERYTHING be polluted by ad tech & privacy intrusions?
cp index.html favicon.png