Your house. Your family and kids. Your health status & insurances. Your job & income. Your car. Your friends. I'm guessing all of those things are, to a varying degree, important to most of us. We care about them, and that they are safe and well. We usually do not share too many details about them with complete strangers. Except when you use web sites online...
In daily regular off-line life it is fairly common to assume the following:
Yet these are details many happily share, knowingly or unknowingly, when they get online, with complete strangers, be it people or companies. A surprising amount of people actually do know they "over-share" information, but seem to think "it is alright, it isn't like Google/Facebook/Microsoft/whoever would be able to do anything with it anyways" and sort of picture the smiling kind faces of Larry and Sergei, the founders of Google, and who indeed look like very nice people (as opposed to the robot stare of Facebook founder Mark Zuckerberg).
Another misconception too many people have is "I have nothing to hide" or "I'm not that interesting and I'm not involved in illegal activities, good luck with finding my details in the information hay stack" without realising they are helping out in overturning democratic elections around the globe, in their roles of being easily led and controlled, because they can (easily) be found, categorized and used as micro-influencers. It isn't people sifting through all the data. Computers do. Computers are really really good at it too. That is what they do. That is why we build them and that is why we use them.
Have a look at the image to the right to see the draft over what "conclusions" about people Cambridge Analytica managed to compile in the Facebook scandal, based on the data they shared and how/who they interacted with others. This type of data was what managed Cambridge Analytica to successfully target large groups of people with just the right type of message triggers that made them vote in particular ways, in a controlled fashion.
Lets make a test: go to this website and let it load fully: https://www.whatismyip.com/my-ip-information/ That is mainly to get a "blueprint" of facts, that we later can compare with. If you are not connected to a VPN, it is quite likely that you will see your current Internet Protocol (IP, or in this case abbreviated "IPv4") address under the headline saying "My IP Information".
Below that there are headlines saying "Geolocation Info" and "Host Info". The Geolocation is the location information that service is guessing you are on, i.e. an attempt at guessing your physical actual location. The Host Info section is that service guessing who is your internet provider.
In my case, looking at this link in Google Chrome without any special plugins, and not connected on a VPN. The service correctly picks out my currently assigned IP address, the country/state/town I'm in along with my Internet Service Provider (ISP) name. We will be using this data for comparison on the next link I will ask you to open.
Lets keep that tab open in our browser and open another tab. In the new tab, go to this web site and let it load fully: https://ipleak.net/
Now, this service overlaps to some degree with the first link, however it provides even more information. You can compare the information given on the first link with the second link, which was the main reason of the first link anyways, so you can see neither site just produces made up mumbo-jumbo that you can't check/verify yourself. Both sites should be showing the same information.
The information presented on this second service is what any/all web sites can extract from you, just by you visiting them. I'm not saying all web sites/services does extract this data from you, but I'm saying that they can, should they choose to. Many do. This is before you start typing things into form fields (leaking information), or before you even start clicking on links or other things (selections that can be catalogued).
The reason I point this out is because for example advertising networks often place either a string of code on the page they have their advertisement on, or they present an image on the page you visit, but that code or image is in reality hosted somewhere else, on another server, that they control. Simply visiting one web site URL therefore can mean you are actually giving away this data to several companies at the same time.
Depending on your default settings in your browser, zooming in on the map on that page will have a harrowing precision. In my unprotected Google Chrome it literally zooms all the way down to my house, which is a freestanding house in a suburban crowded area.
Having the exact, or even the approximate, address of someone is perfect for advertising purposes. They can easily check the median salary of that neighbourhood, house prices, shopping patterns, political party affiliation percentage etc for our street etc even if they don't have my exact house. All of those things are open data that they just have to connect to, if they have something to go on. I/you just gave them that.
If you ever download/use BitTorrent you should probably also do the Torrent test with the magnet link, to see what address you reveal there.
Scroll down a bit, and you'll find a headline saying "Geek details". You might not consider yourself "a geek", but still, those things can be educational to see, because I can tell you this: the companies we are talking about here, and that we are trying to avoid giving too much information, have geeks employed for sure. I know. I used to be one of them. Showing "the geeks" that data will make them able to teach them quite a lot about the web browser and the computer you use, and thereby also teach them more than you think about who you are and how you prioritise.
For example, in my example it is obvious that I'm using a very expensive iMac which is a couple of years old though, running the latest software of everything. I might be in the market for purchasing another computer soon, right? But it also tells them, compared to my neighbours, that I probably spend more on computers and computer gear compared to others in my area. Any hi-tech advertising would probably not be amiss on me...or I might have surplus money to spend on something other they would like to push in my face. They could be wrong, but chances are they are quite good at these guessing games based on various criteria, in long chains of logic.
Now, just a reminder, all of the data above will give the following information:
As I said above, this is before I've even started clicking on things or filled in any forms, which obviously can teach them a lot more, and quite probably can reveal my real and true identity.
Now, if we shift our focus back to Facebook and Google, they are services where you have literally agreed with them that they can save your profile. Your profile in turn can then be linked to various "metadata". That metadata can contain everything they think is applicable to your profile, wether you like it or not, whilst they let you manage your own personal data (phone, email addresses, friends etc) to give you the illusion of control.
Even if you download "all your data" (of which they will still keep their copy), that is basically limited to the data you have provided to them, in various forms such as posts on a timeline, uploading photos, web sites you've browsed, who you are friends with etc. They will not be giving you their metadata about you, that they have created themselves based on you and your activities, mainly because they don't have to.
It is not only them though who can see your metadata: this is what they sell. This is their core product. They use that core product to develop various services for their paying customers. They cluster people (or their metadata) based on various criteria. Political affiliation. Willingness to spend. Your age group in your area. If you are easily influenced by your friends, etc.
If you now combined that with what I told you above, regarding code/images hosted on servers owned by other services that also can extract data from you, and you start to think about how many of the web sites you use that are not owned by Facebook and/or Google, but where you still can find a "login with Facebook" button or "login with Google" button, you also know that both Google and Facebook know when you've visited that other site. Especially if you are, like most, constantly logged in to Facebook/Google services. If you are using Google Chrome as your browser you are most likely logged in to your Google account anyways, and you are then providing Google with your history of web sites you've visited anways.
Then you have the "share this" buttons. They are also...yep, you guessed it, hosted directly from Facebook, Google, Twitter et al. If you can see them, they have tracked you already. It works like this:
The only way to properly share anything via those buttons is if they pass along the exact web site address (URL) of where you are, otherwise your sharing would be pretty meaningless and go straight to the front-page of the site, not to the awesome yellow jeans you were watching and wished to share.
This means they know where you are, regardless of if you decide to share the post/article/product or not. As they already know who you are and you now are providing them with all the things you are interested in, their profile (and their metadata) about you is constantly growing. Digital storage (i.e. hard drives) is cheap though, and there is a lot of money to be made by knowing exactly what triggers you to purchase things, share things or make you do things. They just have to give you a little small push.
They know which of your friends that post things you "like" the most. They know which of your friends who are most likely to like your posts. They know how many seconds you spend looking at a certain advert, they know where you live and the statistical probability of you acting in certain ways. They know what music you like (and dislike). They quite likely know if you are left/right on the political scale. If you like Trump and/or Brexit, or not. If you are religious or not. If you normally do impulse purchases or if you investigate beforehand. They know. You have taught them these things.
I will in future articles be posting ways of limiting this, by using other web browsers and how they should be configured, and what you should think about regarding all your social media usage in general as well as how you move around the internet, over the coming weeks. There are lots of great alternatives to not leak personal and private data. None of those ways are as easy as just going with the flow, using Google Chrome, all the Google web services and hand over your life to Facebook. They will be worth the effort though, if nothing else as we now know that our previous free sharing of "everything" has been a weapon in dismantling democracy as we know it around the globe.
Finally, just don't take my word on it. Have a look at this article from The Guardian outlining what types of data Google and Facebook sit on regarding you, and the data you kind of suspect they know. Please note that this data does not include their metadata about you.