In this essay, I will argue that although the generation, storage and processing of large amounts of personal information - known as ‘big data’ - generates a staggering amount of benefits for society, like most disruptive technologies it also generates some underlying and undesired consequences which should be properly addressed. Due to the advancement in big data technology, a worldwide push for establishing data-driven economies through a range of data collection, analytics, and monetization systems, and the proven potential for big data technologies in politics, society is experiencing several unintended effects. I am aware that there are myriads of problems due to the collection and processing of big data in society, and their roots run deep into other fascinating fields such as privacy, ethics, politics, medical and scientific research, free will, and human rights; however, in this paper, I will focus exclusively on a particular philosophical problem concerning personal identity. Because of the advancement, success, and consequently proliferation of big data driven technologies, the lines between our physical and digital selves are gradually being blurred. Individuals in society are experiencing a kind of fragmentation that can be framed in Gilles Deleuze’s concept of the dividual and Albert Borgmann’s concept of social hyperreality. This essay (I) defines the concept of big data and other key concepts to provide sufficient context to properly present the issue at hand, then (II) details and discusses a specific philosophical issue raised by big data concerning personal identity, and finally (III) reflects on the importance of this problem.

I. Context and key concepts surrounding the technology

Before we move towards the philosophical issue of big data in personal identity, I would like to properly define the term ‘big data’ (BD) and provide some information about the relevant mechanisms that have come into play in modern society as well as the paradigm shifts that have occurred because of its existence and its enormous financial and scientific success. It is a common misconception to think about big data solely as the analysis of massive amounts of data. This definition is not wrong, but because it only focuses on its volume, it is insufficient. For this essay we will refer to it as the ‘folk definition of big data’, but if we are to understand the philosophical problems it brings, the term needs a more thorough definition. There are several aspects besides volume to take into consideration, and because of the complexity and wide use of the concept, several different definitions are in use. For practical purposes, and because it has withheld the test of time, we will use Rob Kitchin’s definition, detailed in his book, The Data Revolution. As we will see, the “big” in big data refers to more than its size: “big data is huge in volume, high in velocity, diverse in variety, exhaustive in scope, fine-grained in resolution, relational in nature, flexible (as it can add new fields easily), and scalable (as it can expand in size rapidly).” In other words, big data is much more than is generally referred to by its folk definition. One more property to consider when we talk about BD is what some call ‘data amplification’, which roughly means that datasets, when combined at a large scale, “enable far greater insights by revealing associations, relationships and patterns which remain hidden if the data remain isolated.” One of the consequences of this characteristic of BD is the consolidation of secondary and tertiary data markets as a multi-billion-dollar industry. Now the next step is to think about its purpose, which can be generally framed as ‘to reveal associations, relationships and patterns, related to human behaviour and how and when we interact with each other.’ There is a wide range of applications for big data, which go from retail, finances, and marketing to health & life sciences, telecommunications, and scientific/medical research. Big data is not a new concept, but it is also far from arriving at its full potential: this means that any effect or problem identified today, could be potentialized across time.

These characteristic features of application, power, and reach, coupled with its ambitious scope, make it arguably the fastest, most powerful, precise and versatile tool for data processing, monitoring, controlling, analyzing, predicting, and surveilling that has ever been available to anyone. It should come to no surprise that a large amount of wealth, and a series of satellite technologies and phenomena have been generated due to its creation. We will now look at those satellite technologies and phenomena that are relevant to our discussion about personal identity.

A new way of doing business has sprung from this tool, one that some call the ‘data- driven economy.’ The pronounced importance of big data in business has never been clearer:

“Data has become a new factor of production, in the same way as hard assets and human capital. Having the right technological basis and organizational structure to exploit data is essential.”

Because of this several influential intellectuals and entrepreneurs have rightly used the following metaphor, ‘Data is the new oil.’ Now, oil is mainly popular for three reasons: because of its scarcity, difficult and costly extraction, and extremely high value when properly extracted and manufactured into a commodity. But these characteristics do not necessarily apply to BD, at least not all of them. Although, at first, a worldwide infrastructure had to be implemented and high investments had to be made, data today is being collected at alarming rates and low costs from phones, tablets, wearables, cars, credit cards, video game consoles, smart devices, television sets, intelligent personal assistants, and - especially - personal computers. Virtually any device that has a connection to the internet or GPS is a potential source for data-gathering. I am referring to this topic in a world-wide context, since it could be argued that recent European law has limited some kinds of data mining. So, it follows not only that big data represents a metaphorical oil field, but that it is in fact better than an oil field because it is easier and cheaper to extract (if you know what you are doing), and also, as I have shown, is associated with a multiplying effect which keeps on driving profits through secondary and tertiary data markets. Therefore, big data arguably has more value than its predecessors in the long run.

The big data business model is often straightforward: users generate data in return for a ‘free’ service such as a communication platform, an app, Wi-Fi access, directions to a restaurant, etc. Companies insert smart ads into their service and sell ‘screen real estate’ or user attention time to their customers. On the side, some companies sell the data they mine to the highest bidder. What is important for our discussion is that this is being done at a massive scale and sometimes through non-desired means to which the user has not explicitly consented.

Sometimes, the value of data is not immediate - or not only immediate - because data sets can reveal new and useful patterns still to come, the value of data “is unknown or uncertain until it is converted into the currency of information... and a robust data exchange, with so-termed data handlers and data brokers, has emerged to perform precisely this work of speculation.” One example for this type of marketplace is BlueKai, a platform “where buyers and sellers trade high-quality data like stocks” This will come into the picture when we discuss the concept of dividuals in section II. But for now, let’s put it to one side and keep discussing big data.

Big data is revolutionary not only in the sense that it has driven the appearance of a new paradigm in online economies and scientific research , bringing new questions and problems to light, but also in that it has produced profound changes in politics. The results of this paradigm shift come from the fine-grained social metrics generated by the big data strategies of media platforms and social media sites. Arguably, contemporary political campaigns that do not use big data have a much smaller chance of success. The massive inefficiency and costs of running ads for general public audiences are crippling when compared to the precision and low cost of targeted smart ads. The motivation for using big data in politics is a noble one: to show the right message to the right person at the right time; but this idea taken to the extreme, against a backdrop of constant and sanguine competition for power, has led political campaigns into uncharted territory.

So, what have big data technologies generated in a free capitalist market? For some authors such as Frank Webster and Kevin Robins, big data is the potential engine of a political structure called ‘cybernetic capitalism,’ by which they mean “a socioeconomic system that is in part dependent on the capacity of state and corporate entities to collect and aggregate personal data.” I am not saying that states have adopted this political system today (conceptual possibility does not imply reality) but that there is a possible future in which this could happen. For such a system to become predominant around the world, some groundwork must be laid, and that is the development of what Mark Poster terms ‘interpellation’. This term refers to “a complicated configuration of unconsciousness, indirection, automation, and absentmindedness.” Such concept may sound distant and obscure, but a contemporary example that proves otherwise is a company called ‘SenseTime’ the most valuable big data analytics company in the world today. Such uses for BD should be limited, but the complexity and speed of development of BD technologies make this almost impossible. My point is that it is highly unlikely that our laws and culture can keep up with understanding the effects of BD, let alone regulate them; as mentioned by Raley R., “neither the legal nor the political infrastructure has kept pace with the technology.” This brings me to my next point; how does this affect our personal identity?

II. The philosophical issue raised by big data in personal identity

The internet has become a dangerous place for our personal identity, and big data tools and systems potentialize this problem. As Albert Borgmann puts it, online personal identity can be “mistaken by a credit agency, spied on by the government, foolishly exposed by yourself, pilloried by an enemy, pounded by a bully, or stolen by a criminal.” As we know, whatever happens on the internet has the possibility to become permanent, because data sets that are properly collected and stored do not decay over time. This fact has brought up interesting privacy rights, such as the 'right to be forgotten’ and has also sparked the imagination of many authors - for example, Viktor Mayer- Schönberger, who claimed that big data produces Borgesian figures called ‘Funes’ who cannot structure a temporal narrative because they have lost the capacity to forget.

The problem of online privacy is an important one, but we are going to discuss a different kind of danger for our personal identities, something more latent and nuanced. It is a problem that is difficult to define and thus difficult to address. Some of the reasons for this is because it has no similar occurrences in history, no criminal implications and shows no immediate moral or cultural repercussions. The idea that the technological paradigm of an age paves the path for humanity’s view of itself at a specific moment in time is not new. Examples of such ideas come from Descartes, Deleuze, and in some ways Foucault. However, understanding today’s complex personal identity is not an easy endeavour because the virtualization of our identities has been gradual and mostly invisible. One general feeling instilled by services with BD driven business models such as social media, communication platforms, and mobile browsers, is that they make us feel “displaced, distracted, and fragmented at the very times when to all appearances we seem to be connected, busy, and energetic.” However, this phenomenon is hard to measure and prove because of the subjectivity involved in these concepts. So, to properly approach and explain the problem, we will look at three moments in history which, taken together, set the stage for this digital absent-mindedness seen in the field of personal identity.

A considerable share of modern knowledge and culture is arguably descended from the movement known as the European Enlightenment. Its diverse paradigm shifts drove significant change in virtually every field and discipline: “Geographically, the Enlightenment came about through the journeys of discovery, religiously through the reformation, politically through democracy, and cognitively through the sciences.” Such radical transformations were welcomed by intellectuals around Europe in the 17th Century, their discussions setting the stage for many contemporary world views, among them personal identity. Three advancements in philosophy contributed drastically to this view. Firstly, Descartes proposed a mechanism that would allow one to “establish order and fix identities in a wide-open world.” This mechanism became known as the Cartesian coordinate system and it is still used today as a means of location, but more importantly, it was one that deepened our knowledge and precision regarding physical structures, including ourselves. Secondly, Leibniz, almost half a century later, thought about an innovative identification principle. His metaphysical breakthrough was the principle of the “identity of ‘indiscernibles” which states roughly that in reality, no two things are the same, there is always a feature or a sequence of features that cannot be equally compared to anything else. Finally, Kant proposed a thesis that joined both of these ideas and corrected their weaknesses, in his words, “a coordinate system overlaid on the world allows one to locate a thing, but only after you established yourself as the origin or reference point of the coordinates” - therefore positioning the individual as the centre of his or her universe.

Some technologies enforce this principle of individuality: supermarket carts represent the centre of our desires, vehicles are the centre of our spatial location, and radio and television amplify individual reach. In this same manner, the web platforms which collect and process big data have enabled us to become our own centres online. Today, BD-powered smartphone apps are the most recent technology that support Kant’s principle. In a way, the internet is “the most impressive illustration of a person’s sovereign and central authority in space, time, and society.” Our new Cartesian locators are our smartphones and it could be said that our new Leibnizian ‘indiscernibles’ are what is roughly known as our ‘online personas’ or ‘digital selves’, which are the version that we present to others and the version that others present to us. But what do these terms mean? Essentially, BD-driven platforms gather our personal data to create a profile. This statement begs the question: who is this person they know? What individual is represented in their database? And am I this individual? There are all sorts of puzzles that arise once we try to position ourselves as an ‘online persona’ or we try to define our ‘digital selves.’ So, when a company or software uses big data to recommend a product to someone online, to whom are they recommending it to exactly?

We will now look at two disruptive personal identity concepts that have been generated because of the disruptive force of BD. Firstly, we will explore Gilles Deleuze’s concept of the dividual and how it comes into picture. The term dividual roughly means the fragmented and dispersed data bodies “that result from the decomposition of individuals into data clouds subject to automated integration and disintegration.” In at least one way, the individual was a product of the enlightenment and the dividual is a product of the digital revolution. This concept can be seen in how big data technologies are used to capture specific data points, or, in other words, a fraction of someone’s identity in order to perform functions such as finding customers for a new product or potential voters for a political party. The problem is that if an individual can be vastly divisible and integrated into data, then whoever owns this data has a deep understanding of and special epistemological access to a segment of such subject’s personal identity. Moreover, this virtual dividual has ontological privileges over its physical individual counterpart. What our dividual says about ourselves is, if integrated with truthful data, more real than what we can say about ourselves. It is known that, for example, the Cambridge Analytica algorithms can predict our personalities better than our family, spouse or even ourselves.

Secondly, we will explore Borgman’s concept of social hyperreality. It could be argued that we are the most sovereign when we are freely clicking, scrolling, and tapping on our laptops and devices without interruption. Everything is easier with big data technologies because we don’t really bother with physical impediments. This preference for virtual over physical creates what Albert Borgman calls a social hyperreality. BD-powered technologies, networks, and devices enable people to “offer one another stylized versions of themselves for amorous or convivial entertainment.” This kind of phenomenon is not often taken seriously because of its sublime nature, and especially because of the widespread use of the folk definition of big data, which, as I have mentioned, is not sufficient to thoroughly understand the technology. The nature of this hyperreality only amplifies the fragmentation already caused by dividuality, and it could be argued that this is already taking place in the physical realm.

III. Conclusions

As we have seen in the last section, our lives as individuals are highly dependent on what our dividuals say about us online. Therefore, previously mentioned data marketplaces (such as BlueKai) are trading much more than mere data: they are trading deep and complex integrations of segments of our personal identities and psychology. The immediate danger, besides the obvious fact that these databases could be hacked, is that the buyers of such dividuals can potentially tap into the mind of individuals and manipulate them in whatever manner desired, something that would be extremely difficult to prove, regulate, and track. It goes unsaid that this is a complex ethical, social, and political issue that is not being addressed today, at least not properly. The line between seeing an ad and being effectively coerced into purchasing a product, subscribing to an ideology unknowingly or voting on a political issue without understanding its underlying consequences is being blurred because of the fragmented nature of our digital selves. I would like to end this paper with a reflection on what actions we might take to ameliorate this problem. As we have seen, the term big data is not properly understood, and it never will because people will assume the term refers only to the data. I propose we use two terms to properly understand the cause and the effect explored on this essay. First, ‘coercive data’ to refer to the control and surveillance aspect of BD, and secondly, ‘digital absent-mindedness’ as the effect produced by coercive data technologies. If the technology was properly understood and internalized throughout the world, and this kind of problem brought out and expressed publicly, society could have a better understanding of what the owners of these platforms have on their hands. This would not stop the exchange of personal data online, but it is a necessary step in the right direction. From time to time, society may express that people have a right to privacy, and studies may show that people dislike ads and social media, but our daily behaviour in the digital sphere objectively suggests otherwise. Our daily behaviour in the digital sphere suggest that these companies and institutions give us what we want: a narrow sense of connection, a short distraction, gossip, entertainment, news, information, a Wi-Fi connection or a login to a nice new App, and we don't care about what they get in return. Moreover, we generally don’t even bother to properly understand the processes which surround the collection, processing and refinement of the data we generate.