The death of Elaine Herzberg (August 2, 1968 – March 18, 2018) was the first recorded case of a pedestrian fatality involving a self-driving car, after a collision that occurred late in the evening of March 18, 2018. Herzberg was pushing a bicycle across a four-lane road in Tempe, Arizona, United States, when she was struck by an Uber test vehicle, which was operating in self-drive mode with a human safety backup driver sitting in the driving seat. Herzberg was taken to the local hospital where she died of her injuries. Following the fatal incident, the National Transportation Safety Board (NTSB) issued a series of recommendations and sharply criticized Uber. The company suspended testing of self-driving vehicles in Arizona, where such testing had been approved since August 2016. Uber chose not to renew its permit for testing self-driving vehicles in California when it expired at the end of March 2018. Uber resumed testing in December 2018, starting in Pittsburgh, Pennsylvania. In March 2019, Arizona prosecutors ruled that Uber was not criminally responsible for the crash. The back-up driver of the vehicle was charged with negligent homicide, pled guilty to endangerment, and was sentenced to three years' probation. While Herzberg was the first pedestrian killed by a self-driving car, driver Gao Yaning died in a Tesla semi-autonomous car two years earlier. A reporter for The Washington Post compared Herzberg's fate with that of Bridget Driscoll who, in the United Kingdom in 1896, was the first pedestrian to be killed by an automobile. The Arizona incident has magnified the importance of collision avoidance systems for self-driving vehicles. == Collision summary == Herzberg was crossing Mill Avenue (North) from west to east, approximately 360 feet (110 m) south of the intersection with Curry Road, outside the designated pedestrian crosswalk, close to the Red Mountain Freeway. She was pushing a bicycle laden with shopping bags, and had crossed at least two lanes of traffic when she was struck at approximately 9:58 pm MST (UTC−07:00) by a prototype Uber self-driving car based on a Volvo XC90, which was traveling north on Mill. The vehicle had been operating in autonomous mode since 9:39 pm, nineteen minutes before it struck and killed Herzberg. The car's human safety backup driver, Rafaela Vasquez, did not intervene in time to prevent the collision. Vehicle telemetry obtained after the crash showed that the human operator responded by moving the steering wheel less than a second before impact, and she engaged the brakes less than a second after impact. == Cause investigation == The county district attorney's office recused itself from the investigation, due to a prior joint partnership with Uber promoting their services as an alternative to driving under the influence of alcohol. Accounts differ on the speed limit at the place of the incident. According to Tempe police the car was traveling in a 35 mph (56 km/h) zone, but this is contradicted by a posted speed limit of 45 mph (72 km/h). The National Transportation Safety Board (NTSB) sent a team of federal investigators to gather data from vehicle instruments, and to examine vehicle condition along with the actions taken by the safety driver. Their preliminary findings were substantiated by multiple event data recorders and proved the vehicle was traveling 43 miles per hour (69 km/h) when Herzberg was first detected 6 seconds (378 feet (115 m)) before impact; during 4.7 seconds the self driving system did not infer that emergency braking was needed. A vehicle traveling 43 mph (69 km/h) can generally stop within 89 feet (27 m) once the brakes are applied. The machine needed to be 1.3 seconds (82 feet (25 m)) away prior to discerning that emergency braking was required, whereas at least that much distance was required to stop. The system failed to behave properly. A total stopping distance of 76 feet itself would imply a safe speed under 25 mph (40 km/h). Human intervention was still legally required. Computer perception–reaction time would have been a speed limiting factor had the technology been superior to humans in ambiguous situations; however, the nascent computerized braking technology was disabled the day of the crash, and the machine's apparent 4.7-second perception–reaction (alarm) time allowed the car to travel 250 feet (76 m). Video released by the police on March 21 showed the safety driver was not watching the road moments before the vehicle struck Herzberg. === Environment === In widely disseminated remarks that would shape the narrative about the crash, which were later seen as prejudicial and subsequently contradicted by her own department, Tempe Police Chief Sylvia Moir was quoted stating that the collision was "unavoidable" based on the initial police investigation, which included a review of the video captured by an onboard camera. Moir faulted Herzberg for crossing the road in an unsafe manner: "It is dangerous to cross roadways in the evening hour when well-illuminated, managed crosswalks are available." According to Uber, safety drivers were trained to keep their hands very close to the wheel all the time while driving the vehicle so they were ready to quickly take control if necessary. The driver said it was like a flash, the person walked out in front of them. His [sic] first alert to the collision was the sound of the collision. [...] it's very clear it would have been difficult to avoid this collision in any kind of mode (autonomous or human-driven) based on how she came from the shadows right into the roadway. Tempe police released video on March 21, 2018, showing footage recorded by two onboard cameras: one forward-looking, and one capturing the safety driver's actions. The forward-facing video shows that the self-driving car was traveling in the far right lane when it struck Herzberg. The driver-facing video shows the safety driver was looking down prior to the collision. The Uber operator is responsible for intervening and taking manual control when necessary as well as for monitoring diagnostic messages, which are displayed on a screen in the center console. In an interview conducted after the crash with NTSB, the driver stated she was monitoring the center stack at the time of the collision. After the Uber video was released, journalist Carolyn Said noted the police explanation of Herzberg's path meant she had already crossed two lanes of traffic before she was struck by the autonomous vehicle. The Marquee Theatre and Tempe Town Lake are west of Mill Avenue, and pedestrians commonly cross mid-street without detouring north to the crosswalk at Curry. According to reporting by the Phoenix New Times, Mill Avenue contains what appears to be a brick-paved path in the median between the northbound and southbound lanes; however, posted signs prohibit pedestrians from crossing in that location. When the second of the Mill Avenue bridges over the town lake was added in 1994 for northbound traffic, the X-shaped crossover in the median was installed to accommodate the potential closing of one of the two road bridges. The purpose of this brick-paved structure is purely to divert cars from one side to the other if a bridge is closed to traffic, and although it may look like a crosswalk for pedestrians, it is in fact a temporary roadway with vertical curbs and warning signs. === Software issues === Michael Ramsey, a self-driving car expert with Gartner, characterized the video as showing "a complete failure of the system to recognize an obviously seen person who is visible for quite some distance in the frame. Uber has some serious explaining to do about why this person wasn't seen and why the system didn't engage." The NTSB preliminary report, however, noted that the software did order the car to brake 1.3 seconds before the collision. A video shot from the vehicle's dashboard camera showed the safety driver looking down, away from the road. It also appeared that the driver's hands were not hovering above the steering wheel, which is what drivers are instructed to do so they can quickly retake control of the car. Uber had moved from two employees in every car to one. The paired employees had been splitting duties: one ready to take over if the autonomous system failed, and another to keep an eye on what the computers were detecting. The second person was responsible for keeping track of system performance as well as labeling data on a laptop computer. Mr. Kallman, the Uber spokesman, said the second person was in the car for purely data related tasks, not safety. When Uber moved to a single operator, some employees expressed safety concerns to managers, according to the two people familiar with Uber's operations. They were worried that going solo would make it harder to remain alert during hours of monotonous driving. The recorded telemetry showed the system had detected Herzberg six seconds before the crash, and classified her first as an unknown object, then as a
Content as a service
Content as a service (CaaS) or managed content as a service (MCaaS) is a service-oriented model, where the service provider delivers the content on demand to the service consumer via web services that are licensed under subscription. The content is hosted by the service provider centrally in the cloud and offered to a number of consumers that need the content delivered into any applications or system, hence content can be demanded by the consumers as and when required. Content as a Service is a way to provide raw content (in other words, without the need for a specific human compatible representation, such as HTML) in a way that other systems can make use of it. Content as a Service is not meant for direct human consumption, but rather for other platforms to consume and make use of the content according to their particular needs. This happens usually on the cloud, with a centralized platform which can be globally accessible and provides a standard format for your content. With Content as a Service, you centralize your content into a single repository, where you can manage it, categorize it, make it available to others, search for it, or do whatever you wish with it. == Overview == The content delivered typically could be one or more of the following The technical terminology related to equipment or spares that is required to procure or design the materials The industrial terminology of the equipment or spares Technical values pertaining to various types, specifications, applications, characteristics of equipment or spares Sourcing information which will help in procurement or supply-chain management of equipment or spares Descriptive specifications of equipment or spares based on the product reference number or identifier UNSPSC codes or industry practiced classifications ISO, IEC compliant terminology Ontology or Technical Dictionary of products & services Predefined content for specific business needs The term "Content as a service" (CaaS) is considered to be part of the nomenclature of cloud computing service models & Service-oriented architecture along with Software as a service (SaaS), Infrastructure as a service (IaaS), and Platform as a service (PaaS).
NATGRID
The National Intelligence Grid or NATGRID is an integrated intelligence master database structure for counter-terrorism purposes which connects databases of various core security agencies under the Government of India. It collects and analyses comprehensive patterns procured from 21 different organizations that can be readily accessed by security agencies round the clock. As of September 2025 its CEO is Hirdesh Kumar. NATGRID came into existence after the 2008 Mumbai attacks. The Government of India in July 2016 appointed Ashok Patnaik as the Chief Executive Officer (CEO) of NATGRID. The appointment is being seen as the government's effort to revive the project. Patnaik's appointment was valid till 31 December 2018. As of 2019, NATGRID is headed by an Indian Police Service (IPS) officer Ashish Gupta. The Ministry of Home Affairs on 5 February 2020 announced in Parliament that Project NATGRID with all its required physical infrastructures been completed as of 31 March 2020 and the NATGRID solution went live as of 31 December 2020. == Reason for establishment == The landscape of Terrorism in India and the subsequent response by Law enforcement in India have necessitated a sophisticated data-integration framework, positioning NATGRID as a vital tool for national security agencies. This shift towards Mass surveillance in India is rooted in a broader policy evolution of state monitoring, which is technologically enabled by the India Stack—the foundational digital infrastructure providing the API-based backbone for government service delivery and identity verification. This ecosystem is further bolstered by advanced Signal intelligence capabilities and the implementation of SIM binding, a security protocol that anchors a user’s digital identity to a specific mobile device and verified SIM card to prevent identity fraud and unauthorized access. Collectively, these elements form a 360-degree surveillance and authentication grid designed to preemptively identify threats by synthesizing historical, financial, and real-time communication data across disparate platforms. === Terror attacks in India === The 2008 Mumbai attacks led to the exposure of several weaknesses in India's intelligence gathering and action networks. NATGRID is part of the radical overhaul of the security and intelligence apparatuses of India that was mooted by the then Home Minister P. Chidambaram in 2009. The National Investigation Agency (NIA) and the National Counter Terrorism Centre (NCTC) are two organisations established in the aftermath of the Mumbai attacks of 2008. Before the Mumbai attacks, a Pakistani origin American Lashkar-e-Taiba (LeT) operative David Coleman Headley had visited India several times and done a recce of the places that came under attack on 26/11. Despite having travelled to India several times and having returned to the US through Pakistan or West Asia, his trips failed to raise the suspicion of Indian agencies as they lacked a system that could reveal a pattern in his unusual travel itineraries and trips to the country. It was argued that if they had a system like the NATGRID in place, Headley would have been apprehended well before the attacks. === Need for the integrated intelligence system === During the inauguration of NATGRID campus in Bengaluru, the Minister of Home Affairs, Amit Shah stated that a new national database is in the process of being made which will bring a change in the current ways of functioning of agencies once it's ready also adding that the government has entrusted the task of developing and operating a state-of-the-art and innovative technology system. It is accessible to 11 central agencies in the first phase and in later phases will be made accessible to police of all States and Union Territories and only authorized personnel are allowed access to the platform on a case-to-case basis for investigations into suspected cases of terrorism. NATGRID has a total fund allocation of ₹3,400 crore (US$355 million). d == Legal framework == Relevant legal framework: Digital Personal Data Protection Act, 2023 – The legislative framework governing how digital data is handled. Information Technology Act - Interception Rules, 2002 – The specific regulations under the Information Technology Act that govern these agencies. National Security Act of 1980, evidence-based preventative detention of suspects Right to Information Act, 2005, for obtaining information from the government and used by activists and whistleblowers == Structure and functions == === Multi-agency integrated intelligence database === NATGRID is an intelligence sharing network that collates data from the standalone databases of the various agencies and ministries of the Indian government. It is a counter terrorism measure that collects and collates a host of information from government databases including tax and bank account details, credit/debit card transactions, visa and immigration records and itineraries of rail and air travel. It also has access to the Crime and Criminal Tracking Network and Systems, a database that links crime information, including First Information Reports, across 14,000 police stations in India. This combined data will be made available to 11 central agencies, which are: the Research and Analysis Wing (R&AW), Intelligence Bureau (IB), National Investigation Agency (NIA), Central Bureau of Investigation (CBI), Narcotics Control Bureau (NCB), Financial Intelligence Unit (India) (FIU), Enforcement Directorate (ED), Central Board of Direct Taxes (CBDT), Central Board of Indirect Taxes and Customs (CBIC), Directorate of Revenue Intelligence (DRI) and Directorate General of GST Intelligence. Also as stated by the MHA, NATGRID will have an in-built mechanism for continuous upgradation. In the later phases of NATGRID integration, the central government further plans to integrate 950 additional organizations into it. === Key components and users === ==== Some important backend data feeds to the NATGRID (middleware) ==== National Crime Records Bureau's Crime and Criminal Tracking Network and Systems (CCTNS) national-integrated law-and-order database for the state-level police forces: CCTNS is a mission-mode project under the National e-Governance Plan that interconnects over 15,000 police stations across India. It serves as the primary source for NATGRID to access digitized FIR (First Information Report) data and criminal history records from state-level law enforcement. NSA's National Technical Research Organisation (NTRO) national security-based database feed to NATGRID: NTRO serves as a primary technical data provider to NATGRID, offering specialized intercepts and satellite imagery. While NATGRID functions as a centralized data-integration middleware under the Ministry of Home Affairs, NTRO reports to the National Security Advisor within the Prime Minister's Office. DRDO's NETRA (Network Traffic Analysis) ELINT-based mass surveillance system for monitor internal internet traffic for keywords related to terrorism and criminal activity within Indian borders: Developed by the Centre for Artificial Intelligence and Robotics (CAIR), NETRA is an internet monitoring system capable of scanning traffic for specific trigger words. It provides digital behavioral triggers that NATGRID can cross-reference against structural data like financial or travel records. NETRA is a massive software network used to intercept and analyze internet traffic (emails, social media, blogs) for keywords like "bomb," "attack," or "kill." The intelligence gathered by NETRA regarding suspicious digital patterns or "keyword hits" can be fed into NATGRID. This allows an investigator to see if a person flagged by NETRA also has suspicious travel (from airline databases) or financial records (from bank databases) linked within NATGRID. Department of Telecommunications (DoT's Central Monitoring System (CMS) for lawfully intercepting national and international telecomm data: CMS is the centralized system for lawful interception of all telecommunications (phone calls, SMS, and data) in India, managed by the Department of Telecommunications (DoT). While CMS focuses on the content and metadata of real-time communication, NATGRID focuses on historical/structural data (tax, travel, identity). They represent two halves of a 360-degree surveillance profile: CMS listens to what a suspect says, while NATGRID tracks where they go and what they own. The CMS allows for the lawful interception of telecommunications metadata and content in real-time. In the broader surveillance architecture, CMS provides the "active" communication profile while NATGRID provides the "static" historical profile. Telecom Enforcement Resource and Monitoring (TERM) - Telecomm Regulatory & Verification Node for telecomm KYC: TERM cells verify subscriber identity (KYC) and maintain the integrity of telecom databases. NATGRID relies on these audited records to ensure the accuracy of telephone-to-identity mapping. TERM
Omni-Path
Omni-Path Architecture (OPA) is a high-performance communication architecture developed by Intel. It aims for low communication latency, low power consumption and a high throughput. It directly competes with InfiniBand. Intel planned to develop technology based on this architecture for exascale computing. The current owner of Omni-Path is Cornelis Networks. == History == Production of Omni-Path products started in 2015 and delivery of these products started in the first quarter of 2016. In November 2015, adapters based on the 2-port "Wolf River" ASIC were announced, using QSFP28 connectors with channel speeds up to 100 Gbit/s. Simultaneously, switches based on the 48-port "Prairie River" ASIC were announced. First models of that series were available starting in 2015. In April 2016, implementation of the InfiniBand "verbs" interface for the Omni-Path fabric was discussed. In October 2016, IBM, Hewlett Packard Enterprise, Dell, Lenovo, Samsung, Seagate Technology, Micron Technology, Western Digital and SK Hynix announced a joint consortium called Gen-Z to develop an open specification and architecture for non-volatile storage and memory products—including Intel's 3D Xpoint technology—which might in part compete against Omni-Path. Intel offered their Omni-Path products and components via other (hardware) vendors. For example, Dell EMC offered Intel Omni-Path as Dell Networking H-series, following the naming-standard of Dell Networking in 2017. In July 2019, Intel announced it would not continue development of Omni-Path networks and canceled OPA 200 series (200-Gbps variant of Omni-Path). In September 2020, Intel announced that the Omni-Path network products and technology would be spun out into a new venture with Cornelis Networks. Intel would continue to maintain support for legacy Omni-Path products, while Cornelis Networks continues the product line, leveraging existing Intel intellectual property related to Omni-Path architecture. In 2021, Cornelis announced Omni-Path Express, which replaces PSM2-based drivers and middleware, which trace back to PathScale's PSM created in 2003, for the existing Omni-Path hardware, with a native libfabric provider.
NATGRID
The National Intelligence Grid or NATGRID is an integrated intelligence master database structure for counter-terrorism purposes which connects databases of various core security agencies under the Government of India. It collects and analyses comprehensive patterns procured from 21 different organizations that can be readily accessed by security agencies round the clock. As of September 2025 its CEO is Hirdesh Kumar. NATGRID came into existence after the 2008 Mumbai attacks. The Government of India in July 2016 appointed Ashok Patnaik as the Chief Executive Officer (CEO) of NATGRID. The appointment is being seen as the government's effort to revive the project. Patnaik's appointment was valid till 31 December 2018. As of 2019, NATGRID is headed by an Indian Police Service (IPS) officer Ashish Gupta. The Ministry of Home Affairs on 5 February 2020 announced in Parliament that Project NATGRID with all its required physical infrastructures been completed as of 31 March 2020 and the NATGRID solution went live as of 31 December 2020. == Reason for establishment == The landscape of Terrorism in India and the subsequent response by Law enforcement in India have necessitated a sophisticated data-integration framework, positioning NATGRID as a vital tool for national security agencies. This shift towards Mass surveillance in India is rooted in a broader policy evolution of state monitoring, which is technologically enabled by the India Stack—the foundational digital infrastructure providing the API-based backbone for government service delivery and identity verification. This ecosystem is further bolstered by advanced Signal intelligence capabilities and the implementation of SIM binding, a security protocol that anchors a user’s digital identity to a specific mobile device and verified SIM card to prevent identity fraud and unauthorized access. Collectively, these elements form a 360-degree surveillance and authentication grid designed to preemptively identify threats by synthesizing historical, financial, and real-time communication data across disparate platforms. === Terror attacks in India === The 2008 Mumbai attacks led to the exposure of several weaknesses in India's intelligence gathering and action networks. NATGRID is part of the radical overhaul of the security and intelligence apparatuses of India that was mooted by the then Home Minister P. Chidambaram in 2009. The National Investigation Agency (NIA) and the National Counter Terrorism Centre (NCTC) are two organisations established in the aftermath of the Mumbai attacks of 2008. Before the Mumbai attacks, a Pakistani origin American Lashkar-e-Taiba (LeT) operative David Coleman Headley had visited India several times and done a recce of the places that came under attack on 26/11. Despite having travelled to India several times and having returned to the US through Pakistan or West Asia, his trips failed to raise the suspicion of Indian agencies as they lacked a system that could reveal a pattern in his unusual travel itineraries and trips to the country. It was argued that if they had a system like the NATGRID in place, Headley would have been apprehended well before the attacks. === Need for the integrated intelligence system === During the inauguration of NATGRID campus in Bengaluru, the Minister of Home Affairs, Amit Shah stated that a new national database is in the process of being made which will bring a change in the current ways of functioning of agencies once it's ready also adding that the government has entrusted the task of developing and operating a state-of-the-art and innovative technology system. It is accessible to 11 central agencies in the first phase and in later phases will be made accessible to police of all States and Union Territories and only authorized personnel are allowed access to the platform on a case-to-case basis for investigations into suspected cases of terrorism. NATGRID has a total fund allocation of ₹3,400 crore (US$355 million). d == Legal framework == Relevant legal framework: Digital Personal Data Protection Act, 2023 – The legislative framework governing how digital data is handled. Information Technology Act - Interception Rules, 2002 – The specific regulations under the Information Technology Act that govern these agencies. National Security Act of 1980, evidence-based preventative detention of suspects Right to Information Act, 2005, for obtaining information from the government and used by activists and whistleblowers == Structure and functions == === Multi-agency integrated intelligence database === NATGRID is an intelligence sharing network that collates data from the standalone databases of the various agencies and ministries of the Indian government. It is a counter terrorism measure that collects and collates a host of information from government databases including tax and bank account details, credit/debit card transactions, visa and immigration records and itineraries of rail and air travel. It also has access to the Crime and Criminal Tracking Network and Systems, a database that links crime information, including First Information Reports, across 14,000 police stations in India. This combined data will be made available to 11 central agencies, which are: the Research and Analysis Wing (R&AW), Intelligence Bureau (IB), National Investigation Agency (NIA), Central Bureau of Investigation (CBI), Narcotics Control Bureau (NCB), Financial Intelligence Unit (India) (FIU), Enforcement Directorate (ED), Central Board of Direct Taxes (CBDT), Central Board of Indirect Taxes and Customs (CBIC), Directorate of Revenue Intelligence (DRI) and Directorate General of GST Intelligence. Also as stated by the MHA, NATGRID will have an in-built mechanism for continuous upgradation. In the later phases of NATGRID integration, the central government further plans to integrate 950 additional organizations into it. === Key components and users === ==== Some important backend data feeds to the NATGRID (middleware) ==== National Crime Records Bureau's Crime and Criminal Tracking Network and Systems (CCTNS) national-integrated law-and-order database for the state-level police forces: CCTNS is a mission-mode project under the National e-Governance Plan that interconnects over 15,000 police stations across India. It serves as the primary source for NATGRID to access digitized FIR (First Information Report) data and criminal history records from state-level law enforcement. NSA's National Technical Research Organisation (NTRO) national security-based database feed to NATGRID: NTRO serves as a primary technical data provider to NATGRID, offering specialized intercepts and satellite imagery. While NATGRID functions as a centralized data-integration middleware under the Ministry of Home Affairs, NTRO reports to the National Security Advisor within the Prime Minister's Office. DRDO's NETRA (Network Traffic Analysis) ELINT-based mass surveillance system for monitor internal internet traffic for keywords related to terrorism and criminal activity within Indian borders: Developed by the Centre for Artificial Intelligence and Robotics (CAIR), NETRA is an internet monitoring system capable of scanning traffic for specific trigger words. It provides digital behavioral triggers that NATGRID can cross-reference against structural data like financial or travel records. NETRA is a massive software network used to intercept and analyze internet traffic (emails, social media, blogs) for keywords like "bomb," "attack," or "kill." The intelligence gathered by NETRA regarding suspicious digital patterns or "keyword hits" can be fed into NATGRID. This allows an investigator to see if a person flagged by NETRA also has suspicious travel (from airline databases) or financial records (from bank databases) linked within NATGRID. Department of Telecommunications (DoT's Central Monitoring System (CMS) for lawfully intercepting national and international telecomm data: CMS is the centralized system for lawful interception of all telecommunications (phone calls, SMS, and data) in India, managed by the Department of Telecommunications (DoT). While CMS focuses on the content and metadata of real-time communication, NATGRID focuses on historical/structural data (tax, travel, identity). They represent two halves of a 360-degree surveillance profile: CMS listens to what a suspect says, while NATGRID tracks where they go and what they own. The CMS allows for the lawful interception of telecommunications metadata and content in real-time. In the broader surveillance architecture, CMS provides the "active" communication profile while NATGRID provides the "static" historical profile. Telecom Enforcement Resource and Monitoring (TERM) - Telecomm Regulatory & Verification Node for telecomm KYC: TERM cells verify subscriber identity (KYC) and maintain the integrity of telecom databases. NATGRID relies on these audited records to ensure the accuracy of telephone-to-identity mapping. TERM
Seq2seq
Seq2seq is a family of machine learning approaches used for natural language processing. Originally developed by Lê Viết Quốc, a Vietnamese computer scientist and a machine learning pioneer at Google Brain, this framework has become foundational in many modern AI systems. Applications include language translation, image captioning, conversational models, speech recognition, and text summarization. Seq2seq uses sequence transformation: it turns one sequence into another sequence. == History == One naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say: 'This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode. seq2seq is an approach to machine translation (or more generally, sequence transduction) with roots in information theory, where communication is understood as an encode-transmit-decode process, and machine translation can be studied as a special case of communication. This viewpoint was elaborated, for example, in the noisy channel model of machine translation. In practice, seq2seq maps an input sequence into a real-numerical vector by using a neural network (the encoder), and then maps it back to an output sequence using another neural network (the decoder). The idea of encoder-decoder sequence transduction had been developed in the early 2010s. The papers most commonly cited as the originators that produced seq2seq are two papers from 2014. In the seq2seq as proposed by them, both the encoder and the decoder were LSTMs. This had the "bottleneck" problem, since the encoding vector has a fixed size, so for long input sequences, information would tend to be lost, as they are difficult to fit into the fixed-length encoding vector. The attention mechanism, proposed in 2014, resolved the bottleneck problem. They called their model RNNsearch, as it "emulates searching through a source sentence during decoding a translation". A problem with seq2seq models at this point was that recurrent neural networks are difficult to parallelize. The 2017 publication of Transformers resolved the problem by replacing the encoding RNN with self-attention Transformer blocks ("encoder blocks"), and the decoding RNN with cross-attention causally-masked Transformer blocks ("decoder blocks"). === Priority dispute === One of the papers cited as the originator for seq2seq is (Sutskever et al 2014), published at Google Brain while they were on Google's machine translation project. The research allowed Google to overhaul Google Translate into Google Neural Machine Translation in 2016. Tomáš Mikolov claims to have developed the idea (before joining Google Brain) of using a "neural language model on pairs of sentences... and then [generating] translation after seeing the first sentence"—which he equates with seq2seq machine translation, and to have mentioned the idea to Ilya Sutskever and Quoc Le (while at Google Brain), who failed to acknowledge him in their paper. Mikolov had worked on RNNLM (using RNN for language modelling) for his PhD thesis, and is more notable for developing word2vec. == Architecture == The main reference for this section is. === Encoder === The encoder is responsible for processing the input sequence and capturing its essential information, which is stored as the hidden state of the network and, in a model with attention mechanism, a context vector. The context vector is the weighted sum of the input hidden states and is generated for every time instance in the output sequences. === Decoder === The decoder takes the context vector and hidden states from the encoder and generates the final output sequence. The decoder operates in an autoregressive manner, producing one element of the output sequence at a time. At each step, it considers the previously generated elements, the context vector, and the input sequence information to make predictions for the next element in the output sequence. Specifically, in a model with attention mechanism, the context vector and the hidden state are concatenated together to form an attention hidden vector, which is used as an input for the decoder. The seq2seq method developed in the early 2010s uses two neural networks: an encoder network converts an input sentence into numerical vectors, and a decoder network converts those vectors to sentences in the target language. The Attention mechanism was grafted onto this structure in 2014 and is shown below. Later it was refined into the encoder-decoder Transformer architecture of 2017. === Training vs prediction === There is a subtle difference between training and prediction. During training time, both the input and the output sequences are known. During prediction time, only the input sequence is known, and the output sequence must be decoded by the network itself. Specifically, consider an input sequence x 1 : n {\displaystyle x_{1:n}} and output sequence y 1 : m {\displaystyle y_{1:m}} . The encoder would process the input x 1 : n {\displaystyle x_{1:n}} step by step. After that, the decoder would take the output from the encoder, as well as the
Data exhaust
Data exhaust (also exhaust data) is the trail of data generated as a by-product of users' online activity, behaviour, and transactions, rather than data they deliberately create or submit. It forms part of a broader category of unconventional data that also includes geospatial, network, and time-series data, and may be useful for predictive analytics. Data exhaust can take the form of cookies, temporary files, log files, clickstream records and stored preferences. Actions such as visiting a web page, following a link, or dwelling on an element may all generate exhaust data that is recorded without the user's active awareness. Unlike primary content — which the user intentionally creates — exhaust data is a passive side effect of interaction. A bank, for example, might treat the amounts and parties involved in a transaction as primary data, while secondary data could include whether the transaction was carried out at a cash machine rather than a branch. == Uses == Data exhaust collected by companies is often information that is not immediately useful in isolation, but can be aggregated and analysed to improve products, personalise content, identify trends, and support quality control. Companies may also store exhaust data for future analysis or sell it to third parties. Shoshana Zuboff has described this practice as a core mechanism of what she terms surveillance capitalism, in which behavioural data generated by users is converted into predictive products. Kosciejew notes that large quantities of often raw data are collected in this way, much of which is never analysed. == Medical exhaust data == Many medical devices — including pacemakers, dialysis machines and surgical cameras — generate exhaust data as a by-product of their operation. The majority of this data is never captured or analysed, and is typically discarded once a procedure ends or a device completes its routine monitoring cycle. The potential use of data generated by implanted devices such as pacemakers raises additional legal and ethical questions around ownership and consent. Using electronic health records for research also creates challenges because of the volume of data involved, creating a need for automated algorithms to process it. == Privacy and regulation == The collection and distribution of data exhaust is not in itself illegal in most jurisdictions, but its use raises questions of privacy and informed consent. Steps commonly taken to address these concerns include data anonymisation, offering users an opt-out from the sale of their data, and publishing explicit privacy policies that disclose what data is collected and how it is used.