Apache Parquet

Apache Parquet

Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem inspired by Google Dremel interactive ad-hoc query system for analysis of read-only nested data. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most of the data processing frameworks around Hadoop. It provides data compression and encoding schemes with enhanced performance to handle complex data in bulk. == History == The open-source project to build Apache Parquet began as a joint effort between Twitter and Cloudera using the record shredding and assembly algorithm as described in Google's Dremel. Parquet was designed as an improvement on the Trevni columnar storage format created by Doug Cutting, the creator of Hadoop. The name 'parquet' (lit. 'small compartment') refers to a style of decorative flooring and was chosen to "evoke the bottom layer of a database with an interesting layout". The first version, Apache Parquet 1.0, was released in July 2013. Since April 27, 2015, Apache Parquet has been a top-level Apache Software Foundation (ASF)-sponsored project. == Features == Apache Parquet is implemented using the record-shredding and assembly algorithm, which accommodates the complex data structures that can be used to store data. The values in each column are stored in contiguous memory locations, providing the following benefits: Column-wise compression is efficient in storage space Encoding and compression techniques specific to the type of data in each column can be used Queries that fetch specific column values need not read the entire row, thus improving performance Apache Parquet is implemented using the Apache Thrift framework, which increases its flexibility; it can work with a number of programming languages like C++, Java, Python, PHP, etc. As of August 2015, Parquet supports the big-data-processing frameworks including Apache Hive, Apache Drill, Apache Impala, Apache Crunch, Apache Pig, Cascading, Presto and Apache Spark. It is one of the external data formats used by the pandas Python data manipulation and analysis library. == Compression and encoding == In Parquet, compression is performed column by column, which enables different encoding schemes to be used for text and integer data. This strategy also keeps the door open for newer and better encoding schemes to be implemented as they are invented. Parquet supports various compression formats: snappy, gzip, LZO, brotli, zstd, and LZ4. === Dictionary encoding === Parquet has an automatic dictionary encoding enabled dynamically for data with a small number of unique values (i.e. below 105) that enables significant compression and boosts processing speed. === Bit packing === Storage of integers is usually done with dedicated 32 or 64 bits per integer. For small integers, packing multiple integers into the same space makes storage more efficient. === Run-length encoding (RLE) === To optimize storage of multiple occurrences of the same value, run-length encoding is used, which is where a single value is stored once along with the number of occurrences. Parquet implements a hybrid of bit packing and RLE, in which the encoding switches based on which produces the best compression results. This strategy works well for certain types of integer data and combines well with dictionary encoding. == Cloud Storage and Data Lakes == Parquet is widely used as the underlying file format in modern cloud-based data lake architectures. Cloud storage systems such as Amazon S3, Azure Data Lake Storage, and Google Cloud Storage commonly store data in Parquet format due to its efficient columnar representation and retrieval capabilities. Data lakehouse frameworks—including Apache Iceberg, Delta Lake, and Apache Hudi —build an additional metadata layer on top of Parquet files to support features such as schema evolution, time-travel queries, and ACID-compliant transactions. In these architectures, Parquet files serve as the immutable storage layer while the table formats manage data versioning and transactional integrity. == Comparison == Apache Parquet is comparable to RCFile and Optimized Row Columnar (ORC) file formats — all three fall under the category of columnar data storage within the Hadoop ecosystem. They all have better compression and encoding with improved read performance at the cost of slower writes. In addition to these features, Apache Parquet supports limited schema evolution, i.e., the schema can be modified according to the changes in the data. It also provides the ability to add new columns and merge schemas that do not conflict. Apache Arrow is designed as an in-memory complement to on-disk columnar formats like Parquet and ORC. The Arrow and Parquet projects include libraries that allow for reading and writing between the two formats. == Implementations == Known implementations of Parquet include:

Medical imaging

Medical imaging is the technique and process of imaging the interior of a body for clinical analysis and medical intervention, as well as visual representation of the function of some organs or tissues (physiology). Medical imaging seeks to reveal internal structures hidden by the skin and bones, as well as to diagnose and treat disease. Medical imaging also establishes a database of normal anatomy and physiology to make it possible to identify abnormalities. Although imaging of removed organs and tissues can be performed for medical reasons, such procedures are usually considered part of pathology instead of medical imaging. Measurement and recording techniques that are not primarily designed to produce images, such as electroencephalography (EEG), magnetoencephalography (MEG), electrocardiography (ECG), and others, represent other technologies that produce data susceptible to representation as a parameter graph versus time or maps that contain data about the measurement locations. In a limited comparison, these technologies can be considered forms of medical imaging in another discipline of medical instrumentation. As of 2010, 5 billion medical imaging studies had been conducted worldwide. Radiation exposure from medical imaging in 2006 made up about 50% of total ionizing radiation exposure in the United States. Medical imaging equipment is manufactured using technology from the semiconductor industry, including CMOS integrated circuit chips, power semiconductor devices, sensors such as image sensors (particularly CMOS sensors) and biosensors, and processors such as microcontrollers, microprocessors, digital signal processors, media processors and system-on-chip devices. As of 2015, annual shipments of medical imaging chips amount to 46 million units and $1.1 billion. The term "noninvasive" is used to denote a procedure where no instrument is introduced into a patient's body, which is the case for most imaging techniques used. == History == In 1972, engineer Godfrey Hounsfield from the British company EMI invented the X-ray computed tomography device for head diagnosis, which is commonly referred to as computed tomography (CT). The CT nucleus method is based on the projecting X-rays through a section of the human head, which are then processed by computer to reconstruct the cross-sectional image, known as image reconstruction. In 1975, EMI successfully developed a CT device for the entire body, enabling the clear acquisition of tomographic images of various parts of the human body. This revolutionary diagnostic technique earned Hounsfield and physicist Allan Cormack the Nobel Prize in Physiology or Medicine in 1979. Digital image processing technology for medical applications was inducted into the Space Foundation's Space Technology Hall of Fame in 1994. By 2010, over 5 billion medical imaging studies had been conducted worldwide. Radiation exposure from medical imaging in 2006 accounted for about 50% of total ionizing radiation exposure in the United States. Medical imaging equipment is manufactured using technology from the semiconductor industry, including CMOS integrated circuit chips, power semiconductor devices, sensors such as image sensors (particularly CMOS sensors) and biosensors, as well as processors like microcontrollers, microprocessors, digital signal processors, media processors and system-on-chip devices. As of 2015, annual shipments of medical imaging chips reached 46 million units, generating a market value of $1.1 billion. == Types == In the clinical context, "invisible light" medical imaging is generally equated to radiology or "clinical imaging". "Visible light" medical imaging involves digital video or still pictures that can be seen without special equipment. Dermatology and wound care are two modalities that use visible light imagery. Interpretation of medical images is generally undertaken by a physician specialising in radiology known as a radiologist; however, this may be undertaken by any healthcare professional who is trained and certified in radiological clinical evaluation. Increasingly interpretation is being undertaken by non-physicians, for example radiographers frequently train in interpretation as part of expanded practice. Diagnostic radiography designates the technical aspects of medical imaging and in particular the acquisition of medical images. The radiographer (also known as a radiologic technologist) is usually responsible for acquiring medical images of diagnostic quality; although other professionals may train in this area, notably some radiological interventions performed by radiologists are done so without a radiographer. As a field of scientific investigation, medical imaging constitutes a sub-discipline of biomedical engineering, medical physics or medicine depending on the context: Research and development in the area of instrumentation, image acquisition (e.g., radiography), modeling and quantification are usually the preserve of biomedical engineering, medical physics, and computer science; Research into the application and interpretation of medical images is usually the preserve of radiology and the medical sub-discipline relevant to medical condition or area of medical science (neuroscience, cardiology, psychiatry, psychology, etc.) under investigation. Many of the techniques developed for medical imaging also have scientific and industrial applications. === Radiography === Two forms of radiographic images are in use in medical imaging. Projection radiography and fluoroscopy, with the latter being useful for catheter guidance. These 2D techniques are still in wide use despite the advance of 3D tomography due to the low cost, high resolution, and depending on the application, lower radiation dosages with 2D technique. This imaging modality uses a wide beam of X-rays for image acquisition and is the first imaging technique available in modern medicine. Fluoroscopy produces real-time images of internal structures of the body in a similar fashion to radiography, but employs a constant input of X-rays, at a lower dose rate. Contrast media, such as barium, iodine, and air are used to visualize internal organs as they work. Fluoroscopy is also used in image-guided procedures when constant feedback during a procedure is required. An image receptor is required to convert the radiation into an image after it has passed through the area of interest. Early on, this was a fluorescing screen, which gave way to an Image Amplifier (IA) which was a large vacuum tube that had the receiving end coated with cesium iodide, and a mirror at the opposite end. Eventually the mirror was replaced with a TV camera. Projectional radiographs, more commonly known as X-rays, are often used to determine the type and extent of a fracture as well as for detecting pathological changes in the lungs. With the use of radio-opaque contrast media, such as barium, they can also be used to visualize the structure of the stomach and intestines – this can help diagnose ulcers or certain types of colon cancer. === Magnetic resonance imaging === A magnetic resonance imaging instrument (MRI scanner), or "nuclear magnetic resonance (NMR) imaging" scanner as it was originally known, uses powerful magnets to polarize and excite hydrogen nuclei (i.e., single protons) of water molecules in human tissue, producing a detectable signal that is spatially encoded, resulting in images of the body. The MRI machine emits a radio frequency (RF) pulse at the resonant frequency of the hydrogen atoms on water molecules. Radio frequency antennas ("RF coils") send the pulse to the area of the body to be examined. The RF pulse is absorbed by protons, causing their direction with respect to the primary magnetic field to change. When the RF pulse is turned off, the protons "relax" back to alignment with the primary magnet and emit radio waves in the process. This radio-frequency emission from the hydrogen atoms on water is what is detected and reconstructed into an image. The resonant frequency of a spinning magnetic dipole (of which protons are one example) is called the Larmor frequency and is determined by the strength of the main magnetic field and the chemical environment of the nuclei of interest. MRI uses three electromagnetic fields: a very strong (typically 1.5 to 3 teslas) static magnetic field to polarize the hydrogen nuclei, called the primary field; gradient fields that can be modified to vary in space and time (on the order of 1 kHz) for spatial encoding, often simply called gradients; and a spatially homogeneous radio-frequency (RF) field for manipulation of the hydrogen nuclei to produce measurable signals, collected through an RF antenna. Like CT, MRI traditionally creates a two-dimensional image of a thin "slice" of the body and is therefore considered a tomographic imaging technique. Modern MRI instruments are capable of producing images in the form of 3D blocks, which may be considered a generalization of the single-slice

Hubert Dreyfus's views on artificial intelligence

Hubert Dreyfus was a critic of artificial intelligence research. In a series of papers and books, including Alchemy and AI (1965), What Computers Can't Do (1972; 1979; 1992) and Mind over Machine (1986), he presented a skeptical and cautious assessment of AI's progress and a critique of the philosophical foundations of the field. Dreyfus' objections are discussed in most introductions to the philosophy of artificial intelligence, including Russell & Norvig (2021), a standard AI textbook, and in Fearn (2007), a survey of contemporary philosophy. Dreyfus argued that human intelligence and expertise depend primarily on yet-to-be understood informal and unconscious processes rather than symbolic manipulation and that these essentially human skills cannot be fully captured in formal rules. His critique was based on the insights of modern continental philosophers such as Merleau-Ponty and Heidegger, and was directed at the first wave of AI research which tried to reduce intelligence to high level formal symbols. When Dreyfus' ideas were first introduced in the mid-1960s, they were met in the AI community with ridicule and outright hostility. By the 1980s, however, some of his perspectives were rediscovered by researchers working in robotics and the new field of connectionism—approaches that were called "sub-symbolic" at the time because they eschewed early AI research's emphasis on high level symbols. In the 21st century, "sub-symbolic" artificial neural networks and other statistics-based approaches to machine learning were highly successful. Historian and AI researcher Daniel Crevier wrote: "time has proven the accuracy and perceptiveness of some of Dreyfus's comments." Dreyfus said in 2007, "I figure I won and it's over—they've given up." == Dreyfus' critique == === The grandiose promises of artificial intelligence === In Alchemy and AI (1965) and What Computers Can't Do (1972), Dreyfus summarized the history of artificial intelligence and ridiculed the unbridled optimism that permeated the field. For example, Herbert A. Simon, following the success of his program General Problem Solver (1957), predicted that by 1967: A computer would be world champion in chess. A computer would discover and prove an important new mathematical theorem. Most theories in psychology will take the form of computer programs. The press dutifully reported these predictions of the imminent arrival of machine intelligence. Dreyfus felt that this optimism was unwarranted and, in 1965, argued forcefully that predictions like these would not come true. He would eventually be proven right. Pamela McCorduck explains Dreyfus' position: A great misunderstanding accounts for public confusion about thinking machines, a misunderstanding perpetrated by the unrealistic claims researchers in AI have been making, claims that thinking machines are already here, or at any rate, just around the corner. These predictions were based on the success of the cognitive revolution, which promoted an "information processing" model of the mind. It was articulated by Newell and Simon in their physical symbol systems hypothesis, and later expanded into a philosophical position known as computationalism by philosophers such as Jerry Fodor and Hilary Putnam. In AI, the approach is now called symbolic AI or "GOFAI". Dreyfus argued that "symbolic AI" was the latest version of the ancient program of rationalism in philosophy. Rationalism had come under heavy criticism in the 20th century from philosophers like Martin Heidegger and Edmund Husserl. The mind, according to modern continental philosophy, is not "rationalist" and is nothing like a digital computer. Cognitivism led early AI researchers to believe that they had successfully simulated the essential process of human thought, thus it seemed a short step to producing fully intelligent machines. Dreyfus' last paper detailed the ongoing history of the "first step fallacy", where AI researchers tend to wildly extrapolate initial success as promising, perhaps even guaranteeing, wild future successes. === Dreyfus' four assumptions of artificial intelligence research === In Alchemy and AI and What Computers Can't Do, Dreyfus identified four philosophical assumptions, at least one of which he deems necessary for AI to succeed. "In each case," Dreyfus writes, "the assumption is taken by workers in AI as an axiom, guaranteeing results, whereas it is, in fact, one hypothesis among others, to be tested by the success of such work." Dreyfus argues that AI would be impossible without accepting at least one of these four assumptions: The biological assumption The brain processes information in discrete operations by way of some biological equivalent of on/off switches. In the early days of research into neurology, scientists found that neurons fire in all-or-nothing pulses. Several researchers, such as Walter Pitts and Warren McCulloch, speculated with great confidence that neurons functioned similarly to the way Boolean logic gates operate, and so could be imitated by electronic circuitry at the level of the neuron. When digital computers became widely used in the early 50s, this argument was extended to suggest that the brain was a vast physical symbol system, manipulating the binary symbols of zero and one. Dreyfus was able to refute the biological assumption by citing research in neurology that suggested that the action and timing of neuron firing had analog components. But Daniel Crevier observes that "few still held that belief in the early 1970s, and nobody argued against Dreyfus" about the biological assumption. The psychological assumption The mind can be viewed as a device operating on bits of information according to formal rules. He refuted this assumption by showing that much of what we know about the world consists of complex attitudes or tendencies that make us lean towards one interpretation over another. He argued that, even when we use explicit symbols, we are using them against an unconscious and informal background including commonsense knowledge and that without this background our symbols cease to mean anything. This background, in Dreyfus' view, was not implemented in individual brains as explicit individual symbols with explicit individual meanings. The epistemological assumption All knowledge can be formalized. This concerns the philosophical issue of epistemology, or the study of knowledge. Even if we agree that the psychological assumption is false, AI researchers could still argue (as AI founder John McCarthy has) that it is possible for a symbol processing machine to represent all knowledge, regardless of whether human beings represent knowledge the same way. Dreyfus argued that there is no justification for this assumption, since so much of human knowledge is not symbolic or even expressible using formal constructs. The ontological assumption The world consists of independent facts that can be represented by independent symbols AI researchers (and futurists and science fiction writers) often assume that there is no limit to formal, scientific knowledge, because they assume that any phenomenon in the universe can be described by symbols or scientific theories. This assumes that everything that exists can be understood as objects, properties of objects, classes of objects, relations of objects, and so on: precisely those things that can be described by logic, language and mathematics. The study of being or existence is called ontology, and so Dreyfus calls this the ontological assumption. If this is false, then it raises doubts about what we can ultimately know and what intelligent machines will ultimately be able to help us to do. === Knowing-how vs. knowing-that: the primacy of intuition === In Mind Over Machine (1986), written (with his brother) during the heyday of expert systems, Dreyfus analyzed the difference between human expertise and the programs that claimed to capture it. This expanded on ideas from What Computers Can't Do, where he had made a similar argument criticizing the "cognitive simulation" school of AI research practiced by Allen Newell and Herbert A. Simon in the 1960s. Dreyfus argued that human problem solving and expertise depend on our background sense of the context, of what is important and interesting given the situation, rather than on the process of searching through combinations of possibilities to find what we need. Dreyfus would describe it in 1986 as the difference between "knowing-that" and "knowing-how", based on Heidegger's distinction of present-at-hand and ready-to-hand. Knowing-that is our conscious, step-by-step problem solving abilities. We use these skills when we encounter a difficult problem that requires us to stop, step back and search through ideas one at time. At moments like this, the ideas become very precise and simple: they become context free symbols, which we manipulate using logic and language. These are the skills that Newell and Simon had demonstrated with both psy

Xcon

The R1, internally called XCON (Expert Configurer), program was a production-rule-based system written in OPS5 by John P. McDermott of Carnegie Mellon University in 1978 to assist in the ordering of Digital Equipment Corporation's (DEC) VAX computer systems by automatically selecting the computer system components based on the customer's requirements. == Overview == In developing the system, McDermott made use of experts from both DEC's PDP/11 and VAX computer systems groups. These experts sometimes even disagreed amongst themselves as to an optimal configuration. The resultant "sorting it out" had an additional benefit in terms of the quality of VAX systems delivered. XCON first went into use in 1980 in DEC's plant in Salem, New Hampshire, US. It eventually had about 2500 rules. By 1986, it had processed 80,000 orders, and achieved 95–98% accuracy. It was estimated to be saving DEC $25M a year by reducing the need to give customers free components when technicians made errors, by speeding the assembly process, and by increasing customer satisfaction. Before XCON, when ordering a VAX from DEC, every cable, connection, and bit of software had to be ordered separately. (Computers and peripherals were not sold complete in boxes as they are today.) The sales people were not always very technically expert, so customers would find that they had hardware without the correct cables, printers without the correct drivers, a processor without the correct language chip, and so on. This meant delays and caused a lot of customer dissatisfaction and resultant legal action. XCON interacted with the sales person, asking critical questions before printing out a coherent and workable system specification/order slip. XCON's success led DEC to rewrite XCON as XSEL—a version of XCON intended for use by DEC's salesforce to aid a customer in properly configuring their VAX (so they would not, say, choose a computer too large to fit through their doorway or choose too few cabinets for the components to fit in). Location problems and configuration were handled by yet another expert system, XSITE. McDermott's 1980 paper on R1 won the Association for the Advancement of Artificial Intelligence (AAAI) Classic Paper Award in 1999. Footnote 2 gave a humorous explanation for the name "R1" as "Four years ago I couldn't even say "knowledge engineer", now I ... [are one.]".

Six Little Dragons

Six Little Dragons (Chinese: 杭州六小龙), or Six Little Dragons of Hangzhou, are an informal grouping of the tech startups Game Science, DeepSeek, Unitree Robotics, DEEP Robotics, BrainCo and Manycore Tech. All six were established in Hangzhou, They are active in artificial intelligence, robotics, gaming, and brain-computer interface technology. Hangzhou is referred to as the China’s “e-commerce capital” (电商之都). The nickname "Six Little Dragons" originated from the Chinese internet. == Background == === Chinese government investments (2002 — 2010s) === From 2002 to 2007, under Xi Jinping's leadership as party secretary of Zhejiang, provincial spending on technology research grew over four times to 28 billion RMB. The province launched "Digital Zhejiang" (数字浙江) to advance modernization and the "Eight Eight Strategy" (八八战略), focusing on eight advantages and actions to boost industrial development, including specialized industries. In 2010, Hangzhou's government started "Project Eagle" (雏鹰计划) to aid science and technology startups. The project works with incubators and accelerators to find promising tech companies and offers public funding and other help, especially for startups by graduates and returning students. Unitree received support in the initial phase, along with government subsidies from Binjiang District. === AI-startups and further investments (2025 — present) === In January 2025, the Chinese government created the "Hangzhou AI Industry Chain High-Quality Development Action Plan" which focuses on computing power, LLM technologies, and AI applications. The plan was made to certify over 2,000 new high-tech enterprises, initiate over 300 major tech projects, and invest more than 300 billion RMB (US$40 billion) annually. The Chinese government also renewed "Project Eagle" and to allocate 15% of industrial policy funds for future industries. Hangzhou aimed to become a center for tech startups, highlighting the "six little dragons of Hangzhou," a nickname popularized in early 2025. This group includes DeepSeek, Game Science, Unitree Robotics, Manycore Tech, BrainCo, and DEEP Robotics, companies in gaming, robotics, and software development. Earlier in 2025, DeepSeek, one of the six dragons, launched an AI system at a much lower cost than those from Silicon Valley. Since then, DeepSeek and Alibaba have produced top-performing open source AI models. Game Science launched the successful video game Black Myth: Wukong in 2024, while Unitree gained attention for their dancing robots in the 2025 annual spring gala broadcast by Chinese state media. The group was acknowledged by Chinese authorities in Hangzhou in a New Years message for local businesses in January 2025. Hangzhou’s universities were given credit for the development of Chinese technological industry. Zhejiang University alumni founded three of the "Six Little Dragons". By September 2024, the university produced 102 executives in Chinese AI start-ups, ranking third among China's top institutions. On February 20, 2025, Alibaba's Eddie Wu stated that the company would focus on artificial generative intelligence and plans significant investment in AI. The company also sought to boost foreign investment to China's "Six Little Dragons" following Alibaba's founder Jack Ma attended General Secretary of the Chinese Communist Party Xi Jinping's business symposium with corporate leaders and entrepreneurs that same month. == Challenges == China's net foreign direct investment (FDI) fell by US$168 billion in 2024, marking the largest capital flight since 1990. Foreign investment peaked at US$344 billion in 2021 but has since declined according to the State Administration of Foreign Exchange. In 2024, foreign investors put in only US$4.5 billion while Chinese firms invested US$173 billion abroad. According to interviews conducted by The New York Times, some start-up company founders believe that Chinese government's support for Hangzhou's technological sector has deterred foreign investors. Tensions with the United States led many international companies to adopt a China Plus One strategy, while Chinese firms build factories overseas to avoid potential Trump tariffs. China also faced US restrictions on its access of advanced chips, forcing Chinese tech companies to stockpile Nvidia chips while Chinese producers like Huawei and Semiconductor Manufacturing International Corporation (SMIC) were competing to produce their own.

Collabora Online

Collabora Online (often abbreviated as COOL) is an open-source online office suite developed by Collabora, based on LibreOffice Online, the web-based edition of the LibreOffice office suite. It enables real-time collaborative editing of documents, spreadsheets, presentations, and vector graphics in a web browser. Optional applications are available for offline use on Android, ChromeOS, iOS, iPadOS, Linux distributions, macOS, and Windows. It supports the OpenDocument format and is compatible with other major formats, including those used by Microsoft Office. The Document Foundation (TDF), the nonprofit organization behind LibreOffice, states that a majority of the LibreOffice software development is done by its partners like Collabora. Collabora Online is an open-source alternative to proprietary cloud office platforms such as Google Workspace and Microsoft 365. Unlike these services, it can be self-hosted or hosted by third-party providers. The platform is marketed particularly toward enterprises and public institutions seeking greater digital sovereignty and independence from U.S.-based "big tech" companies. Collabora also develops Collabora Office, a standalone desktop and mobile app suite based on LibreOffice. Although Collabora Online has increasingly taken on a central role, both products may be used in parallel, similar to Microsoft Office and Microsoft 365. In November 2025, Collabora released Collabora Office Desktop and renamed the previous product Collabora Office Classic. The new product shares code with Collabora Online and brings the same user interface to the desktop on Linux, Windows and MacOS. A separate version, the Collabora Online Development Edition (CODE), is offered free of charge and is recommended for individuals, small teams, and developers. CODE provides early access to new features and serves as a testing and development platform for open-source community contributors. As TDF does not offer a free version of LibreOffice Online, CODE represents the primary freely available option for organizations and individuals interested in deploying LibreOffice in a web-based, collaborative setting. == Applications == Collabora Online includes several applications for document editing, available through the web-based interface and optional desktop and mobile apps: Collabora Writer – A word processor based on LibreOffice Writer, comparable to Microsoft Word and Google Docs. It supports WYSIWYG editing, styles, formatting tools, comment threads, and change tracking. Collabora Calc – A spreadsheet editor based on LibreOffice Calc, similar to Microsoft Excel and Google Sheets. Features include pivot tables, formulas, data validation, conditional formatting, advanced sorting and filtering, charts, and support for up to 16,000 columns. Compatible with some macros written in VBA. Collabora Impress – A presentation program based on LibreOffice Impress, comparable to Microsoft PowerPoint and Google Slides. It supports master slides, transitions, speaker notes, and multimedia elements. Collabora Draw is not a separate application, most of the functionality of the Draw application is now integrated in Writer and Impress – vector graphics editor based on LibreOffice Draw, comparable to Microsoft Visio and Google Drawings. == Features == Collabora Online can be accessed from modern web browsers without the need for plug-ins or add-ons. It supports real-time collaborative editing of word processing documents, spreadsheets, presentations, and vector graphics. Collaboration features include commenting, version tracking with document comparison and restoration, and integration with communication tools such as chat or video calls. These functions are often enabled through integration with enterprise open-source cloud platforms like Nextcloud, ownCloud, Seafile, EGroupware, GroupOffice and others. Collabora Online can also be embedded or integrated into a variety of third-party applications. Although client apps are not required to use the web-based suite, optional applications are available for offline use on Android, ChromeOS, iOS, iPadOS, Linux distributions, macOS, and Windows. These apps share the same LibreOffice-based core as the server version, ensuring document compatibility across platforms. Development of the LibreOffice core benefits both the online server and the client applications simultaneously. The mobile apps offer touch-optimized interfaces that adapt to different screen sizes and can be used offline, with optional integration into cloud storage services. Collabora Online supports OpenDocument formats (ODF; .odt, .odp, .ods, .odg) in accordance with ISO/IEC 26300. It is also compatible with Microsoft Office formats, including Office Open XML (.docx, .pptx, .xlsx) and legacy binary formats (.doc, .ppt, .xls). Additional supported formats include PDF, PNG, CSV, TSV, RTF, EPUB, and others. The suite can import a range of formats supported by LibreOffice, including Microsoft Visio and Publisher files, Apple Keynote, Numbers, and Pages files, as well as legacy formats used by Lotus 1-2-3, Microsoft Works, and Quattro Pro. The core of Collabora Online is written in C++ and utilizes LibreOfficeKit, a programming interface that enables reuse of much of LibreOffice's existing code for document saving, loading, and rendering. Collabora Online operates on the principle that documents remain on the server, with users viewing tile-rendered images of the document and sending their edits back to the server. The user interface is implemented in JavaScript. For file access and authentication with file hosting services, Collabora Online uses Microsoft's WOPI protocol, allowing compatibility with any service supporting Microsoft 365 integration. == Server == The server component can be self-hosted or deployed through third-party enterprise open-source cloud platforms, allowing organizations to maintain control over data and infrastructure. It is available for various Linux distributions and as a Docker image. The server enables features such as in-browser document editing, file synchronization, and real-time communication. These third-party cloud platforms typically offer additional functionality comparable to services such as Dropbox, Google Workspace, Microsoft 365, or Zoom, including file sharing, calendars, email, contacts, chat, and video conferencing. Collabora Online can be integrated into these applications, as well as with other services such as learning management systems and enterprise content platforms, through open APIs and an SDK. == Reception == Various online and print publications have discussed Collabora Online. In December 2016 the technology website Softpedia mentioned the availability of collaborative editing in version 2.0 and the integration with ownCloud, Nextcloud, and other file synchronization and sharing solutions. In June 2020, ZDNET reported that Collabora Online would be included as the standard office suite in Nextcloud version 19, noting that direct document editing was added to the native video conferencing software Talk. The technology blog OMG! Ubuntu! covered the release of Collabora's Android and iOS apps, emphasizing their offline functionality. In September 2020, Linux Magazine compared Collabora Online with OnlyOffice, noting the flexibility and platform independence of both tools and highlighting Collabora's extensive feature set derived from LibreOffice. === Digital sovereignty === Collabora Online's open-source design and support for self-hosting have made it notable in discussions about digital sovereignty—the ability of users and organizations to control their own data. This is particularly relevant in Europe, where concerns about dependence on U.S.-based "big tech" companies and data privacy have grown in recent years. On 10th June 2025, Microsoft executives under oath in the French Senate admitted that they cannot guarantee data sovereignty and would be compelled to pass French (and by implication the wider European Union) information to the US administration if requested via a warrant or subpoena. The Cloud Act is a law that gives the US government authority to obtain digital data held by US-based tech corporations, irrespective of whether that data is stored on servers at home or on foreign soil. A 2020 briefing by the European Parliament highlighted risks associated with reliance on major technology companies that collect and exploit user data. Legal decisions such as the Schrems II ruling have further underscored these concerns. Several European government agencies have adopted private cloud solutions using Collabora Online and related platforms to enhance data security and maintain control over sensitive information. == History == The former LibreOffice development team from SUSE joined Collabora in September 2013, forming the subsidiary Collabora Productivity. In 2015 Collabora and IceWarp announced the development of an enterprise-ready version of LibreOffice Online to compete wi

RightsCon

RightsCon is an annual conference on digital rights hosted by Access Now. It convenes international leaders and organizations to discuss global problems including internet censorship, the regulation of algorithms, electronic surveillance, the ethics of technology, online hate speech, content moderation, cyberwarfare, and more. == History == The conference was first convened by Access (today, Access Now) in Silicon Valley in 2011, with the intention of gathering civil society to discuss impacts of the growing tech industry on digital rights and human rights. It sought the participation of leaders from both industry (including companies such as Twitter, Google, Mozilla, and Comcast) and civil society organizations (such as the Electronic Frontier Foundation and New America). Keynote speakers included the then-Assistant Secretary of State, Michael Posner; Egyptian blogger and political prisoner, Alaa Abd El-Fattah; and then-director of public policy at Google, Bob Boorstin. RightsCon organizers have sought to ensure the event is accessible to attendees from across the globe, particularly global majority countries, informing the decision to hold the conference in Asia, the Middle East, and Latin America. === Online convenings === In 2020, RightsCon was to be held in San José, Costa Rica, but due to the COVID-19 pandemic, the meeting took place in an online format. In 2021, the 10th edition of RightsCon was again held online from Monday, June 7 to Friday, June 11, 2021, due to the continued global COVID-19 pandemic which altered several digital rights physical meetings. The topics for RightsCon2021 included: Artificial Intelligence (AI), automation, data protection and user control, digital futures, democracy, elections, new business models, content control, peacebuilding, censorship, internet shutdowns, freedom of the media and many others were discussed by several digital rights organizations and individuals. === 2026 cancellation === The 14th RightsCon was scheduled to be held in Zambia from May 5 to 8, 2026. On April 29, 2026, the Zambian government abruptly postponed the conference, writing in a statement that the postponement was "necessitated by the need for comprehensive disclosure […] relating to key thematic issues proposed for discussion during the Summit." In May 2026, the conference was cancelled due to pressure from the Chinese government. In a statement the same day, Access Now wrote that it was "told that diplomats from the People's Republic of China (PRC) were putting pressure on the Government of Zambia because Taiwanese civil society participants were planning to join us in person." == List of conferences == Past RightsCon conferences include: