AI Analytics Masters

AI Analytics Masters — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Amira (software)

    Amira (software)

    Amira (ah-MEER-ah) is a software platform for visualization, processing, and analysis of 3D and 4D data. It is being actively developed by Thermo Fisher Scientific in collaboration with the Zuse Institute Berlin (ZIB), and commercially distributed by Thermo Fisher Scientific — together with its sister software Avizo. == Overview == Amira is an extendable software system for scientific visualization, data analysis, and presentation of 3D and 4D data. It is used by researchers and engineers in academia and industry. It is a tool for processing, analysis and visualization of data from various modalities; e.g. micro-CT, PET, Ultrasound. It is used in many fields, such as microscopy in biology and materials science, molecular biology, quantum physics, astrophysics, computational fluid dynamics (CFD), finite element modeling (FEM), non-destructive testing (NDT), and many more. One of the key features, besides data visualization, is Amira's set of tools for image segmentation and geometry reconstruction. This allows the user to mark (or segment) structures and regions of interest in 3D image volumes using automatic, semi-automatic, and manual tools. The segmentation can then be used for a variety of subsequent tasks, such as volumetric analysis, density analysis, shape analysis, or the generation of 3D computer models for visualization, numerical simulations, or rapid prototyping or 3D printing. Other key Amira features are multi-planar and volume visualization, image registration, filament tracing, cell separation and analysis, tetrahedral mesh generation, fiber-tracking from diffusion tensor imaging (DTI) data, skeletonization, spatial graph analysis, and stereoscopic rendering of 3D data over multiple displays and immersive virtual reality environments, including CAVEs. As a commercial product Amira requires the purchase of a license or an academic subscription. A time-limited, but full-featured evaluation version is available for download free of charge. == History == === 1993–1998: Research software === Amira's roots go back to 1993 and the Department for Scientific Visualization, headed by Hans-Christian Hege at the Zuse Institute Berlin (ZIB). The ZIB is a research institute for mathematics and informatics. The Scientific Visualization department's mission is to help solve computationally and scientifically challenging tasks in medicine, biology, engineering and materials science. For this purpose, it develops algorithms and software for 2D, 3D, and 4D data visualization and visually supported exploration and analysis. At that time, the young visualization group at the ZIB had experience with the extendable, data flow-oriented visualization environments apE, IRIS Explorer, and Advanced Visualization Studio (AVS), but was not satisfied with these products' interactivity, flexibility, and ease-of-use for non-computer scientists. Therefore, the development of a new software system was started in a research project within a medically oriented, multi-disciplinary collaborative research center. Based on experiences that Tobias Höllerer had gained in late 1993 with the new graphics library IRIS Inventor, it was decided to utilize that library. The development of the medical planning system was performed by Detlev Stalling, who later became the chief software architect of Amira. The new software was called "HyperPlan", highlighting its initial target application – a planning system for hyperthermia cancer treatment. The system was being developed on Silicon Graphics (SGI) computers, which at the time were the standard workstations used for high-end graphics computing. The software was based on libraries such as OpenGL (originally IRIS GL), Open Inventor (originally IRIS Inventor), and the graphical user interface libraries X11, Motif (software), and ViewKit. In 1998, X11/Motif/Viewkit were replaced by the Qt toolkit. The HyperPlan framework served as the base for more and more projects at the ZIB and was used by a growing number of researchers in collaborating institutions. The projects included applications in medical image computing, medical visualization, neurobiology, confocal microscopy, flow visualization, molecular analytics and computational astrophysics. === 1998–today: Commercially supported product === The growing number of users of the system started to exceed the capacities that ZIB could spare for software distribution and support, as ZIB's primary mission was algorithmic research. Therefore, the spin-off company Indeed – Visual Concepts GmbH was founded by Hans-Christian Hege, Detlev Stalling, and Malte Westerhoff. In Feb 1998 the HyperPlan software was given the new, application-neutral name "Amira". This name is not an acronym, but was chosen for being pronounceable in different languages and providing a suitable connotation, namely "to look at" or "to wonder at", from the Latin verb "admirare" (to admire), which reflects a basic situation in data visualization. A major re-design of the software was undertaken by Detlev Stalling and Malte Westerhoff in order to make it a commercially supportable product and to make it available on non-SGI computers as well. In March 1999, the first version of the commercial Amira was exhibited at the CeBIT tradeshow in Hannover, Germany on SGI IRIX and Hewlett-Packard UniX (HP-UX) booths. Versions for Linux and Microsoft Windows followed within the following twelve months. Later Mac OS X support was added. Indeed – Visual Concepts GmbH selected the Bordeaux, France and San Diego, United States based company TGS, Inc. as the worldwide distributor for Amira and completed five major releases (up to version 3.1) in the subsequent four years. In 2003 both Indeed – Visual Concepts GmbH, as well as TGS, Inc. were acquired by Massachusetts-based Mercury Computer Systems, Inc. (NASDAQ:MRCY) and became part of Mercury's newly formed life sciences business unit, later branded Visage Imaging. In 2009, Mercury Computer Systems, Inc. spun off Visage Imaging again and sold it to Melbourne, Australia based Promedicus Ltd (ASX:PME), a leading provider of radiology information systems and medical IT solutions. During this time, Amira continued to be developed in Berlin, Germany and in close collaboration with the ZIB, still headed by the original creators of Amira. TGS, located in Bordeaux, France was sold by Mercury Computer systems to a French investor and renamed to Visualization Sciences Group (VSG). VSG continued the work on a complementary product named Avizo, based on the same source code but customized for material sciences. In August 2012, FEI, to that date the largest OEM reseller of Amira, purchased VSG and the Amira business from Promedicus. This brought the two software sisters Amira and Avizo back into one hand. In August 2013, Visualization Sciences Group (VSG) became a business unit of FEI. In 2016 FEI has been bought by Thermo Fisher Scientific and became part of its Materials & Structural Analysis division in early 2017. Amira and Avizo are still being marketed as two different products; Amira for life sciences and Avizo for materials science, but the development efforts are now joined once again. In the meantime, the number of scientific articles using the Amira / Avizo software, is in the order of 10 thousands. == Amira options == === Microscopy option === Specific readers for microscopy data Image deconvolution Exploration of 3D imagery obtained from virtually any microscope Extraction and editing of filament networks from microscopy images === DICOM reader === Import of clinical and preclinical data in DICOM format === Mesh option === Generation of 3D finite element (FE) meshes from segmented image data Support for many state-of-the-art FE solver formats High-quality visualization of simulation mesh-based results, using scalar, vector, and tensor field display modules === Skeletonization option === Reconstruction and analysis of neural and vascular networks Visualization of skeletonized networks Length and diameter quantification of network segments Ordering of segments in a tree graph Skeletonization of very large image stacks === Molecular option === Advanced tools for the visualization of molecule models Hardware-accelerated volume rendering Powerful molecule editor Specific tools for complex molecular visualization === Developer option === Creation of new custom components for visualizing or data processing Implementation of new file readers or writers C++ programming language Development wizard for getting started quickly === Neuro option === Medical image analysis for DTI and brain perfusion Fiber tracking supporting several stream-line based algorithms Fiber separation into fiber bundles based on user defined source and destination regions Computation of tensor fields, diffusion weighted maps Eigenvalue decomposition of tensor fields Computation of mean transit time, cerebral blood flow, and cerebral blood volume === VR option === Visualization of data on large tiled displays

    Read more →
  • Parchive

    Parchive

    Parchive (a portmanteau of parity archive, and formally known as Parity Volume Set Specification) is an erasure code system that produces par files for checksum verification of data integrity, with the capability to perform data recovery operations that can repair or regenerate corrupted or missing data. Parchive was originally written to solve the problem of reliable file sharing on Usenet, but it can be used for protecting any kind of data from data corruption, disc rot, bit rot, and accidental or malicious damage. Despite the name, Parchive uses more advanced techniques (specifically error correction codes) than simplistic parity methods of error detection. As of 2015, PAR1 is obsolete, PAR2 is mature for widespread use, and PAR3 is a discontinued experimental version developed by MultiPar author Yutaka Sawada. The original SourceForge Parchive project has been inactive since April 30, 2015. A new PAR3 specification has been worked on since April 28, 2019 by PAR2 specification author Michael Nahas. An alpha version of the PAR3 specification has been published on January 29, 2022 while the program itself is being developed. == History == Parchive was intended to increase the reliability of transferring files via Usenet newsgroups. Usenet was originally designed for informal conversations, and the underlying protocol, NNTP was not designed to transmit arbitrary binary data. Another limitation, which was acceptable for conversations but not for files, was that messages were normally fairly short in length and limited to 7-bit ASCII text. Various techniques were devised to send files over Usenet, such as uuencoding and Base64. Later Usenet software allowed 8 bit Extended ASCII, which permitted new techniques like yEnc. Large files were broken up to reduce the effect of a corrupted download, but the unreliable nature of Usenet remained. With the introduction of Parchive, parity files could be created that were then uploaded along with the original data files. If any of the data files were damaged or lost while being propagated between Usenet servers, users could download parity files and use them to reconstruct the damaged or missing files. Parchive included the construction of small index files (.par in version 1 and .par2 in version 2) that do not contain any recovery data. These indexes contain file hashes that can be used to quickly identify the target files and verify their integrity. Because the index files were so small, they minimized the amount of extra data that had to be downloaded from Usenet to verify that the data files were all present and undamaged, or to determine how many parity volumes were required to repair any damage or reconstruct any missing files. They were most useful in version 1 where the parity volumes were much larger than the short index files. These larger parity volumes contain the actual recovery data along with a duplicate copy of the information in the index files (which allows them to be used on their own to verify the integrity of the data files if there is no small index file available). In July 2001, Tobias Rieper and Stefan Wehlus proposed the Parity Volume Set specification, and with the assistance of other project members, version 1.0 of the specification was published in October 2001. Par1 used Reed–Solomon error correction to create new recovery files. Any of the recovery files can be used to rebuild a missing file from an incomplete download. Version 1 became widely used on Usenet, but it did suffer some limitations: It was restricted to handle at most 255 files. The recovery files had to be the size of the largest input file, so it did not work well when the input files were of various sizes. (This limited its usefulness when not paired with the proprietary RAR compression tool.) The recovery algorithm had a bug, due to a flaw in the academic paper on which it was based. It was strongly tied to Usenet and it was felt that a more general tool might have a wider audience. In January 2002, Howard Fukada proposed that a new Par2 specification should be devised with the significant changes that data verification and repair should work on blocks of data rather than whole files, and that the algorithm should switch to using 16 bit numbers rather than the 8 bit numbers that PAR1 used. Michael Nahas and Peter Clements took up these ideas in July 2002, with additional input from Paul Nettle and Ryan Gallagher (who both wrote Par1 clients). Version 2.0 of the Parchive specification was published by Michael Nahas in September 2002. Peter Clements then went on to write the first two Par2 implementations, QuickPar and par2cmdline. Abandoned since 2004, Paul Houle created phpar2 to supersede par2cmdline. Yutaka Sawada created MultiPar to supersede QuickPar. MultiPar uses par2j.exe (which is partially based on par2cmdline's optimization techniques) to use as MultiPar's backend engine. == Versions == Versions 1 and 2 of the file format are incompatible. (However, many clients support both.) === Par1 === For Par1, the files f1, f2, ..., fn, the Parchive consists of an index file (f.par), which is CRC type file with no recovery blocks, and a number of "parity volumes" (f.p01, f.p02, etc.). Given all of the original files except for one (for example, f2), it is possible to create the missing f2 given all of the other original files and any one of the parity volumes. Alternatively, it is possible to recreate two missing files from any two of the parity volumes and so forth. Par1 supports up to a total of 256 source and recovery files. === Par2 === Par2 files generally use this naming/extension system: filename.vol000+01.PAR2, filename.vol001+02.PAR2, filename.vol003+04.PAR2, filename.vol007+06.PAR2, etc. The number after the "+" in the filename indicates how many blocks it contains, and the number after "vol" indicates the number of the first recovery block within the PAR2 file. If an index file of a download states that 4 blocks are missing, the easiest way to repair the files would be by downloading filename.vol003+04.PAR2. However, due to the redundancy, filename.vol007+06.PAR2 is also acceptable. There is also an index file filename.PAR2, it is identical in function to the small index file used in PAR1. Par2 specification supports up to 32,768 source blocks and up to 65,535 recovery blocks. Input files are split into multiple equal-sized blocks so that recovery files do not need to be the size of the largest input file. Although Unicode is mentioned in the PAR2 specification as an option, most PAR2 implementations do not support Unicode. Directory support is included in the PAR2 specification, but most or all implementations do not support it. === Par3 === The Par3 specification was originally planned to be published as an enhancement over the Par2 specification. However, to date, it has remained closed source by specification owner Yutaka Sawada. A discussion on a new format started in the GitHub issue section of the maintained fork par2cmdline on January 29, 2019. The discussion led to a new format which is also named as Par3. The new Par3 format's specification is published on GitHub, but remains being an alpha draft as of January 28, 2022. The specification is written by Michael Nahas, the author of Par2 specification, with the help from Yutaka Sawada, animetosho and malaire. The new format claims to have multiple advantages over the Par2 format, including support for: More than 216 files and more than 216 blocks. Packing small files into one block, as well as deduplication when a block appears in multiple files. UTF-8 file names. File permissions, hard links, symbolic/soft links, and empty directories. Embedding PAR data inside other formats, like ZIP archives or ISO disk images. "Incremental backups", where a user creates recovery files for some file or folder, change some data, and create new recovery files reusing some of the older files. More error correction code algorithms (such as LDPC and sparse random matrix). BLAKE3 hashes, dropping support for the MD5 hashes used in PAR2. == Software == === Multi-platform === par2+tbb (GPLv2) — a concurrent (multithreaded) version of par2cmdline 0.4 using TBB. Only compatible with x86 based CPUs. It is available in the FreeBSD Ports system as par2cmdline-tbb. Original par2cmdline — (obsolete). Available in the FreeBSD Ports system as par2cmdline. par2cmdline maintained fork by BlackIkeEagle. par2cmdline-mt is another multithreaded version of par2cmdline using OpenMP, GPLv2, or later. Currently merged into BlackIkeEagle's fork and maintained there. ParPar (CC0) is a high performance, multithreaded PAR2 client and Node.js library. Does not support verifying or repair, it can currently only create PAR2 archives. par2deep (LGPL-3.0) — Produce, verify and repair par2 files recursively, both on the command line as well as with the aid of a graphical user interface. It is available in the Python Package Index system as par2deep. par2cron (MIT License) is an o

    Read more →
  • Knuth–Plass line-breaking algorithm

    Knuth–Plass line-breaking algorithm

    The Knuth–Plass algorithm is a line-breaking algorithm designed for use in Donald Knuth's typesetting program TeX. It integrates the problems of text justification and hyphenation into a single algorithm by using a discrete dynamic programming method to minimize a loss function that attempts to quantify the aesthetic qualities desired in the finished output. The algorithm works by dividing the text into a stream of three kinds of objects: boxes, which are non-resizable chunks of content, glue, which are flexible, resizeable elements, and penalties, which represent places where breaking is undesirable (or, if negative, desirable). The loss function, known as "badness", is defined in terms of the deformation of the glue elements, and any extra penalties incurred through line breaking. Making hyphenation decisions follows naturally from the algorithm, but the choice of possible hyphenation points within words, and optionally their preference weighting, must be performed first, and that information inserted into the text stream in advance. Knuth and Plass' original algorithm does not include page breaking, but may be modified to interface with a pagination algorithm, such as the algorithm designed by Plass in his PhD thesis. Typically, the cost function for this technique should be modified so that it does not count the space left on the final line of a paragraph; this modification allows a paragraph to end in the middle of a line without penalty. The same technique can also be extended to take into account other factors such as the number of lines or costs for hyphenating long words. == Computational complexity == A naive brute-force exhaustive search for the minimum badness by trying every possible combination of breakpoints would take an impractical O ( 2 n ) {\displaystyle O(2^{n})} time. The classic Knuth-Plass dynamic programming approach to solving the minimization problem is a worst-case O ( n 2 ) {\displaystyle O(n^{2})} algorithm but usually runs much faster, in close to linear time. Solving for the Knuth-Plass optimum can be shown to be a special case of the convex least-weight subsequence problem, which can be solved in O ( n ) {\displaystyle O(n)} time. Methods to do this include the SMAWK algorithm. == Simple example of minimum raggedness metric == For the input text AAA BB CC DDDDD with line width 6, a greedy algorithm that puts as many words on a line as possible while preserving order before moving to the next line, would produce: ------ Line width: 6 AAA BB Remaining space: 0 CC Remaining space: 4 DDDDD Remaining space: 1 The sum of squared space left over by this method is 0 2 + 4 2 + 1 2 = 17 {\displaystyle 0^{2}+4^{2}+1^{2}=17} . However, the optimal solution achieves the smaller sum 3 2 + 1 2 + 1 2 = 11 {\displaystyle 3^{2}+1^{2}+1^{2}=11} : ------ Line width: 6 AAA Remaining space: 3 BB CC Remaining space: 1 DDDDD Remaining space: 1 The difference here is that the first line is broken before BB instead of after it, yielding a better right margin and a lower cost 11.

    Read more →
  • Token-based replay

    Token-based replay

    Token-based replay technique is a conformance checking algorithm that checks how well a process conforms with its model by replaying each trace on the model (in Petri net notation ). Using the four counters produced tokens, consumed tokens, missing tokens, and remaining tokens, it records the situations where a transition is forced to fire and the remaining tokens after the replay ends. Based on the count at each counter, we can compute the fitness value between the trace and the model. == The algorithm == Source: The token-replay technique uses four counters to keep track of a trace during the replaying: p: Produced tokens c: Consumed tokens m: Missing tokens (consumed while not there) r: Remaining tokens (produced but not consumed) Invariants: At any time: p + m ≥ c ≥ m {\displaystyle p+m\geq c\geq m} At the end: r = p + m − c {\displaystyle r=p+m-c} At the beginning, a token is produced for the source place (p = 1) and at the end, a token is consumed from the sink place (c' = c + 1). When the replay ends, the fitness value can be computed as follows: 1 2 ( 1 − m c ) + 1 2 ( 1 − r p ) {\displaystyle {\frac {1}{2}}(1-{\frac {m}{c}})+{\frac {1}{2}}(1-{\frac {r}{p}})} == Example == Suppose there is a process model in Petri net notation as follows: === Example 1: Replay the trace (a, b, c, d) on the model M === Step 1: A token is initiated. There is one produced token ( p = 1 {\displaystyle p=1} ). Step 2: The activity a {\displaystyle \mathbf {a} } consumes 1 token to be fired and produces 2 tokens ( p = 1 + 2 = 3 {\displaystyle p=1+2=3} and c = 1 {\displaystyle c=1} ). Step 3: The activity b {\displaystyle \mathbf {b} } consumes 1 token and produces 1 token ( p = 3 + 1 = 4 {\displaystyle p=3+1=4} and c = 1 + 1 = 2 {\displaystyle c=1+1=2} ). Step 4: The activity c {\displaystyle \mathbf {c} } consumes 1 token and produces 1 token ( p = 4 + 1 = 5 {\displaystyle p=4+1=5} and c = 2 + 1 = 3 {\displaystyle c=2+1=3} ). Step 5: The activity d {\displaystyle \mathbf {d} } consumes 2 tokens and produces 1 token ( p = 5 + 1 = 6 {\displaystyle p=5+1=6} , c = 3 + 2 = 5 {\displaystyle c=3+2=5} ). Step 6: The token at the end place is consumed ( c = 5 + 1 = 6 {\displaystyle c=5+1=6} ). The trace is complete. The fitness of the trace ( a , b , c , d {\displaystyle \mathbf {a,b,c,d} } ) on the model M {\displaystyle \mathbf {M} } is: 1 2 ( 1 − m c ) + 1 2 ( 1 − r p ) = 1 2 ( 1 − 0 6 ) + 1 2 ( 1 − 0 6 ) = 1 {\displaystyle {\frac {1}{2}}(1-{\frac {m}{c}})+{\frac {1}{2}}(1-{\frac {r}{p}})={\frac {1}{2}}(1-{\frac {0}{6}})+{\frac {1}{2}}(1-{\frac {0}{6}})=1} === Example 2: Replay the trace (a, b, d) on the model M === Step 1: A token is initiated. There is one produced token ( p = 1 {\displaystyle p=1} ). Step 2: The activity a {\displaystyle \mathbf {a} } consumes 1 token to be fired and produces 2 tokens ( p = 1 + 2 = 3 {\displaystyle p=1+2=3} and c = 1 {\displaystyle c=1} ). Step 3: The activity b {\displaystyle \mathbf {b} } consumes 1 token and produces 1 token ( p = 3 + 1 = 4 {\displaystyle p=3+1=4} and c = 1 + 1 = 2 {\displaystyle c=1+1=2} ). Step 4: The activity d {\displaystyle \mathbf {d} } needs to be fired but there are not enough tokens. One artificial token was produced and the missing token counter is increased by one ( m = 1 {\displaystyle m=1} ). The artificial token and the token at place [ b , d ] {\displaystyle [\mathbf {b,d} ]} are consumed ( c = 2 + 2 = 4 {\displaystyle c=2+2=4} ) and one token is produced at place end ( p = 4 + 1 = 5 {\displaystyle p=4+1=5} ). Step 5: The token in the end place is consumed ( c = 4 + 1 = 5 {\displaystyle c=4+1=5} ). The trace is complete. There is one remaining token at place [ a , c ] {\displaystyle [\mathbf {a,c} ]} ( r = 1 {\displaystyle r=1} ). The fitness of the trace ( a , b , d {\displaystyle \mathbf {a,b,d} } ) on the model M {\displaystyle \mathbf {M} } is: 1 2 ( 1 − m c ) + 1 2 ( 1 − r p ) = 1 2 ( 1 − 1 5 ) + 1 2 ( 1 − 1 5 ) = 0.8 {\displaystyle {\frac {1}{2}}(1-{\frac {m}{c}})+{\frac {1}{2}}(1-{\frac {r}{p}})={\frac {1}{2}}(1-{\frac {1}{5}})+{\frac {1}{2}}(1-{\frac {1}{5}})=0.8}

    Read more →
  • ClearForest

    ClearForest

    ClearForest was an Israeli software company that developed and marketed text analytics and text mining solutions. == History == Founded in 1998, ClearForest had its headquarters just outside Boston and a development center in Or Yehuda. The company was acquired by Reuters in April, 2007. It now markets its services under the names Calais, OpenCalais, and OneCalais. ClearForest was previously venture-backed; its last funding round was led by Greylock Ventures and closed in 2005. Other investors included DB Capital Partners, Pitango, Walden Israel, Booz Allen, JP Morgan Partners and HarbourVest Partners. On February 7, 2008 Reuters announced the launch of Open Calais, a named-entity recognition and semantic analysis service that uses ClearForest technology. On April 30, 2007, Reuters announced that it would acquire ClearForest. Sources estimate the acquisition to be for $25 Million. == Solutions and products == ClearForest offers several hosted solutions, including: OpenCalais, a free web service and open API (for commercial and non-commercial use) that performs named-entity recognition and enables automatic metadata generation using the ClearForest financial module. Semantic Web Services (SWS), an on-demand service that makes ClearForest's natural language processing tools available as a standard web service. A subset of ClearForest's capabilities is available via SWS at no cost. Gnosis, a free Firefox extension that uses SWS to analyze the content of a web page. Gnosis identifies named entities such as people, companies, organizations, geographies and products on the page being viewed. Gnosis also automatically processes pages from Wikipedia, providing additional links for people, geographies and other entities which were not explicitly linked within the subject article. Harvest, a real-time machine-readable news service that uses SWS to process a company's news and document feeds and return machine-readable information about people, companies, locations and over 200 other entities facts and events. ClearForest also offers Text Analytics solutions targeted at specific business problems, including: Equity valuation for hedge funds and alternative investments firms Metadata & database creation for publishers and information providers/services Tapping "voice of customer" for market and survey research firms Quality Early Warning for vehicle, capital equipment & durable goods manufacturers

    Read more →
  • Berlekamp–Rabin algorithm

    Berlekamp–Rabin algorithm

    In number theory, Berlekamp's root finding algorithm, also called the Berlekamp–Rabin algorithm, is the probabilistic method of finding roots of polynomials over the field F p {\displaystyle \mathbb {F} _{p}} with p {\displaystyle p} elements. The method was discovered by Elwyn Berlekamp in 1970 as an auxiliary to the algorithm for polynomial factorization over finite fields. The algorithm was later modified by Rabin for arbitrary finite fields in 1979. The method was also independently discovered before Berlekamp by other researchers. == History == The method was proposed by Elwyn Berlekamp in his 1970 work on polynomial factorization over finite fields. His original work lacked a formal correctness proof and was later refined and modified for arbitrary finite fields by Michael Rabin. In 1986 René Peralta proposed a similar algorithm for finding square roots in F p {\displaystyle \mathbb {F} _{p}} . In 2000 Peralta's method was generalized for cubic equations. == Statement of problem == Let p {\displaystyle p} be an odd prime number. Consider the polynomial f ( x ) = a 0 + a 1 x + ⋯ + a n x n {\textstyle f(x)=a_{0}+a_{1}x+\cdots +a_{n}x^{n}} over the field F p ≃ Z / p Z {\displaystyle \mathbb {F} _{p}\simeq \mathbb {Z} /p\mathbb {Z} } of remainders modulo p {\displaystyle p} . The algorithm should find all λ {\displaystyle \lambda } in F p {\displaystyle \mathbb {F} _{p}} such that f ( λ ) = 0 {\textstyle f(\lambda )=0} in F p {\displaystyle \mathbb {F} _{p}} . == Algorithm == === Randomization === Let f ( x ) = ( x − λ 1 ) ( x − λ 2 ) ⋯ ( x − λ n ) {\textstyle f(x)=(x-\lambda _{1})(x-\lambda _{2})\cdots (x-\lambda _{n})} . Finding all roots of this polynomial is equivalent to finding its factorization into linear factors. To find such factorization it is sufficient to split the polynomial into any two non-trivial divisors and factorize them recursively. To do this, consider the polynomial f z ( x ) = f ( x − z ) = ( x − λ 1 − z ) ( x − λ 2 − z ) ⋯ ( x − λ n − z ) {\textstyle f_{z}(x)=f(x-z)=(x-\lambda _{1}-z)(x-\lambda _{2}-z)\cdots (x-\lambda _{n}-z)} where z {\displaystyle z} is some element of F p {\displaystyle \mathbb {F} _{p}} . If one can represent this polynomial as the product f z ( x ) = p 0 ( x ) p 1 ( x ) {\displaystyle f_{z}(x)=p_{0}(x)p_{1}(x)} then in terms of the initial polynomial it means that f ( x ) = p 0 ( x + z ) p 1 ( x + z ) {\displaystyle f(x)=p_{0}(x+z)p_{1}(x+z)} , which provides needed factorization of f ( x ) {\displaystyle f(x)} . === Classification of === F p {\displaystyle \mathbb {F} _{p}} elements Due to Euler's criterion, for every monomial ( x − λ ) {\displaystyle (x-\lambda )} exactly one of following properties holds: The monomial is equal to x {\displaystyle x} if λ = 0 {\displaystyle \lambda =0} , The monomial divides g 0 ( x ) = ( x ( p − 1 ) / 2 − 1 ) {\textstyle g_{0}(x)=(x^{(p-1)/2}-1)} if λ {\displaystyle \lambda } is quadratic residue modulo p {\displaystyle p} , The monomial divides g 1 ( x ) = ( x ( p − 1 ) / 2 + 1 ) {\textstyle g_{1}(x)=(x^{(p-1)/2}+1)} if λ {\displaystyle \lambda } is quadratic non-residual modulo p {\displaystyle p} . Thus if f z ( x ) {\displaystyle f_{z}(x)} is not divisible by x {\displaystyle x} , which may be checked separately, then f z ( x ) {\displaystyle f_{z}(x)} is equal to the product of greatest common divisors gcd ( f z ( x ) ; g 0 ( x ) ) {\displaystyle \gcd(f_{z}(x);g_{0}(x))} and gcd ( f z ( x ) ; g 1 ( x ) ) {\displaystyle \gcd(f_{z}(x);g_{1}(x))} . === Berlekamp's method === The property above leads to the following algorithm: Explicitly calculate coefficients of f z ( x ) = f ( x − z ) {\displaystyle f_{z}(x)=f(x-z)} , Calculate remainders of x , x 2 , x 2 2 , x 2 3 , x 2 4 , … , x 2 ⌊ log 2 ⁡ p ⌋ {\textstyle x,x^{2},x^{2^{2}},x^{2^{3}},x^{2^{4}},\ldots ,x^{2^{\lfloor \log _{2}p\rfloor }}} modulo f z ( x ) {\displaystyle f_{z}(x)} by squaring the current polynomial and taking remainder modulo f z ( x ) {\displaystyle f_{z}(x)} , Using exponentiation by squaring and polynomials calculated on the previous steps calculate the remainder of x ( p − 1 ) / 2 {\textstyle x^{(p-1)/2}} modulo f z ( x ) {\textstyle f_{z}(x)} , If x ( p − 1 ) / 2 ≢ ± 1 ( mod f z ( x ) ) {\textstyle x^{(p-1)/2}\not \equiv \pm 1{\pmod {f_{z}(x)}}} then gcd {\displaystyle \gcd } mentioned below provide a non-trivial factorization of f z ( x ) {\displaystyle f_{z}(x)} , Otherwise all roots of f z ( x ) {\displaystyle f_{z}(x)} are either residues or non-residues simultaneously and one has to choose another z {\displaystyle z} . If f ( x ) {\displaystyle f(x)} is divisible by some non-linear primitive polynomial g ( x ) {\displaystyle g(x)} over F p {\displaystyle \mathbb {F} _{p}} then when calculating gcd {\displaystyle \gcd } with g 0 ( x ) {\displaystyle g_{0}(x)} and g 1 ( x ) {\displaystyle g_{1}(x)} one will obtain a non-trivial factorization of f z ( x ) / g z ( x ) {\displaystyle f_{z}(x)/g_{z}(x)} , thus algorithm allows to find all roots of arbitrary polynomials over F p {\displaystyle \mathbb {F} _{p}} . === Modular square root === Consider equation x 2 ≡ a ( mod p ) {\textstyle x^{2}\equiv a{\pmod {p}}} having elements β {\displaystyle \beta } and − β {\displaystyle -\beta } as its roots. Solution of this equation is equivalent to factorization of polynomial f ( x ) = x 2 − a = ( x − β ) ( x + β ) {\textstyle f(x)=x^{2}-a=(x-\beta )(x+\beta )} over F p {\displaystyle \mathbb {F} _{p}} . In this particular case problem it is sufficient to calculate only gcd ( f z ( x ) ; g 0 ( x ) ) {\displaystyle \gcd(f_{z}(x);g_{0}(x))} . For this polynomial exactly one of the following properties will hold: GCD is equal to 1 {\displaystyle 1} which means that z + β {\displaystyle z+\beta } and z − β {\displaystyle z-\beta } are both quadratic non-residues, GCD is equal to f z ( x ) {\displaystyle f_{z}(x)} which means that both numbers are quadratic residues, GCD is equal to ( x − t ) {\displaystyle (x-t)} which means that exactly one of these numbers is quadratic residue. In the third case GCD is equal to either ( x − z − β ) {\displaystyle (x-z-\beta )} or ( x − z + β ) {\displaystyle (x-z+\beta )} . It allows to write the solution as β = ( t − z ) ( mod p ) {\textstyle \beta =(t-z){\pmod {p}}} . === Example === Assume we need to solve the equation x 2 ≡ 5 ( mod 11 ) {\textstyle x^{2}\equiv 5{\pmod {11}}} . For this we need to factorize f ( x ) = x 2 − 5 = ( x − β ) ( x + β ) {\displaystyle f(x)=x^{2}-5=(x-\beta )(x+\beta )} . Consider some possible values of z {\displaystyle z} : Let z = 3 {\displaystyle z=3} . Then f z ( x ) = ( x − 3 ) 2 − 5 = x 2 − 6 x + 4 {\displaystyle f_{z}(x)=(x-3)^{2}-5=x^{2}-6x+4} , thus gcd ( x 2 − 6 x + 4 ; x 5 − 1 ) = 1 {\displaystyle \gcd(x^{2}-6x+4;x^{5}-1)=1} . Both numbers 3 ± β {\displaystyle 3\pm \beta } are quadratic non-residues, so we need to take some other z {\displaystyle z} . Let z = 2 {\displaystyle z=2} . Then f z ( x ) = ( x − 2 ) 2 − 5 = x 2 − 4 x − 1 {\displaystyle f_{z}(x)=(x-2)^{2}-5=x^{2}-4x-1} , thus gcd ( x 2 − 4 x − 1 ; x 5 − 1 ) ≡ x − 9 ( mod 11 ) {\textstyle \gcd(x^{2}-4x-1;x^{5}-1)\equiv x-9{\pmod {11}}} . From this follows x − 9 = x − 2 − β {\textstyle x-9=x-2-\beta } , so β ≡ 7 ( mod 11 ) {\displaystyle \beta \equiv 7{\pmod {11}}} and − β ≡ − 7 ≡ 4 ( mod 11 ) {\textstyle -\beta \equiv -7\equiv 4{\pmod {11}}} . A manual check shows that, indeed, 7 2 ≡ 49 ≡ 5 ( mod 11 ) {\textstyle 7^{2}\equiv 49\equiv 5{\pmod {11}}} and 4 2 ≡ 16 ≡ 5 ( mod 11 ) {\textstyle 4^{2}\equiv 16\equiv 5{\pmod {11}}} . == Correctness proof == The algorithm finds factorization of f z ( x ) {\displaystyle f_{z}(x)} in all cases except for ones when all numbers z + λ 1 , z + λ 2 , … , z + λ n {\displaystyle z+\lambda _{1},z+\lambda _{2},\ldots ,z+\lambda _{n}} are quadratic residues or non-residues simultaneously. According to theory of cyclotomy, the probability of such an event for the case when λ 1 , … , λ n {\displaystyle \lambda _{1},\ldots ,\lambda _{n}} are all residues or non-residues simultaneously (that is, when z = 0 {\displaystyle z=0} would fail) may be estimated as 2 − k {\displaystyle 2^{-k}} where k {\displaystyle k} is the number of distinct values in λ 1 , … , λ n {\displaystyle \lambda _{1},\ldots ,\lambda _{n}} . In this way even for the worst case of k = 1 {\displaystyle k=1} and f ( x ) = ( x − λ ) n {\displaystyle f(x)=(x-\lambda )^{n}} , the probability of error may be estimated as 1 / 2 {\displaystyle 1/2} and for modular square root case error probability is at most 1 / 4 {\displaystyle 1/4} . == Complexity == Let a polynomial have degree n {\displaystyle n} . We derive the algorithm's complexity as follows: Due to the binomial theorem ( x − z ) k = ∑ i = 0 k ( k i ) ( − z ) k − i x i {\textstyle (x-z)^{k}=\sum \limits _{i=0}^{k}{\binom {k}{i}}(-z)^{k-i}x^{i}} , we may transition from f ( x ) {\displaystyle f(x)} to f ( x − z ) {\displaystyle f(x-z)} in O ( n 2 ) {\displaystyle O(n^{2})} time. Polynomial multiplication a

    Read more →
  • Information flow

    Information flow

    In discourse-based grammatical theory, information flow is any tracking of referential information by speakers. Information may be new, i.e., just introduced into the conversation; given, i.e., already active in the speakers' consciousness; or old, i.e., no longer active. The various types of activation, and how these are defined, are model-dependent. Information flow affects grammatical structures such as: Word order (topic, focus, and afterthought constructions). Active, passive, or middle voice. Choice of deixis, such as articles; "medial" deictics such as Spanish ese and Japanese sore are generally determined by the familiarity of a referent rather than by physical distance. Overtness of information, such as whether an argument of a verb is indicated by a lexical noun phrase, a pronoun, or not mentioned at all. Clefting: Splitting a single clause into two clauses, each with its own verb, e.g. ‘The chicken turtles tasted like chicken.’ becomes ‘It was the chicken turtle | that tasted like chicken.’ In this case, clefting is used to shift the focus of the sentence to the subject, the chicken turtle. Front focus: Placing at the start (front) of a sentence information that would normally occur later in the sentence, to give it extra prominence. For example, in pop culture, Yoda's speech often utilizes such syntactic construction, such as when he says 'much to learn you still have' to Luke Skywalker. End focus (or end weight): Given or familiar information followed by new information. This gives prominence to the final part of the sentences and can enable suspense to build, e.g. ‘Through the door came a gigantic wolf’.(Umer Prince)

    Read more →
  • Transaction data

    Transaction data

    Transaction data or transaction information is a category of data describing transactions. Transaction data/information gather variables generally referring to reference data or master data – e.g. dates, times, time zones, currencies. Typical transactions are: Financial transactions about orders, invoices, payments; Work transactions about plans, activity records; Logistic transactions about deliveries, storage records, travel records, etc.. == Management == Recording and storing transactions is called records management. The record of the transaction is stored in a place where the retention can be guaranteed and where data is archived or removed following a retention period. Formats of recorded transactions can be digital data in databases and spreadsheets, or handwritten texts in physical documents like former bankbooks. Transaction processing systems are application software that generate transactions and manage transaction data/information, e.g. SAP and Oracle Financials. == Data warehousing == Transaction data can be summarised in a data warehouse, which helps accessibility and analysis of the data.

    Read more →
  • Insider threat

    Insider threat

    An insider threat is a perceived threat to an organization that comes from people within the organization, such as employees, former employees, contractors or business associates, who have inside information concerning the organization's security practices, data and computer systems. The threat may involve fraud, the theft of confidential or commercially valuable information, the theft of intellectual property, or the sabotage of computer systems. == Overview == Insiders may have accounts giving them legitimate access to computer systems, with this access originally having been given to them to serve in the performance of their duties; these permissions could be abused to harm the organization. Insiders are often familiar with the organization's data and intellectual property as well as the methods that are in place to protect them. This makes it easier for the insider to circumvent any security controls of which they are aware. Physical proximity to data means that the insider does not need to hack into the organizational network through the outer perimeter by traversing firewalls; rather they are in the building already, often with direct access to the organization's internal network. Insider threats are harder to defend against than attacks from outsiders, since the insider already has legitimate access to the organization's information and assets. An insider may attempt to steal property or information for personal gain or to benefit another organization or country. The threat to the organization could also be through malicious software left running on its computer systems by former employees, a so-called logic bomb. == Research == Insider threat is an active area of research in academia and government. The CERT Coordination Center at Carnegie-Mellon University maintains the CERT Insider Threat Center, which includes a database of more than 850 cases of insider threats, including instances of fraud, theft and sabotage; the database is used for research and analysis. CERT's Insider Threat Team also maintains an informational blog to help organizations and businesses defend themselves against insider crime. The Threat Lab and Defense Personnel and Security Research Center (DOD PERSEREC) has also recently emerged as a national resource within the United States of America. The Threat Lab hosts an annual conference, the SBS Summit. They also maintain a website that contains resources from this conference. Complimenting these efforts, a companion podcast was created, Voices from the SBS Summit. In 2022, the Threat Lab created an interdisciplinary journal, Counter Insider Threat Research and Practice (CITRAP) which publishes research on insider threat detection. === Findings === In the 2022 Data Breach Investigations Report (DBIR), Verizon found that 82% of breaches involved the human element, noting that employees continue to play a leading role in cybersecurity incidents and breaches. According to the UK Information Commissioners Office, 90% of all breaches reported to them in 2019 were the result of mistakes made by end users. This was up from 61% and 87% over the previous two years. A 2018 whitepaper reported that 53% of companies surveyed had confirmed insider attacks against their organization in the previous 12 months, with 27% saying insider attacks have become more frequent. A report published in July 2012 on the insider threat in the U.S. financial sector gives some statistics on insider threat incidents: 80% of the malicious acts were committed at work during working hours; 81% of the perpetrators planned their actions beforehand; 33% of the perpetrators were described as "difficult" and 17% as being "disgruntled". The insider was identified in 74% of cases. Financial gain was a motive in 81% of cases, revenge in 23% of cases, and 27% of the people carrying out malicious acts were in financial difficulties at the time. The US Department of Defense Personnel Security Research Center published a report that describes approaches for detecting insider threats. Earlier it published ten case studies of insider attacks by information technology professionals. Cybersecurity experts believe that 38% of negligent insiders are victims of a phishing attack, whereby they receive an email that appears to come from a legitimate source such as a company. These emails normally contain malware in the form of hyperlinks. == Typologies and ontologies == Multiple classification systems and ontologies have been proposed to classify insider threats. Traditional models of insider threat identify three broad categories: Malicious insiders, which are people who take advantage of their access to inflict harm on an organization; Negligent insiders, which are people who make errors and disregard policies, which place their organizations at risk; and Infiltrators, who are external actors that obtain legitimate access credentials without authorization. == Criticisms == Insider threat research has been criticized. Critics have argued that insider threat is a poorly defined concept. Forensically investigating insider data theft is notoriously difficult, and requires novel techniques such as stochastic forensics. Data supporting insider threat is generally proprietary (i.e., encrypted data). Theoretical/conceptual models of insider threat are often based on loose interpretations of research in the behavioral and social sciences, using "deductive principles and intuitions of subject matter expert." Adopting sociotechnical approaches, researchers have also argued for the need to consider insider threat from the perspective of social systems. Jordan Schoenherr said that "surveillance requires an understanding of how sanctioning systems are framed, how employees will respond to surveillance, what workplace norms are deemed relevant, and what ‘deviance’ means, e.g., deviation for a justified organization norm or failure to conform to an organizational norm that conflicts with general social values." By treating all employees as potential insider threats, organizations might create conditions that lead to insider threats. == Sector-specific concerns == === Healthcare === The healthcare industry faces particularly acute insider threat risks due to the large number of workforce members who require access to sensitive patient records for legitimate clinical purposes. The U.S. Department of Health and Human Services has identified unauthorized access by insiders, including workforce snooping on patient records and theft of protected health information for identity fraud, as a persistent enforcement concern. The Health Insurance Portability and Accountability Act (HIPAA) Security Rule addresses insider threats through several administrative safeguards, including workforce security procedures requiring covered entities to implement policies for authorizing and supervising workforce members who work with electronic protected health information, as well as termination procedures to revoke access when employment ends (45 CFR 164.308(a)(3)). The rule also requires audit controls to record and examine information system activity (45 CFR 164.312(b)), enabling detection of unauthorized access by insiders. The December 2024 Notice of proposed rulemaking (NPRM) to overhaul the HIPAA Security Rule would strengthen insider threat defenses by mandating role-based access controls, requiring notification of relevant workforce members within 24 hours of any changes to access privileges, and requiring regular review of audit logs to detect anomalous access patterns.

    Read more →
  • Learning augmented algorithm

    Learning augmented algorithm

    A learning augmented algorithm (also called algorithm with predictions) is an algorithm that can make use of a prediction to improve its performance. Whereas in regular algorithms just the problem instance is inputted, learning augmented algorithms accept an extra parameter. This extra parameter often is a prediction of some property of the solution. This prediction is then used by the algorithm to improve its running time or the quality of its output. The most common application are online algorithms, where a prediction on the uncertain instance is provided. == Description == A learning augmented algorithm typically takes an input ( I , A ) {\displaystyle ({\mathcal {I}},{\mathcal {A}})} . Here I {\displaystyle {\mathcal {I}}} is a problem instance and A {\displaystyle {\mathcal {A}}} is the prediction. A prediction can be any object. Common are the following types: Prediction of an optimal solution. The prediction gives a solution to the problem or characterizes an optimal solution. Prediction of the input. This is mainly used for online problems. Prediction of algorithmic actions. A prediction tailored to a specific algorithm that suggests a specific algorithm execution. Learning augmented algorithms usually satisfy the following three properties: Consistency. A learning augmented algorithm is said to be consistent if the algorithm can be proven to have a good performance when it is provided with an accurate prediction. Smoothness. A learning augmented algorithm is called smooth if its performance can be bounded by a function of the quality of the prediction. Here, the quality can be measured in a problem specific way. This is also called the prediction error. Robustness. A learning augmented algorithm is called robust if its worst-case performance can be bounded even if the given prediction is inaccurate. Learning augmented algorithms generally do not prescribe how the prediction should be done. For this purpose machine learning can be used. == Applications == A few examples of problems where learning augmented algorithms have been applied are the following. === Online algorithms === The ski rental problem The weighted paging problem The set cover problem Nonclairvoyant scheduling The online bipartite matching problem === Warm starting === ==== Data structures ==== The binary search algorithm is an algorithm for finding elements of a sorted list x 1 , … , x n {\displaystyle x_{1},\ldots ,x_{n}} . It needs O ( log ⁡ ( n ) ) {\displaystyle O(\log(n))} steps to find an element with some known value y {\displaystyle y} in a list of length n {\displaystyle n} . With a prediction i {\displaystyle i} for the position of y {\displaystyle y} , the following learning augmented algorithm can be used. First, look at position i {\displaystyle i} in the list. If x i = y {\displaystyle x_{i}=y} , the element has been found. If x i < y {\displaystyle x_{i} y {\displaystyle x_{i}>y} , do the same as in the previous case, but instead consider i − 1 , i − 2 , i − 4 , … {\displaystyle i-1,i-2,i-4,\ldots } . The error is defined to be η = | i − i ∗ | {\displaystyle \eta =|i-i^{}|} , where i ∗ {\displaystyle i^{}} is the real index of y {\displaystyle y} . In the learning augmented algorithm, probing the positions i + 1 , i + 2 , i + 4 , … {\displaystyle i+1,i+2,i+4,\ldots } takes log 2 ⁡ ( η ) {\displaystyle \log _{2}(\eta )} steps. Then a binary search is performed on a list of size at most 2 η {\displaystyle 2\eta } , which takes log 2 ⁡ ( η ) {\displaystyle \log _{2}(\eta )} steps. This makes the total running time of the algorithm 2 log 2 ⁡ ( η ) {\displaystyle 2\log _{2}(\eta )} . So, when the error is small, the algorithm is faster than a normal binary search. This shows that the algorithm is consistent. Even in the worst case, the error will be at most n {\displaystyle n} . Then the algorithm takes at most O ( log ⁡ ( n ) ) {\displaystyle O(\log(n))} steps, so the algorithm is robust. ==== More examples ==== The maximum weight matching problem === Approximation algorithms === The maximum cut problem The vertex cover problem === Mechanism Design === The facility location problem

    Read more →
  • Reservoir sampling

    Reservoir sampling

    Reservoir sampling is a family of randomized algorithms for choosing a simple random sample, without replacement, of k items from a population of unknown size n in a single pass over the items. The size of the population n is not known to the algorithm and is typically too large for all n items to fit into main memory. The population is revealed to the algorithm over time, and the algorithm cannot look back at previous items. At any point, the current state of the algorithm must permit extraction of a simple random sample without replacement of size k over the part of the population seen so far. == Motivation == Suppose we see a sequence of items, one at a time. We want to keep 10 items in memory, and we want them to be selected at random from the sequence. If we know the total number of items n and can access the items arbitrarily, then the solution is easy: select 10 distinct indices i between 1 and n with equal probability, and keep the i-th elements. The problem is that we do not always know the exact n in advance. == Simple: Algorithm R == A simple and popular but slow algorithm, Algorithm R, was created by Jeffrey Vitter. Initialize an array R {\displaystyle R} indexed from 1 {\displaystyle 1} to k {\displaystyle k} , containing the first k items of the input x 1 , . . . , x k {\displaystyle x_{1},...,x_{k}} . This is the reservoir. For each new input x i {\displaystyle x_{i}} , generate a random number j uniformly in { 1 , . . . , i } {\displaystyle \{1,...,i\}} . If j ∈ { 1 , . . . , k } {\displaystyle j\in \{1,...,k\}} , then set R [ j ] := x i . {\displaystyle R[j]:=x_{i}.} Otherwise, discard x i {\displaystyle x_{i}} . Return R {\displaystyle R} after all inputs are processed. This algorithm works by induction on i ≥ k {\displaystyle i\geq k} . While conceptually simple and easy to understand, this algorithm needs to generate a random number for each item of the input, including the items that are discarded. The algorithm's asymptotic running time is thus O ( n ) {\displaystyle O(n)} . Generating this amount of randomness and the linear run time causes the algorithm to be unnecessarily slow if the input population is large. This is Algorithm R, implemented as follows: == Optimal: Algorithm L == If we generate n {\displaystyle n} random numbers u 1 , . . . , u n ∼ U [ 0 , 1 ] {\displaystyle u_{1},...,u_{n}\sim U[0,1]} independently, then the indices of the smallest k {\displaystyle k} of them is a uniform sample of the k {\displaystyle k} -subsets of { 1 , . . . , n } {\displaystyle \{1,...,n\}} . The process can be done without knowing n {\displaystyle n} : Keep the smallest k {\displaystyle k} of u 1 , . . . , u i {\displaystyle u_{1},...,u_{i}} that has been seen so far, as well as w i {\displaystyle w_{i}} , the index of the largest among them. For each new u i + 1 {\displaystyle u_{i+1}} , compare it with u w i {\displaystyle u_{w_{i}}} . If u i + 1 < u w i {\displaystyle u_{i+1} Read more →

  • Information professional

    Information professional

    The term information professional or information specialist refers to professionals responsible for the collection, documentation, organization, storage, preservation, retrieval, and dissemination of printed and digital information. The service delivered to the client is known as an information service. The term "information professional" is a versatile one, used to describe similar and sometimes overlapping professions, such as librarians, archivists, information managers, information systems specialists, information scientists, records managers, and information consultants. However, terminology differs among sources and organisations. Information professionals are employed in a variety of private, public, and academic institutions, as well as independently. == Skills == Since the term information professional is broad, the skills required for this profession are also varied. A Gartner report in 2011 pointed out that "Professional roles focused on information management will be different to that of established IT roles. An 'information professional' will not be one type of role or skill set, but will in fact have a number of specializations". Thus, an information professional can possess a variety of different skills, depending on the sector in which the person is employed. Some essential cross-sector skills are: IT skills, such as word-processing and spreadsheets, digitisation skills, and conducting Internet searches, together with skills loan systems, databases, content management systems, and specially designed programmes and packages. Customer service. An information professional should have the ability to address the information needs of customers. Language proficiency. This is essential in order to manage the information at hand and deal with customer needs. Soft skills. These include skills such as negotiating, conflict resolution, and time management. Management training. An information professional should be familiar with notions such as strategic planning and project management. Moreover, an information professional should be skilled in planning and using relevant systems, in capturing and securing information, and in accessing it to deliver service whenever the information is required. == Associations == Most countries have a professional association who oversee the professional and academic standards of librarians and other information professionals. There are also international associations related to LIS (library and information science), the most prominent of which is the International Federation of Library Associations and Institutions (IFLA). In many countries, LIS courses are accredited by the relevant professional association, as the American Library Association (ALA) in the USA, the Chartered Institute of Library and Information Professionals (CILIP) in the UK, and the Australian Library and Information Association (ALIA) in Australia. == Qualifications == Educational institutions around the world offer academic degrees, or degrees on related subjects such as Archival Studies, Information Systems, Information Management, and Records Management. Some of the institutions offering information science education refer to themselves as an iSchool, such as the CiSAP (Consortium of iSchools Asia Pacific, founded 2006) in Asia and the iSchool Caucus in the USA. There are also online e-learning resources, some of which offer certification for information professionals. === Africa === Information development in Africa started later than in other continents, mainly due to a lack of internet access, expertise and resources to manage digital infrastructure, and "opportunities for capacity development and knowledge-sharing". Nowadays, academic degrees in information studies are available at many universities of African countries, such as the University of Pretoria (South Africa), University of Nairobi (Kenya), Makerere University (Uganda), University of Botswana (Botswana), and University of Nigeria (Nigeria). === Asia === LIS-related studies are available in more than 30 Asian countries. Some examples listed by iSchools Inc. are the University of Hong Kong, University of Tsukuba, Japan, Yonsei University, South Korea, National Taiwan University and Wuhan University, China. Centre of Library and Information Management Science (CLIMS) at Tata Institute of Social Science in Mumbai, India. In Southeast Asia, the Congress of Southeast Asian Librarians (CONSAL) connects librarians and libraries in more than 10 countries with resources, networking opportunities, and support for growing library systems. === Australasia === The Australian Library and Information Association (ALIA) as of 2021 lists six schools offering undergraduate and postgraduate accredited university courses for "Librarian and Information Specialists" on their website. In New Zealand, the Open Polytechnic of New Zealand and the Victoria University of Wellington offer undergraduate and postgraduate degree courses for information professionals. === Europe === The majority of European countries have universities, colleges, or schools which offer bachelor's degrees in LIS studies. Over 40 universities offer master's degrees in LIS-related fields, and many institutions, such as the Swedish School of Library and Information Science at the University of Borås (Sweden), the University of Barcelona (Spain), Loughborough University (UK), and Aberystwyth University (Wales, UK) also offer PhD degrees. === North America === Information studies and degrees are available at numerous academic institutions throughout the U.S. and Canada. U.S. professional associations, together with their European counterparts, have undertaken many educational initiatives and pioneered many advances in the field of Information studies, such as increased interdisciplinarity and more effective delivery of distance learning. The Association for Intelligent Information Management, based in Silver Spring, Maryland, offers a qualification called Certified Information Professional (CIP), earned upon passing an examination, with certification remaining valid for three years. === South America === There are many schools and colleges in Latin America, which offer courses in Library Science, Archival Studies, and Information Studies, however these subjects are taught completely separately.

    Read more →
  • Symbaloo

    Symbaloo

    Symbaloo is a cloud-based site that allows users to organize and categorize web links in the form of buttons. Symbaloo works from a web browser and can be configured as a homepage, allowing users to create a personalized virtual desktop accessible from any device with an Internet connection. Symbaloo users, which must be previously registered, have a page with a grid of buttons that can be configured to link to a specific page. The site allows users to assign different colors to the buttons for easy visual classification. Symbaloo allows a single user to create different pages or screens with buttons. These screens called webmix are useful to separate topics and links can be shared with other users, making them public and sending the link via email. As of 2015 Symbaloo has 6 million users worldwide and mainly used as an online education resource. Symbaloo's slogan is "Start Simple".

    Read more →
  • Broadcast (parallel pattern)

    Broadcast (parallel pattern)

    Broadcast is a collective communication primitive in parallel programming to distribute programming instructions or data to nodes in a cluster. It is the reverse operation of reduction. The broadcast operation is widely used in parallel algorithms, such as matrix-vector multiplication, Gaussian elimination and shortest paths. The Message Passing Interface implements broadcast in MPI_Bcast. == Definition == A message M [ 1.. m ] {\displaystyle M[1..m]} of length m {\displaystyle m} should be distributed from one node to all other p − 1 {\displaystyle p-1} nodes. T byte {\displaystyle T_{\text{byte}}} is the time it takes to send one byte. T start {\displaystyle T_{\text{start}}} is the time it takes for a message to travel to another node, independent of its length. Therefore, the time to send a package from one node to another is t = s i z e × T byte + T start {\displaystyle t=\mathrm {size} \times T_{\text{byte}}+T_{\text{start}}} . p {\displaystyle p} is the number of nodes and the number of processors. == Binomial Tree Broadcast == With Binomial Tree Broadcast the whole message is sent at once. Each node that has already received the message sends it on further. This grows exponentially as each time step the amount of sending nodes is doubled. The algorithm is ideal for short messages but falls short with longer ones as during the time when the first transfer happens only one node is busy. Sending a message to all nodes takes log 2 ⁡ ( p ) t {\displaystyle \log _{2}(p)t} time which results in a runtime of log 2 ⁡ ( p ) ( m T byte + T start ) {\displaystyle \log _{2}(p)(mT_{\text{byte}}+T_{\text{start}})} == Linear Pipeline Broadcast == The message is split up into k {\displaystyle k} packages and sent piecewise from node n {\displaystyle n} to node n + 1 {\displaystyle n+1} . The time needed to distribute the first message piece is p t = m k T byte + T start {\textstyle pt={\frac {m}{k}}T_{\text{byte}}+T_{\text{start}}} whereby t {\displaystyle t} is the time needed to send a package from one processor to another. Sending a whole message takes ( p + k ) ( m T byte k + T start ) = ( p + k ) t = p t + k t {\displaystyle (p+k)\left({\frac {mT_{\text{byte}}}{k}}+T_{\text{start}}\right)=(p+k)t=pt+kt} . Optimal is to choose k = m ( p − 2 ) T byte T start {\displaystyle k={\sqrt {\frac {m(p-2)T_{\text{byte}}}{T_{\text{start}}}}}} resulting in a runtime of approximately m T byte + p T start + m p T start T byte {\displaystyle mT_{\text{byte}}+pT_{\text{start}}+{\sqrt {mpT_{\text{start}}T_{\text{byte}}}}} The run time is dependent on not only message length but also the number of processors that play roles. This approach shines when the length of the message is much larger than the amount of processors. == Pipelined Binary Tree Broadcast == This algorithm combines Binomial Tree Broadcast and Linear Pipeline Broadcast, which makes the algorithm work well for both short and long messages. The aim is to have as many nodes work as possible while maintaining the ability to send short messages quickly. A good approach is to use Fibonacci trees for splitting up the tree, which are a good choice as a message cannot be sent to both children at the same time. This results in a binary tree structure. We will assume in the following that communication is full-duplex. The Fibonacci tree structure has a depth of about d ≈ log Φ ⁡ ( p ) {\displaystyle d\approx \log _{\Phi }(p)} whereby Φ = 1 + 5 2 {\displaystyle \Phi ={\frac {1+{\sqrt {5}}}{2}}} the golden ratio. The resulting runtime is ( m k T byte + T start ) ( d + 2 k − 2 ) {\textstyle ({\frac {m}{k}}T_{\text{byte}}+T_{\text{start}})(d+2k-2)} . Optimal is k = n ( d − 2 ) T byte 3 T start {\displaystyle k={\sqrt {\frac {n(d-2)T_{\text{byte}}}{3T_{\text{start}}}}}} . This results in a runtime of 2 m T byte + T start log Φ ⁡ ( p ) + 2 m log Φ ⁡ ( p ) T start T byte {\displaystyle 2mT_{\text{byte}}+T_{\text{start}}\log _{\Phi }(p)+{\sqrt {2m\log _{\Phi }(p)T_{\text{start}}T_{\text{byte}}}}} . == Two Tree Broadcast (23-Broadcast) == === Definition === This algorithm aims to improve on some disadvantages of tree structure models with pipelines. Normally in tree structure models with pipelines (see above methods), leaves receive just their data and cannot contribute to send and spread data. The algorithm concurrently uses two binary trees to communicate over. Those trees will be called tree A and B. Structurally in binary trees there are relatively more leave nodes than inner nodes. Basic Idea of this algorithm is to make a leaf node of tree A be an inner node of tree B. It has also the same technical function in opposite side from B to A tree. This means, two packets are sent and received by inner nodes and leaves in different steps. === Tree construction === The number of steps needed to construct two parallel-working binary trees is dependent on the amount of processors. Like with other structures one processor can is the root node who sends messages to two trees. It is not necessary to set a root node, because it is not hard to recognize that the direction of sending messages in binary tree is normally top to bottom. There is no limitation on the number of processors to build two binary trees. Let the height of the combined tree be h = ⌈log(p + 2)⌉. Tree A and B can have a height of h − 1 {\displaystyle h-1} . Especially, if the number of processors correspond to p = 2 h − 1 {\displaystyle p=2^{h}-1} , we can make both sides trees and a root node. To construct this model efficiently and easily with a fully built tree, we can use two methods called "Shifting" and "Mirroring" to get second tree. Let assume tree A is already modeled and tree B is supposed to be constructed based on tree A. We assume that we have p {\displaystyle p} processors ordered from 0 to p − 1 {\displaystyle p-1} . ==== Shifting ==== The "Shifting" method, first copies tree A and moves every node one position to the left to get tree B. The node, which will be located on -1, becomes a child of processor p − 2 {\displaystyle p-2} . ==== Mirroring ==== "Mirroring" is ideal for an even number of processors. With this method tree B can be more easily constructed by tree A, because there are no structural transformations in order to create the new tree. In addition, a symmetric process makes this approach simple. This method can also handle an odd number of processors, in this case, we can set processor p − 1 {\displaystyle p-1} as root node for both trees. For the remaining processors "Mirroring" can be used. === Coloring === We need to find a schedule in order to make sure that no processor has to send or receive two messages from two trees in a step. The edge, is a communication connection to connect two nodes, and can be labelled as either 0 or 1 to make sure that every processor can alternate between 0 and 1-labelled edges. The edges of A and B can be colored with two colors (0 and 1) such that no processor is connected to its parent nodes in A and B using edges of the same color- no processor is connected to its children nodes in A or B using edges of the same color. In every even step the edges with 0 are activated and edges with 1 are activated in every odd step. === Time complexity === In this case the number of packet k is divided in half for each tree. Both trees are working together the total number of packets k = k / 2 + k / 2 {\displaystyle k=k/2+k/2} (upper tree + bottom tree) In each binary tree sending a message to another nodes takes 2 i {\displaystyle 2i} steps until a processor has at least a packet in step i {\displaystyle i} . Therefore, we can calculate all steps as d := log 2 ⁡ ( p + 1 ) ⇒ log 2 ⁡ ( p + 1 ) ≈ log 2 ⁡ ( p ) {\displaystyle d:=\log _{2}(p+1)\Rightarrow \log _{2}(p+1)\approx \log _{2}(p)} . The resulting run time is T ( m , p , k ) ≈ ( m k T byte + T start ) ( 2 d + k − 1 ) {\textstyle T(m,p,k)\approx ({\frac {m}{k}}T_{\text{byte}}+T_{\text{start}})(2d+k-1)} . (Optimal k = m ( 2 d − 1 ) T byte / T start {\textstyle k={\sqrt {{m(2d-1)T_{\text{byte}}}/{T_{\text{start}}}}}} ) This results in a run time of T ( m , p ) ≈ m T byte + T start ⋅ 2 log 2 ⁡ ( p ) + m ⋅ 2 log 2 ⁡ ( p ) T start T byte {\displaystyle T(m,p)\approx mT_{\text{byte}}+T_{\text{start}}\cdot 2\log _{2}(p)+{\sqrt {m\cdot 2\log _{2}(p)T_{\text{start}}T_{\text{byte}}}}} . == ESBT-Broadcasting (Edge-disjoint Spanning Binomial Trees) == In this section, another broadcasting algorithm with an underlying telephone communication model will be introduced. A Hypercube creates network system with p = 2 d ( d = 0 , 1 , 2 , 3 , . . . ) {\displaystyle p=2^{d}(d=0,1,2,3,...)} . Every node is represented by binary 0 , 1 {\displaystyle {0,1}} depending on the number of dimensions. Fundamentally ESBT(Edge-disjoint Spanning Binomial Trees) is based on hypercube graphs, pipelining( m {\displaystyle m} messages are divided by k {\displaystyle k} packets) and binomial trees. The Processor 0 d {\displaystyle 0^{d}} cyclically spreads packets to roots of ESB

    Read more →
  • ArchiMate

    ArchiMate

    ArchiMate ( AR-ki-mayt) is an open and independent enterprise architecture modeling language to support the description, analysis and visualization of architecture within and across business domains in an unambiguous way. ArchiMate is a technical standard from The Open Group and is based on concepts from the now superseded IEEE 1471 standard. It is supported by various tool vendors and consulting firms. ArchiMate is also a registered trademark of The Open Group. The Open Group has a certification program for ArchiMate users, software tools and courses. ArchiMate distinguishes itself from other languages such as Unified Modeling Language (UML) and Business Process Modeling and Notation (BPMN) by its enterprise modelling scope. Also, UML and BPMN are meant for a specific use and they are quite heavy – containing about 150 (UML) and 250 (BPMN) modeling concepts whereas ArchiMate works with just about 50 (in version 2.0). The goal of ArchiMate is to be ”as small as possible”, not to cover every edge scenario imaginable. To be easy to learn and apply, ArchiMate was intentionally restricted “to the concepts that suffice for modeling the proverbial 80% of practical cases". == Overview == ArchiMate offers a common language for describing the construction and operation of business processes, organizational structures, information flows, IT systems, and technical infrastructure. This insight helps the different stakeholders to design, assess, and communicate the consequences of decisions and changes within and between these business domains. The main concepts and relationships of the ArchiMate language can be seen as a framework, the so-called Archimate Framework: It divides the enterprise architecture into a business, application and technology layer. In each layer, three aspects are considered: active elements, an internal structure and elements that define use or communicate information. One of the objectives of the ArchiMate language is to define the relationships between concepts in different architecture domains. The concepts of this language therefore hold the middle between the detailed concepts, which are used for modeling individual domains (for example, the Unified Modeling Language (UML) for modeling software products), and Business Process Model and Notation (BPMN), which is used for business process modeling. == History == ArchiMate is partly based on the now superseded IEEE 1471 standard. It was developed in the Netherlands by a project team from the Telematica Instituut in cooperation with several Dutch partners from government, industry and academia. Among the partners were Ordina NV, Radboud Universiteit Nijmegen, the Leiden Institute for Advanced Computer Science (LIACS) and the Centrum Wiskunde & Informatica (CWI). Later, tests were performed in organizations such as ABN AMRO, the Dutch Tax and Customs Administration and the ABP. The development process lasted from July 2002 to December 2004, and took about 35 person years and approximately 4 million euros. The development was funded by the Dutch government (Dutch Tax and Customs Administration), and business partners, including ABN AMRO and the ABP Pension Fund. In 2008 the ownership and stewardship of ArchiMate was transferred to The Open Group. It is now managed by the ArchiMate Forum within The Open Group. In February 2009 The Open Group published the ArchiMate 1.0 standard as a formal technical standard. In January 2012 the ArchiMate 2.0 standard, and in 2013 the ArchiMate 2.1 standard was released. In June 2016, the Open Group released version 3.0 of the ArchiMate Specification. An update to Archimate 3.0.1 came out in August 2017. Archimate 3.1 was published 5 November 2019. The latest version of the ArchiMate Specification is version 3.2 released October 2022. Version 3.0 adds enhanced support for capability-oriented strategic modelling, new entities representing physical resources (for modelling the ingredients, equipment and transport resources used in the physical world) and a generic metamodel showing the entity types and the relationships between them. == ArchiMate framework == === Core framework === The main concepts and elements of the ArchiMate language are being presented as ArchiMate core framework. It consists of three layers and three aspects. This creates a matrix of combinations. Every layer has its passive structure, behavior and active structure aspects. ==== Layers ==== ArchiMate has a layered and service-oriented look on architectural models. The higher layers make use of services that are provided by the lower layers. Although, at an abstract level, the concepts that are used within each layer are similar, we define more concrete concepts that are specific for a certain layer. In this context, we distinguish three main layers: The business layer is about business processes, services, functions and events of business units. This layer "offers products and services to external customers, which are realized in the organization by business processes performed by business actors and roles". The application layer is about software applications that "support the components in the business with application services". The technology layer deals "with the hardware and communication infrastructure to support the application layer. This layer offers infrastructural services needed to run applications, realized by computer and communication hardware and system software". Each of these main layers can be further divided in sub-layers. For example, in the business layer, the primary business processes realising the products of a company may make use of a layer of secondary (supporting) business processes; in the application layer, the end-user applications may make use of generic services offered by supporting applications. On top of the business layer, a separate environment layer may be added, modelling the external customers that make use of the services of the organisation (although these may also be considered part of the business layer). In line with service orientation, the most important relation between layers is formed by use relations, which show how the higher layers make use of the services of lower layers. However, a second type of link is formed by realisation relations: elements in lower layers may realise comparable elements in higher layers; e.g., a ‘data object’ (application layer) may realise a ‘business object’ (business layer); or an ‘artifact’ (technology layer) may realise either a ‘data object’ or an ‘application component’ (application layer). ==== Aspects ==== Passive structure is the set of entities on which actions are conducted. In the business layer the example would be information objects, in the application layer data objects and in the technology layer, they could include physical objects. Behavior refers to the processes and functions performed by the actors. "Structural elements are assigned to behavioral elements, to show who or what displays the behavior". Active structure is the set of entities that display some behavior, e.g. business actors, devices, or application components. === Full framework === The Full ArchiMate framework is enriched by the physical layer, which was added to allow modeling of “physical equipment, materials, and distribution networks” and was not present in the previous version. The implementation and migration layer adds elements that allow architects to model a state of transition, to mark parts of the architecture that are temporary for the purpose, as the name says, of implementation and migration. Strategy layer adds three elements: resource, capability and course of action. These elements help to incorporate strategic dimension to the ArchiMate language by allowing it to depict the usage of resources and capabilities in order to achieve some strategic goals. Finally, there is a motivation aspect that allows different stakeholders to describe the motivation of specific actors or domains, which can be quite important when looking at one thing from several different angles. It adds several elements like stakeholder, value, driver, goal, meaning etc. == ArchiMate language == The ArchiMate language is formed as a top-level and is hierarchical. On the top, there is a model. A model is a collection of concepts. A concept can be either an element or a relationship. An element can be either of behavior type, structure, motivation or a so-called composite element (which means that it does not fit just one aspect of the framework, but two or more). The functionality of all concepts without a dependency on a specific layer is described by the generic metamodel. This layer-independent description of concepts is useful when trying to understand the mechanics of the Archimate language. === Concepts === ==== Elements ==== The generic elements are distributed into the same categories as the layers: Active structure elements Behavior elements Passive structure elements Motivation elements Active structure e

    Read more →