datasketch

Knowledge

Concepts

Data journalism, visualization, and open data concepts explained.

278 concepts

Plain Text Format

A basic text format containing only human-readable characters without any formatting or styling. This format prioritizes simplicity and universal compatibility across different platforms and applications. It's commonly used for configuration files, readme documents, and when maximum compatibility is needed.

Text File Format

A specification that defines how text is organized and stored in a digital file. These formats can range from simple plain text to structured formats like JSON or XML. Understanding these formats is crucial for data interchange, as they determine how information is stored and read by different applications.

AI Ethics

The study and application of moral principles in artificial intelligence development and deployment. This field addresses crucial questions about AI's impact on society, privacy, and fairness. Key considerations include algorithmic bias, transparency in AI decisions, and ensuring AI systems benefit humanity while minimizing potential harm.

Word Segmentation

The process of dividing text into individual words, particularly important for languages with complex word structures or compound words. This technique is vital in natural language processing.

Natural Language Tokenization

The process of breaking down text into smaller units called tokens, such as words or subwords. This fundamental step in natural language processing enables computers to analyze and understand human language. Picture breaking "I love coding" into ["I", "love", "coding"] - this process is essential for tasks like machine translation.

Text Processing

The manipulation and analysis of text data through computer programs. This field encompasses tasks like parsing, filtering, and transforming text content. Take parsing CSV files or extracting specific information from documents as instances where text processing proves invaluable in data analysis and automation.

Text Files

Digital documents containing written information stored in a computer-readable format. They serve as containers for plain text data, making them essential for storing logs, documentation, and code. A notable characteristic is their universal compatibility across different operating systems and applications.

ASCII Text

Text that uses the American Standard Code for Information Interchange (ASCII), a character encoding standard limited to 128 basic characters. It includes letters, numbers, and basic symbols. ASCII remains fundamental in computing, particularly in scenarios requiring universal compatibility or minimal storage space.

Unformatted Text

Raw text content without any styling, formatting, or special characters. It represents the basic form of written information, containing only standard characters and line breaks. This format is crucial for data processing and system compatibility, such as when working with log files or data exchanges between different platforms.

No-Code Platforms

Software tools that allow users to create applications without writing code, using visual interfaces and drag-and-drop elements. These platforms democratize app development, enabling non-programmers to build functional solutions.

Citizen Development

The practice of non-professional developers creating applications using no-code or low-code platforms. This approach empowers business users to build solutions for their needs without extensive programming knowledge. It accelerates digital transformation while reducing dependency on IT departments.

Rapid Application Development

A software development methodology focusing on quick prototyping and iterative updates based on user feedback. This approach prioritizes speed and flexibility over extensive planning. Popular in creating business applications where quick deployment and adaptation to changing requirements are essential.

E-Democracy

The use of digital technologies to enhance democratic processes and citizen participation. This approach modernizes civic engagement through online voting, public consultations, and digital forums. It aims to increase accessibility and transparency in democratic decision-making using technology.

Real-Time Monitoring

Continuous observation and analysis of systems, processes, or data as events occur. This immediate tracking enables quick responses to changes or issues. Essential in scenarios like network security surveillance or production line monitoring, where instant awareness is crucial.

Performance Metrics

Quantifiable measures used to evaluate success and efficiency in various processes or systems. These indicators help track progress, identify areas for improvement, and guide decision-making, such as monitoring website load times or measuring employee productivity through specific indicators.

Communication Strategy

A planned approach to sharing information and engaging with stakeholders effectively. It encompasses channels, messaging, timing, and feedback mechanisms. This framework ensures consistent, purposeful communication that aligns with organizational goals and audience needs.

Enterprise Software

Applications designed for large organizations' complex needs, offering scalability, security, and integration capabilities. These solutions support critical business operations across departments. Think of ERP systems managing entire company resources or CRM platforms handling customer relationships at scale.

Licensed Software

Programs protected by legal agreements that specify terms of use, distribution, and modification. Licenses define user rights and limitations, ensuring proper usage and protecting intellectual property. These agreements might include restrictions on copying, modification, or redistribution of the software.

Closed Source

Software whose source code is not publicly available and is protected as intellectual property. This approach helps companies maintain competitive advantages and control over their products. Users receive compiled versions only, while the underlying code remains proprietary to the developer.

Commercial Software

Software developed for sale or commercial licensing. Unlike free or open-source alternatives, it typically requires payment and includes customer support, regular updates, and warranties. Notable examples can be found in business applications like Microsoft Office or Adobe Creative Suite.

Collaborative Development

A software development approach where multiple people work together on a project, sharing knowledge and resources. It emphasizes teamwork, version control, and continuous communication. Modern tools like Git and collaborative platforms enable developers worldwide to contribute to projects simultaneously, fostering innovation and code quality.

Free Software

Programs that users can freely use, modify, and distribute without restrictions. This software philosophy promotes transparency and community collaboration. Examples include the Linux operating system or Firefox browser, where users have access to source code and freedom to modify it.

Web Applications

Software programs accessed through web browsers, eliminating the need for local installation. These applications offer cross-platform compatibility and instant updates for all users. Think of online banking platforms or web-based email clients that work across different devices.

Hosted Services

Applications and infrastructure managed by third-party providers accessible via the internet. These services eliminate the need for local hardware and maintenance while providing scalability, like email hosting services or web hosting platforms that manage website infrastructure.

Cloud Applications

Software programs that run on remote servers rather than local devices. These applications offer accessibility from anywhere with internet connection and automatic updates. Well-known instances include Google Docs for document editing or Salesforce for customer relationship management.

Data Organization

The systematic arrangement of information for efficient storage and retrieval. This process involves structuring data logically and establishing relationships between different elements. Good organization is crucial for data accessibility and analysis, like organizing customer records in a CRM system.

Data Harvesting

The systematic collection of data from multiple sources for analysis and use. This process involves gathering, organizing, and storing information efficiently. Commonly applied in market research, scientific studies, or when building comprehensive databases from diverse sources.

Content Extraction

The process of identifying and pulling specific information from various data sources. This technique focuses on retrieving relevant data while filtering out unnecessary elements. Used in scenarios like pulling product details from web pages or extracting key information from documents.

Web Scraping

The automated extraction of data from websites using software tools. This technique allows systematic collection of online information for analysis or monitoring. Common applications include price comparison, market research, and content aggregation from multiple web sources.

Text Classification

The process of categorizing text documents into predefined groups based on their content. This technique is fundamental in natural language processing, enabling automated organization of documents. Example: sorting emails into spam or non-spam, or categorizing news articles by topic.

Frequent Value

A data point or attribute that appears repeatedly in a dataset with significant frequency. Understanding these common values helps identify patterns and make informed decisions. In retail analytics, identifying frequently purchased items helps optimize inventory and marketing strategies.

Cluster Analysis

A statistical method that groups similar data points together based on shared characteristics. This technique helps identify patterns and relationships within complex datasets. For instance, it can segment customers based on purchasing behavior or group documents by topic, making it valuable for business intelligence and research.

Knowledge Discovery

The process of identifying meaningful patterns and relationships in large datasets to generate new insights. This field combines data mining with analytical techniques to uncover hidden knowledge. Used in scientific research, business intelligence, and understanding complex phenomena through data analysis.

Pattern Recognition

The ability to identify recurring structures or regularities in data. This field combines statistics and machine learning to detect meaningful patterns. Applications range from facial recognition in security systems to identifying customer behavior patterns in sales data.

Content Migration

The process of transferring digital content from one system or platform to another while preserving its integrity and structure. This involves moving various types of content like documents, images, and metadata. For instance, moving a website's content to a new content management system.

Database Migration

The process of moving database contents from one platform or format to another while maintaining data integrity. This technical operation requires careful planning to prevent data loss or corruption. Common scenarios include upgrading to newer database versions or switching between different database management systems.

System Migration

The process of transferring an entire system, including data and applications, from one environment to another. This complex operation requires careful planning and execution to minimize disruption, like moving from an old server to a new one or transitioning from on-premises to cloud infrastructure.

Landing Pages

Specialized web pages designed to convert visitors into taking specific actions, like signing up or making purchases. These pages focus on clear messaging and compelling call-to-action elements. Effective landing pages minimize distractions and guide visitors toward desired outcomes through strategic design.

Data Standardization

The process of converting data into a consistent, unified format following specific rules or conventions. This ensures compatibility and comparability across different systems and datasets. Example: standardizing date formats or units of measurement across an organization's databases.

Data Wrangling

The process of manually converting and mapping data from one format to another to make it more useful. This hands-on approach involves cleaning, structuring, and enriching raw data. Consider transforming messy spreadsheets into organized databases or converting incompatible file formats for analysis.

Data Preparation

The process of cleaning and transforming raw data into a format suitable for analysis. This crucial step ensures data quality and consistency before processing. Activities include handling missing values, removing duplicates, and standardizing formats to make data analysis-ready.

Remote Code Execution (RCE)

The ability to execute commands on a device or system from a different location over a network. While essential for remote system administration, it requires strict security controls. System administrators use this capability for maintenance, but it can pose security risks if not properly protected.

Compiler Design

The process of creating software that translates programming code into machine-executable instructions. This field combines programming languages theory with practical implementation techniques. Compiler design focuses on optimization and error detection, ensuring efficient translation of high-level code into machine language.

Coding Syntax

The set of rules that define how to write code in a programming language correctly. This includes proper formatting, punctuation, and structure of code elements. Understanding syntax is fundamental for writing valid code, like knowing where to place semicolons or how to structure function declarations.

Relational Databases

Systems that organize data into tables with rows and columns, establishing relationships between different data sets. This structure enables efficient data management and complex queries. Like a customer database linked to order history, allowing comprehensive data analysis and retrieval.

Data Manipulation Language (DML)

A set of SQL commands used to manage data within a database. These statements allow for inserting, updating, and deleting records. Essential in database management, DML commands like INSERT, UPDATE, and DELETE enable data maintenance and modifications.

API Responses

The data returned by an API after processing a request. These responses follow specific formats and include status codes indicating success or failure. They might contain requested data, error messages, or confirmation of actions performed, helping applications communicate effectively.

Data Interchange

The standardized exchange of information between different computer systems or software applications. This process requires agreed-upon formats and protocols for successful communication. Similar to data exchange but focused on technical implementation, like using JSON or XML for transferring structured data.

Raster Graphics

Digital images composed of a grid of pixels, each with specific color values. These graphics are resolution-dependent and common in digital photography. For instance, JPEG photos or PNG images used on websites, where image quality depends on pixel density.

Data Exchange

The process of sharing information between different systems, organizations, or formats. This involves converting and transmitting data while maintaining its integrity and meaning. Common in business partnerships, where companies share customer data or transaction information securely.

SOAP API

A protocol for exchanging structured information in web services using XML. This standardized approach ensures reliable messaging and transaction security. Often used in enterprise environments where formal contracts between services and strict data typing are required.

REST API

A software architectural style for building web services that use HTTP methods for data operations. This approach emphasizes simplicity, scalability, and stateless communication. Popular in modern web development, REST APIs enable easy integration between different systems using standard web protocols.

API Endpoints

Specific URLs or locations where an API can receive requests and send responses. These points of connection enable communication between different software systems. Think of them as digital doorways where applications can request specific services or data, like getting user profiles or updating account information.

Enterprise Integration

The process of connecting different business systems, applications, and data sources within an organization to work as a unified whole. This approach enables seamless information flow and process coordination, like connecting payroll systems or linking inventory management with sales platforms.

Data Synchronization

Process of keeping data consistent across multiple systems or devices. This ensures that all platforms have the most current information available. Example: keeping calendar events updated between mobile devices and computers, or maintaining consistent product information across e-commerce platforms

Data Architecture

Framework that defines how data systems are organized, integrated, and managed across an organization. This strategic design ensures efficient data flow and utilization. Data architecture provides the blueprint for how different data components connect and work together within an organization's ecosystem.

Data Modeling

Process of creating structured representations of data and its relationships to support efficient storage, retrieval, and analysis. This foundational activity shapes database design and functionality. Like creating architectural blueprints before building, data modeling defines how information will be organized and accessed within a system.

Data Storage

Infrastructure and processes for securely keeping data while ensuring its availability, integrity, and proper organization. This fundamental capability supports all data operations. Data storage can be retrieved efficiently when needed, while maintaining proper backup and recovery capabilities.

Identity Management

System for managing how users are authenticated and authorized to access specific resources while maintaining security and privacy. This framework controls digital access rights and privileges.

Data Protection

Comprehensive measures and practices implemented to safeguard data from unauthorized access, breaches, and misuse throughout its lifecycle. This crucial framework maintains data security and privacy. Data protection combines technical, physical, and administrative controls to ensure information safety.

Sensitive Information

Data that requires special handling and protection due to its private, confidential, or regulated nature. This classification demands enhanced security measures and access controls.

Collaborative Programming

Development approach where multiple programmers work together on shared code, using version control and collaboration tools to coordinate their efforts. This methodology enhances code quality and knowledge sharing. Collaborative programming combines individual expertise for better software outcomes.

Government Accountability

Process where government entities provide transparent information about their activities, decisions, and use of public resources. This transparency ensures democratic oversight and trust.

Civic Engagement

Process of actively involving citizens in public decision-making and governance through data sharing, feedback collection, and collaborative initiatives. This engagement strengthens democratic processes.

Public Data

Information that is freely accessible to everyone, typically collected or generated using public resources and made available for unrestricted use and redistribution. This open resource supports civic engagement and innovation in various sectors. Think of it as a vast digital commons where everyone can access, use, and build upon shared information, though the quality and format of available data may vary across sources.

Transparency Initiatives

Systematic efforts by organizations and governments to provide open access to information about their operations, decisions, and data. These programs promote accountability and public trust through active disclosure.

Compliance Management

Process of ensuring an organization adheres to external regulations and internal policies regarding data handling, privacy, and security requirements. This ongoing responsibility maintains regulatory alignment. Compliance management continually monitors and verifies that data practices meet all necessary requirements and standards.

Data Standards

Established specifications and protocols that define how data should be formatted, stored, and exchanged to ensure consistency and interoperability across systems. These guidelines promote data quality and compatibility. Data standards ensure that information can be reliably shared and understood across different platforms and organizations.

Data Policies

Formal guidelines and rules governing how an organization collects, stores, uses, and shares data, ensuring compliance with regulations and best practices. These directives establish clear protocols for data handling. Data policies provide the framework for responsible information handling while protecting both organization and stakeholder interests.

Data Security

Comprehensive set of measures and practices designed to protect data from unauthorized access, corruption, or theft throughout its lifecycle. This critical framework ensures information remains confidential and intact. Data security combines multiple layers of protection, including encryption, access controls, and monitoring mechanisms.

Pipeline Automation

Systematic approach to automating the flow of data through various processing stages, from collection to final storage or analysis. This streamlined process eliminates manual intervention in routine data operations. Think of it as an automated assembly line for data, where information moves seamlessly through different processing stations, with each step automatically triggering the next.

Data Loading

Process of transferring data from various sources into a target system while ensuring accuracy, completeness, and proper formatting during the transfer. This crucial operation maintains data integrity throughout the import process. Data loading requires systematic procedures to ensure information arrives intact and properly organized.

Data Transformation

Process of converting data from one format or structure to another to improve usability, compatibility, or analysis potential. This technical operation enables data integration and enhancement. Like translating languages while preserving meaning, data transformation converts information between formats while maintaining its essential value and integrity.

Interface Design

Process of creating visual and functional elements of user interaction with digital products, focusing on layout, navigation, and overall user experience. This creative discipline balances aesthetics with functionality. Interface design creates intuitive digital environments that meet user needs effectively.

User Research

Systematic investigation of user needs, behaviors, and motivations through various research methods to inform product development and improvement. This comprehensive approach ensures user-centered design.

Usability Testing

Evaluation process that measures how effectively users can accomplish specific tasks with a product or system. This testing focuses on ease of use and task completion efficiency. Usability testing evaluates how intuitively people can use a product or service.

Statistical Inference

Statistical methodology for drawing conclusions about populations based on sample data, accounting for uncertainty and variability. This approach enables broader insights from limited data. Like estimating a forest's diversity by studying selected areas, statistical inference helps understand larger patterns from smaller representative samples.

Probability Theory

Mathematical framework for understanding random phenomena and calculating the likelihood of different outcomes in uncertain situations. This foundation enables predictive modeling and risk assessment. Similar to calculating odds in games of chance, probability theory provides tools for quantifying uncertainty and making informed decisions based on likely outcomes.

Data Analysis

Systematic examination of data to uncover patterns, relationships, and meaningful insights that support decision-making and understanding. This comprehensive process transforms raw data into actionable knowledge. Picture a detective analyzing evidence from multiple angles, similarly, data analysis similarly examines information through various methods to reveal hidden insights.

Beta Testing

Pre-release testing phase where a limited group of external users evaluates a near-final product version in real-world conditions. This crucial stage identifies potential issues before public release. Beta testing provides valuable feedback while allowing time for final adjustments before full product launch.

User Testing

Structured evaluation process where real users test a product or service, providing feedback on functionality, usability, and overall experience. This direct assessment reveals practical insights about user needs. User testing similarly gathers genuine feedback about how people interact with a product.

Product Usage

Systematic tracking and analysis of how individuals interact with a product or service in real-world conditions, measuring patterns and behaviors. This ongoing assessment helps understand actual usage versus intended design.

Internal Testing

Process of validating software functionality within an organization's environment before external release. These evaluations verify system performance and reliability.

Rich Text Format

Text format that supports various styling elements like bold, italic, different fonts, and formatting while maintaining editability. This enhanced text capability enables more expressive content creation. Rich text format combines readability with visual enhancement options.

Word Processing

Computational analysis and manipulation of textual data to extract information, identify patterns, or transform content into desired formats. This fundamental capability enables text analysis and modification. Like having a sophisticated text editor that can analyze, modify, and understand written content at scale, text processing handles everything from simple formatting to complex analysis.

Statistical Range

Statistical measure that describes the difference between the largest and smallest values in a dataset, providing insight into data spread. This basic metric helps understand data variability. Statistical range similarly captures the full span of values in a dataset.

Data Spread

Process of sharing and delivering data across different systems, users, or locations while maintaining consistency and accessibility. This methodology ensures efficient data flow throughout an organization. Like a sophisticated delivery network, data distribution manages how information moves between sources and destinations, considering factors like speed and reliability.

API Integration

Technical process of connecting different software systems and services to exchange data and functionality seamlessly. This integration enables applications to communicate and work together effectively. Picture building bridges between different islands of functionality – API integration connects various software services while maintaining security and data integrity.

Software Engineering

Systematic approach to designing, developing, and maintaining software systems using established principles and methodologies. This discipline combines technical expertise with project management skills. Like architecture for digital systems, software engineering provides frameworks and practices for building reliable, scalable, and maintainable software solutions.

Web Development

Process of creating and maintaining websites and web applications, combining front-end interface design with back-end functionality. This comprehensive discipline spans multiple technologies and practices.

CSS Styling

Process of defining visual presentation of web content using Cascading Style Sheets (CSS), controlling layout, colors, typography, and responsive design.

JavaScript Frameworks

Collections of pre-written JavaScript code that provide structured foundations for building web applications efficiently. These tools offer reusable components and standardized practices for common development tasks. JavaScript frameworks accelerate development while maintaining consistency and reducing potential errors.

Web Design

Process of creating and arranging digital content for internet presentation, focusing on visual appeal, usability, and functionality. This multifaceted discipline combines aesthetics with technical requirements. Picture architecting a digital storefront – web design balances visual elements, user experience, and technical performance to create effective online presence.

Database Management

Process of organizing, storing, and managing data collections while ensuring security, accessibility, and integrity. This complex system maintains data quality and usability over time. Like a sophisticated library system, database management coordinates various aspects of data storage, retrieval, and maintenance while enforcing security and access controls.

Discrete Numbers

Numbers that represent distinct, separate values without the possibility of intermediate values between them. These values jump from one to another without continuity.

Continuous Data

Numeric data that can take any value within a defined range, representing measurements that can be infinitely divided. These values flow smoothly between points without gaps. Think of temperature readings – continuous data can include any value between points, like 98.6 degrees being between 98 and 99 degrees.

Quantitative Data

Data expressed as numbers or measurements that can be mathematically analyzed and compared. This type of data enables statistical analysis and precise comparisons. Similar to measuring ingredients for a recipe, quantitative data provides exact numerical values that can be used for calculations and objective comparisons.

Tabular Format

Information organized in rows and columns with clear labels and consistent structure, facilitating easy reading and analysis. This standardized format supports efficient data manipulation and comparison. Like organizing information in a spreadsheet, tabular format provides a systematic way to present and analyze data through consistent row and column arrangements.

Relational Data

Data structured to show connections between different information sets through defined relationships and common fields. This organization enables complex queries and data integration. Consider how a family tree shows relationships between individuals – relational data similarly maps connections between different pieces of information using shared characteristics.

Source Data

Original information collected from its initial point of creation or generation, maintaining its original context and format. This authentic data serves as a reference point for verification and analysis.

Primary Data

Data collected firsthand through direct observation, measurement, or recording, representing original source information. This foundational data forms the basis for further analysis and research.

Unprocessed Data

Original data collected directly from its source without any manipulation, cleaning, or processing applied. This raw information represents the purest form of collected data before any transformation. Raw data contains both valuable information and potential imperfections that require careful processing to reveal their full value.

Categorical Variables

Data type representing distinct groups or classifications where values belong to one of a limited number of categories. These variables organize data into mutually exclusive groups. Similar to sorting books by genre in a library, categorical variables provide a way to group and analyze data based on shared characteristics rather than numerical values.

Binary Data

Data type that represents information in two possible states, typically represented as 0/1, true/false, or yes/no. This fundamental format is crucial for computer operations and logical decisions. Like a switch that can only be on or off, binary data represents information in its simplest form with just two possible values.

Integer Values

Numeric data values that represent whole numbers without any decimal or fractional components, used in various computational and statistical applications. These discrete values support basic arithmetic operations. Like steps on a staircase, integer values represent complete units that can't be broken into smaller parts while maintaining meaning.

Countable Data

Data representing discrete quantities that can be counted in whole numbers, used for tracking occurrences or frequencies. These values can only exist as complete units without fractions. Consider inventory items in a warehouse – countable data similarly represents quantities that only make sense as whole numbers, like the number of products in stock.

Image Data

Digital representations of visual information, including photographs, diagrams, and graphics, stored as patterns of pixels or vector instructions. This complex data type requires specialized processing techniques. Like translating a painting into digital form, image data captures visual information while presenting unique challenges for storage and analysis.

Text Data

Raw information in written or typed format, including documents, comments, descriptions, and narratives. This unstructured data type requires specialized processing for analysis. Picture a library full of books – text data similarly contains rich information that needs specific techniques to extract meaningful patterns and insights.

Data Types

Classification system for organizing different forms of data based on their nature, structure, and possible operations. This framework helps determine appropriate analysis methods and storage requirements. Like organizing tools in a workshop where each type serves specific purposes, data types define how information can be stored, processed, and analyzed effectively.

Qualitative Data

Information that describes qualities or characteristics through non-numeric observations, often capturing subjective or descriptive aspects of phenomena. This type focuses on properties that can be observed but not measured numerically. Think of a wine tasting note describing flavors and aromas – qualitative data similarly captures descriptive attributes that aren't easily quantified.

Ordinal Data

Data that represents categories with a meaningful order or rank, though the intervals between values may not be uniform. This type allows comparison of relative positions but not precise differences. Similar to customer satisfaction ratings from 'very dissatisfied' to 'very satisfied', ordinal data shows which value is greater or lesser, though the exact distance between ratings isn't defined.

Nominal Data

Data type representing named categories or labels without any inherent order or ranking between values, used for classification and grouping. These values can only be compared for equality or difference, not magnitude. Like name tags at a conference where each label identifies a category but doesn't imply any hierarchical relationship between participants – red, blue, and green are equally different from each other.

Open Licenses

Legal frameworks that specify how others can use, modify, and redistribute data while protecting attribution rights. These permissions enable open collaboration and innovation. Like a social contract for information sharing, open licenses clearly define what users can do with data while ensuring proper credit is given to original sources.

Data Sharing

Practice of making data available to others for collaboration, validation, or reuse under specific terms and conditions. This approach promotes innovation and knowledge exchange across organizations. Similar to an academic library's interlibrary loan system, data sharing enables broader access to valuable information resources while maintaining appropriate controls.

Data Transparency

Principle of making data openly available and easily accessible, including clear documentation about its source, collection methods, and limitations. This practice builds trust and enables verification of findings. Like having clear windows into an organization's operations, data transparency ensures accountability and promotes understanding.

Public Datasets

Collections of information made freely available for public access and use, often provided by government agencies or research institutions. These resources support transparency and innovation. Think of it as a community library of digital information, where anyone can access and use the data for research, analysis, or development purposes.

Data-Driven Decision-Making

Strategy of using objective data analysis and insights to guide business and operational choices rather than relying solely on intuition. This approach ensures more informed and objective decision-making. Like using a compass instead of guessing direction, data-driven decisions rely on concrete evidence rather than assumptions or gut feelings.

Data Literacy

Ability to read, understand, create, and communicate data as meaningful information. This fundamental skill enables individuals to interpret and work effectively with data in various contexts. Similar to traditional literacy with text, data literacy empowers people to critically evaluate and use data-driven information in their personal and professional lives.

Spreadsheet Format

Standardized way of organizing data in rows and columns with built-in features for calculations, sorting, and analysis. This versatile format supports various data management tasks. Picture a digital ledger that not only stores information but also provides tools for manipulating and analyzing it, making it a fundamental tool for data handling.

Data Import

Process of bringing external data into a system or application while ensuring proper format conversion and data integrity. This crucial operation enables organizations to utilize data from various sources. Like a customs checkpoint for information, data import verifies and processes incoming data to ensure it meets system requirements and maintains quality standards.

Tabular Data

Data organized in rows and columns where each row represents a record and each column contains a specific type of information. This structured format facilitates easy analysis and manipulation of information.

Geographic Data

Information that describes locations, boundaries, and characteristics of places on Earth's surface, including coordinates, elevation, and spatial relationships. This specialized data type enables mapping and geographical analysis through various formats. Like a detailed atlas coming to life digitally, geographic data combines location information with attributes to support spatial understanding and decision-making.

Validation Data

Independent dataset used during model development to tune parameters and assess performance before final testing. This intermediate evaluation helps prevent overfitting and optimize model settings. Picture a dress rehearsal before a performance – validation data helps refine the model's approach before facing completely new test data.

Test Data

Separate dataset used to evaluate a model's performance on new, unseen data after the training phase is complete. This independent evaluation helps assess real-world effectiveness. Similar to a final exam that tests learned knowledge, test data verifies how well a model can apply its training to new, unfamiliar situations.

Training Data

Dataset specifically curated for teaching machine learning models, containing labeled examples that help algorithms learn patterns and relationships. This foundational data shapes how models will perform. Like textbooks for students, training data provides the examples and exercises from which artificial intelligence systems learn to make predictions.

Data Samples

Subset of a larger dataset selected for analysis, representing the entire population's characteristics. This selection process helps manage data volume while maintaining statistical significance. Consider how a food critic tastes portions of a dish – data samples similarly provide insights about the whole dataset through careful selection of representative elements.

Data Collection

Systematic process of gathering and measuring information from various sources to get a complete and accurate picture of an area of interest. This methodology ensures comprehensive data capture for analysis. Like a field researcher carefully gathering specimens, data collection follows specific protocols to ensure the quality and relevance of gathered information.

Cloud Storage

Service that enables data storage and access through the internet rather than local computer storage. This technology allows flexible access to information from any location with internet connectivity. Picture a virtual storage facility that expands or contracts based on your needs, accessible from anywhere, though proper security measures are essential for data protection.

Platform as a Service

Cloud service model that provides a complete development and deployment environment in the cloud, including infrastructure, development tools, and database management systems. This platform enables developers to focus on application development. Like having a fully equipped workshop where all tools and materials are provided, PaaS streamlines the development process.

Infrastructure as a Service

Cloud computing service that provides virtualized computing resources over the internet, including servers, storage, and networking components. This model allows organizations to scale infrastructure without physical hardware investments. Similar to renting fully equipped office space, IaaS provides the foundation needed to run applications and services without managing physical infrastructure.

Version Control

System that tracks and manages changes to documents, code, or other digital assets over time, maintaining a history of modifications and enabling collaboration. This approach allows teams to work simultaneously while preserving previous versions. Think of it as a time machine for your work, recording every change and allowing you to revisit or restore any previous state when needed.

Publishing Workflow

Systematic process that governs how content moves from creation through review, approval, and final publication across different channels. This workflow ensures quality control and consistency in content delivery. Like an assembly line for information, publishing flow coordinates multiple stakeholders and steps, ensuring content meets standards before reaching its intended audience.

Content Management

System for organizing, storing, and publishing digital content while maintaining version control and workflow management. This platform streamlines content creation and distribution processes across channels. Picture an intelligent filing system that not only stores content but also manages its lifecycle, from creation through publication and archiving.

Data Repository

Centralized storage facility for collecting, managing, and sharing datasets within an organization or community. This infrastructure supports data preservation and reuse while maintaining security and access controls. Like a modern digital archive, it provides secure storage while enabling controlled access to valuable information assets for authorized users.

Metadata Registry

Centralized system for documenting and managing information about datasets, including their origin, format, and usage rights. This catalog ensures proper data governance and accessibility. Similar to a detailed inventory system for a vast warehouse, metadata registry tracks essential information about data assets, making them discoverable and usable.

Data Portal Software

Specialized software system that manages and provides access to structured data collections through a web-based interface. This tool facilitates data discovery, sharing, and integration across organizations. Picture a sophisticated digital gateway that connects users with data resources, offering features for search, visualization, and download capabilities.

Open Data Platform

Digital infrastructure that provides public access to datasets from various sources, promoting transparency and collaboration. This system enables users to discover, access, and utilize open data resources for research and innovation. Like a public library for digital information, it democratizes access to valuable data while maintaining quality and usability standards.

Exploratory Analysis

Systematic investigation of datasets to uncover patterns, relationships, and anomalies without preconceived hypotheses. This initial phase of data analysis helps shape further research directions and insights. Consider how a detective examines evidence from multiple angles – exploratory analysis similarly investigates data from various perspectives to discover meaningful patterns.

Statistical Analysis

Mathematical approach to collecting, analyzing, interpreting, and presenting data to identify patterns and trends. This methodology uses probability theory and mathematical models to draw conclusions from datasets. Think of it as a powerful magnifying glass that reveals hidden patterns and relationships in data, helping organizations make informed decisions based on numerical evidence.

Data Encryption

Process of converting data into encoded formats to protect it from unauthorized access and ensure confidentiality during storage or transmission. This security measure uses complex algorithms to transform readable information into seemingly random characters. Similar to how ancient civilizations used secret codes, data encryption provides a sophisticated shield against modern digital threats.

Natural Language Understanding

Advanced computational capability that enables machines to interpret and understand human language in its natural form, considering context, nuances, and variations in expression. This technology combines linguistics, machine learning, and pattern recognition to process text and speech.

Conversational AI

Technology that enables natural language interactions between humans and computers through text or speech. These systems use advanced language processing to understand and respond to user inputs naturally. Consider having a knowledgeable assistant who can understand and respond to questions – conversational AI similarly aims to provide helpful, context-aware responses through natural dialogue.

Data Dictionary

Comprehensive reference that defines and describes all data elements within a system or organization. This centralized repository ensures consistent understanding and usage of data across different teams. Similar to a detailed glossary in a technical manual, a data dictionary provides clear definitions and context for every piece of information in the system.

Metadata Management

Systematic approach to organizing, storing, and maintaining information about data assets within an organization. This framework helps track data definitions, relationships, and usage patterns. Like a library's catalog system for books, metadata management provides essential context and organization for data resources, making them easier to find and use effectively.

Breakpoints

Specific points in screen width where a website's layout changes to provide optimal viewing experience. These crucial transitions ensure content remains accessible and visually appealing across devices. Like transition points in a shape-shifting object, breakpoints determine when and how a design transforms to better suit different screen sizes.

Fluid Layouts

Design technique that uses relative units and flexible grids to create layouts that smoothly adjust to different screen sizes. This approach ensures content remains readable and well-organized regardless of viewport dimensions. Picture a water container – just as liquid adapts to its container's shape, fluid layouts naturally adjust to fill available space effectively.

Adaptive Design

Design approach that automatically adjusts layout and functionality based on device capabilities and screen sizes. This methodology ensures optimal user experience across different platforms and devices. Adaptive design modifies content presentation to suit various viewing contexts and user needs.

Data Accuracy

Measure of how correctly data represents the real-world values or concepts it's meant to describe. This fundamental quality metric ensures reliable analysis and decision-making based on stored information. Consider a precision measuring tool – data accuracy similarly determines how closely stored values match their actual real-world counterparts, though achieving perfection requires ongoing maintenance.

Data Validation

Process of verifying data accuracy, completeness, and consistency before it enters a system or database. This crucial step ensures data integrity and reliability throughout its lifecycle. Similar to quality control in manufacturing, data validation applies specific rules and checks to confirm that information meets predefined standards before being accepted into the system.

On-Premises Software

Software that is installed and runs on computers within an organization's physical premises rather than in the cloud. This traditional deployment model gives organizations complete control over their data and infrastructure. Think of it as owning your own power generator instead of relying on the grid – while requiring more maintenance, it offers greater control and customization options.

Cassandra

Distributed database management system designed for handling large amounts of data across multiple servers, ensuring high availability and fault tolerance. This NoSQL solution excels at managing massive datasets with no single point of failure. Like a network of interconnected storage facilities working together seamlessly, Cassandra provides reliable data storage and retrieval even when some nodes experience issues.

MongoDB

Popular NoSQL database that stores data in flexible, JSON-like documents rather than traditional table-based rows and columns. This approach allows for dynamic schema changes and easier handling of complex data structures. Consider how a filing cabinet can hold various types of documents – MongoDB similarly accommodates different data formats while maintaining quick access and scalability.

Graph Database

Database model that represents data through nodes and edges, showing relationships between different elements. This structure is ideal for analyzing complex networks and interconnected data. Much like a social network map showing connections between people, graph databases excel at managing and querying highly connected data structures with multiple relationships.

Key Value Store

Database system that stores data as pairs of keys and values, offering quick data retrieval and flexible schema design. This approach is particularly effective for applications requiring fast access to simple data structures. Picture a library card catalog where each card (key) points directly to a book's location (value) – key-value stores provide similarly rapid access to stored information.

Automated Reporting

System that generates and distributes reports automatically based on predefined schedules and triggers, eliminating manual data compilation. This streamlines information sharing and decision-making processes across organizations. Like having a personal assistant who regularly prepares and delivers important updates, automated reporting ensures timely access to crucial business insights.

Workflow Automation

Process of converting manual, repetitive tasks into automated sequences using software tools and predefined rules. This approach increases efficiency and reduces human error in business processes. Think of it as creating a digital assembly line where routine tasks are handled automatically, allowing teams to focus on more strategic work while ensuring consistency in operations.

User Flow

Step-by-step path that users take to complete specific tasks or achieve goals on a website or application. This sequence of actions and decisions helps optimize user experience and conversion rates. Similar to mapping out a customer's journey through a store, user flow analysis reveals how people interact with digital interfaces and where they might encounter obstacles.

Navigation Design

Strategic planning and implementation of website navigation elements to create intuitive user experiences. This includes designing menus, links, and pathways that help users move efficiently through a site. Much like well-designed road signs guide travelers to their destination, effective navigation design helps website visitors find their way through digital content without confusion.

Content Hierarchy

Organizational structure that arranges website content based on importance and relationships between elements. This framework guides users through information levels, from general to specific content. Consider how a book is organized with chapters, sections, and subsections – content hierarchy similarly creates a clear path through digital information, ensuring users can find what they need efficiently.

Site Mapping

Process of documenting and organizing all pages and content elements within a website into a structured hierarchy. This comprehensive overview helps plan website architecture and improve user navigation. Like creating a detailed blueprint of a building, site mapping outlines how different pages connect and relate to each other, ensuring logical organization and easy access to information.

Labeled Data

Data that has been tagged with descriptive labels or annotations, making it suitable for training supervised machine learning models and validating their performance. This prepared dataset acts as a foundation for teaching algorithms to recognize patterns and make predictions. Picture a collection of photos where each image is marked with descriptions – labeled data similarly provides clear examples for machines to learn from, though preparation can be time-consuming.

Regression Models

Predict numeric outcomes based on relationships between variables. Example: Predicting sales.

Cross-Validation

Validation technique that assesses how well a model will generalize to new, unseen data by testing it on different data subsets. This method helps prevent overfitting and ensures reliable model performance. Like testing a recipe with different ingredients to ensure it works consistently, cross-validation evaluates model reliability across various data combinations.

Model Training

Process of teaching machine learning algorithms to recognize patterns and make decisions using sample data. This crucial phase involves feeding the model examples to help it learn and improve its performance. Consider how a student learns through practice problems – model training similarly involves presenting various scenarios to help the algorithm develop accurate prediction capabilities.

Neural Networks

Computing systems inspired by biological brain structures, designed to recognize patterns and learn from examples. These interconnected nodes process information in layers, enabling complex problem-solving and pattern recognition. Much like how human brains learn from experience, neural networks adapt and improve their performance through exposure to data, making them powerful tools for AI applications.

Conversion Rate

Performance metric measuring the percentage of website visitors who complete desired actions, such as purchases or sign-ups. This crucial indicator helps evaluate marketing effectiveness and user experience success. Like calculating the success rate of sales pitches, conversion rate shows how effectively your website turns visitors into customers or achieves other business goals.

Page Views

Metric that counts the total number of web pages viewed by visitors during their sessions, including repeated views of single pages. This measurement helps understand user engagement depth and content popularity. Similar to tracking how many rooms a museum visitor explores, page views indicate how extensively users interact with your website's content and navigation patterns.

Session Duration

Time measurement tracking how long users remain actively engaged during a single visit to a website. This metric helps evaluate content engagement and user interest levels. Think of it as a stopwatch that begins when visitors enter your site and stops when they leave or become inactive, providing insights into how compelling your content is and how well it holds attention.

Bounce Rate

Metric indicating the percentage of visitors who leave a website after viewing only one page without further interaction. This important indicator helps evaluate website effectiveness and user engagement. Picture a store where customers immediately turn around and leave – a high bounce rate similarly suggests potential issues with content relevance, design, or user experience that need addressing.

User Tracking

Process of monitoring and collecting data about website visitors' interactions and behaviors online. This includes tracking page visits, clicks, scroll depth, and other engagement metrics to understand user preferences and patterns. Imagine a digital footprint tracker that records every step a visitor takes on your website, providing valuable insights into their journey and interests.

Google Analytics

Web analytics service that provides comprehensive insights into website traffic and user behavior. This powerful tool tracks various metrics, helping businesses understand their online audience and optimize their digital presence. Like having an all-seeing digital observer, it monitors everything from visitor demographics to interaction patterns, enabling data-driven decisions about website improvements.

Data Mining Techniques

Systematic processes and methodologies used to discover meaningful patterns and insights from large datasets. These techniques combine statistics, machine learning, and database management to extract valuable knowledge. Consider how archaeologists carefully uncover and analyze artifacts – data mining similarly reveals hidden patterns in data, using tools like clustering, classification, and pattern recognition to uncover business-critical insights.

Statistical Modeling

Process of creating mathematical representations of real-world phenomena to understand patterns and relationships in data. These models simplify complex systems while maintaining essential characteristics for analysis and prediction. Much like architectural models represent buildings, statistical models capture key relationships in data to guide decision-making and understanding.

Time Series Analysis

Specialized analytical approach focusing on data points collected sequentially over time, helping identify patterns, cycles, and trends. This methodology is crucial for understanding temporal patterns and making time-based predictions. Consider retail sales data – time series analysis reveals seasonal patterns, helping businesses anticipate demand fluctuations throughout the year.

Regression Analysis

Statistical method that examines relationships between dependent and independent variables to understand how changes in one affect others. This analysis helps in prediction and understanding cause-effect relationships in various fields. Take housing prices – regression analysis might show how factors like location, size, and age influence a home's value, enabling more accurate price predictions.

Forecasting Models

Mathematical and statistical techniques used to predict future trends based on historical data patterns. These models incorporate various factors and relationships to generate insights about potential future outcomes. For instance, weather forecasting models analyze atmospheric conditions and historical patterns to predict upcoming weather, though all predictions carry some degree of uncertainty.

Data Lake

Centralized repository that stores vast amounts of raw data in its native format until needed. This approach allows organizations to maintain a single source of truth while preserving data flexibility for various analytical purposes. Picture a vast warehouse where items are stored in their original packaging until specific needs arise – a data lake operates on a similar principle but for digital information.

Data Stewardship

Comprehensive management of data assets throughout their lifecycle, ensuring quality, accessibility, and security. This role combines technical expertise with organizational strategy to maintain data integrity and compliance with regulations. Similar to how a museum curator preserves and manages valuable artifacts, data stewards safeguard and optimize an organization's data resources.

Search Algorithms

Methods and techniques used to efficiently locate specific information within large datasets or structures. From basic linear searches to sophisticated binary and hash-based approaches, these algorithms form the backbone of modern information retrieval systems. Much like a well-organized library catalog helps locate books quickly, search algorithms provide systematic ways to find data effectively.

Algorithmic Bias

Systematic errors in algorithms that result in unfair or discriminatory outcomes for certain groups based on characteristics like race, gender, or age. These biases often stem from historical data used to train systems, reflecting and potentially amplifying existing societal prejudices. Picture a biased referee in sports – if their past decisions show favoritism, any automated system learning from those calls would perpetuate similar patterns.

Recursive Functions

Programming concept where a function calls itself to solve complex problems by breaking them into smaller, identical tasks. This approach is particularly powerful for tasks that have a repetitive or nested structure, like calculating factorials or traversing tree data structures. Consider how Russian nesting dolls contain smaller versions of themselves – recursive functions work in a comparable way, handling each "layer" until reaching a base condition.

Web Standards

Comprehensive guidelines and technical specifications that ensure consistency and compatibility across the World Wide Web. These standards, maintained by organizations like W3C, cover everything from HTML structure to accessibility requirements, making websites work seamlessly across different browsers and devices. Think of them as the universal language that allows web technologies to communicate effectively, similar to how traffic rules keep roads safe and organized for everyone.

XLSX

A file with the XLSX extension is an XML-formatted spreadsheet file opened exclusively by Microsoft Excel. It is a ZIP-compressed, XML-based file created by Microsoft Excel version 2007 and later.

XLS

An XLS file is a spreadsheet file created by Microsoft Excel or exported by another spreadsheet program. It contains one or more spreadsheets, which store and display data in table format. XLS files can also store mathematical functions, charts, styles and formatting.

WEBP

WebP is a format developed by Google. It is based on the VP8 video codec and offers rich, high-quality images in a smaller size than PNG or JPEG.

Geospatial Visualization

Combination of maps and data to analyze phenomena with a spatial dimension.

Standard Deviation

It is the measure of the amount of variation or dispersion that a data set has. It shows, on average, how far each value is from the mean.

TXT

A TXT file is a standard text document containing plain or unformatted text. It can be opened and edited in any editing or word processing program.

Algorithm Transparency

A principle that ensures the clear explanation of how algorithms function, decisions are made, and data is used.

Tokenization

The process of replacing sensitive data with unique identifiers or tokens that have no value outside of their context. It protects information such as credit card numbers or personal data.

Plain Text

In computing, plain text is referred to as text that has no formatting whatsoever.

Low Code

A software development method that uses visual tools and pre-built components to create applications with minimal coding. It accelerates development and reduces costs.

Civic Technology

Digital technologies and tools that enhance citizen participation, improve public services, and promote transparency. For example, platforms to track public budgets or report issues in urban transportation.

Dashboard

An interactive visual interface that centralizes and presents key data through charts, maps, and dynamic tables, allowing users to filter, explore, and analyze information.

Storytelling

A technique to communicate data findings clearly and persuasively by combining visualizations, narratives, and context. It helps transform complex data into understandable and actionable messages.

Proprietary Software

Proprietary software (also known as closed source software) is not free to use and is protected by intellectual property rights, such as copyrights. Unlike open source software, proprietary software does not allow end users to view or modify the source code.

Open Source Software

Open source software is free and not restricted by copyright. Its code can be viewed and modified by anyone with programming knowledge.

Software As A Service (SAAS)

Sometimes organizations choose to subscribe to software services instead of installing expensive and complex software. These services are usually available online and are not tied to a specific device. An added advantage of SaaS is that you only pay for what you consume, as if the software were a service fee.

Data Management System

Data management systems enable organizations to maximize the information they can get from their data. They can collate all their data, collect data from sources outside their organization, query the data in their database to identify or discover new information.

Data Scraping

Data scraping is a way of getting data from a website into a local file on the computer, such as a spreadsheet.

Optical Character Recognition (OCR)

A technology that converts printed or handwritten text into digitized, editable text. It is useful for digitizing documents, extracting information, and automating processes. Example: scanning invoices and converting them into text for accounting analysis.

Proportion

Ratio, in general, refers to a part, share or number considered in comparative relation to a whole. If two given sets of numbers increase or decrease in the same ratio, then the ratios are said to be directly proportional to each other. 2/4 is proportional to 4/8 and 1/2.

Natural Language Processing (NLP)

A branch of artificial intelligence that enables machines to understand, interpret, and generate human language. It is used in tasks such as machine translation and chatbots.

Open Data Portal

They are online user interfaces that allow users to access open data collections. Two of the most common types of organizations that publish data through open data portals are governments and research organizations.

PNG

The Portable Network Graphics (PNG) extension is a raster graphics file format that performs lossless compression of the image. It was designed as an enhancement to the Graphics Interchange Format (GIF) and is not patented.

PDF

A file with the .pdf extension is a Portable Document Format (PDF) file. It was created with two goals in mind, that people could open the documents on any hardware or operating system, without needing the application used to create them, and that the layout of the document would be retained when opened.

Large Language Models (LLM)

Artificial intelligence algorithms trained on large volumes of text to understand and generate human language coherently. They are useful in tasks such as writing, translation, and text analysis.

Mode

It is the value that appears most frequently in a set or group of values. It is possible for a group of values to have two or more modes, or for there to be none. It is a very clear sample, and the values presented can be either quantitative or qualitative.

Data Mining

The goal of the data mining process is to extract as much value and "usable" information as possible from the raw data.

Data Migration

The process of transferring data from one system or format to another, ensuring its integrity and compatibility. It is key in technological upgrades or platform changes.

Microsite

A webpage or set of pages dedicated to a specific topic, campaign, or project, separate from the main website. It can be external, such as a promotional site, or internal, designed for communication and resources within an organization, such as training or internal updates.

Small Data

Simple data, easy for humans to process and understand. They are relevant for solving specific or local problems, such as purchase records in a small store.

Metadata

Metadata is data about data. For one piece of data, there is usually a lot of other metadata, that is, pieces of information that describe that data. A good example is a document on your computer. The document itself is the data, and information such as the time and date of creation, file size, and storage location are the metadata.

Median

Median is the middle value of the given list of data, when arranged in an order.

Mean

The result obtained by adding two or more quantities and dividing the total by the number of quantities. Some characteristics of the mean are that it considers all values, the numerator in the formula is the total number of values, and when there are extreme values, it may not provide an accurate representation of the sample.

Data Cleaning

The process of preparing data for proper use and analysis. It includes tasks such as correcting missing values, adjusting formats, and verifying consistency, ensuring reliable and accurate data.

Programming Language

Programming languages allow humans to interact with computers in terms that both parties can understand and interpret. Some of the most common ones are Python, JavaScript, C#, C++ and C.

Structured Query Language (SQL)

SQL programming language allows you to discover, edit or delete data found in a relational database management system.

JSON

JavaScript Object Notation (JSON) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute-value pairs and arrays.

JPEG

The .jpeg or .jpg file extensions are used in image files compressed to the Joint Photographic Experts Group (JPEG) standard. They support up to 24-bit color and use lossy compression, which can significantly reduce image quality if high amounts are applied.

Interoperability

Ability of digital systems and services to exchange, understand, and use data in a fluid and standardized way. It is a fundamental principle in technological development that allows different platforms, applications, and databases to communicate efficiently with each other, regardless of their architecture or provider.

Internet Of Things (IoT)

More and more household devices are connecting to the Internet, giving rise to the concept of the Internet of Things (IoT): the integration of everyday objects with the network. In homes, this technology aims to make users' lives easier, while on a larger scale, it drives the development of smart cities by optimizing urban services.

User Interface

The user interface is the part of a software that allows the user to interact directly with it. A website or a microsite are common user interfaces that may contain numerous interactive elements, drop-down menus or progress bars.

Application Programming Interface (API)

An API can be thought of as a messenger. It goes back and forth between two applications, receiving a request and returning a response.

Artificial Intelligence (AI)

Technology that simulates human capabilities such as learning, reasoning, or perception, used in applications like chatbots, virtual assistants, or recommendation systems.

Data Integration

Data integration is an aspect of data management that focuses on bringing together data from many different sources. Integrating data properly minimizes the margin of error in all data-driven decisions an organization makes.

Insight

Valuable knowledge or conclusion derived from the analysis of data or information. It helps understand trends, solve problems, or make strategic decisions.

Data Engineering

A data engineer creates structures to host and connect data. In order for a data scientist to analyze large data sets, he or she first needs a data engineer to build the mechanisms necessary to collect and process this data.

Data Infrastructure

Systems and resources required to collect, store, process, and share data.

Personally Identifiable Information (Pii)

PII is the name given to any information related to a specific individual. It can be very basic information, such as a name or number, or very sensitive information, such as bank details or medical records. Because PII can say something about private individuals, it is often regulated by data protection legislation.

Hackathon

It is a meeting between developers, data scientists and other related profiles, in which they work intensively on a specific project for a specific period of time. They can arise to devise solutions for specific projects of the organization or even to solve global problems.

Open Government

Open government is a governance model that recognizes that citizens have the right to access government documents and procedures. The concept has a broad scope, but is often linked to the ideas of access to information, participation, accountability, innovation, government coordination, integrity, civic engagement, budget transparency and anti-corruption.

Data Governance

A set of standards, processes, and institutional arrangements that manage the responsible use of data to maximize its value without compromising rights. It involves coordination among entities, adoption of common standards, and consultation with stakeholders, balancing openness, protection, innovation, and regulation.

Data Management

Data management describes the process of collecting, cataloging and processing data within an organization to achieve a certain outcome. More and more organizations are adopting "data management systems" to simplify data processes. These data management systems aim to make data management an everyday activity for non-technical staff.

Extract, Transform And Load (Etl)

It is the process by which data is taken from one source and moved into a larger container - or database - with lots of other data. Its name describes the process: data is taken ("extracted") from a source, converted ("transformed") into a uniform format and placed ("loaded") into a larger store. This process seeks to facilitate the manipulation of the data and its storage in a logical way, in order to facilitate its use.

User Experience (Ux)

User Experience (UX) aims to understand how users react to and feel about specific digital products, such as websites or applications. UX designers, using user-centered methodologies, create interfaces designed to maximize interaction and encourage user engagement.

Statistics

It is a numerical value that has been calculated that characterizes some aspect of a sample data set. It usually serves to estimate the true value of a corresponding parameter in an underlying population.

Embedding

To integrate or embed content, such as videos, graphics, or applications, within a webpage or another platform. It allows viewing external resources without leaving the current environment.

DOCX

A .docx file is a document file in Microsoft Word's open XML format. They are smaller and easier to support than .doc files because the format is XML-based and all content is stored as separate files, and eventually compacted into a single ZIP compressed file.

Dispersion

In statistics, it is a means of describing the degree of distribution of data around a central value or point. It helps to understand the distribution of data. A smaller spread indicates greater accuracy in manufacturing process or data measurements, while a larger spread means less accuracy.

Full-Stack Developer

A full-stack developer can work on both back-end and front-end development, so they have an overview of all aspects of building a website or software.

Front-End Developer

Some programmers specialize in creating the "front-end" or graphical interface of a website or software that users interact with. While back-end developers focus on building the components and functions that make a website or software work, front-end developers build the applications that allow users to access these components.

Back-End Developer

Back-end programmers specialize in the "behind the scenes" of a website or software. They deal with how things work on the inside, they create the components that the user accesses through the front-end application.

Data Democratization

Sometimes, those who produce and manage data or information for public consumption are people in positions of power who decide what information to show. Increasingly, however, anyone can create, use and make decisions from data, which is known as data democratization.

Numerical Data

Numerical data is a type of data expressed in numbers. It is sometimes referred to as quantitative data and is differentiated from other types of data in the form of numbers by its ability to perform arithmetic operations on these numbers.

Structured Data

In most cases, structured data are quantitative data. They are easy to organize in spreadsheets, relational databases and to visualize. Examples include names, order numbers and geolocation. It is easier to generate information from structured data than from unstructured data.

Raw Data

Raw data is data that has not been processed or transformed in any way. It is data that has been taken directly from the source.

Discrete Data

Discrete data is a type of quantitative data that includes numbers and statistics from individual, non-divisible data points that can be counted. Discrete data points are usually written as numbers that represent exact values, and discrete data usually represent single events that have already occurred.

Unstructured Data

Unstructured data is usually qualitative data and is often stored in NoSQL databases. They are useful, but sometimes not so practical for analysis and information generation purposes, as they cannot be visualized well in analysis tools such as graphs and tables. Some data of this type are video, audio or satellite images.

Categorical Data

Categorical data is data that can be divided into groups or categories.

Open-Data

Open data is data that can be freely accessed, distributed and copied by anyone. It is "public" data and, as such, is not protected by intellectual property rights.

Splines

A spline is a mathematical function that generates a smooth and continuous curve from data points, fitting them precisely. It is used in computer graphics, geometric modeling, and data analysis to connect points with seamless transitions. It enables the creation of complex shapes with control over the curve while minimizing jumps between segments. Tools like PowerPoint or Illustrator make it easy to visualize with "curve line" options.

Data Culture

It can be understood from two perspectives. The first has to do with an organization's ability to use data to make data-driven decisions. The second is about the linkage and connections between culture, art and data. In either case, it speaks to the appropriation of data for conscious action.

CSV

CSV stands for "comma separated values". A CSV file, as the name implies, divides data with commas. This makes it easy to export them to tables such as spreadsheets, as the comma delimitation of the data gives them their own "field".

Choropleth

A symbol or area marked and delimited on a map that shows the distribution of some property.

Database Query

When used in relation to databases, the word "query" has a very similar meaning to "search". If someone performs a query on a database, it means that they have searched for a specific piece of data or set of data.

Data Set

They are collections of data with a shared theme. Frequently, people look for data sets for their research. They are practical because when analyzed as a whole, they provide the full context of a problem.

Cloud Computing

Technology that enables access to storage, processing, and applications over the internet without relying on local infrastructure.

CMS

A CMS is software used to manage the content of a website.

CKAN

An open-source platform for efficiently managing and publishing open data. It facilitates the organization, access, and visualization of datasets, promoting transparency and data sharing.

Data Science

It can be described as the study of data. Data scientists conduct experiments and research, committed to solving problems and finding answers. They piece together different data, dissect patterns, look for anomalies, generate charts and graphs, explore machine learning and artificial intelligence. If data mining is about extracting value, data science is about generating value.

Cybersecurity

A set of practices and technologies designed to protect systems, networks, and data from attacks or unauthorized access.

Chatbot

An interface that uses natural language to automate tasks, answer questions, or explore complex data.

Data Catalog

A data catalog is a metadata management tool that companies use to inventory and organize data in their systems. Typical benefits include improvements in data discovery, governance and information access.

Responsiveness

The ability of a system or design to quickly adapt to user needs, such as automatically adjusting to the screen size.

Data Quality

High quality data is accurate, complete, consistent and valid. The better its quality, the greater the likelihood of obtaining valuable information from it.

Big Data

Big data is a field that works on solutions for collecting, transporting and storing very large amounts of data.

NoSQL Database

Some unstructured data, such as full documents, texts or videos, are often stored in NoSQL databases. These are databases designed with different types of data in mind. A NoSQL database is non-relational, which means that it stores data that cannot be displayed in table form.

Data Automation

The process of using tools and technologies to collect, process, and analyze data without manual intervention. It improves efficiency and reduces errors, such as automating daily sales report updates.

Information Architecture

The organization and logical structuring of content and data to facilitate access and understanding. It is applied in web design, applications, and complex systems, achieving that clear menus and well-defined categories enhance the user experience.

Supervised Learning

A machine learning technique in which a model is trained with labeled data to make predictions or classifications. For example, identifying emails as "spam" or "not spam" based on previously classified examples.

Deep Learning

A subfield of artificial intelligence (AI) and machine learning (ML) that uses artificial neural networks inspired by the structure of the human brain to process large amounts of data and solve complex problems. It is widely used in various applications due to its ability to learn hierarchical representations of data.

Machine Learning

Machine learning is a type of artificial intelligence whereby machines or computer algorithms that "learn" to improve through experience.

Web Analytics

The process of collecting, measuring, and analyzing data on user behavior on a website. It helps optimize user experiences, identify trends, and make strategic decisions.

Predictive Analytics

A technique that uses historical data, algorithms, and statistical models to forecast future events or trends. It helps in making anticipatory decisions.

Data Warehouse

It is the place where raw data is stored before being processed into more manageable formats.

Algorithm

An algorithm can be understood as a set of instructions or a kind of recipe for carrying out a process. When applied to computers, an algorithm tells a computer how to perform a certain task. So, every time you give it the command to do something, the computer will know what steps to follow.

Accessibility

Ensuring that all individuals, regardless of their abilities or contexts, can interact with data and analysis tools. This involves creating inclusive interfaces, understandable formats, and open data.