aggiornato manuali, spostato file di configurazione e profili

2025-12-12 10:14:37 +01:00 · 2025-12-12 10:14:37 +01:00 · 9538919374
commit 9538919374
parent 79ed9c1d72
7 changed files with 1044 additions and 34 deletions
--- a/doc/English-manual.md
+++ b/doc/English-manual.md
@ -184,4 +184,402 @@ src/module/b_copy.py                    118     8     10     5     2.4    76   -
 ---
-If you want, I can add a short step-by-step example that shows how to create a baseline, run duplicates, and export a CSV + UCC-style report from the GUI and from the CLI. Would you like a full worked example with sample files and commands?
+## 9. Duplicate Detection: Algorithms and Technical Details
 This section provides a deeper understanding of how PyUCC identifies duplicate code, what the algorithms do, and how to interpret the results.
 ### 9.1 Exact Duplicate Detection
 **How it works:**
 - PyUCC normalizes each file (strips leading/trailing whitespace from each line, converts to lowercase optionally).
 - Computes a SHA1 hash of the normalized content.
 - Files with identical hashes are considered exact duplicates.
 **Use case:** Finding files that were copy-pasted with no or minimal changes (e.g., `utils.py` and `utils_backup.py`).
 **What you'll see:**
 - In the GUI table: pairs of files marked as "exact" duplicates with 100% similarity.
 - In the report: listed under "Exact duplicates" section.
 ### 9.2 Fuzzy Duplicate Detection (Advanced)
 Fuzzy detection identifies files that are *similar* but not identical. This is useful for finding:
 - Code that was copy-pasted and then slightly modified.
 - Refactored modules that share large blocks of logic.
 - Experimental branches or "almost-duplicates" that should be merged.
 **Algorithm Overview:**
 1. **K-gram Hashing (Rolling Hash with Rabin-Karp):**
   - Each file is divided into overlapping sequences of `k` consecutive lines (k-grams).
   - A rolling hash (Rabin-Karp polynomial hash) is computed for each k-gram.
   - This produces a large set of hash values representing all k-grams in the file.
 2. **Winnowing (Fingerprint Selection):**
   - To reduce the number of hashes (and improve performance), PyUCC applies a "winnowing" technique.
   - A sliding window of size `w` moves over the hash sequence.
   - In each window, the minimum hash value is selected as a fingerprint.
   - This creates a compact set of representative fingerprints for the file.
   - **Key property:** If two files share a substring of at least `k + w - 1` lines, they will share at least one fingerprint.
 3. **Inverted Index:**
   - All fingerprints from all files are stored in an inverted index: `{fingerprint -> [list of files containing it]}`.
   - This allows fast lookup of which files share fingerprints.
 4. **Jaccard Similarity:**
   - For each pair of files that share at least one fingerprint, PyUCC computes the Jaccard similarity:
     ```
     Jaccard(A, B) = |A ∩ B| / |A ∪ B|
     ```
   - Where A and B are the sets of fingerprints for the two files.
   - If the Jaccard score is above the threshold (default: 0.85, meaning 85% similarity), the pair is flagged as a fuzzy duplicate.
 5. **Percent Change Calculation:**
   - PyUCC also estimates the percentage of lines that differ between the two files.
   - If `pct_change <= threshold` (e.g., ≤5%), the files are considered duplicates.
 **Parameters you can adjust:**
 - **`k` (k-gram size):** Number of consecutive lines in each k-gram. Default: 25.
  - Larger `k` → fewer false positives, but may miss small duplicates.
  - Smaller `k` → more sensitive, but may produce false positives.
 - **`window` (winnowing window size):** Size of the window for selecting fingerprints. Default: 4.
  - Larger window → fewer fingerprints, faster processing, but may miss some matches.
  - Smaller window → more fingerprints, slower, but more thorough.
 - **`threshold` (percent change threshold):** Maximum allowed difference (in %) to still consider two files duplicates. Default: 5.0%.
  - Lower threshold → stricter matching (only very similar files).
  - Higher threshold → more lenient (catches files with more differences).
 **Recommended settings:**
 | Use Case | k | window | threshold |
 |----------|---|--------|----------|
 | Strict duplicate finding (only near-identical files) | 30 | 5 | 3.0% |
 | Balanced (default) | 25 | 4 | 5.0% |
 | Loose matching (catch refactored code) | 20 | 3 | 10.0% |
 | Very aggressive (experimental) | 15 | 2 | 15.0% |
 ### 9.3 Understanding Duplicate Reports
 **GUI Table Columns:**
 - **File A / File B:** The two files being compared.
 - **Match Type:** "exact" or "fuzzy".
 - **Similarity (%):** For fuzzy matches, the Jaccard similarity score (0-100%).
 - **Pct Change (%):** Estimated percentage of lines that differ.
 **Textual Report (duplicates_report.txt):**
 The report is divided into two sections:
 1. **Exact Duplicates:**
   ```
   Exact duplicates: 3
   src/utils.py <=> src/backup/utils_old.py
   src/module/helper.py <=> src/module/helper - Copy.py
   ```
 2. **Fuzzy Duplicates:**
   ```
   Fuzzy duplicates (threshold): 5
   src/processor.py <=> src/processor_v2.py
     Similarity: 92.5% | Pct Change: 3.2%
   src/core/engine.py <=> src/experimental/engine_new.py
     Similarity: 88.0% | Pct Change: 4.8%
   ```
 **Interpretation:**
 - **High similarity (>95%):** Strong candidates for deduplication. Consider keeping only one version or merging.
 - **Medium similarity (85-95%):** Review manually. May indicate refactored code or intentional variations.
 - **Threshold violations:** Files that exceed the `pct_change` threshold won't appear in the report, even if they share some fingerprints.
 ---
 ## 10. Reading and Interpreting Differ Reports
 The Differ functionality produces several types of output. Understanding each helps you track code evolution accurately.
 ### 10.1 Compact UCC-Style Table
 When you run *Differing*, PyUCC generates a compact summary table similar to the original UCC tool:
 **Example:**
 ```
 File                                   Code   Comm  Blank  Func  AvgCC   MI   ΔCode  ΔComm  ΔBlank  ΔFunc  ΔAvgCC  ΔMI
 ---------------------------------------------------------------------------------------------------------------
 src/module/a.py                         120    10     8      5     2.3    78   +10    -1     0       +0     -0.1    +2
 src/module/b.py                         118     8     10     5     2.4    76   -2     -2     +2      0      +0.1   -2
 src/new_feature.py                       45     5      3     2     1.8    82   +45    +5     +3      +2     +1.8   +82
 src/old_code.py                          --    --     --    --     --    --   -30    -5     -2      -1     -2.1   -75
 ```
 **Column Meanings:**
 | Column | Meaning |
 |--------|--------|
 | **File** | Relative path to the file |
 | **Code** | Current number of code lines |
 | **Comm** | Current number of comment lines |
 | **Blank** | Current number of blank lines |
 | **Func** | Number of functions detected (requires `lizard`) |
 | **AvgCC** | Average cyclomatic complexity per function |
 | **MI** | Maintainability Index (0-100, higher is better) |
 | **ΔCode** | Change in code lines (current - baseline) |
 | **ΔComm** | Change in comment lines |
 | **ΔBlank** | Change in blank lines |
 | **ΔFunc** | Change in function count |
 | **ΔAvgCC** | Change in average cyclomatic complexity |
 | **ΔMI** | Change in maintainability index |
 **Color Coding (GUI):**
 - **Green rows:** New files (Added) or improved metrics (e.g., ΔAvgCC < 0, ΔMI > 0).
 - **Red rows:** Deleted files or worsened metrics (e.g., ΔAvgCC > 0, ΔMI < 0).
 - **Yellow/Orange rows:** Modified files with mixed changes.
 - **Gray rows:** Unmodified files (identical to baseline).
 **What to look for:**
 - **ΔCode >> 0:** Significant code expansion. Is it justified by new features?
 - **ΔComm < 0:** Documentation decreased. Consider adding more comments.
 - **ΔAvgCC > 0:** Complexity increased. May indicate need for refactoring.
 - **ΔMI < 0:** Maintainability worsened. Review the changes.
 - **New files with high AvgCC:** New code is already complex. Flag for review.
 ### 10.2 Detailed Diff Report (diff_report.txt)
 A textual report is saved in the baseline folder:
 **Structure:**
 ```
 PyUCC Baseline Comparison Report
 =================================
 Baseline ID: MyProject__20251205T143022_local
 Snapshot timestamp: 2025-12-05 14:30:22
 Summary:
  New files: 3
  Deleted files: 1
  Modified files: 12
  Unchanged files: 45
 Metric Changes:
  Total Code Lines: +150
  Total Comments: -5
  Average CC: +0.2 (slight increase in complexity)
  Average MI: -1.5 (slight decrease in maintainability)
 [Compact UCC-style table here]
 Legend:
  A = Added file
  D = Deleted file
  M = Modified file
  U = Unchanged file
  ...
 ```
 ### 10.3 CSV Exports
 You can export any result table to CSV for further analysis in Excel, pandas, or BI tools.
 **Columns include:**
 - File path
 - All SLOC metrics (code, comment, blank lines)
 - Complexity metrics (CC, MI, function count)
 - Deltas (if from a Differ operation)
 - Status flags (A/D/M/U)
 **Use cases:**
 - Trend analysis over multiple baselines.
 - Generating charts (e.g., complexity over time).
 - Feeding into CI/CD quality gates.
 ---
 ## 11. Practical Use Cases and Workflows
 ### Use Case 1: Detecting Copy-Paste Code Before Code Review
 **Scenario:** Your team is developing a new module. You suspect some developers copy-pasted existing code instead of refactoring.
 **Workflow:**
 1. Create a profile for your project.
 2. Click **Duplicates** button.
 3. Set threshold to 5% (strict).
 4. Review the results table.
 5. For each fuzzy duplicate pair:
   - Double-click to open both files in the diff viewer (if implemented).
   - Assess whether the duplication is intentional or should be refactored into a shared utility.
 6. Export to CSV and share with the team for discussion.
 **Expected outcome:** You identify 3-5 near-duplicate files and create tickets to consolidate them.
 ---
 ### Use Case 2: Tracking Complexity During a Refactoring Sprint
 **Scenario:** Your team plans a 2-week refactoring sprint to reduce technical debt.
 **Workflow:**
 1. **Before the sprint:** Create a baseline ("Pre-Refactor").
   - Click **Differing** → Create baseline.
   - Name it "PreRefactor_Sprint5".
 2. **During the sprint:** Developers refactor code, extract functions, add comments.
 3. **After the sprint:** Run **Differing** against the baseline.
 4. Review the compact table:
   - Check ΔAvgCC: Should be negative (complexity reduced).
   - Check ΔMI: Should be positive (maintainability improved).
   - Check ΔComm: Should be positive (more documentation).
 5. Generate a diff report and attach to sprint retrospective.
 **Expected outcome:** Quantitative proof that refactoring worked: "We reduced average CC by 15% and increased MI by 8 points."
 ---
 ### Use Case 3: Ensuring New Features Don't Degrade Quality
 **Scenario:** You're adding a new feature to a mature codebase. You want to ensure the new code doesn't introduce excessive complexity.
 **Workflow:**
 1. Create a baseline before starting feature development.
 2. Develop the feature in a branch.
 3. Before merging to main:
   - Run **Differing** to compare current state vs. baseline.
   - Filter for new files (status = "A").
   - Check AvgCC and MI of new files.
   - If AvgCC > 5 or MI < 70, flag for refactoring before merge.
 4. Use **Duplicates** to ensure new code doesn't duplicate existing utilities.
 **Expected outcome:** New feature code passes quality gates before merge.
 ---
 ### Use Case 4: Generating Compliance Reports for Audits
 **Scenario:** Your organization requires periodic code quality audits.
 **Workflow:**
 1. Create baselines monthly (e.g., "Audit_2025_01", "Audit_2025_02", ...).
 2. Each baseline automatically generates:
   - `countings_report.txt`
   - `metrics_report.txt`
   - `duplicates_report.txt`
 3. Archive these reports in a compliance folder.
 4. For the audit, provide:
   - Trend of total SLOC over time.
   - Trend of average CC and MI.
   - Number of duplicates detected and resolved each month.
 **Expected outcome:** Auditors see measurable improvement in code quality metrics over time.
 ---
 ### Use Case 5: Onboarding New Developers with Code Metrics
 **Scenario:** A new developer joins the team and needs to understand the codebase.
 **Workflow:**
 1. Run **Metrics** on the entire codebase.
 2. Export to CSV.
 3. Sort by AvgCC (descending) to identify the most complex modules.
 4. Share the list with the new developer:
   - "These 5 files have the highest complexity. Be extra careful when modifying them."
   - "These modules have low MI. They're candidates for refactoring—good learning exercises."
 5. Use **Duplicates** to show which parts of the code have redundancy (explain why).
 **Expected outcome:** New developer understands code hotspots and quality issues faster.
 ---
 ## 12. Tips for Effective Use
 ### 12.1 Profile Management
 - **Create separate profiles** for different subprojects or components.
 - Use **ignore patterns** aggressively to exclude:
  - `node_modules`, `venv`, `.venv`
  - Build outputs (`build/`, `dist/`, `bin/`)
  - Generated code
  - Test fixtures or mock data
 ### 12.2 Baseline Strategy
 - **Naming convention:** Use descriptive names with dates or version tags:
  - `Release_v1.2.0_20251201`
  - `PreRefactor_Sprint10`
  - `BeforeMerge_FeatureX`
 - **Frequency:** Create baselines at key milestones:
  - End of each sprint
  - Before/after major refactorings
  - Before releases
 - **Retention:** Keep at least 3-5 recent baselines. Archive older ones.
 ### 12.3 Interpreting Metrics
 **Cyclomatic Complexity (CC):**
 - **1-5:** Simple, low risk.
 - **6-10:** Moderate complexity, acceptable.
 - **11-20:** High complexity, review recommended.
 - **21+:** Very high complexity, refactoring strongly recommended.
 **Maintainability Index (MI):**
 - **85-100:** Highly maintainable (green zone).
 - **70-84:** Moderately maintainable (yellow zone).
 - **Below 70:** Low maintainability (red zone), needs attention.
 ### 12.4 Duplicate Detection Best Practices
 - Start with **default parameters** (k=25, window=4, threshold=5%).
 - If you get too many false positives, **increase k** or **decrease threshold**.
 - If you suspect duplicates are being missed, **decrease k** or **increase threshold**.
 - Always **review fuzzy duplicates manually**—not all similarities are bad (e.g., interface implementations).
 ---
 ## 13. Troubleshooting and FAQs
 **Q: Duplicates detection is slow on large codebases.**
 **A:** 
 - Use profile filters to limit the file types analyzed.
 - Increase `k` and `window` to reduce the number of fingerprints processed.
 - Exclude large auto-generated files or test fixtures.
 **Q: Why are some files missing function-level metrics?**
 **A:** 
 - Function-level analysis requires `lizard`. Install it: `pip install lizard`.
 - Some languages may not be fully supported by `lizard`.
 **Q: Differ shows files as "Modified" but I didn't change them.**
 **A:**
 - Check if line endings changed (CRLF ↔ LF).
 - Verify the file wasn't reformatted by an auto-formatter.
 - PyUCC uses content hashing—any byte-level change triggers "Modified" status.
 **Q: How do I reset all baselines?**
 **A:** 
 - Baselines are stored in the `baseline/` folder (default).
 - Delete the baseline folder or specific baseline subdirectories to reset.
 **Q: Can I run PyUCC in CI/CD pipelines?**
 **A:** 
 - Yes! Use the CLI mode:
  ```bash
  python -m pyucc differ create /path/to/repo
  python -m pyucc differ diff <baseline_id> /path/to/repo
  python -m pyucc duplicates /path/to/repo --threshold 5.0
  ```
 - Parse the JSON output or text reports in your pipeline scripts.
 ---
--- a/doc/Italian-manual.md
+++ b/doc/Italian-manual.md
@ -184,4 +184,402 @@ src/module/b_copy.py                    118     8     10     5     2.4    76   -
 ---
-Se vuoi, posso aggiungere un esempio passo-passo che mostra come creare una baseline, eseguire la ricerca duplicati e esportare CSV + report UCC sia da GUI che da CLI. Vuoi che lo prepari con comandi e file di esempio?
+## 9. Rilevamento Duplicati: Algoritmi e Dettagli Tecnici
 Questa sezione fornisce una comprensione più approfondita di come PyUCC identifica il codice duplicato, cosa fanno gli algoritmi e come interpretare i risultati.
 ### 9.1 Rilevamento Duplicati Esatti
 **Come funziona:**
 - PyUCC normalizza ciascun file (rimuove spazi iniziali/finali da ogni riga, converte in minuscolo opzionalmente).
 - Calcola un hash SHA1 del contenuto normalizzato.
 - I file con hash identici sono considerati duplicati esatti.
 **Caso d'uso:** Trovare file che sono stati copiati e incollati senza o con modifiche minime (es. `utils.py` e `utils_backup.py`).
 **Cosa vedrai:**
 - Nella tabella GUI: coppie di file contrassegnate come duplicati "esatti" con similarità al 100%.
 - Nel report: elencate nella sezione "Duplicati esatti".
 ### 9.2 Rilevamento Duplicati Fuzzy (Avanzato)
 Il rilevamento fuzzy identifica file che sono *simili* ma non identici. È utile per trovare:
 - Codice che è stato copiato e poi leggermente modificato.
 - Moduli rifatti che condividono grandi blocchi di logica.
 - Branch sperimentali o "quasi-duplicati" che dovrebbero essere fusi.
 **Panoramica dell'Algoritmo:**
 1. **Hashing K-gram (Rolling Hash con Rabin-Karp):**
   - Ogni file è diviso in sequenze sovrapposte di `k` righe consecutive (k-gram).
   - Viene calcolato un rolling hash (hash polinomiale Rabin-Karp) per ogni k-gram.
   - Questo produce un grande insieme di valori hash che rappresentano tutti i k-gram del file.
 2. **Winnowing (Selezione delle Impronte Digitali):**
   - Per ridurre il numero di hash (e migliorare le prestazioni), PyUCC applica una tecnica di "winnowing".
   - Una finestra scorrevole di dimensione `w` si sposta sulla sequenza di hash.
   - In ogni finestra, viene selezionato il valore hash minimo come impronta digitale.
   - Questo crea un insieme compatto di impronte rappresentative per il file.
   - **Proprietà chiave:** Se due file condividono una sottostringa di almeno `k + w - 1` righe, condivideranno almeno un'impronta digitale.
 3. **Indice Invertito:**
   - Tutte le impronte digitali di tutti i file vengono memorizzate in un indice invertito: `{impronta -> [lista di file che la contengono]}`.
   - Questo permette una ricerca veloce di quali file condividono impronte.
 4. **Similarità di Jaccard:**
   - Per ogni coppia di file che condividono almeno un'impronta digitale, PyUCC calcola la similarità di Jaccard:
     ```
     Jaccard(A, B) = |A ∩ B| / |A ∪ B|
     ```
   - Dove A e B sono gli insiemi di impronte digitali per i due file.
   - Se il punteggio Jaccard è sopra la soglia (default: 0.85, ovvero 85% di similarità), la coppia viene segnalata come duplicato fuzzy.
 5. **Calcolo Percentuale di Modifica:**
   - PyUCC stima anche la percentuale di righe che differiscono tra i due file.
   - Se `pct_change <= threshold` (es. ≤5%), i file sono considerati duplicati.
 **Parametri regolabili:**
 - **`k` (dimensione k-gram):** Numero di righe consecutive in ogni k-gram. Default: 25.
  - `k` maggiore → meno falsi positivi, ma può perdere duplicati piccoli.
  - `k` minore → più sensibile, ma può produrre falsi positivi.
 - **`window` (dimensione finestra winnowing):** Dimensione della finestra per selezionare le impronte. Default: 4.
  - Finestra maggiore → meno impronte, elaborazione più veloce, ma può perdere alcune corrispondenze.
  - Finestra minore → più impronte, più lento, ma più accurato.
 - **`threshold` (soglia percentuale di modifica):** Differenza massima consentita (in %) per considerare ancora due file come duplicati. Default: 5.0%.
  - Soglia inferiore → corrispondenza più stretta (solo file molto simili).
  - Soglia superiore → più permissiva (cattura file con più differenze).
 **Impostazioni consigliate:**
 | Caso d'Uso | k | window | threshold |
 |----------|---|--------|----------|
 | Ricerca duplicati stretta (solo file quasi identici) | 30 | 5 | 3.0% |
 | Bilanciata (default) | 25 | 4 | 5.0% |
 | Corrispondenza lassa (cattura codice rifatto) | 20 | 3 | 10.0% |
 | Molto aggressiva (sperimentale) | 15 | 2 | 15.0% |
 ### 9.3 Interpretare i Report sui Duplicati
 **Colonne Tabella GUI:**
 - **File A / File B:** I due file confrontati.
 - **Match Type:** "exact" o "fuzzy".
 - **Similarity (%):** Per corrispondenze fuzzy, il punteggio di similarità Jaccard (0-100%).
 - **Pct Change (%):** Percentuale stimata di righe che differiscono.
 **Report Testuale (duplicates_report.txt):**
 Il report è diviso in due sezioni:
 1. **Duplicati Esatti:**
   ```
   Exact duplicates: 3
   src/utils.py <=> src/backup/utils_old.py
   src/module/helper.py <=> src/module/helper - Copy.py
   ```
 2. **Duplicati Fuzzy:**
   ```
   Fuzzy duplicates (threshold): 5
   src/processor.py <=> src/processor_v2.py
     Similarity: 92.5% | Pct Change: 3.2%
   src/core/engine.py <=> src/experimental/engine_new.py
     Similarity: 88.0% | Pct Change: 4.8%
   ```
 **Interpretazione:**
 - **Alta similarità (>95%):** Forti candidati per la deduplicazione. Considera di mantenere solo una versione o di fonderle.
 - **Media similarità (85-95%):** Rivedi manualmente. Può indicare codice rifatto o variazioni intenzionali.
 - **Violazioni della soglia:** I file che superano la soglia `pct_change` non appariranno nel report, anche se condividono alcune impronte.
 ---
 ## 10. Leggere e Interpretare i Report Differ
 La funzionalità Differ produce diversi tipi di output. Comprendere ciascuno aiuta a tracciare l'evoluzione del codice con precisione.
 ### 10.1 Tabella Compatta in Stile UCC
 Quando esegui *Differing*, PyUCC genera una tabella di riepilogo compatta simile allo strumento UCC originale:
 **Esempio:**
 ```
 File                                   Code   Comm  Blank  Func  AvgCC   MI   ΔCode  ΔComm  ΔBlank  ΔFunc  ΔAvgCC  ΔMI
 ---------------------------------------------------------------------------------------------------------------
 src/module/a.py                         120    10     8      5     2.3    78   +10    -1     0       +0     -0.1    +2
 src/module/b.py                         118     8     10     5     2.4    76   -2     -2     +2      0      +0.1   -2
 src/new_feature.py                       45     5      3     2     1.8    82   +45    +5     +3      +2     +1.8   +82
 src/old_code.py                          --    --     --    --     --    --   -30    -5     -2      -1     -2.1   -75
 ```
 **Significato delle Colonne:**
 | Colonna | Significato |
 |--------|--------|
 | **File** | Percorso relativo del file |
 | **Code** | Numero corrente di righe di codice |
 | **Comm** | Numero corrente di righe di commenti |
 | **Blank** | Numero corrente di righe vuote |
 | **Func** | Numero di funzioni rilevate (richiede `lizard`) |
 | **AvgCC** | Complessità ciclomatica media per funzione |
 | **MI** | Indice di Manutenibilità (0-100, più alto è meglio) |
 | **ΔCode** | Variazione nelle righe di codice (corrente - baseline) |
 | **ΔComm** | Variazione nelle righe di commenti |
 | **ΔBlank** | Variazione nelle righe vuote |
 | **ΔFunc** | Variazione nel conteggio funzioni |
 | **ΔAvgCC** | Variazione nella complessità ciclomatica media |
 | **ΔMI** | Variazione nell'indice di manutenibilità |
 **Codifica Colori (GUI):**
 - **Righe verdi:** File nuovi (Aggiunti) o metriche migliorate (es. ΔAvgCC < 0, ΔMI > 0).
 - **Righe rosse:** File eliminati o metriche peggiorate (es. ΔAvgCC > 0, ΔMI < 0).
 - **Righe gialle/arancioni:** File modificati con cambiamenti misti.
 - **Righe grigie:** File non modificati (identici alla baseline).
 **Cosa cercare:**
 - **ΔCode >> 0:** Espansione significativa del codice. È giustificata da nuove funzionalità?
 - **ΔComm < 0:** Documentazione diminuita. Considera di aggiungere più commenti.
 - **ΔAvgCC > 0:** Complessità aumentata. Può indicare necessità di refactoring.
 - **ΔMI < 0:** Manutenibilità peggiorata. Rivedi le modifiche.
 - **Nuovi file con alto AvgCC:** Il nuovo codice è già complesso. Segnala per revisione.
 ### 10.2 Report Diff Dettagliato (diff_report.txt)
 Un report testuale viene salvato nella cartella baseline:
 **Struttura:**
 ```
 PyUCC Baseline Comparison Report
 =================================
 Baseline ID: MyProject__20251205T143022_local
 Snapshot timestamp: 2025-12-05 14:30:22
 Summary:
  New files: 3
  Deleted files: 1
  Modified files: 12
  Unchanged files: 45
 Metric Changes:
  Total Code Lines: +150
  Total Comments: -5
  Average CC: +0.2 (slight increase in complexity)
  Average MI: -1.5 (slight decrease in maintainability)
 [Tabella compatta in stile UCC qui]
 Legend:
  A = Added file
  D = Deleted file
  M = Modified file
  U = Unchanged file
  ...
 ```
 ### 10.3 Esportazioni CSV
 Puoi esportare qualsiasi tabella dei risultati in CSV per ulteriori analisi in Excel, pandas o strumenti BI.
 **Le colonne includono:**
 - Percorso file
 - Tutte le metriche SLOC (codice, commenti, righe vuote)
 - Metriche di complessità (CC, MI, conteggio funzioni)
 - Delta (se da un'operazione Differ)
 - Flag di stato (A/D/M/U)
 **Casi d'uso:**
 - Analisi dei trend su più baseline.
 - Generazione di grafici (es. complessità nel tempo).
 - Integrazione in gate di qualità CI/CD.
 ---
 ## 11. Casi d'Uso Pratici e Workflow
 ### Caso d'Uso 1: Rilevare Codice Copiato Prima della Code Review
 **Scenario:** Il tuo team sta sviluppando un nuovo modulo. Sospetti che alcuni sviluppatori abbiano copiato e incollato codice esistente invece di fare refactoring.
 **Workflow:**
 1. Crea un profilo per il tuo progetto.
 2. Clicca sul pulsante **Duplicates**.
 3. Imposta threshold a 5% (stretto).
 4. Rivedi la tabella dei risultati.
 5. Per ogni coppia di duplicati fuzzy:
   - Fai doppio click per aprire entrambi i file nel visualizzatore diff (se implementato).
   - Valuta se la duplicazione è intenzionale o dovrebbe essere rifatta in un'utility condivisa.
 6. Esporta in CSV e condividi con il team per discussione.
 **Risultato atteso:** Identifichi 3-5 file quasi duplicati e crei ticket per consolidarli.
 ---
 ### Caso d'Uso 2: Tracciare la Complessità Durante uno Sprint di Refactoring
 **Scenario:** Il tuo team pianifica uno sprint di refactoring di 2 settimane per ridurre il debito tecnico.
 **Workflow:**
 1. **Prima dello sprint:** Crea una baseline ("Pre-Refactor").
   - Clicca **Differing** → Crea baseline.
   - Nominala "PreRefactor_Sprint5".
 2. **Durante lo sprint:** Gli sviluppatori rifanno il codice, estraggono funzioni, aggiungono commenti.
 3. **Dopo lo sprint:** Esegui **Differing** contro la baseline.
 4. Rivedi la tabella compatta:
   - Controlla ΔAvgCC: Dovrebbe essere negativo (complessità ridotta).
   - Controlla ΔMI: Dovrebbe essere positivo (manutenibilità migliorata).
   - Controlla ΔComm: Dovrebbe essere positivo (più documentazione).
 5. Genera un report diff e allegalo alla retrospettiva dello sprint.
 **Risultato atteso:** Prova quantitativa che il refactoring ha funzionato: "Abbiamo ridotto il CC medio del 15% e aumentato MI di 8 punti."
 ---
 ### Caso d'Uso 3: Assicurare che Nuove Funzionalità Non Degradino la Qualità
 **Scenario:** Stai aggiungendo una nuova funzionalità a una codebase matura. Vuoi assicurarti che il nuovo codice non introduca complessità eccessiva.
 **Workflow:**
 1. Crea una baseline prima di iniziare lo sviluppo della funzionalità.
 2. Sviluppa la funzionalità in un branch.
 3. Prima del merge su main:
   - Esegui **Differing** per confrontare lo stato corrente vs. baseline.
   - Filtra per nuovi file (status = "A").
   - Controlla AvgCC e MI dei nuovi file.
   - Se AvgCC > 5 o MI < 70, segnala per refactoring prima del merge.
 4. Usa **Duplicates** per assicurarti che il nuovo codice non duplichi utility esistenti.
 **Risultato atteso:** Il codice della nuova funzionalità supera i gate di qualità prima del merge.
 ---
 ### Caso d'Uso 4: Generare Report di Conformità per Audit
 **Scenario:** La tua organizzazione richiede audit periodici sulla qualità del codice.
 **Workflow:**
 1. Crea baseline mensili (es. "Audit_2025_01", "Audit_2025_02", ...).
 2. Ogni baseline genera automaticamente:
   - `countings_report.txt`
   - `metrics_report.txt`
   - `duplicates_report.txt`
 3. Archivia questi report in una cartella di conformità.
 4. Per l'audit, fornisci:
   - Trend di SLOC totale nel tempo.
   - Trend di CC e MI medi.
   - Numero di duplicati rilevati e risolti ogni mese.
 **Risultato atteso:** Gli auditor vedono un miglioramento misurabile nelle metriche di qualità del codice nel tempo.
 ---
 ### Caso d'Uso 5: Onboarding Nuovi Sviluppatori con Metriche del Codice
 **Scenario:** Un nuovo sviluppatore si unisce al team e ha bisogno di comprendere la codebase.
 **Workflow:**
 1. Esegui **Metrics** sull'intera codebase.
 2. Esporta in CSV.
 3. Ordina per AvgCC (decrescente) per identificare i moduli più complessi.
 4. Condividi l'elenco con il nuovo sviluppatore:
   - "Questi 5 file hanno la complessità più alta. Fai particolare attenzione quando li modifichi."
   - "Questi moduli hanno MI basso. Sono candidati per refactoring—buoni esercizi di apprendimento."
 5. Usa **Duplicates** per mostrare quali parti del codice hanno ridondanza (spiega perché).
 **Risultato atteso:** Il nuovo sviluppatore comprende più velocemente i punti critici e i problemi di qualità del codice.
 ---
 ## 12. Suggerimenti per un Uso Efficace
 ### 12.1 Gestione Profili
 - **Crea profili separati** per diversi sottoprogetti o componenti.
 - Usa **pattern di ignore** in modo aggressivo per escludere:
  - `node_modules`, `venv`, `.venv`
  - Output di build (`build/`, `dist/`, `bin/`)
  - Codice generato
  - Fixture di test o dati mock
 ### 12.2 Strategia Baseline
 - **Convenzione di denominazione:** Usa nomi descrittivi con date o tag di versione:
  - `Release_v1.2.0_20251201`
  - `PreRefactor_Sprint10`
  - `BeforeMerge_FeatureX`
 - **Frequenza:** Crea baseline ai traguardi chiave:
  - Fine di ogni sprint
  - Prima/dopo refactoring importanti
  - Prima dei rilasci
 - **Ritenzione:** Mantieni almeno 3-5 baseline recenti. Archivia quelle più vecchie.
 ### 12.3 Interpretare le Metriche
 **Complessità Ciclomatica (CC):**
 - **1-5:** Semplice, basso rischio.
 - **6-10:** Complessità moderata, accettabile.
 - **11-20:** Alta complessità, revisione raccomandata.
 - **21+:** Complessità molto alta, refactoring fortemente raccomandato.
 **Indice di Manutenibilità (MI):**
 - **85-100:** Altamente manutenibile (zona verde).
 - **70-84:** Moderatamente manutenibile (zona gialla).
 - **Sotto 70:** Bassa manutenibilità (zona rossa), necessita attenzione.
 ### 12.4 Best Practice per il Rilevamento Duplicati
 - Inizia con **parametri di default** (k=25, window=4, threshold=5%).
 - Se ottieni troppi falsi positivi, **aumenta k** o **diminuisci threshold**.
 - Se sospetti che i duplicati vengano persi, **diminuisci k** o **aumenta threshold**.
 - **Rivedi sempre manualmente i duplicati fuzzy**—non tutte le similarità sono negative (es. implementazioni di interfacce).
 ---
 ## 13. Risoluzione Problemi e FAQ
 **D: Il rilevamento duplicati è lento su grandi codebase.**
 **R:** 
 - Usa i filtri del profilo per limitare i tipi di file analizzati.
 - Aumenta `k` e `window` per ridurre il numero di impronte elaborate.
 - Escludi file auto-generati di grandi dimensioni o fixture di test.
 **D: Perché alcuni file mancano di metriche a livello di funzione?**
 **R:** 
 - L'analisi a livello di funzione richiede `lizard`. Installalo: `pip install lizard`.
 - Alcuni linguaggi potrebbero non essere completamente supportati da `lizard`.
 **D: Differ mostra file come "Modified" ma non li ho modificati.**
 **R:**
 - Controlla se le terminazioni di riga sono cambiate (CRLF ↔ LF).
 - Verifica che il file non sia stato riformattato da un auto-formatter.
 - PyUCC usa hashing del contenuto—qualsiasi modifica a livello di byte attiva lo stato "Modified".
 **D: Come resetto tutte le baseline?**
 **R:** 
 - Le baseline sono memorizzate nella cartella `baseline/` (default).
 - Elimina la cartella baseline o specifiche sottocartelle baseline per resettare.
 **D: Posso eseguire PyUCC in pipeline CI/CD?**
 **R:** 
 - Sì! Usa la modalità CLI:
  ```bash
  python -m pyucc differ create /path/to/repo
  python -m pyucc differ diff <baseline_id> /path/to/repo
  python -m pyucc duplicates /path/to/repo --threshold 5.0
  ```
 - Analizza l'output JSON o i report testuali nei tuoi script di pipeline.
 ---
--- a/profiles.json
+++ b/profiles.json
@ -0,0 +1,189 @@
 [
  {
    "name": "target_simulator",
    "paths": [
      "C:\\src\\____GitProjects\\S1005403_RisCC\\target_simulator"
    ],
    "languages": [
      "Python"
    ],
    "ignore": [
      "__pycache__",
      "*.pyc",
      "*.pyo",
      "*.pyd",
      ".Python",
      "env",
      "venv",
      ".venv",
      "build",
      "dist",
      "*.egg-info",
      ".eggs",
      "node_modules",
      ".git",
      ".hg",
      ".svn",
      ".idea",
      ".vscode",
      ".DS_Store",
      "*.class",
      "*.o",
      "*.so",
      "*.dylib",
      ".pytest_cache",
      ".mypy_cache",
      ".cache",
      "coverage",
      ".tox",
      "pip-wheel-metadata",
      "*.log",
      "*.tmp",
      "Thumbs.db"
    ]
  },
  {
    "name": "EIF",
    "paths": [
      "C:\\src\\GRIFO-E\\REP\\Projects\\GHost",
      "C:\\src\\GRIFO-E\\REP\\Projects\\GrifoFwIF",
      "C:\\src\\GRIFO-E\\REP\\Projects\\GrifoSdkEif",
      "C:\\src\\GRIFO-E\\REP\\Projects\\AesaAntennaLibrary",
      "C:\\src\\GRIFO-E\\REP\\Projects\\RpyOut\\IDD"
    ],
    "languages": [
      "C",
      "C++"
    ],
    "ignore": []
  },
  {
    "name": "test1",
    "paths": [
      "C:\\src\\____GitProjects\\__test"
    ],
    "languages": [
      "Python"
    ],
    "ignore": [
      "__pycache__",
      "*.pyc",
      "*.pyo",
      "*.pyd",
      ".Python",
      "env",
      "venv",
      ".venv",
      "build",
      "dist",
      "*.egg-info",
      ".eggs",
      "node_modules",
      ".git",
      ".hg",
      ".svn",
      ".idea",
      ".vscode",
      ".DS_Store",
      "*.class",
      "*.o",
      "*.so",
      "*.dylib",
      ".pytest_cache",
      ".mypy_cache",
      ".cache",
      "coverage",
      ".tox",
      "pip-wheel-metadata",
      "*.log",
      "*.tmp",
      "Thumbs.db"
    ]
  },
  {
    "name": "pyucc",
    "paths": [
      "C:\\src\\____GitProjects\\SXXXXXXX_PyUcc\\pyucc"
    ],
    "languages": [
      "Python"
    ],
    "ignore": [
      "__pycache__",
      "*.pyc",
      "*.pyo",
      "*.pyd",
      ".Python",
      "env",
      "venv",
      ".venv",
      "build",
      "dist",
      "*.egg-info",
      ".eggs",
      "node_modules",
      ".git",
      ".hg",
      ".svn",
      ".idea",
      ".vscode",
      ".DS_Store",
      "*.class",
      "*.o",
      "*.so",
      "*.dylib",
      ".pytest_cache",
      ".mypy_cache",
      ".cache",
      "coverage",
      ".tox",
      "pip-wheel-metadata"
    ]
  },
  {
    "name": "DSP",
    "paths": [
      "C:\\__temp\\Metrics\\attuale\\REP\\Projects\\DSP",
      "C:\\__temp\\Metrics\\attuale\\REP\\Projects\\DspAlgorithms"
    ],
    "languages": [
      "C",
      "C++"
    ],
    "ignore": [
      "__pycache__",
      "*.pyc",
      "*.pyo",
      "*.pyd",
      ".Python",
      "env",
      "venv",
      ".venv",
      "build",
      "dist",
      "*.egg-info",
      ".eggs",
      "node_modules",
      ".git",
      ".hg",
      ".svn",
      ".idea",
      ".vscode",
      ".DS_Store",
      "*.class",
      "*.o",
      "*.so",
      "*.dylib",
      ".pytest_cache",
      ".mypy_cache",
      ".cache",
      "coverage",
      ".tox",
      "pip-wheel-metadata",
      "*.log",
      "*.tmp",
      "Thumbs.db",
      "*.bak"
    ]
  }
 ]
--- a/pyucc/_version.py
+++ b/pyucc/_version.py
@ -6,10 +6,10 @@
 import re
 # --- Version Data (Generated) ---
-__version__ = "v.0.0.0.16-0-gd211a7d-dirty"
+__version__ = "v.0.0.0.19-0-g79ed9c1-dirty"
-GIT_COMMIT_HASH = "d211a7d3549a26561a54d32bc2ccd15abef5d714"
+GIT_COMMIT_HASH = "79ed9c1d728bb53ace64ca2c232eb3c038b69152"
 GIT_BRANCH = "master"
-BUILD_TIMESTAMP = "2025-12-01T13:12:51.174558+00:00"
+BUILD_TIMESTAMP = "2025-12-12T09:07:06.615288+00:00"
 IS_GIT_REPO = True
 # --- Default Values (for comparison or fallback) ---
@ -17,7 +17,6 @@ DEFAULT_VERSION = "0.0.0+unknown"
 DEFAULT_COMMIT = "Unknown"
 DEFAULT_BRANCH = "Unknown"
 # --- Helper Function ---
 def get_version_string(format_string=None):
    """
@ -45,38 +44,28 @@ def get_version_string(format_string=None):
    replacements = {}
    try:
-        replacements["version"] = __version__ if __version__ else DEFAULT_VERSION
+        replacements['version'] = __version__ if __version__ else DEFAULT_VERSION
-        replacements["commit"] = GIT_COMMIT_HASH if GIT_COMMIT_HASH else DEFAULT_COMMIT
+        replacements['commit'] = GIT_COMMIT_HASH if GIT_COMMIT_HASH else DEFAULT_COMMIT
-        replacements["commit_short"] = (
+        replacements['commit_short'] = GIT_COMMIT_HASH[:7] if GIT_COMMIT_HASH and len(GIT_COMMIT_HASH) >= 7 else DEFAULT_COMMIT
-            GIT_COMMIT_HASH[:7]
+        replacements['branch'] = GIT_BRANCH if GIT_BRANCH else DEFAULT_BRANCH
-            if GIT_COMMIT_HASH and len(GIT_COMMIT_HASH) >= 7
+        replacements['timestamp'] = BUILD_TIMESTAMP if BUILD_TIMESTAMP else "Unknown"
-            else DEFAULT_COMMIT
+        replacements['timestamp_short'] = BUILD_TIMESTAMP.split('T')[0] if BUILD_TIMESTAMP and 'T' in BUILD_TIMESTAMP else "Unknown"
-        )
+        replacements['is_git'] = "Git" if IS_GIT_REPO else "Unknown"
-        replacements["branch"] = GIT_BRANCH if GIT_BRANCH else DEFAULT_BRANCH
+        replacements['dirty'] = "-dirty" if __version__ and __version__.endswith('-dirty') else ""
        replacements["timestamp"] = BUILD_TIMESTAMP if BUILD_TIMESTAMP else "Unknown"
        replacements["timestamp_short"] = (
            BUILD_TIMESTAMP.split("T")[0]
            if BUILD_TIMESTAMP and "T" in BUILD_TIMESTAMP
            else "Unknown"
        )
        replacements["is_git"] = "Git" if IS_GIT_REPO else "Unknown"
        replacements["dirty"] = (
            "-dirty" if __version__ and __version__.endswith("-dirty") else ""
        )
        tag = DEFAULT_VERSION
        if __version__ and IS_GIT_REPO:
-            match = re.match(r"^(v?([0-9]+(?:\.[0-9]+)*))", __version__)
+            match = re.match(r'^(v?([0-9]+(?:\.[0-9]+)*))', __version__)
            if match:
                tag = match.group(1)
-        replacements["tag"] = tag
+        replacements['tag'] = tag
        output_string = format_string
        for placeholder, value in replacements.items():
-            pattern = re.compile(r"{{\s*" + re.escape(placeholder) + r"\s*}}")
+             pattern = re.compile(r'{{\s*' + re.escape(placeholder) + r'\s*}}')
             output_string = pattern.sub(str(value), output_string)
-        if re.search(r"{\s*\w+\s*}", output_string):
+        if re.search(r'{\s*\w+\s*}', output_string):
             pass # Or log a warning: print(f"Warning: Unreplaced placeholders found: {output_string}")
        return output_string
--- a/pyucc/config/profiles.py
+++ b/pyucc/config/profiles.py
@ -1,7 +1,7 @@
 """Profiles persistence for PyUcc.
-Stores user profiles as JSON in the user's home directory
+Stores user profiles as JSON in the application directory
-(`~/.pyucc_profiles.json`). Each profile is a dict with keys:
+(where the executable is located, `profiles.json`). Each profile is a dict with keys:
  - name: str
  - path: str
  - languages: list[str]
@ -12,9 +12,20 @@ This module exposes simple load/save/manage helpers.
 from pathlib import Path
 import json
 import sys
 from typing import List, Dict, Optional
-_DEFAULT_PATH = Path.home() / ".pyucc_profiles.json"
+# Get the directory where the application is running from
 # If frozen (PyInstaller), use the executable's directory
 # Otherwise use the directory of this module
 if getattr(sys, 'frozen', False):
    # Running as compiled executable
    _APP_DIR = Path(sys.executable).parent
 else:
    # Running as script - go up from config/ to pyucc/ to root
    _APP_DIR = Path(__file__).parent.parent.parent
 _DEFAULT_PATH = _APP_DIR / "profiles.json"
 def _read_file(path: Path) -> List[Dict]:
--- a/pyucc/config/settings.py
+++ b/pyucc/config/settings.py
@ -1,8 +1,19 @@
 from pathlib import Path
 import json
 import sys
 from typing import Dict, Optional
-_DEFAULT_PATH = Path.home() / ".pyucc_settings.json"
+# Get the directory where the application is running from
 # If frozen (PyInstaller), use the executable's directory
 # Otherwise use the directory of this module
 if getattr(sys, 'frozen', False):
    # Running as compiled executable
    _APP_DIR = Path(sys.executable).parent
 else:
    # Running as script - go up from config/ to pyucc/ to root
    _APP_DIR = Path(__file__).parent.parent.parent
 _DEFAULT_PATH = _APP_DIR / "settings.json"
 def _read_file(path: Path) -> Dict:
--- a/settings.json
+++ b/settings.json
@ -0,0 +1,14 @@
 {
  "baseline_dir": "C:\\src\\____GitProjects\\SXXXXXXX_PyUcc\\baseline",
  "max_keep": 5,
  "zip_baselines": false,
  "duplicates": {
    "threshold": 5.0,
    "extensions": [
      ".py",
      ".pyw"
    ],
    "k": 25,
    "window": 4
  }
 }